Benchmarks
zugbruecke performs reasonably well given its complexity with 0.15 ms overhead per simple function call on average on modern hardware. Very complex function calls involving callback functions and memory synchronization can involve an overhead of several milliseconds.
Note
zugbruecke is not yet optimized for speed. The inter-process communication via multiprocessing connection adds overhead to every function call. Because zugbruecke takes care of packing and unpacking of pointers and structures for arguments and return values, this adds another bit of overhead. Calls are slow in general, but the first call of an individual routine within a session is even slower due to necessary initialization happening beforehand. Depending on the use-case, instead of working with zugbruecke, it will be significantly faster to isolate functionality depending on DLL calls into a dedicated Python script and run it directly with a Windows Python interpreter under Wine. zugbruecke offers a Wine Python Environment for this purpose.
For comparison and overhead measurements, see the individual benchmarks.
version |
arch |
convention |
ctypes [µs] |
zugbruecke [µs] |
overhead [µs] |
---|---|---|---|---|---|
3.7.9 |
win32 |
cdll |
11.6 |
201.0 |
189.4 |
3.7.9 |
win32 |
windll |
11.6 |
205.7 |
194.1 |
3.7.9 |
win64 |
cdll |
8.6 |
185.4 |
176.8 |
3.7.9 |
win64 |
windll |
8.4 |
186.7 |
178.3 |
3.8.10 |
win32 |
cdll |
10.1 |
192.4 |
182.3 |
3.8.10 |
win32 |
windll |
10.0 |
199.7 |
189.7 |
3.8.10 |
win64 |
cdll |
7.2 |
180.1 |
172.9 |
3.8.10 |
win64 |
windll |
7.1 |
179.7 |
172.6 |
3.9.13 |
win32 |
cdll |
9.4 |
195.1 |
185.7 |
3.9.13 |
win32 |
windll |
9.4 |
203.7 |
194.3 |
3.9.13 |
win64 |
cdll |
7.2 |
174.4 |
167.2 |
3.9.13 |
win64 |
windll |
7.1 |
180.3 |
173.2 |
3.10.9 |
win32 |
cdll |
9.8 |
194.8 |
185.0 |
3.10.9 |
win32 |
windll |
9.7 |
200.1 |
190.4 |
3.10.9 |
win64 |
cdll |
7.3 |
183.3 |
176.0 |
3.10.9 |
win64 |
windll |
7.3 |
180.3 |
173.0 |
3.11.1 |
win32 |
cdll |
9.4 |
190.8 |
181.4 |
3.11.1 |
win32 |
windll |
9.3 |
193.1 |
183.8 |
3.11.1 |
win64 |
cdll |
7.2 |
179.7 |
172.5 |
3.11.1 |
win64 |
windll |
7.2 |
176.7 |
169.5 |
The “memsync” benchmark is a basic test of bidirectional memory synchronization
via a memsync
directive for a pointer argument,
an array of single-precision floating point numbers.
The benchmark uses 10 numbers per array.
It is passed to the DLL function,
next to the array’s length as an c_int
.
The DLL function performs a classic bubblesort algorithm in-place
on the passed / synchronized memory.
version |
arch |
convention |
ctypes [µs] |
zugbruecke [µs] |
overhead [µs] |
---|---|---|---|---|---|
3.7.9 |
win32 |
cdll |
0.7 |
151.8 |
151.1 |
3.7.9 |
win32 |
windll |
0.7 |
143.5 |
142.8 |
3.7.9 |
win64 |
cdll |
0.6 |
132.1 |
131.5 |
3.7.9 |
win64 |
windll |
0.6 |
133.7 |
133.1 |
3.8.10 |
win32 |
cdll |
0.7 |
143.1 |
142.4 |
3.8.10 |
win32 |
windll |
0.7 |
139.2 |
138.5 |
3.8.10 |
win64 |
cdll |
0.5 |
128.4 |
127.9 |
3.8.10 |
win64 |
windll |
0.5 |
132.5 |
132.0 |
3.9.13 |
win32 |
cdll |
0.7 |
146.8 |
146.1 |
3.9.13 |
win32 |
windll |
0.7 |
147.3 |
146.6 |
3.9.13 |
win64 |
cdll |
0.5 |
130.4 |
129.9 |
3.9.13 |
win64 |
windll |
0.5 |
135.0 |
134.5 |
3.10.9 |
win32 |
cdll |
0.7 |
146.2 |
145.5 |
3.10.9 |
win32 |
windll |
0.7 |
146.1 |
145.4 |
3.10.9 |
win64 |
cdll |
0.5 |
139.1 |
138.6 |
3.10.9 |
win64 |
windll |
0.5 |
131.5 |
131.0 |
3.11.1 |
win32 |
cdll |
0.7 |
142.5 |
141.8 |
3.11.1 |
win32 |
windll |
0.7 |
142.5 |
141.8 |
3.11.1 |
win64 |
cdll |
0.5 |
131.8 |
131.3 |
3.11.1 |
win64 |
windll |
0.5 |
133.8 |
133.3 |
The “minimal” benchmark is a simple function call with
two c_int
parameters and a single c_int
return value.
The DLL function simply adds the two numbers and returns the result.
version |
arch |
convention |
ctypes [µs] |
zugbruecke [µs] |
overhead [µs] |
---|---|---|---|---|---|
3.7.9 |
win32 |
cdll |
67.3 |
2,573.5 |
2,506.2 |
3.7.9 |
win32 |
windll |
66.8 |
2,578.1 |
2,511.3 |
3.7.9 |
win64 |
cdll |
49.9 |
2,301.9 |
2,252.0 |
3.7.9 |
win64 |
windll |
51.4 |
2,313.5 |
2,262.1 |
3.8.10 |
win32 |
cdll |
59.9 |
2,406.5 |
2,346.6 |
3.8.10 |
win32 |
windll |
60.5 |
2,402.5 |
2,342.0 |
3.8.10 |
win64 |
cdll |
45.2 |
2,166.9 |
2,121.7 |
3.8.10 |
win64 |
windll |
44.7 |
2,181.1 |
2,136.4 |
3.9.13 |
win32 |
cdll |
59.6 |
2,434.6 |
2,375.0 |
3.9.13 |
win32 |
windll |
59.8 |
2,415.7 |
2,355.9 |
3.9.13 |
win64 |
cdll |
45.9 |
2,187.5 |
2,141.6 |
3.9.13 |
win64 |
windll |
45.6 |
2,188.5 |
2,142.9 |
3.10.9 |
win32 |
cdll |
62.6 |
2,361.9 |
2,299.3 |
3.10.9 |
win32 |
windll |
62.5 |
2,439.2 |
2,376.7 |
3.10.9 |
win64 |
cdll |
47.0 |
2,177.9 |
2,130.9 |
3.10.9 |
win64 |
windll |
46.5 |
2,173.7 |
2,127.2 |
3.11.1 |
win32 |
cdll |
58.5 |
2,490.1 |
2,431.6 |
3.11.1 |
win32 |
windll |
58.6 |
2,347.9 |
2,289.3 |
3.11.1 |
win64 |
cdll |
42.7 |
2,113.6 |
2,070.9 |
3.11.1 |
win64 |
windll |
42.7 |
2,123.4 |
2,080.7 |
The “maximal” benchmark runs through everything that zugbuecke has to offer.
The DLL function takes three arguments: Two pointers to structs and a function pointer.
The structs themselves contain pointers to memory of arbitrary length which is handled by memsync
.
The function pointer allows to pass a reference to a callback function, written in pure Python.
It takes a single pointer to a struct, again containing a pointer to memory of arbitrary length,
yet again handled by memsync
, and returns a single integer.
The callback is invoked 9 times per DLL function call.
The test is based on a simple monochrom image filter where the DLL function iterates over every pixel
in a 3x3 pixel monochrom image while the filter’s kernel is provided by the callback function.
Benchmarks were performed on an “AMD EPYC 7443P 24-Core Processor” CPU, Linux 5.15.0-56-generic 64bit, and Wine 7.17 (Staging).
zugbruecke was configured with log_level
set to 0
(logs off) for minimal overhead. For the corresponding source code, both Python and C, check the benchmark directory of this project.