Benchmarks

zugbruecke performs reasonably well given its complexity with 0.15 ms overhead per simple function call on average on modern hardware. Very complex function calls involving callback functions and memory synchronization can involve an overhead of several milliseconds.

Note

zugbruecke is not yet optimized for speed. The inter-process communication via multiprocessing connection adds overhead to every function call. Because zugbruecke takes care of packing and unpacking of pointers and structures for arguments and return values, this adds another bit of overhead. Calls are slow in general, but the first call of an individual routine within a session is even slower due to necessary initialization happening beforehand. Depending on the use-case, instead of working with zugbruecke, it will be significantly faster to isolate functionality depending on DLL calls into a dedicated Python script and run it directly with a Windows Python interpreter under Wine. zugbruecke offers a Wine Python Environment for this purpose.

For comparison and overhead measurements, see the individual benchmarks.

“memsync” benchmark, CPython 3.10.6 on linux, versions of CPython on Wine

version

arch

convention

ctypes [µs]

zugbruecke [µs]

overhead [µs]

3.7.9

win32

cdll

11.6

201.0

189.4

3.7.9

win32

windll

11.6

205.7

194.1

3.7.9

win64

cdll

8.6

185.4

176.8

3.7.9

win64

windll

8.4

186.7

178.3

3.8.10

win32

cdll

10.1

192.4

182.3

3.8.10

win32

windll

10.0

199.7

189.7

3.8.10

win64

cdll

7.2

180.1

172.9

3.8.10

win64

windll

7.1

179.7

172.6

3.9.13

win32

cdll

9.4

195.1

185.7

3.9.13

win32

windll

9.4

203.7

194.3

3.9.13

win64

cdll

7.2

174.4

167.2

3.9.13

win64

windll

7.1

180.3

173.2

3.10.9

win32

cdll

9.8

194.8

185.0

3.10.9

win32

windll

9.7

200.1

190.4

3.10.9

win64

cdll

7.3

183.3

176.0

3.10.9

win64

windll

7.3

180.3

173.0

3.11.1

win32

cdll

9.4

190.8

181.4

3.11.1

win32

windll

9.3

193.1

183.8

3.11.1

win64

cdll

7.2

179.7

172.5

3.11.1

win64

windll

7.2

176.7

169.5

The “memsync” benchmark is a basic test of bidirectional memory synchronization via a memsync directive for a pointer argument, an array of single-precision floating point numbers. The benchmark uses 10 numbers per array. It is passed to the DLL function, next to the array’s length as an c_int. The DLL function performs a classic bubblesort algorithm in-place on the passed / synchronized memory.

“minimal” benchmark, CPython 3.10.6 on linux, versions of CPython on Wine

version

arch

convention

ctypes [µs]

zugbruecke [µs]

overhead [µs]

3.7.9

win32

cdll

0.7

151.8

151.1

3.7.9

win32

windll

0.7

143.5

142.8

3.7.9

win64

cdll

0.6

132.1

131.5

3.7.9

win64

windll

0.6

133.7

133.1

3.8.10

win32

cdll

0.7

143.1

142.4

3.8.10

win32

windll

0.7

139.2

138.5

3.8.10

win64

cdll

0.5

128.4

127.9

3.8.10

win64

windll

0.5

132.5

132.0

3.9.13

win32

cdll

0.7

146.8

146.1

3.9.13

win32

windll

0.7

147.3

146.6

3.9.13

win64

cdll

0.5

130.4

129.9

3.9.13

win64

windll

0.5

135.0

134.5

3.10.9

win32

cdll

0.7

146.2

145.5

3.10.9

win32

windll

0.7

146.1

145.4

3.10.9

win64

cdll

0.5

139.1

138.6

3.10.9

win64

windll

0.5

131.5

131.0

3.11.1

win32

cdll

0.7

142.5

141.8

3.11.1

win32

windll

0.7

142.5

141.8

3.11.1

win64

cdll

0.5

131.8

131.3

3.11.1

win64

windll

0.5

133.8

133.3

The “minimal” benchmark is a simple function call with two c_int parameters and a single c_int return value. The DLL function simply adds the two numbers and returns the result.

“maximal” benchmark, CPython 3.10.6 on linux, versions of CPython on Wine

version

arch

convention

ctypes [µs]

zugbruecke [µs]

overhead [µs]

3.7.9

win32

cdll

67.3

2,573.5

2,506.2

3.7.9

win32

windll

66.8

2,578.1

2,511.3

3.7.9

win64

cdll

49.9

2,301.9

2,252.0

3.7.9

win64

windll

51.4

2,313.5

2,262.1

3.8.10

win32

cdll

59.9

2,406.5

2,346.6

3.8.10

win32

windll

60.5

2,402.5

2,342.0

3.8.10

win64

cdll

45.2

2,166.9

2,121.7

3.8.10

win64

windll

44.7

2,181.1

2,136.4

3.9.13

win32

cdll

59.6

2,434.6

2,375.0

3.9.13

win32

windll

59.8

2,415.7

2,355.9

3.9.13

win64

cdll

45.9

2,187.5

2,141.6

3.9.13

win64

windll

45.6

2,188.5

2,142.9

3.10.9

win32

cdll

62.6

2,361.9

2,299.3

3.10.9

win32

windll

62.5

2,439.2

2,376.7

3.10.9

win64

cdll

47.0

2,177.9

2,130.9

3.10.9

win64

windll

46.5

2,173.7

2,127.2

3.11.1

win32

cdll

58.5

2,490.1

2,431.6

3.11.1

win32

windll

58.6

2,347.9

2,289.3

3.11.1

win64

cdll

42.7

2,113.6

2,070.9

3.11.1

win64

windll

42.7

2,123.4

2,080.7

The “maximal” benchmark runs through everything that zugbuecke has to offer. The DLL function takes three arguments: Two pointers to structs and a function pointer. The structs themselves contain pointers to memory of arbitrary length which is handled by memsync. The function pointer allows to pass a reference to a callback function, written in pure Python. It takes a single pointer to a struct, again containing a pointer to memory of arbitrary length, yet again handled by memsync, and returns a single integer. The callback is invoked 9 times per DLL function call. The test is based on a simple monochrom image filter where the DLL function iterates over every pixel in a 3x3 pixel monochrom image while the filter’s kernel is provided by the callback function.

Benchmarks were performed on an “AMD EPYC 7443P 24-Core Processor” CPU, Linux 5.15.0-56-generic 64bit, and Wine 7.17 (Staging).

zugbruecke was configured with log_level set to 0 (logs off) for minimal overhead. For the corresponding source code, both Python and C, check the benchmark directory of this project.