Releases: ashvardanian/StringZilla
Release v3.12.5
Release v3.12.4
Release v3.12.3
Release v3.12.2
Release v3.12.1
GoLang support in StringZilla v3.12 🥳
Together with @MarkReedZ we've added basic GoLang bindings to StringZilla, which look surprisingly fast compared to native GoLang strings. We currently use the new cGo
annotations available in Go 1.24:
Cgo has gained new capabilities in Go 1.24, supporting new C function annotations to improve runtime performance. Among them,
#cgo noescape cFunctionName
is used to inform the compiler that the memory passed tocFunctionname
will not escape;#cgo nocallback cFunctionName
indicates that this C function will not call back any Go functions. In addition, Cgo's inspection of multiple incompatible declarations of C functions has become more stringent. When there are incompatible declarations in different files, errors can be detected and reported more timely and accurately.
I was using an Intel Sapphire Rapids machine on AWS for preliminary testing and benchmarking. I've precompiled StringZilla with dynamic dispatch enabled, linked to the thin GoLang binding layer:
$ ~/StringZilla/golang$ CGO_CFLAGS="-I$(pwd)/../include" \
CGO_LDFLAGS="-L$(pwd)/../build_golang -lstringzilla_shared" \
LD_LIBRARY_PATH="$(pwd)/../build_golang:$LD_LIBRARY_PATH" \
go run ../scripts/bench.go --input ../leipzig1M.txt --split lines --seed 42
... and compared to native GoLang strings on some key operations:
Benchmarking on `../leipzig1M.txt` with seed 42.
Total input length: 129644797
Total lines: 1000000
Average line length: 128.64
Running benchmark using `testing.Benchmark`.
strings.Contains : 309 3818144 ns/op
sz.Contains : 664 1881251 ns/op
strings.Index : 325 3669081 ns/op
sz.Index : 624 1990093 ns/op
strings.LastIndex : 12 85201713 ns/op
sz.LastIndex : 494 2306318 ns/op
strings.IndexAny : 6321228 181.0 ns/op
sz.IndexAny : 10608960 112.6 ns/op
strings.Count : 156 8015292 ns/op
sz.Count (non-overlap) : 285 4206698 ns/op
sz.Count (overlap) : 284 4204370 ns/op
So if you are processing a lot of text in Go, try doing so with StringZilla and stay tuned for the upcoming 4.0 release #201 🥳
Release v3.11.3
Release v3.11.2
v3.11.1: Matching N3322 for `memcpy` UB in C2y
v3.11.0: Checksums in AVX-512, AVX2, NEON
- 🆕
sz_checksum(char const *, size_t)
C 99 interface - 🆕
sz::str().checksum()
C++ 11 interface - 🆕
sz.checksum(str)
Python interface
Database and other Systems Engineers, you can now use StringZilla to dynamically dispatch different check-sum kernels for AVX2 capable Haswell+ CPUs, AVX-512BW capable Ice Lake+ CPUs, and Arm NEON CPUs on mobile. In AVX-512, masked loads are used extensively, resulting in a 10% improvement even on typical English words, averaging 5 bytes in length and 20x performance improvement compared to the serial code for longer strings.
On the technical side, on x86, the kernels use the well-known SAD(text, zeros)
idiom to accumulate absolute differences between individual bytes into 64-bit words. It also uses bidirectional traversal to saturate the core, capable of performing 2 loads per CPU cycle. Moreover, on large inputs, it switches to streaming loads, separately handling the head and the tail, similar to our memcpy
alternative, also outperforming LibC on AVX-512-capable machines 😎