Releases · ashvardanian/StringZilla

Together with @MarkReedZ we've added basic GoLang bindings to StringZilla, which look surprisingly fast compared to native GoLang strings. We currently use the new cGo annotations available in Go 1.24:

Cgo has gained new capabilities in Go 1.24, supporting new C function annotations to improve runtime performance. Among them, #cgo noescape cFunctionName is used to inform the compiler that the memory passed to cFunctionname will not escape; #cgo nocallback cFunctionName indicates that this C function will not call back any Go functions. In addition, Cgo's inspection of multiple incompatible declarations of C functions has become more stringent. When there are incompatible declarations in different files, errors can be detected and reported more timely and accurately.

I was using an Intel Sapphire Rapids machine on AWS for preliminary testing and benchmarking. I've precompiled StringZilla with dynamic dispatch enabled, linked to the thin GoLang binding layer:

$ ~/StringZilla/golang$ CGO_CFLAGS="-I$(pwd)/../include" \
        CGO_LDFLAGS="-L$(pwd)/../build_golang -lstringzilla_shared" \
        LD_LIBRARY_PATH="$(pwd)/../build_golang:$LD_LIBRARY_PATH" \
        go run ../scripts/bench.go  --input ../leipzig1M.txt --split lines --seed 42

... and compared to native GoLang strings on some key operations:

Benchmarking on `../leipzig1M.txt` with seed 42.
Total input length: 129644797
Total lines: 1000000
Average line length: 128.64
Running benchmark using `testing.Benchmark`.
strings.Contains              :      309           3818144 ns/op
sz.Contains                   :      664           1881251 ns/op
strings.Index                 :      325           3669081 ns/op
sz.Index                      :      624           1990093 ns/op
strings.LastIndex             :       12          85201713 ns/op
sz.LastIndex                  :      494           2306318 ns/op
strings.IndexAny              :  6321228             181.0 ns/op
sz.IndexAny                   : 10608960             112.6 ns/op
strings.Count                 :      156           8015292 ns/op
sz.Count (non-overlap)        :      285           4206698 ns/op
sz.Count (overlap)            :      284           4204370 ns/op

So if you are processing a lot of text in Go, try doing so with StringZilla and stay tuned for the upcoming 4.0 release #201 🥳

Contributors

MarkReedZ

Assets 9

26 Dec 19:44

ashvardanian

v3.11.3

a8a74ca

Release v3.11.3

Release: v3.11.3 [skip ci]

Patch

Improve: Pointer casting rules (a3f2f00)
Docs: LLVM build instruction (89be0cb)

Assets 9

19 Dec 11:30

ashvardanian

v3.11.2

0d47be2

Release v3.11.2

Release: v3.11.2 [skip ci]

Patch

Make: Using SIMD on FreeBSD (#205) (6700fcc)

Assets 9

11 Dec 14:46

ashvardanian

v3.11.1

5affd61

v3.11.1: Matching N3322 for `memcpy` UB in C2y

Release: v3.11.1 [skip ci]

Patch

Fix: Matching N3322 for memcpy UB in C2y (14ee92c)

Assets 9

01 Dec 10:11

ashvardanian

v3.11.0

152ed04

v3.11.0: Checksums in AVX-512, AVX2, NEON

🆕 sz_checksum(char const *, size_t) C 99 interface
🆕 sz::str().checksum() C++ 11 interface
🆕 sz.checksum(str) Python interface

Database and other Systems Engineers, you can now use StringZilla to dynamically dispatch different check-sum kernels for AVX2 capable Haswell+ CPUs, AVX-512BW capable Ice Lake+ CPUs, and Arm NEON CPUs on mobile. In AVX-512, masked loads are used extensively, resulting in a 10% improvement even on typical English words, averaging 5 bytes in length and 20x performance improvement compared to the serial code for longer strings.

On the technical side, on x86, the kernels use the well-known SAD(text, zeros) idiom to accumulate absolute differences between individual bytes into 64-bit words. It also uses bidirectional traversal to saturate the core, capable of performing 2 loads per CPU cycle. Moreover, on large inputs, it switches to streaming loads, separately handling the head and the tail, similar to our memcpy alternative, also outperforming LibC on AVX-512-capable machines 😎

Minor

Add: Checksums in Python (1b77de9)
Add: Checksum tests (c2b997c)
Add: Checksum kernels (a99337b)

Patch

Docs: Simpler Python doc-strings (ad5fa2c)
Fix: sz_checksum visibility (9bec0eb)
Fix: Missing _mm_cvtsi128_si64x in Clang (c8c6c7c)

Assets 9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Patch

Uh oh!

Patch

Uh oh!

Patch

Uh oh!

Patch

Uh oh!

Patch

Uh oh!

Contributors

Uh oh!

Patch

Uh oh!

Patch

Uh oh!

Patch

Uh oh!

Minor

Patch

Uh oh!

Releases: ashvardanian/StringZilla

Release v3.12.5

Patch

Uh oh!

Release v3.12.4

Patch

Uh oh!

Release v3.12.3

Patch

Uh oh!

Release v3.12.2

Patch

Uh oh!

Release v3.12.1

Patch

Uh oh!

GoLang support in StringZilla v3.12 🥳

Contributors

Uh oh!

Release v3.11.3

Patch

Uh oh!

Release v3.11.2

Patch

Uh oh!

v3.11.1: Matching N3322 for `memcpy` UB in C2y

Patch

Uh oh!

v3.11.0: Checksums in AVX-512, AVX2, NEON

Minor

Patch

Uh oh!