-
Notifications
You must be signed in to change notification settings - Fork 25.4k
Description
Description
When doing a filtered search over DiskBBQ, do the simple thing and explore more centroids until we capture the expected overall percentage of vectors.
However, this means we just explore more and more centroids, scoring more and visiting useless centroids. While we don't actually do any vector ops, its interesting to see how docID decoding and figuring out there are not matches, becomes a strangely dominate cost.
I think we can speed up highly filtered search through adding (though this may be expensive) an additional mapping from vectorOrd -> [centroid_primary, centroid_overspill]
When we detect very specific filters, such that the probability of hitting the vectors in a centroid becomes very low, we can do a first pass with that restricted filter to gather the matching centroids, and then only score those.
This should be optional as I expect it to add overhead at index and index size, though I expect the index size to not be effect way too much?
//cc @jimczi