Skip to content

Improve DiskBBQ filtered search through filtering centroids #132933

@benwtrent

Description

@benwtrent

Description

When doing a filtered search over DiskBBQ, do the simple thing and explore more centroids until we capture the expected overall percentage of vectors.

However, this means we just explore more and more centroids, scoring more and visiting useless centroids. While we don't actually do any vector ops, its interesting to see how docID decoding and figuring out there are not matches, becomes a strangely dominate cost.

I think we can speed up highly filtered search through adding (though this may be expensive) an additional mapping from vectorOrd -> [centroid_primary, centroid_overspill]

When we detect very specific filters, such that the probability of hitting the vectors in a centroid becomes very low, we can do a first pass with that restricted filter to gather the matching centroids, and then only score those.

This should be optional as I expect it to add overhead at index and index size, though I expect the index size to not be effect way too much?

//cc @jimczi

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions