IVFFlat indexes
IVFFlat is a type of vector index for approximate nearest neighbor search. It is a frequently used index type that can improve performance when querying highly-dimensional vectors, like those representing embeddings.
Choosing an index#
Today pgvector
supports two types of indexes:
In general we recommend using HNSW because of its performance and robustness against changing data. If you have a special use case that requires IVFFlat instead, keep reading.
Usage#
The way you create an IVFFlat index depends on the distance operator you are using. pgvector
includes 3 distance operators:
Operator | Description | Operator class |
---|---|---|
<-> | Euclidean distance | vector_l2_ops |
<#> | negative inner product | vector_ip_ops |
<=> | cosine distance | vector_cosine_ops |
Use the following SQL commands to create an IVFFlat index for the operator(s) used in your queries.
Euclidean L2 distance (vector_l2_ops
)#
_10create index on items using ivfflat (column_name vector_l2_ops) with (lists = 100);
Inner product (vector_ip_ops
)#
_10create index on items using ivfflat (column_name vector_ip_ops) with (lists = 100);
Cosine distance (vector_cosine_ops
)#
_10create index on items using ivfflat (column_name vector_cosine_ops) with (lists = 100);
Currently vectors with up to 2,000 dimensions can be indexed.
How does IVFFlat work?#
IVF stands for 'inverted file indexes'. It works by clustering your vectors in order to reduce the similarity search scope. Rather than comparing a vector to every other vector, the vector is only compared against vectors within the same cell cluster (or nearby clusters, depending on your configuration).
Inverted lists (cell clusters)#
When you create the index, you choose the number of inverted lists (cell clusters). Increase this number to speed up queries, but at the expense of recall.
For example, to create an index with 100 lists on a column that uses the cosine operator:
_10create index on items using ivfflat (column_name vector_cosine_ops) with (lists = 100);
For more info on the different operators, see Distance operations.
For every query, you can set the number of probes (1 by default). The number of probes corresponds to the number of nearby cells to probe for a match. Increase this for better recall at the expense of speed.
To set the number of probes for the duration of the session run:
_10set ivfflat.probes = 10;
To set the number of probes only for the current transaction run:
_10begin;_10set local ivfflat.probes = 10;_10select ..._10commit;
If the number of probes is the same as the number of lists, exact nearest neighbor search will be performed and the planner won't use the index.
Approximate nearest neighbor#
One important note with IVF indexes is that nearest neighbor search is approximate, since exact search on high dimensional data can't be indexed efficiently. This means that similarity results will change (slightly) after you add an index (trading recall for speed).
When should you create IVFFlat indexes?#
pgvector
recommends building IVFFlat indexes only after the table has sufficient data, so that the internal IVFFlat cell clusters are based on your data's distribution. Anytime the distribution changes significantly, consider rebuilding indexes.
Resources#
Read more about indexing on pgvector
's GitHub page.