similarities.annoy
– Approximate Vector Search using Annoy¶
This module integrates Spotify’s Annoy (Approximate Nearest Neighbors Oh Yeah)
library with Gensim’s Word2Vec
, Doc2Vec
,
FastText
and KeyedVectors
word embeddings.
Important
To use this module, you must have the annoy
library installed.
To install it, run pip install annoy
.
- class gensim.similarities.annoy.AnnoyIndexer(model=None, num_trees=None)¶
This class allows the use of Annoy for fast (approximate) vector retrieval in most_similar() calls of
Word2Vec
,Doc2Vec
,FastText
andWord2VecKeyedVectors
models.- Parameters
model (trained model, optional) – Use vectors from this model as the source for the index.
num_trees (int, optional) – Number of trees for Annoy indexer.
Examples
>>> from gensim.similarities.annoy import AnnoyIndexer >>> from gensim.models import Word2Vec >>> >>> sentences = [['cute', 'cat', 'say', 'meow'], ['cute', 'dog', 'say', 'woof']] >>> model = Word2Vec(sentences, min_count=1, seed=1) >>> >>> indexer = AnnoyIndexer(model, 2) >>> model.most_similar("cat", topn=2, indexer=indexer) [('cat', 1.0), ('dog', 0.32011348009109497)]
- load(fname)¶
Load an AnnoyIndexer instance from disk.
- Parameters
fname (str) – The path as previously used by
save()
.
Examples
>>> from gensim.similarities.index import AnnoyIndexer >>> from gensim.models import Word2Vec >>> from tempfile import mkstemp >>> >>> sentences = [['cute', 'cat', 'say', 'meow'], ['cute', 'dog', 'say', 'woof']] >>> model = Word2Vec(sentences, min_count=1, seed=1, epochs=10) >>> >>> indexer = AnnoyIndexer(model, 2) >>> _, temp_fn = mkstemp() >>> indexer.save(temp_fn) >>> >>> new_indexer = AnnoyIndexer() >>> new_indexer.load(temp_fn) >>> new_indexer.model = model
- most_similar(vector, num_neighbors)¶
Find num_neighbors most similar items.
- Parameters
vector (numpy.array) – Vector for word/document.
num_neighbors (int) – Number of most similar items
- Returns
List of most similar items in format [(item, cosine_distance), … ]
- Return type
list of (str, float)
- save(fname, protocol=4)¶
Save AnnoyIndexer instance to disk.
- Parameters
fname (str) – Path to output. Save will produce 2 files: fname: Annoy index itself. fname.dict: Index metadata.
protocol (int, optional) – Protocol for pickle.
Notes
This method saves only the index. The trained model isn’t preserved.