similarities.annoy – Approximate Vector Search using Annoy¶
This module integrates Spotify’s Annoy (Approximate Nearest Neighbors Oh Yeah)
library with Gensim’s Word2Vec, Doc2Vec,
FastText and KeyedVectors word embeddings.
Important
To use this module, you must have the annoy library installed.
To install it, run pip install annoy.
- class gensim.similarities.annoy.AnnoyIndexer(model=None, num_trees=None)¶
This class allows the use of Annoy for fast (approximate) vector retrieval in most_similar() calls of
Word2Vec,Doc2Vec,FastTextandWord2VecKeyedVectorsmodels.- Parameters
model (trained model, optional) – Use vectors from this model as the source for the index.
num_trees (int, optional) – Number of trees for Annoy indexer.
Examples
>>> from gensim.similarities.annoy import AnnoyIndexer >>> from gensim.models import Word2Vec >>> >>> sentences = [['cute', 'cat', 'say', 'meow'], ['cute', 'dog', 'say', 'woof']] >>> model = Word2Vec(sentences, min_count=1, seed=1) >>> >>> indexer = AnnoyIndexer(model, 2) >>> model.most_similar("cat", topn=2, indexer=indexer) [('cat', 1.0), ('dog', 0.32011348009109497)]
- load(fname)¶
Load an AnnoyIndexer instance from disk.
- Parameters
fname (str) – The path as previously used by
save().
Examples
>>> from gensim.similarities.index import AnnoyIndexer >>> from gensim.models import Word2Vec >>> from tempfile import mkstemp >>> >>> sentences = [['cute', 'cat', 'say', 'meow'], ['cute', 'dog', 'say', 'woof']] >>> model = Word2Vec(sentences, min_count=1, seed=1, epochs=10) >>> >>> indexer = AnnoyIndexer(model, 2) >>> _, temp_fn = mkstemp() >>> indexer.save(temp_fn) >>> >>> new_indexer = AnnoyIndexer() >>> new_indexer.load(temp_fn) >>> new_indexer.model = model
- most_similar(vector, num_neighbors)¶
Find num_neighbors most similar items.
- Parameters
vector (numpy.array) – Vector for word/document.
num_neighbors (int) – Number of most similar items
- Returns
List of most similar items in format [(item, cosine_distance), … ]
- Return type
list of (str, float)
- save(fname, protocol=4)¶
Save AnnoyIndexer instance to disk.
- Parameters
fname (str) – Path to output. Save will produce 2 files: fname: Annoy index itself. fname.dict: Index metadata.
protocol (int, optional) – Protocol for pickle.
Notes
This method saves only the index. The trained model isn’t preserved.
