models.utils_any2vec
– Utils for any2vec models¶General functions used for any2vec models.
One of the goals of this module is to provide an abstraction over the Cython extensions for FastText. If they are not available, then the module substitutes slower Python versions in their place.
Another related set of FastText functionality is computing ngrams for a word.
The compute_ngrams()
and compute_ngrams_bytes()
hashes achieve that.
Closely related is the functionality for hashing ngrams, implemented by the
ft_hash()
and ft_hash_broken()
functions.
The module exposes “working” and “broken” hash functions in order to maintain
backwards compatibility with older versions of Gensim.
For compatibility with older Gensim, use compute_ngrams()
and
ft_hash_broken()
to has each ngram. For compatibility with the
current Facebook implementation, use compute_ngrams_bytes()
and
ft_hash_bytes()
.
gensim.models.utils_any2vec.
ft_ngram_hashes
(word, minn, maxn, num_buckets, fb_compatible=True)¶Calculate the ngrams of the word and hash them.
word (str) – The word to calculate ngram hashes for.
minn (int) – Minimum ngram length
maxn (int) – Maximum ngram length
num_buckets (int) – The number of buckets
fb_compatible (boolean, optional) – True for compatibility with the Facebook implementation. False for compatibility with the old Gensim implementation.
A list of hashes (integers), one per each detected ngram.