models._utils_any2vec
– Cython utils for any2vec models¶General functions used for any2vec models.
gensim.models._utils_any2vec.
compute_ngrams
(word, unsigned int min_n, unsigned int max_n)¶Get the list of all possible ngrams for a given word.
word (str) – The word whose ngrams need to be computed.
min_n (unsigned int) – Minimum character length of the ngrams.
max_n (unsigned int) – Maximum character length of the ngrams.
Sequence of character ngrams.
list of str
gensim.models._utils_any2vec.
compute_ngrams_bytes
(word, unsigned int min_n, unsigned int max_n)¶Computes ngrams for a word.
Ported from the original FB implementation.
word (str) – A unicode string.
min_n (unsigned int) – The minimum ngram length.
max_n (unsigned int) – The maximum ngram length.
Returns –
-------- –
of str (list) – A list of ngrams, where each ngram is a list of bytes.
See also
gensim.models._utils_any2vec.
ft_hash_broken
(unicode string)¶Calculate hash based on string.
This implementation is broken, see https://github.com/RaRe-Technologies/gensim/issues/2059. It is here only for maintaining backwards compatibility with older models.
string (unicode) – The string whose hash needs to be calculated.
The hash of the string.
unsigned int
gensim.models._utils_any2vec.
ft_hash_bytes
(bytes bytez)¶Calculate hash based on bytez. Reproduce hash method from Facebook fastText implementation.
bytez (bytes) – The string whose hash needs to be calculated, encoded as UTF-8.
The hash of the string.
unsigned int