models.fasttext_inner
– Cython routines for training FastText models¶Optimized Cython functions for training a FastText
model.
The main entry points are train_batch_sg()
and train_batch_cbow()
. They may be
called directly from Python code.
Notes
The implementation of the above functions heavily depends on the
FastTextConfig struct defined in gensim/models/fasttext_inner.pxd
.
The FAST_VERSION constant determines what flavor of BLAS we’re currently using:
0: double 1: float 2: no BLAS, use Cython loops instead
See also
gensim.models.fasttext_inner.
init
()¶Precompute function sigmoid(x) = 1 / (1 + exp(-x)), for x values discretized into table EXP_TABLE. Also calculate log(sigmoid(x)) into LOG_TABLE.
Enumeration to signify underlying data type returned by the BLAS dot product calculation. 0 signifies double, 1 signifies double, and 2 signifies that custom cython loops were used instead of BLAS.
{0, 1, 2}
gensim.models.fasttext_inner.
train_batch_cbow
(model, sentences, alpha, _work, _neu1)¶Update the CBOW model by training on a sequence of sentences.
Each sentence is a list of string tokens, which are looked up in the model’s
vocab dictionary. Called internally from train()
.
model (FastText
) – Model to be trained.
sentences (iterable of list of str) – A single batch: part of the corpus streamed directly from disk/network.
alpha (float) – Learning rate.
_work (np.ndarray) – Private working memory for each worker.
_neu1 (np.ndarray) – Private working memory for each worker.
Effective number of words trained.
int
gensim.models.fasttext_inner.
train_batch_sg
(model, sentences, alpha, _work, _l1)¶Update skip-gram model by training on a sequence of sentences.
Each sentence is a list of string tokens, which are looked up in the model’s
vocab dictionary. Called internally from train()
.
model (FastText
) – Model to be trained.
sentences (iterable of list of str) – A single batch: part of the corpus streamed directly from disk/network.
alpha (float) – Learning rate.
_work (np.ndarray) – Private working memory for each worker.
_l1 (np.ndarray) – Private working memory for each worker.
Effective number of words trained.
int