models.fasttext_inner – Cython routines for training FastText models

`models.fasttext_inner` – Cython routines for training FastText models¶

Optimized Cython functions for training a FastText model.

The main entry points are train_batch_sg() and train_batch_cbow(). They may be called directly from Python code.

Notes

The implementation of the above functions heavily depends on the FastTextConfig struct defined in gensim/models/fasttext_inner.pxd.

The FAST_VERSION constant determines what flavor of BLAS we’re currently using:

0: double 1: float 2: no BLAS, use Cython loops instead

See also

gensim.models.fasttext_inner.init()¶

Precompute function sigmoid(x) = 1 / (1 + exp(-x)), for x values discretized into table EXP_TABLE. Also calculate log(sigmoid(x)) into LOG_TABLE.

Returns: Enumeration to signify underlying data type returned by the BLAS dot product calculation. 0 signifies double, 1 signifies double, and 2 signifies that custom cython loops were used instead of BLAS.
Return type: {0, 1, 2}

gensim.models.fasttext_inner.train_batch_cbow(model, sentences, alpha, _work, _neu1)¶

Update the CBOW model by training on a sequence of sentences.

Each sentence is a list of string tokens, which are looked up in the model’s vocab dictionary. Called internally from train().

Parameters

model (FastText) – Model to be trained.
sentences (iterable of list of str) – A single batch: part of the corpus streamed directly from disk/network.
alpha (float) – Learning rate.
_work (np.ndarray) – Private working memory for each worker.
_neu1 (np.ndarray) – Private working memory for each worker.

Returns

Effective number of words trained.

Return type

int

gensim.models.fasttext_inner.train_batch_sg(model, sentences, alpha, _work, _l1)¶

Update skip-gram model by training on a sequence of sentences.

Each sentence is a list of string tokens, which are looked up in the model’s vocab dictionary. Called internally from train().

Parameters

model (FastText) – Model to be trained.
sentences (iterable of list of str) – A single batch: part of the corpus streamed directly from disk/network.
alpha (float) – Learning rate.
_work (np.ndarray) – Private working memory for each worker.
_l1 (np.ndarray) – Private working memory for each worker.

Returns

Effective number of words trained.

Return type

int

models.fasttext_inner – Cython routines for training FastText models¶