gensim logo

gensim
gensim tagline

Get Expert Help From The Gensim Authors

Consulting in Machine Learning & NLP

Corporate trainings in Data Science, NLP and Deep Learning

models.fasttext_inner – Cython routines for training FastText models

models.fasttext_inner – Cython routines for training FastText models

Optimized Cython functions for training a FastText model.

The main entry points are train_batch_sg() and train_batch_cbow(). They may be called directly from Python code.

Notes

The implementation of the above functions heavily depends on the FastTextConfig struct defined in gensim/models/fasttext_inner.pxd.

The FAST_VERSION constant determines what flavor of BLAS we’re currently using:

0: double 1: float 2: no BLAS, use Cython loops instead

gensim.models.fasttext_inner.init()

Precompute function sigmoid(x) = 1 / (1 + exp(-x)), for x values discretized into table EXP_TABLE. Also calculate log(sigmoid(x)) into LOG_TABLE.

Returns

Enumeration to signify underlying data type returned by the BLAS dot product calculation. 0 signifies double, 1 signifies double, and 2 signifies that custom cython loops were used instead of BLAS.

Return type

{0, 1, 2}

gensim.models.fasttext_inner.train_batch_cbow(model, sentences, alpha, _work, _neu1)

Update the CBOW model by training on a sequence of sentences.

Each sentence is a list of string tokens, which are looked up in the model’s vocab dictionary. Called internally from train().

Parameters
  • model (FastText) – Model to be trained.

  • sentences (iterable of list of str) – A single batch: part of the corpus streamed directly from disk/network.

  • alpha (float) – Learning rate.

  • _work (np.ndarray) – Private working memory for each worker.

  • _neu1 (np.ndarray) – Private working memory for each worker.

Returns

Effective number of words trained.

Return type

int

gensim.models.fasttext_inner.train_batch_sg(model, sentences, alpha, _work, _l1)

Update skip-gram model by training on a sequence of sentences.

Each sentence is a list of string tokens, which are looked up in the model’s vocab dictionary. Called internally from train().

Parameters
  • model (FastText) – Model to be trained.

  • sentences (iterable of list of str) – A single batch: part of the corpus streamed directly from disk/network.

  • alpha (float) – Learning rate.

  • _work (np.ndarray) – Private working memory for each worker.

  • _l1 (np.ndarray) – Private working memory for each worker.

Returns

Effective number of words trained.

Return type

int