gensim logo

gensim
gensim tagline

Get Expert Help From The Gensim Authors

Consulting in Machine Learning & NLP

• Commercial document similarity engine: ScaleText.ai

Corporate trainings in Python Data Science and Deep Learning

models.word2vec_inner – Cython routines for training Word2Vec models

models.word2vec_inner – Cython routines for training Word2Vec models

Optimized cython functions for training Word2Vec model.

gensim.models.word2vec_inner.init()
Precompute function sigmoid(x) = 1 / (1 + exp(-x)), for x values discretized into table EXP_TABLE.
Also calculate log(sigmoid(x)) into LOG_TABLE.
Returns:Enumeration to signify underlying data type returned by the BLAS dot product calculation. 0 signifies double, 1 signifies double, and 2 signifies that custom cython loops were used instead of BLAS.
Return type:{0, 1, 2}
gensim.models.word2vec_inner.score_sentence_cbow(model, sentence, _work, _neu1)

Obtain likelihood score for a single sentence in a fitted CBOW representation.

Notes

This scoring function is only implemented for hierarchical softmax (model.hs == 1). The model should have been trained using the skip-gram model (model.cbow == 1`).

Parameters:
  • model (Word2Vec) – The trained model. It MUST have been trained using hierarchical softmax and the CBOW algorithm.
  • sentence (list of str) – The words comprising the sentence to be scored.
  • _work (np.ndarray) – Private working memory for each worker.
  • _neu1 (np.ndarray) – Private working memory for each worker.
Returns:

The probability assigned to this sentence by the Skip-Gram model.

Return type:

float

gensim.models.word2vec_inner.score_sentence_sg(model, sentence, _work)

Obtain likelihood score for a single sentence in a fitted skip-gram representation.

Notes

This scoring function is only implemented for hierarchical softmax (model.hs == 1). The model should have been trained using the skip-gram model (model.sg == 1`).

Parameters:
  • model (Word2Vec) – The trained model. It MUST have been trained using hierarchical softmax and the skip-gram algorithm.
  • sentence (list of str) – The words comprising the sentence to be scored.
  • _work (np.ndarray) – Private working memory for each worker.
Returns:

The probability assigned to this sentence by the Skip-Gram model.

Return type:

float

gensim.models.word2vec_inner.train_batch_cbow(model, sentences, alpha, _work, _neu1, compute_loss)

Update CBOW model by training on a batch of sentences.

Called internally from train().

Parameters:
  • model (Word2Vec) – The Word2Vec model instance to train.
  • sentences (iterable of list of str) – The corpus used to train the model.
  • alpha (float) – The learning rate.
  • _work (np.ndarray) – Private working memory for each worker.
  • _neu1 (np.ndarray) – Private working memory for each worker.
  • compute_loss (bool) – Whether or not the training loss should be computed in this batch.
Returns:

Number of words in the vocabulary actually used for training (They already existed in the vocabulary and were not discarded by negative sampling).

Return type:

int

gensim.models.word2vec_inner.train_batch_sg(model, sentences, alpha, _work, compute_loss)

Update skip-gram model by training on a batch of sentences.

Called internally from train().

Parameters:
  • model (Word2Vec) – The Word2Vec model instance to train.
  • sentences (iterable of list of str) – The corpus used to train the model.
  • alpha (float) – The learning rate
  • _work (np.ndarray) – Private working memory for each worker.
  • compute_loss (bool) – Whether or not the training loss should be computed in this batch.
Returns:

Number of words in the vocabulary actually used for training (They already existed in the vocabulary and were not discarded by negative sampling).

Return type:

int