models.word2vec_inner
– Cython routines for training Word2Vec models¶Optimized cython functions for training Word2Vec
model.
gensim.models.word2vec_inner.
init
()¶Also calculate log(sigmoid(x)) into LOG_TABLE.
Enumeration to signify underlying data type returned by the BLAS dot product calculation. 0 signifies double, 1 signifies double, and 2 signifies that custom cython loops were used instead of BLAS.
{0, 1, 2}
gensim.models.word2vec_inner.
score_sentence_cbow
(model, sentence, _work, _neu1)¶Obtain likelihood score for a single sentence in a fitted CBOW representation.
Notes
This scoring function is only implemented for hierarchical softmax (model.hs == 1). The model should have been trained using the skip-gram model (model.cbow == 1`).
model (Word2Vec
) – The trained model. It MUST have been trained using hierarchical softmax and the CBOW algorithm.
sentence (list of str) – The words comprising the sentence to be scored.
_work (np.ndarray) – Private working memory for each worker.
_neu1 (np.ndarray) – Private working memory for each worker.
The probability assigned to this sentence by the Skip-Gram model.
float
gensim.models.word2vec_inner.
score_sentence_sg
(model, sentence, _work)¶Obtain likelihood score for a single sentence in a fitted skip-gram representation.
Notes
This scoring function is only implemented for hierarchical softmax (model.hs == 1). The model should have been trained using the skip-gram model (model.sg == 1`).
model (Word2Vec
) – The trained model. It MUST have been trained using hierarchical softmax and the skip-gram algorithm.
sentence (list of str) – The words comprising the sentence to be scored.
_work (np.ndarray) – Private working memory for each worker.
The probability assigned to this sentence by the Skip-Gram model.
float
gensim.models.word2vec_inner.
train_batch_cbow
(model, sentences, alpha, _work, _neu1, compute_loss)¶Update CBOW model by training on a batch of sentences.
Called internally from train()
.
model (Word2Vec
) – The Word2Vec model instance to train.
sentences (iterable of list of str) – The corpus used to train the model.
alpha (float) – The learning rate.
_work (np.ndarray) – Private working memory for each worker.
_neu1 (np.ndarray) – Private working memory for each worker.
compute_loss (bool) – Whether or not the training loss should be computed in this batch.
Number of words in the vocabulary actually used for training (They already existed in the vocabulary and were not discarded by negative sampling).
int
gensim.models.word2vec_inner.
train_batch_sg
(model, sentences, alpha, _work, compute_loss)¶Update skip-gram model by training on a batch of sentences.
Called internally from train()
.
model (Word2Vec
) – The Word2Vec model instance to train.
sentences (iterable of list of str) – The corpus used to train the model.
alpha (float) – The learning rate
_work (np.ndarray) – Private working memory for each worker.
compute_loss (bool) – Whether or not the training loss should be computed in this batch.
Number of words in the vocabulary actually used for training (They already existed in the vocabulary and were not discarded by negative sampling).
int