models.doc2vec_inner
– Cython routines for training Doc2Vec models¶Optimized cython functions for training Doc2Vec
model.
gensim.models.doc2vec_inner.
train_document_dbow
(model, doc_words, doctag_indexes, alpha, work=None, train_words=False, learn_doctags=True, learn_words=True, learn_hidden=True, word_vectors=None, word_locks=None, doctag_vectors=None, doctag_locks=None)¶Update distributed bag of words model (“PV-DBOW”) by training on a single document.
Called internally from train()
and
infer_vector()
.
model (Doc2Vec
) – The model to train.
doc_words (list of str) – The input document as a list of words to be used for training. Each word will be looked up in the model’s vocabulary.
doctag_indexes (list of int) – Indices into doctag_vectors used to obtain the tags of the document.
alpha (float) – Learning rate.
work (list of float, optional) – Updates to be performed on each neuron in the hidden layer of the underlying network.
train_words (bool, optional) – Word vectors will be updated exactly as per Word2Vec skip-gram training only if both learn_words and train_words are set to True.
learn_doctags (bool, optional) – Whether the tag vectors should be updated.
learn_words (bool, optional) – Word vectors will be updated exactly as per Word2Vec skip-gram training only if both learn_words and train_words are set to True.
learn_hidden (bool, optional) – Whether or not the weights of the hidden layer will be updated.
word_vectors (numpy.ndarray, optional) – The vector representation for each word in the vocabulary. If None, these will be retrieved from the model.
word_locks (numpy.ndarray, optional) – A learning lock factor for each weight in the hidden layer for words, value 0 completely blocks updates, a value of 1 allows to update word-vectors.
doctag_vectors (numpy.ndarray, optional) – Vector representations of the tags. If None, these will be retrieved from the model.
doctag_locks (numpy.ndarray, optional) – The lock factors for each tag, same as word_locks, but for document-vectors.
Number of words in the input document that were actually used for training.
int
gensim.models.doc2vec_inner.
train_document_dm
(model, doc_words, doctag_indexes, alpha, work=None, neu1=None, learn_doctags=True, learn_words=True, learn_hidden=True, word_vectors=None, word_locks=None, doctag_vectors=None, doctag_locks=None)¶Update distributed memory model (“PV-DM”) by training on a single document. This method implements the DM model with a projection (input) layer that is either the sum or mean of the context vectors, depending on the model’s dm_mean configuration field.
Called internally from train()
and
infer_vector()
.
model (Doc2Vec
) – The model to train.
doc_words (list of str) – The input document as a list of words to be used for training. Each word will be looked up in the model’s vocabulary.
doctag_indexes (list of int) – Indices into doctag_vectors used to obtain the tags of the document.
alpha (float) – Learning rate.
work (np.ndarray, optional) – Private working memory for each worker.
neu1 (np.ndarray, optional) – Private working memory for each worker.
learn_doctags (bool, optional) – Whether the tag vectors should be updated.
learn_words (bool, optional) – Word vectors will be updated exactly as per Word2Vec skip-gram training only if both learn_words and train_words are set to True.
learn_hidden (bool, optional) – Whether or not the weights of the hidden layer will be updated.
word_vectors (numpy.ndarray, optional) – The vector representation for each word in the vocabulary. If None, these will be retrieved from the model.
word_locks (numpy.ndarray, optional) – A learning lock factor for each weight in the hidden layer for words, value 0 completely blocks updates, a value of 1 allows to update word-vectors.
doctag_vectors (numpy.ndarray, optional) – Vector representations of the tags. If None, these will be retrieved from the model.
doctag_locks (numpy.ndarray, optional) – The lock factors for each tag, same as word_locks, but for document-vectors.
Number of words in the input document that were actually used for training.
int
gensim.models.doc2vec_inner.
train_document_dm_concat
(model, doc_words, doctag_indexes, alpha, work=None, neu1=None, learn_doctags=True, learn_words=True, learn_hidden=True, word_vectors=None, word_locks=None, doctag_vectors=None, doctag_locks=None)¶window word vectors (rather than a sum or average). This might be slower since the input at each batch will be significantly larger.
Called internally from train()
and
infer_vector()
.
model (Doc2Vec
) – The model to train.
doc_words (list of str) – The input document as a list of words to be used for training. Each word will be looked up in the model’s vocabulary.
doctag_indexes (list of int) – Indices into doctag_vectors used to obtain the tags of the document.
alpha (float, optional) – Learning rate.
work (np.ndarray, optional) – Private working memory for each worker.
neu1 (np.ndarray, optional) – Private working memory for each worker.
learn_doctags (bool, optional) – Whether the tag vectors should be updated.
learn_words (bool, optional) – Word vectors will be updated exactly as per Word2Vec skip-gram training only if both learn_words and train_words are set to True.
learn_hidden (bool, optional) – Whether or not the weights of the hidden layer will be updated.
word_vectors (numpy.ndarray, optional) – The vector representation for each word in the vocabulary. If None, these will be retrieved from the model.
word_locks (numpy.ndarray, optional) – A learning lock factor for each weight in the hidden layer for words, value 0 completely blocks updates, a value of 1 allows to update word-vectors.
doctag_vectors (numpy.ndarray, optional) – Vector representations of the tags. If None, these will be retrieved from the model.
doctag_locks (numpy.ndarray, optional) – The lock factors for each tag, same as word_locks, but for document-vectors.
Number of words in the input document that were actually used for training.
int