models.doc2vec_inner – Cython routines for training Doc2Vec models

`models.doc2vec_inner` – Cython routines for training Doc2Vec models¶

Optimized cython functions for training Doc2Vec model.

gensim.models.doc2vec_inner.train_document_dbow(model, doc_words, doctag_indexes, alpha, work=None, train_words=False, learn_doctags=True, learn_words=True, learn_hidden=True, word_vectors=None, word_locks=None, doctag_vectors=None, doctag_locks=None)¶

Update distributed bag of words model (“PV-DBOW”) by training on a single document.

Called internally from train() and infer_vector().

Parameters

model (Doc2Vec) – The model to train.
doc_words (list of str) – The input document as a list of words to be used for training. Each word will be looked up in the model’s vocabulary.
doctag_indexes (list of int) – Indices into doctag_vectors used to obtain the tags of the document.
alpha (float) – Learning rate.
work (list of float, optional) – Updates to be performed on each neuron in the hidden layer of the underlying network.
train_words (bool, optional) – Word vectors will be updated exactly as per Word2Vec skip-gram training only if both learn_words and train_words are set to True.
learn_doctags (bool, optional) – Whether the tag vectors should be updated.
learn_words (bool, optional) – Word vectors will be updated exactly as per Word2Vec skip-gram training only if both learn_words and train_words are set to True.
learn_hidden (bool, optional) – Whether or not the weights of the hidden layer will be updated.
word_vectors (numpy.ndarray, optional) – The vector representation for each word in the vocabulary. If None, these will be retrieved from the model.
word_locks (numpy.ndarray, optional) – A learning lock factor for each weight in the hidden layer for words, value 0 completely blocks updates, a value of 1 allows to update word-vectors.
doctag_vectors (numpy.ndarray, optional) – Vector representations of the tags. If None, these will be retrieved from the model.
doctag_locks (numpy.ndarray, optional) – The lock factors for each tag, same as word_locks, but for document-vectors.

Returns

Number of words in the input document that were actually used for training.

Return type

int

gensim.models.doc2vec_inner.train_document_dm(model, doc_words, doctag_indexes, alpha, work=None, neu1=None, learn_doctags=True, learn_words=True, learn_hidden=True, word_vectors=None, word_locks=None, doctag_vectors=None, doctag_locks=None)¶

Update distributed memory model (“PV-DM”) by training on a single document. This method implements the DM model with a projection (input) layer that is either the sum or mean of the context vectors, depending on the model’s dm_mean configuration field.

Called internally from train() and infer_vector().

Parameters

model (Doc2Vec) – The model to train.
doc_words (list of str) – The input document as a list of words to be used for training. Each word will be looked up in the model’s vocabulary.
doctag_indexes (list of int) – Indices into doctag_vectors used to obtain the tags of the document.
alpha (float) – Learning rate.
work (np.ndarray, optional) – Private working memory for each worker.
neu1 (np.ndarray, optional) – Private working memory for each worker.
learn_doctags (bool, optional) – Whether the tag vectors should be updated.
learn_words (bool, optional) – Word vectors will be updated exactly as per Word2Vec skip-gram training only if both learn_words and train_words are set to True.
learn_hidden (bool, optional) – Whether or not the weights of the hidden layer will be updated.
word_vectors (numpy.ndarray, optional) – The vector representation for each word in the vocabulary. If None, these will be retrieved from the model.
word_locks (numpy.ndarray, optional) – A learning lock factor for each weight in the hidden layer for words, value 0 completely blocks updates, a value of 1 allows to update word-vectors.
doctag_vectors (numpy.ndarray, optional) – Vector representations of the tags. If None, these will be retrieved from the model.
doctag_locks (numpy.ndarray, optional) – The lock factors for each tag, same as word_locks, but for document-vectors.

Returns

Number of words in the input document that were actually used for training.

Return type

int

gensim.models.doc2vec_inner.train_document_dm_concat(model, doc_words, doctag_indexes, alpha, work=None, neu1=None, learn_doctags=True, learn_words=True, learn_hidden=True, word_vectors=None, word_locks=None, doctag_vectors=None, doctag_locks=None)¶

Update distributed memory model (“PV-DM”) by training on a single document, using a concatenation of the context: window word vectors (rather than a sum or average). This might be slower since the input at each batch will be significantly larger.

Called internally from train() and infer_vector().

Parameters

model (Doc2Vec) – The model to train.
doc_words (list of str) – The input document as a list of words to be used for training. Each word will be looked up in the model’s vocabulary.
doctag_indexes (list of int) – Indices into doctag_vectors used to obtain the tags of the document.
alpha (float, optional) – Learning rate.
work (np.ndarray, optional) – Private working memory for each worker.
neu1 (np.ndarray, optional) – Private working memory for each worker.
learn_doctags (bool, optional) – Whether the tag vectors should be updated.
learn_words (bool, optional) – Word vectors will be updated exactly as per Word2Vec skip-gram training only if both learn_words and train_words are set to True.
learn_hidden (bool, optional) – Whether or not the weights of the hidden layer will be updated.
word_vectors (numpy.ndarray, optional) – The vector representation for each word in the vocabulary. If None, these will be retrieved from the model.
word_locks (numpy.ndarray, optional) – A learning lock factor for each weight in the hidden layer for words, value 0 completely blocks updates, a value of 1 allows to update word-vectors.
doctag_vectors (numpy.ndarray, optional) – Vector representations of the tags. If None, these will be retrieved from the model.
doctag_locks (numpy.ndarray, optional) – The lock factors for each tag, same as word_locks, but for document-vectors.

Returns

Number of words in the input document that were actually used for training.

Return type

int

Get Expert Help From The Gensim Authors

models.doc2vec_inner – Cython routines for training Doc2Vec models¶

`models.doc2vec_inner` – Cython routines for training Doc2Vec models¶