models.doc2vec_inner
– Cython routines for training Doc2Vec models¶
Optimized cython functions for training Doc2Vec
model.
- gensim.models.doc2vec_inner.train_document_dbow(model, doc_words, doctag_indexes, alpha, work=None, train_words=False, learn_doctags=True, learn_words=True, learn_hidden=True, word_vectors=None, words_lockf=None, doctag_vectors=None, doctags_lockf=None)¶
Update distributed bag of words model (“PV-DBOW”) by training on a single document.
Called internally from
train()
andinfer_vector()
.- Parameters
model (
Doc2Vec
) – The model to train.doc_words (list of str) – The input document as a list of words to be used for training. Each word will be looked up in the model’s vocabulary.
doctag_indexes (list of int) – Indices into doctag_vectors used to obtain the tags of the document.
alpha (float) – Learning rate.
work (list of float, optional) – Updates to be performed on each neuron in the hidden layer of the underlying network.
train_words (bool, optional) – Word vectors will be updated exactly as per Word2Vec skip-gram training only if both learn_words and train_words are set to True.
learn_doctags (bool, optional) – Whether the tag vectors should be updated.
learn_words (bool, optional) – Word vectors will be updated exactly as per Word2Vec skip-gram training only if both learn_words and train_words are set to True.
learn_hidden (bool, optional) – Whether or not the weights of the hidden layer will be updated.
word_vectors (numpy.ndarray, optional) – The vector representation for each word in the vocabulary. If None, these will be retrieved from the model.
words_lockf (numpy.ndarray, optional) – EXPERIMENTAL. A learning lock factor for each word-vector; value 0.0 completely blocks updates, a value of 1.0 allows normal updates to word-vectors.
doctag_vectors (numpy.ndarray, optional) – Vector representations of the tags. If None, these will be retrieved from the model.
doctags_lockf (numpy.ndarray, optional) – EXPERIMENTAL. The lock factors for each tag, same as words_lockf, but for document-vectors.
- Returns
Number of words in the input document that were actually used for training.
- Return type
int
- gensim.models.doc2vec_inner.train_document_dm(model, doc_words, doctag_indexes, alpha, work=None, neu1=None, learn_doctags=True, learn_words=True, learn_hidden=True, word_vectors=None, words_lockf=None, doctag_vectors=None, doctags_lockf=None)¶
Update distributed memory model (“PV-DM”) by training on a single document. This method implements the DM model with a projection (input) layer that is either the sum or mean of the context vectors, depending on the model’s dm_mean configuration field.
Called internally from
train()
andinfer_vector()
.- Parameters
model (
Doc2Vec
) – The model to train.doc_words (list of str) – The input document as a list of words to be used for training. Each word will be looked up in the model’s vocabulary.
doctag_indexes (list of int) – Indices into doctag_vectors used to obtain the tags of the document.
alpha (float) – Learning rate.
work (np.ndarray, optional) – Private working memory for each worker.
neu1 (np.ndarray, optional) – Private working memory for each worker.
learn_doctags (bool, optional) – Whether the tag vectors should be updated.
learn_words (bool, optional) – Word vectors will be updated exactly as per Word2Vec skip-gram training only if both learn_words and train_words are set to True.
learn_hidden (bool, optional) – Whether or not the weights of the hidden layer will be updated.
word_vectors (numpy.ndarray, optional) – The vector representation for each word in the vocabulary. If None, these will be retrieved from the model.
words_lockf (numpy.ndarray, optional) – EXPERIMENTAL. A learning lock factor for each word-vector; value 0.0 completely blocks updates, a value of 1.0 allows normal updates to word-vectors.
doctag_vectors (numpy.ndarray, optional) – Vector representations of the tags. If None, these will be retrieved from the model.
doctags_lockf (numpy.ndarray, optional) – EXPERIMENTAL. The lock factors for each tag, same as words_lockf, but for document-vectors.
- Returns
Number of words in the input document that were actually used for training.
- Return type
int
- gensim.models.doc2vec_inner.train_document_dm_concat(model, doc_words, doctag_indexes, alpha, work=None, neu1=None, learn_doctags=True, learn_words=True, learn_hidden=True, word_vectors=None, words_lockf=None, doctag_vectors=None, doctags_lockf=None)¶
- Update distributed memory model (“PV-DM”) by training on a single document, using a concatenation of the
context window word vectors (rather than a sum or average). This will be slower since the input at each batch will be significantly larger.
Called internally from
train()
andinfer_vector()
.- Parameters
model (
Doc2Vec
) – The model to train.doc_words (list of str) – The input document as a list of words to be used for training. Each word will be looked up in the model’s vocabulary.
doctag_indexes (list of int) – Indices into doctag_vectors used to obtain the tags of the document.
alpha (float, optional) – Learning rate.
work (np.ndarray, optional) – Private working memory for each worker.
neu1 (np.ndarray, optional) – Private working memory for each worker.
learn_doctags (bool, optional) – Whether the tag vectors should be updated.
learn_words (bool, optional) – Word vectors will be updated exactly as per Word2Vec skip-gram training only if both learn_words and train_words are set to True.
learn_hidden (bool, optional) – Whether or not the weights of the hidden layer will be updated.
word_vectors (numpy.ndarray, optional) – The vector representation for each word in the vocabulary. If None, these will be retrieved from the model.
words_lockf (numpy.ndarray, optional) – EXPERIMENTAL. A learning lock factor for each word-vector, value 0.0 completely blocks updates, a value of 1.0 allows normal updates to word-vectors.
doctag_vectors (numpy.ndarray, optional) – Vector representations of the tags. If None, these will be retrieved from the model.
doctags_lockf (numpy.ndarray, optional) – EXPERIMENTAL. The lock factors for each tag, same as words_lockf, but for document-vectors.
- Returns
Number of words in the input document that were actually used for training.
- Return type
int