gensim logo

gensim
gensim tagline

Get Expert Help From The Gensim Authors

Consulting in Machine Learning & NLP

Corporate trainings in Data Science, NLP and Deep Learning

models.poincare – Train and use Poincare embeddings

models.poincare – Train and use Poincare embeddings

Python implementation of Poincaré Embeddings.

These embeddings are better at capturing latent hierarchical information than traditional Euclidean embeddings. The method is described in detail in Maximilian Nickel, Douwe Kiela - “Poincaré Embeddings for Learning Hierarchical Representations”.

The main use-case is to automatically learn hierarchical representations of nodes from a tree-like structure, such as a Directed Acyclic Graph (DAG), using a transitive closure of the relations. Representations of nodes in a symmetric graph can also be learned.

This module allows training Poincaré Embeddings from a training file containing relations of graph in a csv-like format, or from a Python iterable of relations.

Examples

Initialize and train a model from a list

>>> from gensim.models.poincare import PoincareModel
>>> relations = [('kangaroo', 'marsupial'), ('kangaroo', 'mammal'), ('gib', 'cat')]
>>> model = PoincareModel(relations, negative=2)
>>> model.train(epochs=50)

Initialize and train a model from a file containing one relation per line

>>> from gensim.models.poincare import PoincareModel, PoincareRelations
>>> from gensim.test.utils import datapath
>>> file_path = datapath('poincare_hypernyms.tsv')
>>> model = PoincareModel(PoincareRelations(file_path), negative=2)
>>> model.train(epochs=50)
class gensim.models.poincare.LexicalEntailmentEvaluation(filepath)

Bases: object

Evaluate reconstruction on given network for any embedding.

Initialize evaluation instance with HyperLex text file containing relation pairs.

Parameters

filepath (str) – Path to HyperLex text file.

static create_vocab_trie(embedding)

Create trie with vocab terms of the given embedding to enable quick prefix searches.

Parameters

embedding (PoincareKeyedVectors) – Embedding for which trie is to be created.

Returns

Trie containing vocab terms of the input embedding.

Return type

pygtrie.Trie

evaluate_spearman(embedding)

Evaluate spearman scores for lexical entailment for given embedding.

Parameters

embedding (PoincareKeyedVectors) – Embedding for which evaluation is to be done.

Returns

Spearman correlation score for the task for input embedding.

Return type

float

static find_matching_terms(trie, word)

Find terms in the trie beginning with the word.

Parameters
  • trie (pygtrie.Trie) – Trie to use for finding matching terms.

  • word (str) – Input word to use for prefix search.

Returns

List of matching terms.

Return type

list of str

score_function(embedding, trie, term_1, term_2)

Compute predicted score - extent to which term_1 is a type of term_2.

Parameters
  • embedding (PoincareKeyedVectors) – Embedding to use for computing predicted score.

  • trie (pygtrie.Trie) – Trie to use for finding matching vocab terms for input terms.

  • term_1 (str) – Input term.

  • term_2 (str) – Input term.

Returns

Predicted score (the extent to which term_1 is a type of term_2).

Return type

float

class gensim.models.poincare.LinkPredictionEvaluation(train_path, test_path, embedding)

Bases: object

Evaluate reconstruction on given network for given embedding.

Initialize evaluation instance with tsv file containing relation pairs and embedding to be evaluated.

Parameters
  • train_path (str) – Path to tsv file containing relation pairs used for training.

  • test_path (str) – Path to tsv file containing relation pairs to evaluate.

  • embedding (PoincareKeyedVectors) – Embedding to be evaluated.

evaluate(max_n=None)

Evaluate all defined metrics for the link prediction task.

Parameters

max_n (int, optional) – Maximum number of positive relations to evaluate, all if max_n is None.

Returns

(metric_name, metric_value) pairs, e.g. {‘mean_rank’: 50.3, ‘MAP’: 0.31}.

Return type

dict of (str, float)

evaluate_mean_rank_and_map(max_n=None)

Evaluate mean rank and MAP for link prediction.

Parameters

max_n (int, optional) – Maximum number of positive relations to evaluate, all if max_n is None.

Returns

(mean_rank, MAP), e.g (50.3, 0.31).

Return type

tuple (float, float)

static get_unknown_relation_ranks_and_avg_prec(all_distances, unknown_relations, known_relations)

Compute ranks and Average Precision of unknown positive relations.

Parameters
  • all_distances (numpy.array of float) – Array of all distances for a specific item.

  • unknown_relations (list of int) – List of indices of unknown positive relations.

  • known_relations (list of int) – List of indices of known positive relations.

Returns

The list contains ranks of positive relations in the same order as positive_relations. The float is the Average Precision of the ranking, e.g. ([1, 2, 3, 20], 0.610).

Return type

tuple (list of int, float)

class gensim.models.poincare.NegativesBuffer(items)

Bases: object

Buffer and return negative samples.

Initialize instance from list or numpy array of samples.

Parameters

items (list/numpy.array) – List or array containing negative samples.

get_items(num_items)

Get the next num_items from buffer.

Parameters

num_items (int) – Number of items to fetch.

Returns

Slice containing num_items items from the original data.

Return type

numpy.array or list

Notes

No error is raised if less than num_items items are remaining, simply all the remaining items are returned.

num_items()

Get the number of items remaining in the buffer.

Returns

Number of items in the buffer that haven’t been consumed yet.

Return type

int

class gensim.models.poincare.PoincareBatch(vectors_u, vectors_v, indices_u, indices_v, regularization_coeff=1.0)

Bases: object

Compute Poincare distances, gradients and loss for a training batch.

Store intermediate state to avoid recomputing multiple times.

Initialize instance with sets of vectors for which distances are to be computed.

Parameters
  • vectors_u (numpy.array) – Vectors of all nodes u in the batch. Expected shape (batch_size, dim).

  • vectors_v (numpy.array) – Vectors of all positively related nodes v and negatively sampled nodes v’, for each node u in the batch. Expected shape (1 + neg_size, dim, batch_size).

  • indices_u (list of int) – List of node indices for each of the vectors in vectors_u.

  • indices_v (list of lists of int) – Nested list of lists, each of which is a list of node indices for each of the vectors in vectors_v for a specific node u.

  • regularization_coeff (float, optional) – Coefficient to use for l2-regularization

compute_all()

Convenience method to perform all computations.

compute_distance_gradients()

Compute and store partial derivatives of poincare distance d(u, v) w.r.t all u and all v.

compute_distances()

Compute and store norms, euclidean distances and poincare distances between input vectors.

compute_gradients()

Compute and store gradients of loss function for all input vectors.

compute_loss()

Compute and store loss value for the given batch of examples.

class gensim.models.poincare.PoincareKeyedVectors(vector_size)

Bases: gensim.models.keyedvectors.BaseKeyedVectors

Vectors and vocab for the PoincareModel training class.

Used to perform operations on the vectors such as vector lookup, distance calculations etc.

__contains__(entity)
__getitem__(entities)

Get vector representation of entities.

Parameters

entities ({str, list of str}) – Input entity/entities.

Returns

Vector representation for entities (1D if entities is string, otherwise - 2D).

Return type

numpy.ndarray

add(entities, weights, replace=False)

Append entities and theirs vectors in a manual way. If some entity is already in the vocabulary, the old vector is kept unless replace flag is True.

Parameters
  • entities (list of str) – Entities specified by string ids.

  • weights (list of numpy.ndarray or numpy.ndarray) – List of 1D np.array vectors or a 2D np.array of vectors.

  • replace (bool, optional) – Flag indicating whether to replace vectors for entities which already exist in the vocabulary, if True - replace vectors, otherwise - keep old vectors.

ancestors(node)

Get the list of recursively closest parents from the given node.

Parameters

node ({str, int}) – Key for node for which ancestors are to be found.

Returns

Ancestor nodes of the node node.

Return type

list of str

closer_than(entity1, entity2)

Get all entities that are closer to entity1 than entity2 is to entity1.

closest_child(node)

Get the node closest to node that is lower in the hierarchy than node.

Parameters

node ({str, int}) – Key for node for which closest child is to be found.

Returns

Node closest to node that is lower in the hierarchy than node. If there are no nodes lower in the hierarchy, None is returned.

Return type

{str, None}

closest_parent(node)

Get the node closest to node that is higher in the hierarchy than node.

Parameters

node ({str, int}) – Key for node for which closest parent is to be found.

Returns

Node closest to node that is higher in the hierarchy than node. If there are no nodes higher in the hierarchy, None is returned.

Return type

{str, None}

descendants(node, max_depth=5)

Get the list of recursively closest children from the given node, up to a max depth of max_depth.

Parameters
  • node ({str, int}) – Key for node for which descendants are to be found.

  • max_depth (int) – Maximum number of descendants to return.

Returns

Descendant nodes from the node node.

Return type

list of str

difference_in_hierarchy(node_or_vector_1, node_or_vector_2)

Compute relative position in hierarchy of node_or_vector_1 relative to node_or_vector_2. A positive value indicates node_or_vector_1 is higher in the hierarchy than node_or_vector_2.

Parameters
  • node_or_vector_1 ({str, int, numpy.array}) – Input node key or vector.

  • node_or_vector_2 ({str, int, numpy.array}) – Input node key or vector.

Returns

Relative position in hierarchy of node_or_vector_1 relative to node_or_vector_2.

Return type

float

Examples

>>> from gensim.test.utils import datapath
>>>
>>> # Read the sample relations file and train the model
>>> relations = PoincareRelations(file_path=datapath('poincare_hypernyms_large.tsv'))
>>> model = PoincareModel(train_data=relations)
>>> model.train(epochs=50)
>>>
>>> model.kv.difference_in_hierarchy('mammal.n.01', 'dog.n.01')
0.05382517902410999

>>> model.kv.difference_in_hierarchy('dog.n.01', 'mammal.n.01')
-0.05382517902410999

Notes

The returned value can be positive or negative, depending on whether node_or_vector_1 is higher or lower in the hierarchy than node_or_vector_2.

distance(w1, w2)

Calculate Poincare distance between vectors for nodes w1 and w2.

Parameters
  • w1 ({str, int}) – Key for first node.

  • w2 ({str, int}) – Key for second node.

Returns

Poincare distance between the vectors for nodes w1 and w2.

Return type

float

Examples

>>> from gensim.test.utils import datapath
>>>
>>> # Read the sample relations file and train the model
>>> relations = PoincareRelations(file_path=datapath('poincare_hypernyms_large.tsv'))
>>> model = PoincareModel(train_data=relations)
>>> model.train(epochs=50)
>>>
>>> # What is the distance between the words 'mammal' and 'carnivore'?
>>> model.kv.distance('mammal.n.01', 'carnivore.n.01')
2.9742298803339304
Raises

KeyError – If either of w1 and w2 is absent from vocab.

distances(node_or_vector, other_nodes=())

Compute Poincare distances from given node_or_vector to all nodes in other_nodes. If other_nodes is empty, return distance between node_or_vector and all nodes in vocab.

Parameters
  • node_or_vector ({str, int, numpy.array}) – Node key or vector from which distances are to be computed.

  • other_nodes ({iterable of str, iterable of int, None}, optional) – For each node in other_nodes distance from node_or_vector is computed. If None or empty, distance of node_or_vector from all nodes in vocab is computed (including itself).

Returns

Array containing distances to all nodes in other_nodes from input node_or_vector, in the same order as other_nodes.

Return type

numpy.array

Examples

>>> from gensim.test.utils import datapath
>>>
>>> # Read the sample relations file and train the model
>>> relations = PoincareRelations(file_path=datapath('poincare_hypernyms_large.tsv'))
>>> model = PoincareModel(train_data=relations)
>>> model.train(epochs=50)
>>>
>>> # Check the distances between a word and a list of other words.
>>> model.kv.distances('mammal.n.01', ['carnivore.n.01', 'dog.n.01'])
array([2.97422988, 2.83007402])

>>> # Check the distances between a word and every other word in the vocab.
>>> all_distances = model.kv.distances('mammal.n.01')
Raises

KeyError – If either node_or_vector or any node in other_nodes is absent from vocab.

get_vector(entity)

Get the entity’s representations in vector space, as a 1D numpy array.

Parameters

entity (str) – Identifier of the entity to return the vector for.

Returns

Vector for the specified entity.

Return type

numpy.ndarray

Raises

KeyError – If the given entity identifier doesn’t exist.

property index2entity
classmethod load(fname_or_handle, **kwargs)

Load an object previously saved using save() from a file.

Parameters
  • fname (str) – Path to file that contains needed object.

  • mmap (str, optional) – Memory-map option. If the object was saved with large arrays stored separately, you can load these arrays via mmap (shared memory) using mmap=’r’. If the file being loaded is compressed (either ‘.gz’ or ‘.bz2’), then `mmap=None must be set.

See also

save()

Save object to file.

Returns

Object loaded from fname.

Return type

object

Raises

AttributeError – When called on an object instance instead of class (this is a class method).

classmethod load_word2vec_format(fname, fvocab=None, binary=False, encoding='utf8', unicode_errors='strict', limit=None, datatype=<class 'numpy.float32'>)

Load the input-hidden weight matrix from the original C word2vec-tool format. Use _load_word2vec_format().

Note that the information stored in the file is incomplete (the binary tree is missing), so while you can query for word similarity etc., you cannot continue training with a model loaded this way.

Parameters
  • fname (str) – The file path to the saved word2vec-format file.

  • fvocab (str, optional) – File path to the vocabulary.Word counts are read from fvocab filename, if set (this is the file generated by -save-vocab flag of the original C tool).

  • binary (bool, optional) – If True, indicates whether the data is in binary word2vec format.

  • encoding (str, optional) – If you trained the C model using non-utf8 encoding for words, specify that encoding in encoding.

  • unicode_errors (str, optional) – default ‘strict’, is a string suitable to be passed as the errors argument to the unicode() (Python 2.x) or str() (Python 3.x) function. If your source file may include word tokens truncated in the middle of a multibyte unicode character (as is common from the original word2vec.c tool), ‘ignore’ or ‘replace’ may help.

  • limit (int, optional) – Sets a maximum number of word-vectors to read from the file. The default, None, means read all.

  • datatype (type, optional) – (Experimental) Can coerce dimensions to a non-default float type (such as np.float16) to save memory. Such types may result in much slower bulk operations or incompatibility with optimized routines.)

Returns

Loaded Poincare model.

Return type

PoincareModel

most_similar(node_or_vector, topn=10, restrict_vocab=None)

Find the top-N most similar nodes to the given node or vector, sorted in increasing order of distance.

Parameters
  • node_or_vector ({str, int, numpy.array}) – node key or vector for which similar nodes are to be found.

  • topn (int or None, optional) – Number of top-N similar nodes to return, when topn is int. When topn is None, then distance for all nodes are returned.

  • restrict_vocab (int or None, optional) – Optional integer which limits the range of vectors which are searched for most-similar values. For example, restrict_vocab=10000 would only check the first 10000 node vectors in the vocabulary order. This may be meaningful if vocabulary is sorted by descending frequency.

Returns

When topn is int, a sequence of (node, distance) is returned in increasing order of distance. When topn is None, then similarities for all words are returned as a one-dimensional numpy array with the size of the vocabulary.

Return type

list of (str, float) or numpy.array

Examples

>>> from gensim.test.utils import datapath
>>>
>>> # Read the sample relations file and train the model
>>> relations = PoincareRelations(file_path=datapath('poincare_hypernyms_large.tsv'))
>>> model = PoincareModel(train_data=relations)
>>> model.train(epochs=50)
>>>
>>> # Which words are most similar to 'kangaroo'?
>>> model.kv.most_similar('kangaroo.n.01', topn=2)
[(u'kangaroo.n.01', 0.0), (u'marsupial.n.01', 0.26524229460827725)]
most_similar_to_given(entity1, entities_list)

Get the entity from entities_list most similar to entity1.

norm(node_or_vector)

Compute absolute position in hierarchy of input node or vector. Values range between 0 and 1. A lower value indicates the input node or vector is higher in the hierarchy.

Parameters

node_or_vector ({str, int, numpy.array}) – Input node key or vector for which position in hierarchy is to be returned.

Returns

Absolute position in the hierarchy of the input vector or node.

Return type

float

Examples

>>> from gensim.test.utils import datapath
>>>
>>> # Read the sample relations file and train the model
>>> relations = PoincareRelations(file_path=datapath('poincare_hypernyms_large.tsv'))
>>> model = PoincareModel(train_data=relations)
>>> model.train(epochs=50)
>>>
>>> # Get the norm of the embedding of the word `mammal`.
>>> model.kv.norm('mammal.n.01')
0.6423008703542398

Notes

The position in hierarchy is based on the norm of the vector for the node.

rank(entity1, entity2)

Rank of the distance of entity2 from entity1, in relation to distances of all entities from entity1.

save(fname_or_handle, **kwargs)

Save the object to a file.

Parameters
  • fname_or_handle (str or file-like) – Path to output file or already opened file-like object. If the object is a file handle, no special array handling will be performed, all attributes will be saved to the same file.

  • separately (list of str or None, optional) –

    If None, automatically detect large numpy/scipy.sparse arrays in the object being stored, and store them into separate files. This prevent memory errors for large objects, and also allows memory-mapping the large arrays for efficient loading and sharing the large arrays in RAM between multiple processes.

    If list of str: store these attributes into separate files. The automated size check is not performed in this case.

  • sep_limit (int, optional) – Don’t store arrays smaller than this separately. In bytes.

  • ignore (frozenset of str, optional) – Attributes that shouldn’t be stored at all.

  • pickle_protocol (int, optional) – Protocol number for pickle.

See also

load()

Load object from file.

save_word2vec_format(fname, fvocab=None, binary=False, total_vec=None)

Store the input-hidden weight matrix in the same format used by the original C word2vec-tool, for compatibility, using _save_word2vec_format().

Parameters
  • fname (str) – Path to file that will be used for storing.

  • fvocab (str, optional) – File path used to save the vocabulary.

  • binary (bool, optional) – If True, the data wil be saved in binary word2vec format, else it will be saved in plain text.

  • total_vec (int, optional) – Explicitly specify total number of vectors (in case word vectors are appended with document vectors afterwards).

similarity(w1, w2)

Compute similarity based on Poincare distance between vectors for nodes w1 and w2.

Parameters
  • w1 ({str, int}) – Key for first node.

  • w2 ({str, int}) – Key for second node.

Returns

Similarity between the between the vectors for nodes w1 and w2 (between 0 and 1).

Return type

float

Examples

>>> from gensim.test.utils import datapath
>>>
>>> # Read the sample relations file and train the model
>>> relations = PoincareRelations(file_path=datapath('poincare_hypernyms_large.tsv'))
>>> model = PoincareModel(train_data=relations)
>>> model.train(epochs=50)
>>>
>>> # What is the similarity between the words 'mammal' and 'carnivore'?
>>> model.kv.similarity('mammal.n.01', 'carnivore.n.01')
0.25162107631176484
Raises

KeyError – If either of w1 and w2 is absent from vocab.

static vector_distance(vector_1, vector_2)

Compute poincare distance between two input vectors. Convenience method over vector_distance_batch.

Parameters
  • vector_1 (numpy.array) – Input vector.

  • vector_2 (numpy.array) – Input vector.

Returns

Poincare distance between vector_1 and vector_2.

Return type

numpy.float

static vector_distance_batch(vector_1, vectors_all)

Compute poincare distances between one vector and a set of other vectors.

Parameters
  • vector_1 (numpy.array) – vector from which Poincare distances are to be computed, expected shape (dim,).

  • vectors_all (numpy.array) – for each row in vectors_all, distance from vector_1 is computed, expected shape (num_vectors, dim).

Returns

Poincare distance between vector_1 and each row in vectors_all, shape (num_vectors,).

Return type

numpy.array

property vectors
word_vec(word)

Get the word’s representations in vector space, as a 1D numpy array.

Examples

>>> from gensim.test.utils import datapath
>>>
>>> # Read the sample relations file and train the model
>>> relations = PoincareRelations(file_path=datapath('poincare_hypernyms_large.tsv'))
>>> model = PoincareModel(train_data=relations)
>>> model.train(epochs=50)
>>>
>>> # Query the trained model.
>>> wv = model.kv.word_vec('kangaroo.n.01')
words_closer_than(w1, w2)

Get all words that are closer to w1 than w2 is to w1.

Parameters
  • w1 (str) – Input word.

  • w2 (str) – Input word.

Returns

List of words that are closer to w1 than w2 is to w1.

Return type

list (str)

Examples

>>> from gensim.test.utils import datapath
>>>
>>> # Read the sample relations file and train the model
>>> relations = PoincareRelations(file_path=datapath('poincare_hypernyms_large.tsv'))
>>> model = PoincareModel(train_data=relations)
>>> model.train(epochs=50)
>>>
>>> # Which term is closer to 'kangaroo' than 'metatherian' is to 'kangaroo'?
>>> model.kv.words_closer_than('kangaroo.n.01', 'metatherian.n.01')
[u'marsupial.n.01', u'phalanger.n.01']
class gensim.models.poincare.PoincareModel(train_data, size=50, alpha=0.1, negative=10, workers=1, epsilon=1e-05, regularization_coeff=1.0, burn_in=10, burn_in_alpha=0.01, init_range=(-0.001, 0.001), dtype=<class 'numpy.float64'>, seed=0)

Bases: gensim.utils.SaveLoad

Train, use and evaluate Poincare Embeddings.

The model can be stored/loaded via its save() and load() methods, or stored/loaded in the word2vec format via model.kv.save_word2vec_format and load_word2vec_format().

Notes

Training cannot be resumed from a model loaded via load_word2vec_format, if you wish to train further, use save() and load() methods instead.

An important attribute (that provides a lot of additional functionality when directly accessed) are the keyed vectors:

self.kvPoincareKeyedVectors

This object essentially contains the mapping between nodes and embeddings, as well the vocabulary of the model (set of unique nodes seen by the model). After training, it can be used to perform operations on the vectors such as vector lookup, distance and similarity calculations etc. See the documentation of its class for usage examples.

Initialize and train a Poincare embedding model from an iterable of relations.

Parameters
  • train_data ({iterable of (str, str), gensim.models.poincare.PoincareRelations}) – Iterable of relations, e.g. a list of tuples, or a gensim.models.poincare.PoincareRelations instance streaming from a file. Note that the relations are treated as ordered pairs, i.e. a relation (a, b) does not imply the opposite relation (b, a). In case the relations are symmetric, the data should contain both relations (a, b) and (b, a).

  • size (int, optional) – Number of dimensions of the trained model.

  • alpha (float, optional) – Learning rate for training.

  • negative (int, optional) – Number of negative samples to use.

  • workers (int, optional) – Number of threads to use for training the model.

  • epsilon (float, optional) – Constant used for clipping embeddings below a norm of one.

  • regularization_coeff (float, optional) – Coefficient used for l2-regularization while training (0 effectively disables regularization).

  • burn_in (int, optional) – Number of epochs to use for burn-in initialization (0 means no burn-in).

  • burn_in_alpha (float, optional) – Learning rate for burn-in initialization, ignored if burn_in is 0.

  • init_range (2-tuple (float, float)) – Range within which the vectors are randomly initialized.

  • dtype (numpy.dtype) – The numpy dtype to use for the vectors in the model (numpy.float64, numpy.float32 etc). Using lower precision floats may be useful in increasing training speed and reducing memory usage.

  • seed (int, optional) – Seed for random to ensure reproducibility.

Examples

Initialize a model from a list:

>>> from gensim.models.poincare import PoincareModel
>>> relations = [('kangaroo', 'marsupial'), ('kangaroo', 'mammal'), ('gib', 'cat')]
>>> model = PoincareModel(relations, negative=2)

Initialize a model from a file containing one relation per line:

>>> from gensim.models.poincare import PoincareModel, PoincareRelations
>>> from gensim.test.utils import datapath
>>> file_path = datapath('poincare_hypernyms.tsv')
>>> model = PoincareModel(PoincareRelations(file_path), negative=2)

See PoincareRelations for more options.

build_vocab(relations, update=False)

Build the model’s vocabulary from known relations.

Parameters
  • relations ({iterable of (str, str), gensim.models.poincare.PoincareRelations}) – Iterable of relations, e.g. a list of tuples, or a gensim.models.poincare.PoincareRelations instance streaming from a file. Note that the relations are treated as ordered pairs, i.e. a relation (a, b) does not imply the opposite relation (b, a). In case the relations are symmetric, the data should contain both relations (a, b) and (b, a).

  • update (bool, optional) – If true, only new nodes’s embeddings are initialized. Use this when the model already has an existing vocabulary and you want to update it. If false, all node’s embeddings are initialized. Use this when you’re creating a new vocabulary from scratch.

Examples

Train a model and update vocab for online training:

>>> from gensim.models.poincare import PoincareModel
>>>
>>> # train a new model from initial data
>>> initial_relations = [('kangaroo', 'marsupial'), ('kangaroo', 'mammal')]
>>> model = PoincareModel(initial_relations, negative=1)
>>> model.train(epochs=50)
>>>
>>> # online training: update the vocabulary and continue training
>>> online_relations = [('striped_skunk', 'mammal')]
>>> model.build_vocab(online_relations, update=True)
>>> model.train(epochs=50)
classmethod load(*args, **kwargs)

Load model from disk, inherited from SaveLoad.

See also

save()

Parameters
  • *args – Positional arguments passed to load().

  • **kwargs – Keyword arguments passed to load().

Returns

The loaded model.

Return type

PoincareModel

save(*args, **kwargs)

Save complete model to disk, inherited from SaveLoad.

See also

load()

Parameters
  • *args – Positional arguments passed to save().

  • **kwargs – Keyword arguments passed to save().

train(epochs, batch_size=10, print_every=1000, check_gradients_every=None)

Train Poincare embeddings using loaded data and model parameters.

Parameters
  • epochs (int) – Number of iterations (epochs) over the corpus.

  • batch_size (int, optional) – Number of examples to train on in a single batch.

  • print_every (int, optional) – Prints progress and average loss after every print_every batches.

  • check_gradients_every (int or None, optional) – Compares computed gradients and autograd gradients after every check_gradients_every batches. Useful for debugging, doesn’t compare by default.

Examples

>>> from gensim.models.poincare import PoincareModel
>>> relations = [('kangaroo', 'marsupial'), ('kangaroo', 'mammal'), ('gib', 'cat')]
>>> model = PoincareModel(relations, negative=2)
>>> model.train(epochs=50)
class gensim.models.poincare.PoincareRelations(file_path, encoding='utf8', delimiter='t')

Bases: object

Stream relations for PoincareModel from a tsv-like file.

Initialize instance from file containing a pair of nodes (a relation) per line.

Parameters
  • file_path (str) –

    Path to file containing a pair of nodes (a relation) per line, separated by delimiter. Since the relations are asymmetric, the order of u and v nodes in each pair matters. To express a “u is v” relation, the lines should take the form u delimeter v. e.g: kangaroo mammal is a tab-delimited line expressing a “kangaroo is a mammal” relation.

    For a full input file example, see gensim/test/test_data/poincare_hypernyms.tsv.

  • encoding (str, optional) – Character encoding of the input file.

  • delimiter (str, optional) – Delimiter character for each relation.

__iter__()

Stream relations from self.file_path decoded into unicode strings.

Yields

(unicode, unicode) – Relation from input file.

class gensim.models.poincare.ReconstructionEvaluation(file_path, embedding)

Bases: object

Evaluate reconstruction on given network for given embedding.

Initialize evaluation instance with tsv file containing relation pairs and embedding to be evaluated.

Parameters
  • file_path (str) – Path to tsv file containing relation pairs.

  • embedding (PoincareKeyedVectors) – Embedding to be evaluated.

evaluate(max_n=None)

Evaluate all defined metrics for the reconstruction task.

Parameters

max_n (int, optional) – Maximum number of positive relations to evaluate, all if max_n is None.

Returns

(metric_name, metric_value) pairs, e.g. {‘mean_rank’: 50.3, ‘MAP’: 0.31}.

Return type

dict of (str, float)

evaluate_mean_rank_and_map(max_n=None)

Evaluate mean rank and MAP for reconstruction.

Parameters

max_n (int, optional) – Maximum number of positive relations to evaluate, all if max_n is None.

Returns

(mean_rank, MAP), e.g (50.3, 0.31).

Return type

(float, float)

static get_positive_relation_ranks_and_avg_prec(all_distances, positive_relations)

Compute ranks and Average Precision of positive relations.

Parameters
  • all_distances (numpy.array of float) – Array of all distances (floats) for a specific item.

  • positive_relations (list) – List of indices of positive relations for the item.

Returns

The list contains ranks of positive relations in the same order as positive_relations. The float is the Average Precision of the ranking, e.g. ([1, 2, 3, 20], 0.610).

Return type

(list of int, float)