models.poincare
– Train and use Poincare embeddings¶Python implementation of Poincaré Embeddings.
These embeddings are better at capturing latent hierarchical information than traditional Euclidean embeddings. The method is described in detail in Maximilian Nickel, Douwe Kiela  “Poincaré Embeddings for Learning Hierarchical Representations”.
The main usecase is to automatically learn hierarchical representations of nodes from a treelike structure, such as a Directed Acyclic Graph (DAG), using a transitive closure of the relations. Representations of nodes in a symmetric graph can also be learned.
This module allows training Poincaré Embeddings from a training file containing relations of graph in a csvlike format, or from a Python iterable of relations.
Examples
Initialize and train a model from a list
>>> from gensim.models.poincare import PoincareModel
>>> relations = [('kangaroo', 'marsupial'), ('kangaroo', 'mammal'), ('gib', 'cat')]
>>> model = PoincareModel(relations, negative=2)
>>> model.train(epochs=50)
Initialize and train a model from a file containing one relation per line
>>> from gensim.models.poincare import PoincareModel, PoincareRelations
>>> from gensim.test.utils import datapath
>>> file_path = datapath('poincare_hypernyms.tsv')
>>> model = PoincareModel(PoincareRelations(file_path), negative=2)
>>> model.train(epochs=50)
gensim.models.poincare.
LexicalEntailmentEvaluation
(filepath)¶Bases: object
Evaluate reconstruction on given network for any embedding.
Initialize evaluation instance with HyperLex text file containing relation pairs.
Parameters:  filepath (str) – Path to HyperLex text file. 

create_vocab_trie
(embedding)¶Create trie with vocab terms of the given embedding to enable quick prefix searches.
Parameters:  embedding (PoincareKeyedVectors ) – Embedding for which trie is to be created. 

Returns:  Trie containing vocab terms of the input embedding. 
Return type:  pygtrie.Trie 
evaluate_spearman
(embedding)¶Evaluate spearman scores for lexical entailment for given embedding.
Parameters:  embedding (PoincareKeyedVectors ) – Embedding for which evaluation is to be done. 

Returns:  Spearman correlation score for the task for input embedding. 
Return type:  float 
find_matching_terms
(trie, word)¶Find terms in the trie beginning with the word.
Parameters: 


Returns:  List of matching terms. 
Return type:  list of str 
score_function
(embedding, trie, term_1, term_2)¶Compute predicted score  extent to which term_1 is a type of term_2.
Parameters: 


Returns:  Predicted score (the extent to which term_1 is a type of term_2). 
Return type:  float 
gensim.models.poincare.
LinkPredictionEvaluation
(train_path, test_path, embedding)¶Bases: object
Evaluate reconstruction on given network for given embedding.
Initialize evaluation instance with tsv file containing relation pairs and embedding to be evaluated.
Parameters: 


evaluate
(max_n=None)¶Evaluate all defined metrics for the link prediction task.
Parameters:  max_n (int, optional) – Maximum number of positive relations to evaluate, all if max_n is None. 

Returns:  (metric_name, metric_value) pairs, e.g. {‘mean_rank’: 50.3, ‘MAP’: 0.31}. 
Return type:  dict of (str, float) 
evaluate_mean_rank_and_map
(max_n=None)¶Evaluate mean rank and MAP for link prediction.
Parameters:  max_n (int, optional) – Maximum number of positive relations to evaluate, all if max_n is None. 

Returns:  (mean_rank, MAP), e.g (50.3, 0.31). 
Return type:  tuple (float, float) 
get_unknown_relation_ranks_and_avg_prec
(all_distances, unknown_relations, known_relations)¶Compute ranks and Average Precision of unknown positive relations.
Parameters: 


Returns:  The list contains ranks of positive relations in the same order as positive_relations. The float is the Average Precision of the ranking, e.g. ([1, 2, 3, 20], 0.610). 
Return type:  tuple (list of int, float) 
gensim.models.poincare.
NegativesBuffer
(items)¶Bases: object
Buffer and return negative samples.
Initialize instance from list or numpy array of samples.
Parameters:  items (list/numpy.array) – List or array containing negative samples. 

get_items
(num_items)¶Get the next num_items from buffer.
Parameters:  num_items (int) – Number of items to fetch. 

Returns:  Slice containing num_items items from the original data. 
Return type:  numpy.array or list 
Notes
No error is raised if less than num_items items are remaining, simply all the remaining items are returned.
num_items
()¶Get the number of items remaining in the buffer.
Returns:  Number of items in the buffer that haven’t been consumed yet. 

Return type:  int 
gensim.models.poincare.
PoincareBatch
(vectors_u, vectors_v, indices_u, indices_v, regularization_coeff=1.0)¶Bases: object
Compute Poincare distances, gradients and loss for a training batch.
Store intermediate state to avoid recomputing multiple times.
Initialize instance with sets of vectors for which distances are to be computed.
Parameters: 


compute_all
()¶Convenience method to perform all computations.
compute_distance_gradients
()¶Compute and store partial derivatives of poincare distance d(u, v) w.r.t all u and all v.
compute_distances
()¶Compute and store norms, euclidean distances and poincare distances between input vectors.
compute_gradients
()¶Compute and store gradients of loss function for all input vectors.
compute_loss
()¶Compute and store loss value for the given batch of examples.
gensim.models.poincare.
PoincareKeyedVectors
(vector_size)¶Bases: gensim.models.keyedvectors.BaseKeyedVectors
Vectors and vocab for the PoincareModel
training class.
Used to perform operations on the vectors such as vector lookup, distance calculations etc.
__contains__
(entity)¶__getitem__
(entities)¶Get vector representation of entities.
Parameters:  entities ({str, list of str}) – Input entity/entities. 

Returns:  Vector representation for entities (1D if entities is string, otherwise  2D). 
Return type:  numpy.ndarray 
add
(entities, weights, replace=False)¶Append entities and theirs vectors in a manual way. If some entity is already in the vocabulary, the old vector is kept unless replace flag is True.
Parameters: 


ancestors
(node)¶Get the list of recursively closest parents from the given node.
Parameters:  node ({str, int}) – Key for node for which ancestors are to be found. 

Returns:  Ancestor nodes of the node node. 
Return type:  list of str 
closer_than
(entity1, entity2)¶Get all entities that are closer to entity1 than entity2 is to entity1.
closest_child
(node)¶Get the node closest to node that is lower in the hierarchy than node.
Parameters:  node ({str, int}) – Key for node for which closest child is to be found. 

Returns:  Node closest to node that is lower in the hierarchy than node. If there are no nodes lower in the hierarchy, None is returned. 
Return type:  {str, None} 
closest_parent
(node)¶Get the node closest to node that is higher in the hierarchy than node.
Parameters:  node ({str, int}) – Key for node for which closest parent is to be found. 

Returns:  Node closest to node that is higher in the hierarchy than node. If there are no nodes higher in the hierarchy, None is returned. 
Return type:  {str, None} 
descendants
(node, max_depth=5)¶Get the list of recursively closest children from the given node, up to a max depth of max_depth.
Parameters: 


Returns:  Descendant nodes from the node node. 
Return type:  list of str 
difference_in_hierarchy
(node_or_vector_1, node_or_vector_2)¶Compute relative position in hierarchy of node_or_vector_1 relative to node_or_vector_2. A positive value indicates node_or_vector_1 is higher in the hierarchy than node_or_vector_2.
Parameters: 


Returns:  Relative position in hierarchy of node_or_vector_1 relative to node_or_vector_2. 
Return type:  float 
Examples
>>> from gensim.test.utils import datapath
>>>
>>> # Read the sample relations file and train the model
>>> relations = PoincareRelations(file_path=datapath('poincare_hypernyms_large.tsv'))
>>> model = PoincareModel(train_data=relations)
>>> model.train(epochs=50)
>>>
>>> model.kv.difference_in_hierarchy('mammal.n.01', 'dog.n.01')
0.05382517902410999
>>> model.kv.difference_in_hierarchy('dog.n.01', 'mammal.n.01')
0.05382517902410999
Notes
The returned value can be positive or negative, depending on whether node_or_vector_1 is higher or lower in the hierarchy than node_or_vector_2.
distance
(w1, w2)¶Calculate Poincare distance between vectors for nodes w1 and w2.
Parameters: 


Returns:  Poincare distance between the vectors for nodes w1 and w2. 
Return type:  float 
Examples
>>> from gensim.test.utils import datapath
>>>
>>> # Read the sample relations file and train the model
>>> relations = PoincareRelations(file_path=datapath('poincare_hypernyms_large.tsv'))
>>> model = PoincareModel(train_data=relations)
>>> model.train(epochs=50)
>>>
>>> # What is the distance between the words 'mammal' and 'carnivore'?
>>> model.kv.distance('mammal.n.01', 'carnivore.n.01')
2.9742298803339304
Raises:  KeyError – If either of w1 and w2 is absent from vocab. 

distances
(node_or_vector, other_nodes=())¶Compute Poincare distances from given node_or_vector to all nodes in other_nodes. If other_nodes is empty, return distance between node_or_vector and all nodes in vocab.
Parameters: 


Returns:  Array containing distances to all nodes in other_nodes from input node_or_vector, in the same order as other_nodes. 
Return type:  numpy.array 
Examples
>>> from gensim.test.utils import datapath
>>>
>>> # Read the sample relations file and train the model
>>> relations = PoincareRelations(file_path=datapath('poincare_hypernyms_large.tsv'))
>>> model = PoincareModel(train_data=relations)
>>> model.train(epochs=50)
>>>
>>> # Check the distances between a word and a list of other words.
>>> model.kv.distances('mammal.n.01', ['carnivore.n.01', 'dog.n.01'])
array([2.97422988, 2.83007402])
>>> # Check the distances between a word and every other word in the vocab.
>>> all_distances = model.kv.distances('mammal.n.01')
Raises:  KeyError – If either node_or_vector or any node in other_nodes is absent from vocab. 

get_vector
(entity)¶Get the entity’s representations in vector space, as a 1D numpy array.
Parameters:  entity (str) – Identifier of the entity to return the vector for. 

Returns:  Vector for the specified entity. 
Return type:  numpy.ndarray 
Raises:  KeyError – If the given entity identifier doesn’t exist. 
index2entity
¶load
(fname_or_handle, **kwargs)¶Load an object previously saved using save()
from a file.
Parameters: 


See also
save()
Returns:  Object loaded from fname. 

Return type:  object 
Raises:  AttributeError – When called on an object instance instead of class (this is a class method). 
load_word2vec_format
(fname, fvocab=None, binary=False, encoding='utf8', unicode_errors='strict', limit=None, datatype=<type 'numpy.float32'>)¶Load the inputhidden weight matrix from the original C word2vectool format.
Use _load_word2vec_format()
.
Note that the information stored in the file is incomplete (the binary tree is missing), so while you can query for word similarity etc., you cannot continue training with a model loaded this way.
Parameters: 


Returns:  Loaded Poincare model. 
Return type: 
most_similar
(node_or_vector, topn=10, restrict_vocab=None)¶Find the topN most similar nodes to the given node or vector, sorted in increasing order of distance.
Parameters: 


Returns:  List of tuples containing (node, distance) pairs in increasing order of distance. 
Return type:  list of (str, float) 
Examples
>>> from gensim.test.utils import datapath
>>>
>>> # Read the sample relations file and train the model
>>> relations = PoincareRelations(file_path=datapath('poincare_hypernyms_large.tsv'))
>>> model = PoincareModel(train_data=relations)
>>> model.train(epochs=50)
>>>
>>> # Which words are most similar to 'kangaroo'?
>>> model.kv.most_similar('kangaroo.n.01', topn=2)
[(u'kangaroo.n.01', 0.0), (u'marsupial.n.01', 0.26524229460827725)]
most_similar_to_given
(entity1, entities_list)¶Get the entity from entities_list most similar to entity1.
norm
(node_or_vector)¶Compute absolute position in hierarchy of input node or vector. Values range between 0 and 1. A lower value indicates the input node or vector is higher in the hierarchy.
Parameters:  node_or_vector ({str, int, numpy.array}) – Input node key or vector for which position in hierarchy is to be returned. 

Returns:  Absolute position in the hierarchy of the input vector or node. 
Return type:  float 
Examples
>>> from gensim.test.utils import datapath
>>>
>>> # Read the sample relations file and train the model
>>> relations = PoincareRelations(file_path=datapath('poincare_hypernyms_large.tsv'))
>>> model = PoincareModel(train_data=relations)
>>> model.train(epochs=50)
>>>
>>> # Get the norm of the embedding of the word `mammal`.
>>> model.kv.norm('mammal.n.01')
0.6423008703542398
Notes
The position in hierarchy is based on the norm of the vector for the node.
rank
(entity1, entity2)¶Rank of the distance of entity2 from entity1, in relation to distances of all entities from entity1.
save
(fname_or_handle, **kwargs)¶Save the object to a file.
Parameters: 


See also
load()
save_word2vec_format
(fname, fvocab=None, binary=False, total_vec=None)¶Store the inputhidden weight matrix in the same format used by the original
C word2vectool, for compatibility, using _save_word2vec_format()
.
Parameters: 


similarity
(w1, w2)¶Compute similarity based on Poincare distance between vectors for nodes w1 and w2.
Parameters: 


Returns:  Similarity between the between the vectors for nodes w1 and w2 (between 0 and 1). 
Return type:  float 
Examples
>>> from gensim.test.utils import datapath
>>>
>>> # Read the sample relations file and train the model
>>> relations = PoincareRelations(file_path=datapath('poincare_hypernyms_large.tsv'))
>>> model = PoincareModel(train_data=relations)
>>> model.train(epochs=50)
>>>
>>> # What is the similarity between the words 'mammal' and 'carnivore'?
>>> model.kv.similarity('mammal.n.01', 'carnivore.n.01')
0.25162107631176484
Raises:  KeyError – If either of w1 and w2 is absent from vocab. 

vector_distance
(vector_1, vector_2)¶Compute poincare distance between two input vectors. Convenience method over vector_distance_batch.
Parameters: 


Returns:  Poincare distance between vector_1 and vector_2. 
Return type:  numpy.float 
vector_distance_batch
(vector_1, vectors_all)¶Compute poincare distances between one vector and a set of other vectors.
Parameters: 


Returns:  Poincare distance between vector_1 and each row in vectors_all, shape (num_vectors,). 
Return type:  numpy.array 
vectors
¶word_vec
(word)¶Get the word’s representations in vector space, as a 1D numpy array.
Examples
>>> from gensim.test.utils import datapath
>>>
>>> # Read the sample relations file and train the model
>>> relations = PoincareRelations(file_path=datapath('poincare_hypernyms_large.tsv'))
>>> model = PoincareModel(train_data=relations)
>>> model.train(epochs=50)
>>>
>>> # Query the trained model.
>>> wv = model.kv.word_vec('kangaroo.n.01')
words_closer_than
(w1, w2)¶Get all words that are closer to w1 than w2 is to w1.
Parameters: 


Returns:  List of words that are closer to w1 than w2 is to w1. 
Return type:  list (str) 
Examples
>>> from gensim.test.utils import datapath
>>>
>>> # Read the sample relations file and train the model
>>> relations = PoincareRelations(file_path=datapath('poincare_hypernyms_large.tsv'))
>>> model = PoincareModel(train_data=relations)
>>> model.train(epochs=50)
>>>
>>> # Which term is closer to 'kangaroo' than 'metatherian' is to 'kangaroo'?
>>> model.kv.words_closer_than('kangaroo.n.01', 'metatherian.n.01')
[u'marsupial.n.01', u'phalanger.n.01']
gensim.models.poincare.
PoincareModel
(train_data, size=50, alpha=0.1, negative=10, workers=1, epsilon=1e05, regularization_coeff=1.0, burn_in=10, burn_in_alpha=0.01, init_range=(0.001, 0.001), dtype=<type 'numpy.float64'>, seed=0)¶Bases: gensim.utils.SaveLoad
Train, use and evaluate Poincare Embeddings.
The model can be stored/loaded via its save()
and load()
methods, or stored/loaded in the word2vec format
via model.kv.save_word2vec_format and load_word2vec_format()
.
Notes
Training cannot be resumed from a model loaded via load_word2vec_format, if you wish to train further,
use save()
and load()
methods instead.
An important attribute (that provides a lot of additional functionality when directly accessed) are the keyed vectors:
PoincareKeyedVectors
Initialize and train a Poincare embedding model from an iterable of relations.
Parameters: 


Examples
Initialize a model from a list:
>>> from gensim.models.poincare import PoincareModel
>>> relations = [('kangaroo', 'marsupial'), ('kangaroo', 'mammal'), ('gib', 'cat')]
>>> model = PoincareModel(relations, negative=2)
Initialize a model from a file containing one relation per line:
>>> from gensim.models.poincare import PoincareModel, PoincareRelations
>>> from gensim.test.utils import datapath
>>> file_path = datapath('poincare_hypernyms.tsv')
>>> model = PoincareModel(PoincareRelations(file_path), negative=2)
See PoincareRelations
for more options.
load
(*args, **kwargs)¶Load model from disk, inherited from SaveLoad
.
See also
Parameters:  

Returns:  The loaded model. 
Return type: 
train
(epochs, batch_size=10, print_every=1000, check_gradients_every=None)¶Train Poincare embeddings using loaded data and model parameters.
Parameters: 


Examples
>>> from gensim.models.poincare import PoincareModel
>>> relations = [('kangaroo', 'marsupial'), ('kangaroo', 'mammal'), ('gib', 'cat')]
>>> model = PoincareModel(relations, negative=2)
>>> model.train(epochs=50)
gensim.models.poincare.
PoincareRelations
(file_path, encoding='utf8', delimiter='t')¶Bases: object
Stream relations for PoincareModel from a tsvlike file.
Initialize instance from file containing a pair of nodes (a relation) per line.
Parameters: 


__iter__
()¶Stream relations from self.file_path decoded into unicode strings.
Yields:  (unicode, unicode) – Relation from input file. 

gensim.models.poincare.
ReconstructionEvaluation
(file_path, embedding)¶Bases: object
Evaluate reconstruction on given network for given embedding.
Initialize evaluation instance with tsv file containing relation pairs and embedding to be evaluated.
Parameters: 


evaluate
(max_n=None)¶Evaluate all defined metrics for the reconstruction task.
Parameters:  max_n (int, optional) – Maximum number of positive relations to evaluate, all if max_n is None. 

Returns:  (metric_name, metric_value) pairs, e.g. {‘mean_rank’: 50.3, ‘MAP’: 0.31}. 
Return type:  dict of (str, float) 
evaluate_mean_rank_and_map
(max_n=None)¶Evaluate mean rank and MAP for reconstruction.
Parameters:  max_n (int, optional) – Maximum number of positive relations to evaluate, all if max_n is None. 

Returns:  (mean_rank, MAP), e.g (50.3, 0.31). 
Return type:  (float, float) 
get_positive_relation_ranks_and_avg_prec
(all_distances, positive_relations)¶Compute ranks and Average Precision of positive relations.
Parameters: 


Returns:  The list contains ranks of positive relations in the same order as positive_relations. The float is the Average Precision of the ranking, e.g. ([1, 2, 3, 20], 0.610). 
Return type:  (list of int, float) 