gensim logo

gensim
gensim tagline

Get Expert Help From The Gensim Authors

Consulting in Machine Learning & NLP

• Commercial document similarity engine: ScaleText.ai

Corporate trainings in Python Data Science and Deep Learning

test.utils – Common utils

test.utils – Common utils

Module contains common utilities used in automated code tests for Gensim modules.

Attributes:

module_path : str
Full path to this module directory.
common_texts : list of list of str
Toy dataset.
common_dictionary : Dictionary
Dictionary of toy dataset.
common_corpus : list of list of (int, int)
Corpus of toy dataset.

Examples:

It’s easy to keep objects in temporary folder and reuse’em if needed:

>>> from gensim.models import word2vec
>>> from gensim.test.utils import get_tmpfile, common_texts
>>>
>>> model = word2vec.Word2Vec(common_texts, min_count=1)
>>> temp_path = get_tmpfile('toy_w2v')
>>> model.save(temp_path)
>>>
>>> new_model = word2vec.Word2Vec.load(temp_path)
>>> result = new_model.wv.most_similar("human", topn=1)

Let’s print first document in toy dataset and then recreate it using its corpus and dictionary.

>>> from gensim.test.utils import common_texts, common_dictionary, common_corpus
>>> print(common_texts[0])
['human', 'interface', 'computer']
>>> assert common_dictionary.doc2bow(common_texts[0]) == common_corpus[0]

We can find our toy set in test data directory.

>>> from gensim.test.utils import datapath
>>>
>>> with open(datapath("testcorpus.txt")) as f:
...     texts = [line.strip().split() for line in f]
>>> print(texts[0])
['computer', 'human', 'interface']

If you don’t need to keep temporary objects on disk use temporary_file():

>>> from gensim.test.utils import temporary_file, common_corpus, common_dictionary
>>> from gensim.models import LdaModel
>>>
>>> with temporary_file("temp.txt") as tf:
...     lda = LdaModel(common_corpus, id2word=common_dictionary, num_topics=3)
...     lda.save(tf)
gensim.test.utils.datapath(fname)

Get full path for file fname in test data directory placed in this module directory. Usually used to place corpus to test_data directory.

Parameters:fname (str) – Name of file.
Returns:Full path to fname in test_data folder.
Return type:str

Example

Let’s get path of test GloVe data file and check if it exits.

>>> from gensim.corpora import MmCorpus
>>> from gensim.test.utils import datapath
>>>
>>> corpus = MmCorpus(datapath("testcorpus.mm"))
>>> for document in corpus:
...     pass
gensim.test.utils.get_tmpfile(suffix)

Get full path to file suffix in temporary folder. This function doesn’t creates file (only generate unique name). Also, it may return different paths in consecutive calling.

Parameters:suffix (str) – Suffix of file.
Returns:Path to suffix file in temporary folder.
Return type:str

Examples

Using this function we may get path to temporary file and use it, for example, to store temporary model.

>>> from gensim.models import LsiModel
>>> from gensim.test.utils import get_tmpfile, common_dictionary, common_corpus
>>>
>>> tmp_f = get_tmpfile("toy_lsi_model")
>>>
>>> model = LsiModel(common_corpus, id2word=common_dictionary)
>>> model.save(tmp_f)
>>>
>>> loaded_model = LsiModel.load(tmp_f)
gensim.test.utils.temporary_file(*args, **kwds)

This context manager creates file name in temporary directory and returns its full path. Temporary directory with included files will deleted at the end of context. Note, it won’t create file.

Parameters:name (str) – Filename.
Yields:str – Path to file name in temporary directory.

Examples

This example demonstrates that created temporary directory (and included files) will deleted at the end of context.

>>> import os
>>> from gensim.test.utils import temporary_file
>>> with temporary_file("temp.txt") as tf, open(tf, 'w') as outfile:
...     outfile.write("my extremely useful information")
...     print("Is this file exists? {}".format(os.path.exists(tf)))
...     print("Is this folder exists? {}".format(os.path.exists(os.path.dirname(tf))))
Is this file exists? True
Is this folder exists? True
>>>
>>> print("Is this file exists? {}".format(os.path.exists(tf)))
Is this file exists? False
>>> print("Is this folder exists? {}".format(os.path.exists(os.path.dirname(tf))))
Is this folder exists? False