test.utils
– Internal testing functions¶
Module contains common utilities used in automated code tests for Gensim modules.
Attributes:
- module_pathstr
Full path to this module directory.
- common_textslist of list of str
Toy dataset.
- common_dictionary
Dictionary
Dictionary of toy dataset.
- common_corpuslist of list of (int, int)
Corpus of toy dataset.
Examples:
It’s easy to keep objects in temporary folder and reuse’em if needed:
>>> from gensim.models import word2vec
>>> from gensim.test.utils import get_tmpfile, common_texts
>>>
>>> model = word2vec.Word2Vec(common_texts, min_count=1)
>>> temp_path = get_tmpfile('toy_w2v')
>>> model.save(temp_path)
>>>
>>> new_model = word2vec.Word2Vec.load(temp_path)
>>> result = new_model.wv.most_similar("human", topn=1)
Let’s print first document in toy dataset and then recreate it using its corpus and dictionary.
>>> from gensim.test.utils import common_texts, common_dictionary, common_corpus
>>> print(common_texts[0])
['human', 'interface', 'computer']
>>> assert common_dictionary.doc2bow(common_texts[0]) == common_corpus[0]
We can find our toy set in test data directory.
>>> from gensim.test.utils import datapath
>>>
>>> with open(datapath("testcorpus.txt")) as f:
... texts = [line.strip().split() for line in f]
>>> print(texts[0])
['computer', 'human', 'interface']
If you don’t need to keep temporary objects on disk use temporary_file()
:
>>> from gensim.test.utils import temporary_file, common_corpus, common_dictionary
>>> from gensim.models import LdaModel
>>>
>>> with temporary_file("temp.txt") as tf:
... lda = LdaModel(common_corpus, id2word=common_dictionary, num_topics=3)
... lda.save(tf)
- class gensim.test.utils.LeeCorpus¶
Bases:
object
- gensim.test.utils.datapath(fname)¶
Get full path for file fname in test data directory placed in this module directory. Usually used to place corpus to test_data directory.
- Parameters
fname (str) – Name of file.
- Returns
Full path to fname in test_data folder.
- Return type
str
Example
Let’s get path of test GloVe data file and check if it exits.
>>> from gensim.corpora import MmCorpus >>> from gensim.test.utils import datapath >>> >>> corpus = MmCorpus(datapath("testcorpus.mm")) >>> for document in corpus: ... pass
- gensim.test.utils.get_tmpfile(suffix)¶
Get full path to file suffix in temporary folder. This function doesn’t creates file (only generate unique name). Also, it may return different paths in consecutive calling.
- Parameters
suffix (str) – Suffix of file.
- Returns
Path to suffix file in temporary folder.
- Return type
str
Examples
Using this function we may get path to temporary file and use it, for example, to store temporary model.
>>> from gensim.models import LsiModel >>> from gensim.test.utils import get_tmpfile, common_dictionary, common_corpus >>> >>> tmp_f = get_tmpfile("toy_lsi_model") >>> >>> model = LsiModel(common_corpus, id2word=common_dictionary) >>> model.save(tmp_f) >>> >>> loaded_model = LsiModel.load(tmp_f)
- gensim.test.utils.temporary_file(name='')¶
This context manager creates file name in temporary directory and returns its full path. Temporary directory with included files will deleted at the end of context. Note, it won’t create file.
- Parameters
name (str) – Filename.
- Yields
str – Path to file name in temporary directory.
Examples
This example demonstrates that created temporary directory (and included files) will deleted at the end of context.
>>> import os >>> from gensim.test.utils import temporary_file >>> with temporary_file("temp.txt") as tf, open(tf, 'w') as outfile: ... outfile.write("my extremely useful information") ... print("Is this file exists? {}".format(os.path.exists(tf))) ... print("Is this folder exists? {}".format(os.path.exists(os.path.dirname(tf)))) Is this file exists? True Is this folder exists? True >>> >>> print("Is this file exists? {}".format(os.path.exists(tf))) Is this file exists? False >>> print("Is this folder exists? {}".format(os.path.exists(os.path.dirname(tf)))) Is this folder exists? False