scripts.word2vec2tensor
– Convert the word2vec format to Tensorflow 2D tensor¶
This script allows converting word-vectors from word2vec format into Tensorflow 2D tensor and metadata format. This script used for word-vector visualization on Embedding Visualization.
How to use¶
Convert your word-vector with this script (for example, we’ll use model from gensim-data)
python -m gensim.downloader -d glove-wiki-gigaword-50 # download model in word2vec format python -m gensim.scripts.word2vec2tensor -i ~/gensim-data/glove-wiki-gigaword-50/glove-wiki-gigaword-50.gz -o /tmp/my_model_prefix
Click “Load Data” button from the left menu.
Select “Choose file” in “Load a TSV file of vectors.” and choose “/tmp/my_model_prefix_tensor.tsv” file.
Select “Choose file” in “Load a TSV file of metadata.” and choose “/tmp/my_model_prefix_metadata.tsv” file.
???
PROFIT!
For more information about TensorBoard TSV format please visit: https://www.tensorflow.org/versions/master/how_tos/embedding_viz/
Command line arguments¶
...
-h, --help show this help message and exit
-i INPUT, --input INPUT
Path to input file in word2vec format
-o OUTPUT, --output OUTPUT
Prefix path for output files
-b, --binary Set this flag if word2vec model in binary format
(default: False)
- gensim.scripts.word2vec2tensor.word2vec2tensor(word2vec_model_path, tensor_filename, binary=False)¶
Convert file in Word2Vec format and writes two files 2D tensor TSV file.
File “tensor_filename”_tensor.tsv contains word-vectors, “tensor_filename”_metadata.tsv contains words.
- Parameters
word2vec_model_path (str) – Path to file in Word2Vec format.
tensor_filename (str) – Prefix for output files.
binary (bool, optional) – True if input file in binary format.