scripts.word2vec2tensor – Convert the word2vec format to Tensorflow 2D tensor

This script allows converting word-vectors from word2vec format into Tensorflow 2D tensor and metadata format. This script used for word-vector visualization on Embedding Visualization.

How to use

  1. Convert your word-vector with this script (for example, we’ll use model from gensim-data)

    python -m gensim.downloader -d glove-wiki-gigaword-50  # download model in word2vec format
    python -m gensim.scripts.word2vec2tensor -i ~/gensim-data/glove-wiki-gigaword-50/glove-wiki-gigaword-50.gz                                              -o /tmp/my_model_prefix
    
  2. Open http://projector.tensorflow.org/

  3. Click “Load Data” button from the left menu.

  4. Select “Choose file” in “Load a TSV file of vectors.” and choose “/tmp/my_model_prefix_tensor.tsv” file.

  5. Select “Choose file” in “Load a TSV file of metadata.” and choose “/tmp/my_model_prefix_metadata.tsv” file.

  6. ???

  7. PROFIT!

For more information about TensorBoard TSV format please visit: https://www.tensorflow.org/versions/master/how_tos/embedding_viz/

Command line arguments

...
  -h, --help            show this help message and exit
  -i INPUT, --input INPUT
                        Path to input file in word2vec format
  -o OUTPUT, --output OUTPUT
                        Prefix path for output files
  -b, --binary          Set this flag if word2vec model in binary format
                        (default: False)
gensim.scripts.word2vec2tensor.word2vec2tensor(word2vec_model_path, tensor_filename, binary=False)

Convert file in Word2Vec format and writes two files 2D tensor TSV file.

File “tensor_filename”_tensor.tsv contains word-vectors, “tensor_filename”_metadata.tsv contains words.

Parameters
  • word2vec_model_path (str) – Path to file in Word2Vec format.

  • tensor_filename (str) – Prefix for output files.

  • binary (bool, optional) – True if input file in binary format.