`scripts.word2vec_standalone` – Train word2vec on text file CORPUS¶

USAGE: %(program)s -train CORPUS -output VECTORS -size SIZE -window WINDOW -cbow CBOW -sample SAMPLE -hs HS -negative NEGATIVE -threads THREADS -iter ITER -min_count MIN-COUNT -alpha ALPHA -binary BINARY -accuracy FILE

Trains a neural embedding model on text file CORPUS. Parameters essentially reproduce those used by the original C tool (see https://code.google.com/archive/p/word2vec/).

Parameters for training:

-train <file>: Use text data from <file> to train the model
-output <file>: Use <file> to save the resulting word vectors / word clusters
-size <int>: Set size of word vectors; default is 100
-window <int>: Set max skip length between words; default is 5
-sample <float>: Set threshold for occurrence of words. Those that appear with higher frequency in the training data will be randomly down-sampled; default is 1e-3, useful range is (0, 1e-5)
-hs <int>: Use Hierarchical Softmax; default is 0 (not used)
-negative <int>: Number of negative examples; default is 5, common values are 3 - 10 (0 = not used)
-threads <int>: Use <int> threads (default 3)
-iter <int>: Run more training iterations (default 5)
-min_count <int>: This will discard words that appear less than <int> times; default is 5
-alpha <float>: Set the starting learning rate; default is 0.025 for skip-gram and 0.05 for CBOW
-binary <int>: Save the resulting vectors in binary moded; default is 0 (off)
-cbow <int>: Use the continuous bag of words model; default is 1 (use 0 for skip-gram model)
-accuracy <file>: Compute accuracy of the resulting model analogical inference power on questions file <file> See an example of questions file at https://code.google.com/p/word2vec/source/browse/trunk/questions-words.txt

Example: python -m gensim.scripts.word2vec_standalone -train data.txt -output vec.txt -size 200 -sample 1e-4 -binary 0 -iter 3

Please sponsor Gensim to help sustain this open source project!

scripts.word2vec_standalone – Train word2vec on text file CORPUS¶

`scripts.word2vec_standalone` – Train word2vec on text file CORPUS¶