models._fasttext_bin – Facebook’s fastText I/O

Load models from the native binary format released by Facebook.

The main entry point is the load() function. It returns a Model namedtuple containing everything loaded from the binary.


Load a model from a binary file:

>>> from gensim.test.utils import datapath
>>> from gensim.models.fasttext_bin import load
>>> with open(datapath('crime-and-punishment.bin'), 'rb') as fin:
...     model = load(fin)
>>> model.nwords
>>> model.vectors_ngrams.shape
(391, 5)
>>> sorted(model.raw_vocab, key=lambda w: len(w), reverse=True)[:5]
['останавливаться', 'изворачиваться,', 'раздражительном', 'exceptionally', 'проскользнуть']

See also

FB Implementation.

class gensim.models._fasttext_bin.Model(bucket, dim, epoch, hidden_output, loss, lr_update_rate, maxn, min_count, minn, model, neg, ntokens, nwords, raw_vocab, t, vectors_ngrams, vocab_size, word_ngrams, ws)

Bases: tuple

Holds data loaded from the Facebook binary.

  • dim (int) – The dimensionality of the vectors.

  • ws (int) – The window size.

  • epoch (int) – The number of training epochs.

  • neg (int) – If non-zero, indicates that the model uses negative sampling.

  • loss (int) – If equal to 1, indicates that the model uses hierarchical sampling.

  • model (int) – If equal to 2, indicates that the model uses skip-grams.

  • bucket (int) – The number of buckets.

  • min_count (int) – The threshold below which the model ignores terms.

  • t (float) – The sample threshold.

  • minn (int) – The minimum ngram length.

  • maxn (int) – The maximum ngram length.

  • raw_vocab (collections.OrderedDict) – A map from words (str) to their frequency (int). The order in the dict corresponds to the order of the words in the Facebook binary.

  • nwords (int) – The number of words.

  • vocab_size (int) – The size of the vocabulary.

  • vectors_ngrams (numpy.array) – This is a matrix that contains vectors learned by the model. Each row corresponds to a vector. The number of vectors is equal to the number of words plus the number of buckets. The number of columns is equal to the vector dimensionality.

  • hidden_output (numpy.array) – This is a matrix that contains the shallow neural network output. This array has the same dimensions as vectors_ngrams. May be None - in that case, it is impossible to continue training the model.

__getitem__(key, /)

Return self[key].


Alias for field number 0

count(value, /)

Return number of occurrences of value.


Alias for field number 1


Alias for field number 2


Alias for field number 3

index(value, start=0, stop=9223372036854775807, /)

Return first index of value.

Raises ValueError if the value is not present.


Alias for field number 4


Alias for field number 5


Alias for field number 6


Alias for field number 7


Alias for field number 8


Alias for field number 9


Alias for field number 10


Alias for field number 11


Alias for field number 12


Alias for field number 13


Alias for field number 14


Alias for field number 15


Alias for field number 16


Alias for field number 17


Alias for field number 18

gensim.models._fasttext_bin.load(fin, encoding='utf-8', full_model=True)

Load a model from a binary stream.

  • fin (file) – The readable binary stream.

  • encoding (str, optional) – The encoding to use for decoding text

  • full_model (boolean, optional) – If False, skips loading the hidden output matrix. This saves a fair bit of CPU time and RAM, but prevents training continuation.


The loaded model.

Return type

Model, fout, fb_fasttext_parameters, encoding)

Saves word embeddings to the Facebook’s native fasttext .bin format.

  • fout (file name or writeable binary stream) – stream to which model is saved

  • model (gensim.models.fasttext.FastText) – saved model

  • fb_fasttext_parameters (dictionary) – dictionary contain parameters containing lr_update_rate, word_ngrams unused by gensim implementation, so they have to be provided externally

  • encoding (str) – encoding used in the output file


Unfortunately, there is no documentation of the Facebook’s native fasttext .bin format

This is just reimplementation of [FastText::saveModel](

Based on v0.9.1, more precisely commit da2745fcccb848c7a225a7d558218ee4c64d5333

Code follows the original C++ code naming.