models._fasttext_bin – Facebook I/O

`models._fasttext_bin` – Facebook I/O¶

Load models from the native binary format released by Facebook.

The main entry point is the load() function. It returns a Model namedtuple containing everything loaded from the binary.

Examples

Load a model from a binary file:

>>> from gensim.test.utils import datapath
>>> from gensim.models.fasttext_bin import load
>>> with open(datapath('crime-and-punishment.bin'), 'rb') as fin:
...     model = load(fin)
>>> model.nwords
291
>>> model.vectors_ngrams.shape
(391, 5)
>>> sorted(model.raw_vocab, key=lambda w: len(w), reverse=True)[:5]
['останавливаться', 'изворачиваться,', 'раздражительном', 'exceptionally', 'проскользнуть']

See also

FB Implementation.

class gensim.models._fasttext_bin.Model(bucket, dim, epoch, hidden_output, loss, maxn, min_count, minn, model, neg, nwords, raw_vocab, t, vectors_ngrams, vocab_size, ws)¶

Bases: tuple

Holds data loaded from the Facebook binary.

Parameters

dim (int) – The dimensionality of the vectors.
ws (int) – The window size.
epoch (int) – The number of training epochs.
neg (int) – If non-zero, indicates that the model uses negative sampling.
loss (int) – If equal to 1, indicates that the model uses hierarchical sampling.
model (int) – If equal to 2, indicates that the model uses skip-grams.
bucket (int) – The number of buckets.
min_count (int) – The threshold below which the model ignores terms.
t (float) – The sample threshold.
minn (int) – The minimum ngram length.
maxn (int) – The maximum ngram length.
raw_vocab (collections.OrderedDict) – A map from words (str) to their frequency (int). The order in the dict corresponds to the order of the words in the Facebook binary.
nwords (int) – The number of words.
vocab_size (int) – The size of the vocabulary.
vectors_ngrams (numpy.array) – This is a matrix that contains vectors learned by the model. Each row corresponds to a vector. The number of vectors is equal to the number of words plus the number of buckets. The number of columns is equal to the vector dimensionality.
hidden_output (numpy.array) – This is a matrix that contains the shallow neural network output. This array has the same dimensions as vectors_ngrams. May be None - in that case, it is impossible to continue training the model.

__getitem__()¶: Return self[key].

property bucket¶: Alias for field number 0

count(value) → integer -- return number of occurrences of value¶

property dim¶: Alias for field number 1

property epoch¶: Alias for field number 2

property hidden_output¶: Alias for field number 3

index(value[, start[, stop]]) → integer -- return first index of value.¶: Raises ValueError if the value is not present.

property loss¶: Alias for field number 4

property maxn¶: Alias for field number 5

property min_count¶: Alias for field number 6

property minn¶: Alias for field number 7

property model¶: Alias for field number 8

property neg¶: Alias for field number 9

property nwords¶: Alias for field number 10

property raw_vocab¶: Alias for field number 11

property t¶: Alias for field number 12

property vectors_ngrams¶: Alias for field number 13

property vocab_size¶: Alias for field number 14

property ws¶: Alias for field number 15

gensim.models._fasttext_bin.load(fin, encoding='utf-8', full_model=True)¶

Load a model from a binary stream.

Parameters

fin (file) – The readable binary stream.
encoding (str, optional) – The encoding to use for decoding text
full_model (boolean, optional) – If False, skips loading the hidden output matrix. This saves a fair bit of CPU time and RAM, but prevents training continuation.

Returns

The loaded model.

Return type

Model

Get Expert Help From The Gensim Authors

models._fasttext_bin – Facebook I/O¶

`models._fasttext_bin` – Facebook I/O¶