models._fasttext_bin
– Facebook I/O¶Load models from the native binary format released by Facebook.
The main entry point is the load()
function.
It returns a Model
namedtuple containing everything loaded from the binary.
Examples
Load a model from a binary file:
>>> from gensim.test.utils import datapath
>>> from gensim.models.fasttext_bin import load
>>> with open(datapath('crime-and-punishment.bin'), 'rb') as fin:
... model = load(fin)
>>> model.nwords
291
>>> model.vectors_ngrams.shape
(391, 5)
>>> sorted(model.raw_vocab, key=lambda w: len(w), reverse=True)[:5]
['останавливаться', 'изворачиваться,', 'раздражительном', 'exceptionally', 'проскользнуть']
See also
gensim.models._fasttext_bin.
Model
(bucket, dim, epoch, hidden_output, loss, maxn, min_count, minn, model, neg, nwords, raw_vocab, t, vectors_ngrams, vocab_size, ws)¶Bases: tuple
Holds data loaded from the Facebook binary.
dim (int) – The dimensionality of the vectors.
ws (int) – The window size.
epoch (int) – The number of training epochs.
neg (int) – If non-zero, indicates that the model uses negative sampling.
loss (int) – If equal to 1, indicates that the model uses hierarchical sampling.
model (int) – If equal to 2, indicates that the model uses skip-grams.
bucket (int) – The number of buckets.
min_count (int) – The threshold below which the model ignores terms.
t (float) – The sample threshold.
minn (int) – The minimum ngram length.
maxn (int) – The maximum ngram length.
raw_vocab (collections.OrderedDict) – A map from words (str) to their frequency (int). The order in the dict corresponds to the order of the words in the Facebook binary.
nwords (int) – The number of words.
vocab_size (int) – The size of the vocabulary.
vectors_ngrams (numpy.array) – This is a matrix that contains vectors learned by the model. Each row corresponds to a vector. The number of vectors is equal to the number of words plus the number of buckets. The number of columns is equal to the vector dimensionality.
hidden_output (numpy.array) – This is a matrix that contains the shallow neural network output. This array has the same dimensions as vectors_ngrams. May be None - in that case, it is impossible to continue training the model.
__getitem__
()¶Return self[key].
bucket
¶Alias for field number 0
count
(value) → integer -- return number of occurrences of value¶dim
¶Alias for field number 1
epoch
¶Alias for field number 2
Alias for field number 3
index
(value[, start[, stop]]) → integer -- return first index of value.¶Raises ValueError if the value is not present.
loss
¶Alias for field number 4
maxn
¶Alias for field number 5
min_count
¶Alias for field number 6
minn
¶Alias for field number 7
model
¶Alias for field number 8
neg
¶Alias for field number 9
nwords
¶Alias for field number 10
raw_vocab
¶Alias for field number 11
t
¶Alias for field number 12
vectors_ngrams
¶Alias for field number 13
vocab_size
¶Alias for field number 14
ws
¶Alias for field number 15
gensim.models._fasttext_bin.
load
(fin, encoding='utf-8', full_model=True)¶Load a model from a binary stream.
fin (file) – The readable binary stream.
encoding (str, optional) – The encoding to use for decoding text
full_model (boolean, optional) – If False, skips loading the hidden output matrix. This saves a fair bit of CPU time and RAM, but prevents training continuation.
The loaded model.