gensim logo

gensim tagline

Get Expert Help From The Gensim Authors

Consulting in Machine Learning & NLP

• Commercial document similarity engine:

Corporate trainings in Python Data Science and Deep Learning

models.normmodel – Normalization model

models.normmodel – Normalization model

class gensim.models.normmodel.NormModel(corpus=None, norm='l2')

Bases: gensim.interfaces.TransformationABC

Objects of this class realize the explicit normalization of vectors (l1 and l2).

Compute the l1 or l2 normalization by normalizing separately for each document in a corpus.

If v_{i,j} is the ‘i’th component of the vector representing document ‘j’, the l1 normalization is

l1_{i, j} = \frac{v_{i,j}}{\sum_k |v_{k,j}|}

the l2 normalization is

l2_{i, j} = \frac{v_{i,j}}{\sqrt{\sum_k v_{k,j}^2}}

  • corpus (iterable of iterable of (int, number), optional) – Input corpus.
  • norm ({'l1', 'l2'}, optional) – Norm used to normalize.

Call the normalize().

Parameters:bow (list of (int, number)) – Document in BoW format.
Returns:Normalized document.
Return type:list of (int, number)

Calculate the norm by calling unitvec() with the norm parameter.

Parameters:corpus (iterable of iterable of (int, number)) – Input corpus.
classmethod load(fname, mmap=None)

Load a previously saved object (using save()) from file.

  • fname (str) – Path to file that contains needed object.
  • mmap (str, optional) – Memory-map option. If the object was saved with large arrays stored separately, you can load these arrays via mmap (shared memory) using mmap=’r’. If the file being loaded is compressed (either ‘.gz’ or ‘.bz2’), then `mmap=None must be set.

See also


Returns:Object loaded from fname.
Return type:object
Raises:IOError – When methods are called on instance (should be called from class).

Normalize a simple count representation.

Parameters:bow (list of (int, number)) – Document in BoW format.
Returns:Normalized document.
Return type:list of (int, number)
save(fname_or_handle, separately=None, sep_limit=10485760, ignore=frozenset([]), pickle_protocol=2)

Save the object to file.

  • fname_or_handle (str or file-like) – Path to output file or already opened file-like object. If the object is a file handle, no special array handling will be performed, all attributes will be saved to the same file.
  • separately (list of str or None, optional) – If None - automatically detect large numpy/scipy.sparse arrays in the object being stored, and store them into separate files. This avoids pickle memory errors and allows mmap’ing large arrays back on load efficiently. If list of str - this attributes will be stored in separate files, the automatic check is not performed in this case.
  • sep_limit (int) – Limit for automatic separation.
  • ignore (frozenset of str) – Attributes that shouldn’t be serialize/store.
  • pickle_protocol (int) – Protocol number for pickle.

See also