gensim logo

gensim
gensim tagline

Get Expert Help From The Gensim Authors

Consulting in Machine Learning & NLP

• Commercial document similarity engine: ScaleText.ai

Corporate trainings in Python Data Science and Deep Learning

models.normmodel – Normalization model

models.normmodel – Normalization model

class gensim.models.normmodel.NormModel(corpus=None, norm='l2')

Bases: gensim.interfaces.TransformationABC

Objects of this class realize the explicit normalization of vectors (l1 and l2).

Compute the l1 or l2 normalization by normalizing separately for each document in a corpus.

If v_{i,j} is the ‘i’th component of the vector representing document ‘j’, the l1 normalization is

l1_{i, j} = \frac{v_{i,j}}{\sum_k |v_{k,j}|}

the l2 normalization is

l2_{i, j} = \frac{v_{i,j}}{\sqrt{\sum_k v_{k,j}^2}}

Parameters:
  • corpus (iterable of iterable of (int, number), optional) – Input corpus.
  • norm ({'l1', 'l2'}, optional) – Norm used to normalize.
__getitem__(bow)

Call the normalize().

Parameters:bow (list of (int, number)) – Document in BoW format.
Returns:Normalized document.
Return type:list of (int, number)
calc_norm(corpus)

Calculate the norm by calling unitvec() with the norm parameter.

Parameters:corpus (iterable of iterable of (int, number)) – Input corpus.
load(fname, mmap=None)

Load an object previously saved using save() from a file.

Parameters:
  • fname (str) – Path to file that contains needed object.
  • mmap (str, optional) – Memory-map option. If the object was saved with large arrays stored separately, you can load these arrays via mmap (shared memory) using mmap=’r’. If the file being loaded is compressed (either ‘.gz’ or ‘.bz2’), then `mmap=None must be set.

See also

save()
Save object to file.
Returns:Object loaded from fname.
Return type:object
Raises:AttributeError – When called on an object instance instead of class (this is a class method).
normalize(bow)

Normalize a simple count representation.

Parameters:bow (list of (int, number)) – Document in BoW format.
Returns:Normalized document.
Return type:list of (int, number)
save(fname_or_handle, separately=None, sep_limit=10485760, ignore=frozenset([]), pickle_protocol=2)

Save the object to a file.

Parameters:
  • fname_or_handle (str or file-like) – Path to output file or already opened file-like object. If the object is a file handle, no special array handling will be performed, all attributes will be saved to the same file.
  • separately (list of str or None, optional) –

    If None, automatically detect large numpy/scipy.sparse arrays in the object being stored, and store them into separate files. This prevent memory errors for large objects, and also allows memory-mapping the large arrays for efficient loading and sharing the large arrays in RAM between multiple processes.

    If list of str: store these attributes into separate files. The automated size check is not performed in this case.

  • sep_limit (int, optional) – Don’t store arrays smaller than this separately. In bytes.
  • ignore (frozenset of str, optional) – Attributes that shouldn’t be stored at all.
  • pickle_protocol (int, optional) – Protocol number for pickle.

See also

load()
Load object from file.