corpora.csvcorpus – Corpus in CSV format¶
Corpus in CSV format.
- class gensim.corpora.csvcorpus.CsvCorpus(fname, labels)¶
Corpus in CSV format.
The CSV delimiter, headers etc. are guessed automatically based on the file content. All row values are expected to be ints/floats.
fname (str) – Path to corpus.
labels (bool) – If True - ignore first column (class labels).
- add_lifecycle_event(event_name, log_level=20, **event)¶
Append an event into the lifecycle_events attribute of this object, and also optionally log the event at log_level.
Events are important moments during the object’s life, such as “model created”, “model saved”, “model loaded”, etc.
The lifecycle_events attribute is persisted across object’s
load()operations. It has no impact on the use of the model, but is useful during debugging and support.
Set self.lifecycle_events = None to disable this behaviour. Calls to add_lifecycle_event() will not record events into self.lifecycle_events then.
event_name (str) – Name of the event. Can be any label, e.g. “created”, “stored” etc.
event (dict) –
Key-value mapping to append to self.lifecycle_events. Should be JSON-serializable, so keep it simple. Can be empty.
This method will automatically add the following key-values to event, so you don’t have to specify them:
datetime: the current date & time
gensim: the current Gensim version
python: the current Python version
platform: the current platform
event: the name of this event
log_level (int) – Also log the complete event dict, at the specified log level. Set to False to not log at all.
- classmethod load(fname, mmap=None)¶
Load an object previously saved using
save()from a file.
fname (str) – Path to file that contains needed object.
mmap (str, optional) – Memory-map option. If the object was saved with large arrays stored separately, you can load these arrays via mmap (shared memory) using mmap=’r’. If the file being loaded is compressed (either ‘.gz’ or ‘.bz2’), then `mmap=None must be set.
Save object to file.
Object loaded from fname.
- Return type
AttributeError – When called on an object instance instead of class (this is a class method).
- save(*args, **kwargs)¶
Saves the in-memory state of the corpus (pickles the object).
This saves only the “internal state” of the corpus object, not the corpus data!
To save the corpus data, use the serialize method of your desired output format instead, e.g.
- static save_corpus(fname, corpus, id2word=None, metadata=False)¶
Save corpus to disk.
Some formats support saving the dictionary (feature_id -> word mapping), which can be provided by the optional id2word parameter.
Some corpora also support random access via document indexing, so that the documents on disk can be accessed in O(1) time (see the
In this case,
save_corpus()is automatically called internally by
serialize(), which does
save_corpus()plus saves the index at the same time.
serialize() is preferred to calling :meth:`gensim.interfaces.CorpusABC.save_corpus().
fname (str) – Path to output file.
corpus (iterable of list of (int, number)) – Corpus in BoW format.
Dictionary, optional) – Dictionary of corpus.
metadata (bool, optional) – Write additional metadata to a separate too?