Why Gensim?
Super fast
The fastest library for training of vector embeddings – Python or otherwise. The core algorithms in Gensim use battle-hardened, highly optimized & parallelized C routines.
Data Streaming
Gensim can process arbitrarily large corpora, using data-streamed algorithms. There are no "dataset must fit in RAM" limitations.
Platform independent
Gensim runs on Linux, Windows and OS X, as well as any other platform that supports Python and NumPy.
Proven
With thousands of companies using Gensim every day, over 2600 academic citations and 1M downloads per week, Gensim is one of the most mature ML libraries.
Open source
All Gensim source code is hosted on Github under the GNU LGPL license, maintained by its open source community. For commercial arrangements, see Business Support.
Ready-to-use models and corpora
The Gensim community also publishes pretrained models for specific domains like legal or health, via the Gensim-data project.
Installation
Quick install
Run in your terminal (recommended):
pip install --upgrade gensim
or, alternatively for conda environments:
conda install -c conda-forge gensim
That's it! Congratulations, you can proceed to the tutorials.
Code dependencies
Gensim runs on Linux, Windows and Mac OS X, and should run on any other platform that supports Python 3.8+ and NumPy. Gensim depends on the following software:
- Python, tested with versions 3.8, 3.9, 3.10 and 3.11.
- NumPy for number crunching.
- smart_open for transparently opening files on remote storages or compressed files.
Testing Gensim
Or, to install and test Gensim locally:
pip install -e . # compile and install Gensim from the current directory
pytest gensim # run the tests
Who is using Gensim?
Doing something interesting with Gensim? Sponsor Gensim and ask to be featured among adopters.