gensim

gensim
gensim tagline
Fork me on GitHub
gensim code example
Gensim is a FREE Python library
  • Scalable statistical semantics
  • Analyze plain-text documents for semantic structure
  • Retrieve semantically similar documents

Installation

Quick install

Run in your terminal (recommended):

pip install --upgrade gensim
            

or, alternatively for conda environments:

conda install -c conda-forge gensim
            

That's it! Congratulations, you can proceed to the tutorials.

In case that failed, make sure you're installing into a writeable location.


Code dependencies

Gensim runs on Linux, Windows and Mac OS X, and should run on any other platform that supports Python 2.7 or 3.5+ and NumPy. Gensim depends on the following software:

  • Python, tested with versions 2.7, 3.5, 3.6 and 3.7.
  • NumPy for number crunching.
  • smart_open for transparently opening files on remote storages or compressed files.

Testing Gensim

Gensim uses continuous integration, automatically running a full test suite on each pull request with

CI service Task Build badge
Travis Run tests on Linux and check code-style Travis
AppVeyor Run tests on Windows AppVeyor
CircleCI Build documentation CircleCI

Problems?

Use the Gensim discussion group for questions and troubleshooting. See the support page for commercial support.

Who is using Gensim?
Doing something interesting with Gensim? Ask to be featured here.

  • “Here at Tailwind, we use Gensim to help our customers post interesting and relevant content to Pinterest. No fuss, no muss. Just fast, scalable language processing.” Waylon Flinn, Tailwind
  • “We are using Gensim every day. Over 15 thousand times per day to be precise. Gensim’s LDA module lies at the very core of the analysis we perform on each uploaded publication to figure out what it’s all about. It simply works.” Andrius Butkus, Issuu
  • “Gensim hits the sweetest spot of being a simple yet powerful way to access some incredibly complex NLP goodness.” Alan J. Salmoni, Roistr.com
  • “I used Gensim at Ghent university. I found it easy to build prototypes with various models, extend it with additional features and gain empirical insights quickly. It's a reliable library that can be used beyond prototyping too.” Dieter Plaetinck, IBCN group
  • “We used Gensim in several text mining projects at Sports Authority. The data were from free-form text fields in customer surveys, as well as social media sources. Having Gensim significantly sped our time to development, and it is still my go-to package for topic modeling with large retail data sets.” Josh Hemann, Sports Authority
  • “Semantic analysis is a hot topic in online marketing, but there are few products on the market that are truly powerful. Gensim is undoubtedly one of the best frameworks that efficiently implement algorithms for statistical analysis. Few products, even commercial, have this level of quality.” Bruno Champion, DynAdmic
  • “Based on our experience with Gensim on DML-CZ, we naturally opted to use it on a much bigger scale for similarity of fulltexts of scientific papers in the European Digital Mathematics Library. In evaluation with other approaches, Gensim became a clear winner, especially because of speed, scalability and ease of use.”Petr Sojka, EuDML
  • “We have been using Gensim in several DTU courses related to digital media engineering and find it immensely useful as the tutorial material provides students an excellent introduction to quickly understand the underlying principles in topic modeling based on both LSA and LDA.”Michael Kai Petersen, Technical University of Denmark
get started