Documentation

Core Tutorials: New Users Start Here!

If you’re new to gensim, we recommend going through all core tutorials in order. Understanding this functionality is vital for using gensim effectively.

Core Concepts

Core Concepts

Core Concepts
Corpora and Vector Spaces

Corpora and Vector Spaces

Corpora and Vector Spaces
Topics and Transformations

Topics and Transformations

Topics and Transformations
Similarity Queries

Similarity Queries

Similarity Queries

Tutorials: Learning Oriented Lessons

Learning-oriented lessons that introduce a particular gensim feature, e.g. a model (Word2Vec, FastText) or technique (similarity queries or text summarization).

Word2Vec Model

Word2Vec Model

Word2Vec Model
Doc2Vec Model

Doc2Vec Model

Doc2Vec Model
Ensemble LDA

Ensemble LDA

Ensemble LDA
FastText Model

FastText Model

FastText Model
Fast Similarity Queries with Annoy and Word2Vec

Fast Similarity Queries with Annoy and Word2Vec

Fast Similarity Queries with Annoy and Word2Vec
LDA Model

LDA Model

LDA Model
Word Mover's Distance

Word Mover’s Distance

Word Mover's Distance
Soft Cosine Measure

Soft Cosine Measure

Soft Cosine Measure

How-to Guides: Solve a Problem

These goal-oriented guides demonstrate how to solve a specific problem using gensim.

How to download pre-trained models and corpora

How to download pre-trained models and corpora

How to download pre-trained models and corpora
How to Author Gensim Documentation

How to Author Gensim Documentation

How to Author Gensim Documentation
How to reproduce the doc2vec 'Paragraph Vector' paper

How to reproduce the doc2vec ‘Paragraph Vector’ paper

How to reproduce the doc2vec 'Paragraph Vector' paper
How to Compare LDA Models

How to Compare LDA Models

How to Compare LDA Models

Other Resources

Blog posts, tutorial videos, hackathons and other useful Gensim resources, from around the internet.

  • Use FastText or Word2Vec? Comparison of embedding quality and performance. Jupyter Notebook

  • Multiword phrases extracted from How I Met Your Mother. Blog post by Mark Needham

  • Using Gensim LDA for hierarchical document clustering. Jupyter notebook by Brandon Rose

  • Evolution of Voldemort topic through the 7 Harry Potter books. Blog post

  • Movie plots by genre: Document classification using various techniques: TF-IDF, word2vec averaging, Deep IR, Word Movers Distance and doc2vec. Github repo

  • Word2vec: Faster than Google? Optimization lessons in Python, talk by Radim Řehůřek at PyData Berlin 2014. Youtube video

  • Word2vec & friends, talk by Radim Řehůřek at MLMU.cz 7.1.2015. Youtube video

Gallery generated by Sphinx-Gallery