Parallelizing word2vec in Python

The final installment on optimizing word2vec in Python: how to make use of multicore machines.

You may want to read Part One and Part Two first.


The original C toolkit allows setting a “-threads N” parameter, which effectively splits the training corpus into N parts, each to be processed by a separate thread in parallel. The result is a nice speed-up: 1.9x for N=2 threads, 3.2x for N=4.

Read more on Parallelizing word2vec in Python…

Deep learning with word2vec and gensim

Neural networks have been a bit of a punching bag historically: neither particularly fast, nor robust or accurate, nor open to introspection by humans curious to gain insights from them. But things have been changing lately, with deep learning becoming a hot topic in academia with spectacular results. I decided to check out one deep learning algorithm via gensim.

Read more on Deep learning with word2vec and gensim…