Radim Řehůřek : Articles: Page 2

Performance Shootout of Nearest Neighbours: Contestants

posted on December 8, 2013 by Radim | 15 Comments

Continuing the benchmark of libraries for nearest-neighbour similarity search, part 2. What is the best software out there for similarity search in high dimensional vector spaces?

Document Similarity @ English Wikipedia

Performance Shootout of Nearest Neighbours: Intro

posted on November 30, 2013 by Radim | 2 Comments

Violent as the title sounds, I’ll be actually benchmarking software packages that realize the nearest-neighbour search in high dimensional vector spaces. Which approach is the fastest, easiest to use, the best? No neighbours got harmed writing this post.

Money, startups, fame, bullshit

posted on November 15, 2013 by Radim | 2 Comments

I write these lines as I recover my voice from the Pioneers Festival 2013 in Vienna, a major event in the world of IT startups, investors and the media. I’m not used to talking so much, for so long :-) Read more on Money, startups, fame, bullshit…

Five Years of Gensim

posted on October 28, 2013 by Radim | 7 Comments

Gensim, the machine learning library for unsupervised learning I started in late 2008, will be celebrating its fifth anniversary this November. Time to reminisce and mull over its successes and failures :)

Technology vs. politics, round 1

posted on October 25, 2013 by Radim | 2 Comments

I intend to keep this blog mostly technical. But since it’s the eve before parliamentary elections here in the Czech Republic, I feel a small politically-technical rant is in order.

Parallelizing word2vec in Python

posted on October 4, 2013 by Radim | 13 Comments

The final installment on optimizing word2vec in Python: how to make use of multicore machines.

You may want to read Part One and Part Two first.

Word2vec in Python, Part Two: Optimizing

posted on September 21, 2013 by Radim | 44 Comments

Last weekend, I ported Google’s word2vec into Python. The result was a clean, concise and readable code that plays well with other Python NLP packages. One problem remained: the performance was 20x slower than the original C code, even after all the obvious NumPy optimizations.

Deep learning with word2vec and gensim

posted on September 17, 2013 by Radim | 26 Comments

Neural networks have been a bit of a punching bag historically: neither particularly fast, nor robust or accurate, nor open to introspection by humans curious to gain insights from them. But things have been changing lately, with deep learning becoming a hot topic in academia with spectacular results. I decided to check out one deep learning algorithm via gensim.

Site under construction

posted on August 30, 2013 by Radim | No comments

EDIT: went live on 8th September, comments and suggestions welcome :-)