For our vacation this year, we picked Vietnam: Christmas in the touristy region of Mui Ne, then New Year in Saigon (aka Ho Chi Minh City). This post is a short travel log with pictures, highlights, travel costs and tips from our stay there. My hope is this may help someone planning a similar trip.
One year ago, Tomáš Mikolov (together with his colleagues at Google) made some ripples by releasing word2vec, an unsupervised algorithm for learning the meaning behind words. In this blog post, I’ll evaluate some extensions that have appeared over the year.
The latest gensim release of 0.10.3 has a new class named Doc2Vec. All credit for this class, which is an implementation of Quoc Le & Tomáš Mikolov: “Distributed Representations of Sentences and Documents”, as well as for this tutorial, goes to the illustrious Tim Emerick.
Latent Dirichlet Allocation (LDA), one of the most used modules in gensim, has received a major performance revamp recently. Using all your machine cores at once now, chances are the new LdaMulticore class is limited by the speed you can feed it input data. Make sure your CPU fans are in working order!
Now that I have a blog, I figured I could start posting more info about our travels. So here’s a little digest from one of our recent trips. I’m hoping it will be useful to other tourists looking to visit Gran Canaria, especially in the same season we went (February).
There are tools and concepts in computing that are very powerful but potentially confusing to novices. One such concept is data streaming (aka lazy evaluation), which can be realized neatly and natively in Python. Do you know when and how to use generators, iterators and iterables?
Previous posts explained the whys & whats of nearest-neighbour search, the available OSS libraries and Python wrappers. We converted the English Wikipedia to vector space, to be used as our testing dataset for retrieving “similar articles”. In this post, I finally get to some hard performance numbers.
The end of the year is proving crazy busy as usual, but gensim acquired a cool new feature that I just had to blog about. Ben Trahan sent a patch that allows automatic tuning of Latent Dirichlet Allocation (LDA) hyperparameters in gensim. This means that an optimal, asymmetric alpha can now be trained directly from your data.