The European Digital Mathematics Library (EuDML) makes European mathematics literature available online in the form of an enduring digital collection.
Who, where, when
EuDML is an EU project, developed and maintained by a network of 15 institutions from Portugal, France, UK, Germany, Czech Republic, Poland, Spain, Bulgaria and Greece. Its goal is to collect, digitize (possibly via OCR), archive and display all mathematical literature, ranging from medieval texts to modern articles.
During 2010-2013, I cooperated with Petr Sojka and the MUNI partner on a module for text analysis and similarity search.
I participated in the project’s work package dealing with metadata enhancements and similarity search. The result was a server that indexes mathematical texts (incl. formulae, where available from MathML/OCR) and searches for similar records.
Technologies, data sets, tools
The server was written in Python, communicating with Java framework via a TCP/IP API. EuDML contains hundreds of thousands of records, with only a moderate rate of access and updates, so the major challenges in this project were logistical and personal (15 partner institutions!), rather than technical.
Digital Libraries / Semantic Analysis