Radim Řehůřek : Naviga

Overview

Naviga is a SaaS (software as a service) discovery solution for academic institutions and libraries.

Who, where, when

I architected and (together with my team) implemented Naviga for Suweco.cz, a company dealing with journal and consortial subscriptions for central Europe. The bulk of design, API specifications and system implementation was done in early 2012, with my cooperation with Suweco on new features and products extending to 2013.

The core service performs fulltext search through tens of millions of articles, with complex faceting and access right, in sub-second times.

What

Naviga contains several services, known in the “e-library world” as

search and discovery, where collections of academic articles, abstracts, reviews and other scholarly material are collected from publishers and indexed for subsequent search from a single point of access. Instead of checking dozens of publishing platforms individually, users search the entire academic universe from a single, consolidated service.
Competing products: Serials Solutions’s Summon, EBSCO’s Discovery, or PRIMO by Ex Libris.
alphabet, a service that manages customers’ titles and holdings data. A competing product to EBSCO’s A-to-Z, Ex Libris’ A-Z journal list etc.
link resolver, a service for flexible linking to electronic resources, using OpenURL. A competing product to Ex Libris’ SFX, EBSCO’s LinkSource or Serials Solution’s 360 Link.
semantica, a service for analyzing documents for subjects and themes, plus retrieving similar articles by topic. A custom, novel solution.

These were all full-blown products, with stringent requirements on user interface, privilege management of electronic resources, scalability and efficiency. The major challenge consisted of creating a high-availability, faceted, distributed search architecture over tens of millions of scholarly records.

Exclusive solutions included semantica (built using gensim) and ElasticFind, a core proprietary search platform powering Naviga that transparently combines central and federated search in a single, scalable platform.

Relevant technologies

The world of digital libraries comes with its own bag of protocols and acronyms (such as Z39.50, SRU/SRW, OpenURL, OPAC, CCL, MARC, DOI, OAI-PMH…), as well as terminology (such as “discovery” for search and “patron” for user).

Used tools, data, architecture

Naviga uses a combination of technologies for its different services and subsystems: Linux, Apache, Selenium, Java, Node.js, Python, C, Jenkins, git…

The detailed architecture and data sources for Naviga are not public.

Services

Search Engines

Project details

Project URL: http://naviga.cz/