`similarities.levenshtein` – Fast soft-cosine semantic similarity search¶

This module allows fast fuzzy search between strings, using kNN queries with Levenshtein similarity.

class gensim.similarities.levenshtein.LevenshteinSimilarityIndex(dictionary, alpha=1.8, beta=5.0, max_distance=2)¶

Retrieve the most similar terms from a static set of terms (“dictionary”) given a query term, using Levenshtein similarity.

“Levenshtein similarity” is a modification of the Levenshtein (edit) distance, defined in [charletetal17].

This implementation uses the FastSS algorithm for fast kNN nearest-neighbor retrieval.

Parameters

dictionary (Dictionary) – A dictionary that specifies the considered terms.
alpha (float, optional) – Multiplicative factor alpha for the Levenshtein similarity. See [charletetal17].
beta (float, optional) – The exponential factor beta for the Levenshtein similarity. See [charletetal17].
max_distance (int, optional) – Do not consider terms with Levenshtein distance larger than this as “similar”. This is done for performance reasons: keep this value below 3 for reasonable retrieval performance. Default is 1.

similarities.levenshtein – Fast soft-cosine semantic similarity search¶