When searching in multilingual texts on the Internet as well as in databases, it is good to know what particular language is used in a given part of the text (e.g. sentence, paragraph or article). Automatic language recognition is important for further text processing such as indexing, lemmatizing, tagging, searching etc. Language detection is the first step in text processing for entities using multiple languages, when searching large volumes of text or audio data, processing them or further analyzing them.
Lemmatizer is an integral part of any fully-fledged full text search function for any inflectional language. It is able to identify the base form (or lemma) for any given word form or generate all possible correct word forms for a given lemma. In Slavic languages there can be dozens of different word forms for one lemma, in Finno-Ugric languages even hundreds of them. Depending on a given language, we also allow for other regular word-forming phenomena such as forming deverbatives, word compounding or forming various numerals. Our solution covers over 30 languages in a unified manner which means that all you need to do is integrate just one library into any application thus reducing the development, testing and maintenance costs of the application.
The entire solution is based on a detailed formal description of given morphology which enables us to assign all relevant grammatical categories to any given word. These categories include for example:
- grammatical case, number or gender for nouns
- person, number, mood, tense and aspect for verbs
- categories of pronouns, adverbs, numerals or conjunctions
Just one universal library can deal with the specifics of most European languages, be it a Germanic, Romance, Slavic, Baltic, Finno-Ugric or Greek language. We support high-quality solutions for over 30 languages.
Thesaurus is a useful component enabling the user to further extend the scope of search by other words with the same meaning. When searching, it can also be used to match loanwords with domestic words of the same meaning (if such words exist). In combination with the Lemmatizer, it can also look up synonyms for a word regardless of the morphological form in which it is entered.
Term translator searches analyzed text in the source language for terms (even multi-word terms) and analyzes them as their translation in the target language (again as multi-word terms). This is useful mainly in query analysis for cross-lingual search. You can for example search large amounts of English texts by entering queries in your native language (other than English).