Full Text Search
When thinking about building a software system with full text search, you'd probably appreciate having a tool capable of finding words in whatever form they may be. When, for example, searching for kurzy akcií, you surely appreciate that program finds also an article containing vývoj kurzu akcií or just akcie. For these purposes there is our component, providing you with:
- principal form of a word formed from any of its inflected forms
- all forms of a given principal form
Linguistic part
Much like with the spellcheckers, this solution is based on formal description of the morphology, with extra information added. This mainly includes all grammatical categories:
- case, number and gender of nouns
- person, number, mood, tense and aspect of verbs
- categories of pronouns, numerals, adverbs or conjunctions
Finding principal form of a word is not as simple as it seems. Besides regular forms, various root alternations, which are very common in Czech, must be solved. Examples include pairs like mráz-mrazu, stůl-stolu, Bůh-Bohu, brontosaurus-brontosauři, pelyněk-pelyňku, péct-peče, stonat-stůně, or even hnát-ženu, Zeus-Dia, čest-cti etc. where even the first letters are different. Similar situations can be found in virtually every language. In English, common examples include pairs like come-came, break-broken or go-went.
And then there is the issue of homonymy. There are many forms with unclear roots - ženu , for example, can be derived either from žena (singular accusative) or hnát (1st person singular). Moreover, hnát can be either the verb referring to movement or noun denoting a limb. There are numerous examples like this, so don't get surprised when getting more than one search result.
Software solution
Our software solution is very economical. Did you know that there are more than 6 700 000 forms of words in Czech? And yet this huge number, including all morphological information, takes up only 1MB. It is no exaggeration to say that we need just one bit for three Czech words.
Available functions
- Principal form of a word returned.
- All morphologically related forms of a word returned.
- Whole noun group, consisting of noun and adjective, declined. E.g. akciová společnost, akciové společnosti, ..., akciovou společností, akciové společnosti etc.
We currently support full text search for a wide range of languages (see table) and platforms (see overview). To make your search even more user-friendly, you are advised to use our Thesaurus - Dictionary of synonyms.
For multilingual search it can be combined with our translation technologies, for audio or video search with our speech technologies.
References
Try out all abilities of morphological search for various languages in our Lingea Lexicon applications. They are primarily used in search engines of various products or corporate systems.