Full Text Search

When thinking about building a software system with full text search, you'd probably appreciate having a tool capable of finding words in whatever form they may be. When, for example, searching for kurzy akcií, you surely appreciate that program finds also an article containing vývoj kurzu akcií  or just akcie. For these purposes there is our component, providing you with:

 

Linguistic part

Much like with the spellcheckers, this solution is based on formal description of the morphology, with extra information added. This mainly includes all grammatical categories:

 

Finding principal form of a word is not as simple as it seems. Besides regular forms, various root alternations, which are very common in Czech, must be solved. Examples include pairs like mráz-mrazustůl-stoluBůh-Bohubrontosaurus-brontosauřipelyněk-pelyňkupéct-pečestonat-stůně, or even hnát-ženuZeus-Diačest-cti  etc. where even the first letters are different. Similar situations can be found in virtually every language. In English, common examples include pairs like come-camebreak-broken or go-went.

And then there is the issue of homonymy. There are many forms with unclear roots - ženu , for example, can be derived either from žena (singular accusative) or hnát (1st person singular). Moreover, hnát can be either the verb referring to movement or noun denoting a limb. There are numerous examples like this, so don't get surprised when getting more than one search result. 

Software solution

Our software solution is very economical. Did you know that there are more than 6 700 000 forms of words in Czech? And yet this huge number, including all morphological information, takes up only 1MB. It is no exaggeration to say that we need just one bit for three Czech words.

Available functions

 

We currently support full text search for a wide range of languages (see table) and platforms (see overview). To make your search even more user-friendly, you are advised to use our Thesaurus - Dictionary of synonyms.

For multilingual search it can be combined with our translation technologies, for audio or video search with our speech technologies.

References

Try out all abilities of morphological search for various languages in our Lingea Lexicon applications. They are primarily used in search engines of various products or corporate systems.