Poliqarp for DjVu
Poliqarp for DjVu is an open-source search engine software for DjVu corpora available on GNU GPL license, developped by Janusz S. Bień at the University of Warsaw. It relies on the DjVu format and allows to present end-users with results of advanced language technologies.
Conceived as a modification of the Poliqarp (Polyinterpretation Indexing Query and Retrieval Procesor) corpus query tool, it inherits from its origin the powerfull search facilities based on two-level regular expressions, which can be used in the queries to circumvent the OCR errors, but also the ability to represent low-level ambiguities and other linguistic phenomena. It delivers highlighted results and KWIC search results.
Although at present the tool is used mainly to facilitate access to the results of dirty OCR, it is ready to handle also more sophisticated output of linguistic technologies.
Poliqarp for DjVu is in particular used for a non-medieval corpus (corpus of historical Polish (since 1570 to 1756), with issues related to medieval corpus (spelling, abbreviations, etc.)