Projects

We have implemented our algorithms for understanding the contents of documents written in natural language in Pathevo, an application from American company Owen Software which facilitates planning for education and professional careers. We carried out a feasibility study, tuned our algorithms for commercial application and delivered our know-how in the form of contractual research and a subsequent licensing agreement.
View details
We have implemented a system for automatic categorization of press releases (sport, politics, etc.) for the Czech News Agency. It simplifies the process of adding new press releases as well as searching in them. The challenge is the multi-label nature of this task, i.e. each press release can belong to multiple categories and we do not know in advance how many there are. Our system works with the semantics of the press releases, because the mere use of words without knowing their relations is not sufficient. Our know-how was delivered in the form of a software license.
View details
We have implemented a system for a renowned Czech law firm that is able to process the contents of legal documents, automatically create an archive and enable intelligent search. The query for our search engine is a previously solved legal problem. We enabled the processing of thousands of documents that lacked any metadata. The challenge lies in the fact that our system has to work with document semantics. Processing using keywords words alone is not sufficient. We have delivered our know-how in the form of software licenses.
View details
The company originated at the University of West Bohemia. Josef, currently an associate professor at the university, came with an idea to transfer the latest technologies from the university to the business environment. Together with other two researchers Michal and Tomáš, they started the original team. The idea was further developed during our participation in the Microsoft Innovation Center and StartupYard. Very soon, it was apparent that the team has to be complemented by someone with business experience. Luckily, we have met the Jan and Slávek.
View details
HPS is a multilingual stemming tool based on unsupervised training from unlabeled corpora. HPS proved state-of-the-art performance on several tasks (e.g., Named Entity Recognition, Sentiment Analysis, Language Modeling, Information Retrieval) and in many languages.
View details
MEVEX is a multilingual corpus of news, annotated with event metadata information. The events in our corpus are from the domain of violence, natural and man made disasters. The main goal of the corpus is an automatic evaluation of event detection and extraction systems in different languages. The event annotation follows the attached event taxonomy. There are 109 topics. Each topic contains comparable articles (source: Wikinews) from different languages. In total, there are 342 articles from 14 languages, with the best coverage of Czech and English.
View details
DIME is a corpus which contains a set of news articles about three natural disasters. We collected news articles from different online media sources and annotated the effects of the disasters. Most of the events we cover are damage on people, infrastructure and interruption of roads, telecommunication and other services, such as electricity, water, etc.
View details
Back to Top