Projects

We have implemented our algorithms for understanding the contents of documents written in natural language in Pathevo, an application from American company Owen Software which facilitates planning for education and professional careers. We carried out a feasibility study, tuned our algorithms for commercial application and delivered our know-how in the form of contractual research and a subsequent licensing agreement.
View details
We have implemented a system for automatic categorization of press releases (sport, politics, etc.) for the Czech News Agency. It simplifies the process of adding new press releases as well as searching in them. The challenge is the multi-label nature of this task, i.e. each press release can belong to multiple categories and we do not know in advance how many there are. Our system works with the semantics of the press releases, because the mere use of words without knowing their relations is not sufficient. Our know-how was delivered in the form of a software license.
View details
We have implemented a system for a renowned Czech law firm that is able to process the contents of legal documents, automatically create an archive and enable intelligent search. The query for our search engine is a previously solved legal problem. We enabled the processing of thousands of documents that lacked any metadata. The challenge lies in the fact that our system has to work with document semantics. Processing using keywords words alone is not sufficient. We have delivered our know-how in the form of software licenses.
View details
CCID is research and development competency centre established in cooperation with University of West Bohemia in Pilsen. The goal of CCID is to provide necessary technology and innovation leadership for new (“BIG”) data driven ICT projects and solutions. The main focus is on developing the core competencies around Data Driven Solutions, Advanced Data Intelligence Services and new HTML 5 Explorative Data Driven Interfaces and to further explore the practical applications of Data Science in general.
View details
HPS is a multilingual stemming tool based on unsupervised training from unlabeled corpora. HPS proved state-of-the-art performance on several tasks (e.g., Named Entity Recognition, Sentiment Analysis, Language Modeling, Information Retrieval) and in many languages.
View details
MEVEX is a multilingual corpus of news, annotated with event metadata information. The events in our corpus are from the domain of violence, natural and man made disasters. The main goal of the corpus is an automatic evaluation of event detection and extraction systems in different languages. The event annotation follows the attached event taxonomy. There are 109 topics. Each topic contains comparable articles (source: Wikinews) from different languages. In total, there are 342 articles from 14 languages, with the best coverage of Czech and English.
View details
DIME is a corpus which contains a set of news articles about three natural disasters. We collected news articles from different online media sources and annotated the effects of the disasters. Most of the events we cover are damage on people, infrastructure and interruption of roads, telecommunication and other services, such as electricity, water, etc.
View details
Back to Top