NLP group

Medical Diagnosis Assistant

The outcome of this project no: CZ.01.1.02/0.0/0.0/20_321/0024835 is the implementation of the Medical Diagnosis Assistant that supports the coding and reporting of diagnoses primarily based on medical reports.

View details

BAGOM - Development of technology for automation of seamless and georeferenced map of stable cadastre from 1826-1843 and implementation software

This project deals with the automatic positioning of map sheets in the coordinate systems Gusterberg and Svatý Štěpán based on the machine detection and recognition of the nomenclatures. The main goal is automatic cutting of individual map sheets according to the map frame and the cadastral boundary and the connection of all map sheets into a common seamless and georeferenced map. The last goal of the project is automatic creation of a register of toponyms for the entire connected map.

View details

Modern Access to Historical Sources

The main goal of this project is to make accessible archival resources from the Czech-Bavarian border region using state-of-the-art information technologies. It will be possible to search information based on geolocation. We also focus on a clear presentation and an effective search of the documents in a form of raster images. We further realize an intelligent full-text access to the printed documents in both Czech and German languages. The information will be available through an existing portal Porta Fontium.

View details

New Generation DMS

The output of the project is a modern web-based Document Management System that uses elements of self-learning artificial intelligence and machine learning, with a focus on the use of neural networks. The product is unique on the market due to its use of state-of-the-art technologies and the latest findings in the field of semantic search and automatic natural language processing. The solution’s competitive edge is further enhanced by Blockchain technology, which has an experimental connection with digital signatures.

View details

Pathevo: Course syllabi understanding

We have implemented our algorithms for understanding the contents of documents written in natural language in Pathevo, an application from American company Owen Software which facilitates planning for education and professional careers. We carried out a feasibility study, tuned our algorithms for commercial application and delivered our know-how in the form of contractual research and a subsequent licensing agreement.

View details

MEVEX

MEVEX is a multilingual corpus of news, annotated with event metadata information. The events in our corpus are from the domain of violence, natural and man made disasters. The main goal of the corpus is an automatic evaluation of event detection and extraction systems in different languages. The event annotation follows the attached event taxonomy. There are 109 topics. Each topic contains comparable articles (source: Wikinews) from different languages. In total, there are 342 articles from 14 languages, with the best coverage of Czech and English.

View details

DIME

DIME (Disaster-related micro-events) is a corpus which contains a set of news articles about three natural disasters which have happened in 2016 - the floods in Houston, landslides in South China, and the magnitude 7.8 earthquake, which has hit Ecuador in April. We collected news articles from different online media sources and annotated the effects of the disasters. Most of the events we cover are damage on people, infrastructure and interruption of roads, telecommunication and other services, such as electricity, water, etc. For each news article we provide the date, the location, the title and the text of the news article, where we have annotated the micro events with bold font. The current version of the corpus contains around 50 articles, but we envisage to augment its size. A twitter channel - Disastrobot collects tweets which report about effects from natural disasters.

View details

ČTK: Document classification

We have implemented a system for automatic categorization of press releases (sport, politics, etc.) for the Czech News Agency. It simplifies the process of adding new press releases as well as searching in them. The challenge is the multi-label nature of this task, i.e. each press release can belong to multiple categories and we do not know in advance how many there are. Our system works with the semantics of the press releases, because the mere use of words without knowing their relations is not sufficient. Our know-how was delivered in the form of a software license.

View details

Arzinger & Partners: Intelligent search

We have implemented a system for a renowned Czech law firm that is able to process the contents of legal documents, automatically create an archive and enable intelligent search. The query for our search engine is a previously solved legal problem. We enabled the processing of thousands of documents that lacked any metadata. The challenge lies in the fact that our system has to work with document semantics. Processing using keywords words alone is not sufficient. We have delivered our know-how in the form of software licenses.

View details

Machine Learning Approach to Fact-checking in West Slavic Languages

Fake news is designed to incite agitation against an individual or a group of people. Its aim is to influence and manipulate public opinion on targeted topics. Fake news detection, including fact-checking, which can be used as the first step of a detection system, are currently receiving a lot of attention in the research community and journalism.

View details

HPS: High Precision stemmer

HPS is a multilingual stemming tool based on unsupervised training from unlabeled corpora. HPS proved state-of-the-art performance on several tasks (e.g., Named Entity Recognition, Sentiment Analysis, Language Modeling, Information Retrieval) and in many languages. More technical details can be found in our paper (see [Brychcín and Konopík, 2015]) that has been accepted to the Information Processing and Management journal.

View details

SentiSquare

The company originated at the University of West Bohemia. Josef, currently an associate professor at the university, came with an idea to transfer the latest technologies from the university to the business environment. Together with other two researchers Michal and Tomáš, they started the original team. The idea was further developed during our participation in the Microsoft Innovation Center and StartupYard. Very soon, it was apparent that the team has to be complemented by someone with business experience. Luckily, we have met the Jan and Slávek.

View details

R&D of Technologies for Advanced Digitalization in the Pilsen Metropolitan Area (DigiTech) No. CZ.02.01.01/00/23_021/0008436.

The DigiTech project will significantly expand the cooperation of the NTIS research center at Faculty of Applied Sciences of the UWB in Pilsen with application partners. The main purpose is to build effective cooperation enabling the application of research results in practice. The outputs and results of the resolved research project "R&D technologies for advanced digitization" will contribute to setting the environment and processes for long-term sustainability of cooperation, intellectual property protection and technology transfer between the NTIS research center and partners from industry and services.

View details

NLP group

Research & development

Projects

Medical Diagnosis Assistant

BAGOM - Development of technology for automation of seamless and georeferenced map of stable cadastre from 1826-1843 and implementation software

Modern Access to Historical Sources

New Generation DMS

Pathevo: Course syllabi understanding

MEVEX

DIME

ČTK: Document classification

Arzinger & Partners: Intelligent search

Machine Learning Approach to Fact-checking in West Slavic Languages

HPS: High Precision stemmer

SentiSquare

R&D of Technologies for Advanced Digitalization in the Pilsen Metropolitan Area (DigiTech) No. CZ.02.01.01/00/23_021/0008436.

Contact Us

NLP group

We offer