NLP group

Czech medical coding assistant based on transformer networks

Ladislav Lenc and Jiří Martínek and Josef Baloun and Pavel Přibáň and Martin Prantl and Stephen Taylor and Pavel Král
Computers in Biology and Medicine (2024)

PDF

Research topics:

Language Modeling | Document Classification | Neural Networks

Abstract

The International Classification of Diseases (ICD) hierarchical taxonomy is used for so-called clinical coding of medical reports, typically presented in unstructured text. In the Czech Republic, it is currently carried out manually by a so-called clinical coder. However, due to the human factor, this process is error-prone and expensive. The coder needs to be properly trained and spends significant effort on each report, leading to occasional mistakes. The main goal of this paper is to propose and implement a system that serves as an assistant to the coder and automatically predicts diagnosis codes. These predictions are then presented to the coder for approval or correction, aiming to enhance efficiency and accuracy. We consider two classification tasks: main (principal) diagnosis; and all diagnoses. Crucial requirements for the implementation include minimal memory consumption, generality, ease of portability, and sustainability. The main contribution lies in the proposal and evaluation of ICD classification models for the Czech language with relatively few training parameters, allowing swift utilisation on the prevalent computer systems within Czech hospitals and enabling easy retraining or fine-tuning with newly available data. First, we introduce a small transformer-based model for each task followed by the design of a transformer-based “Four-headed” model incorporating four distinct classification heads. This model achieves comparable, sometimes even better results, against four individual models. Moreover this novel model significantly economises memory usage and learning time. We also show that our models achieve comparable results against state-of-the-art English models on the Mimic IV dataset even though our models are significantly smaller.

NLP group

Research & development

Czech medical coding assistant based on transformer networks

Research topics:

Abstract

Authors

Ing. Ladislav Lenc, Ph.D.

Researcher

Ing. Jiří Martínek, Ph.D.

Researcher

Ing. Josef Baloun

PhD student

Ing. Pavel Přibáň

Researcher

Ing. Martin Prantl, Ph.D.

Researcher

Stephen Taylor, Ph.D.

Researcher

Doc.Ing. Pavel Král, Ph.D.

Team leader

BibTex

Contact Us

NLP group

We offer