Mild Blue

Project / product name: Disquill

Link to the project: http://disquill.mild.blue

Team leader: Jakub Monhart

Challenge: 7. Automated Medical Discharge Message Enhancement

Problem: High administrative burden in healthcare. It is ineffective and expensive to use doctors' time for paperwork that could be at least partially automated.

Solution: Our solution is two-fold. First, we provide an API to take care of all the AI stuff. Hospital only needs to send the relevant information about a patient to our service, and a summary is generated. Second, we take care of the models. We are capable of fine-tuning an open-source model on the data you provide and deploy the whole solution on-prem on your hardware. If you prefer, we can instead utilize proprietary models that run on cloud and do not require you to upgrade your IT infrastructure.

Impact: Initial steps into automating administration in healthcare. Our solution could potentially be used in many hospitals, every day. Saving time. All this, without relying on third-party access to generative models, since our solution can be based on open-source technology, trained on your data, directly by us If hospitals are able to provide enough text medical data, we believe fine-tuning open-source models can bring promising results not only in summarizing medical reports.

Feasibility & financials: The hardware to deploy our solution on-prem would cost hundreds of thousands of CZK, based on the scale of the intended usage. Developing and fine-tuning can be done using cloud providers (Azure, ..), or on-prem servers. Bear in mind that training these models is considerably more demanding than the sole inference. Since our solution is based on a simple API, it can be quickly integrated into any hospital information system.

What is new about your solution?: We believe there has been no focus on utilizing open-source large language models for medical data in the Czech language yet.

What you have built at the hackathon - text explanation + code (e.g. GitHub link): https://github.com/matejkloucek/european-healthcare-hackathon Frontend and backend (API) was developed by us during the hackathon using. We had done a lot of data preprocessing. We used OpenAI API for two of the outputs, one was further fine-tuned on the provided data. Our main model is open-source. It is a multilingual large language model fine-tuned on the provided data to generate correct outputs. Before the fine-tuning, the model was not performing well at all at the presented task.

What you had before the hackathon, please mention open source as well: For the open-source model, we used mBART multilingual model, further fine-tuned by the AIC NLP research group from FEE CTU. They used summarization of Czech news articles (SumeCzech dataset) into headlines for this. We used their checkpoint to do the further fine-tuning on hospitalization data, which was crucial to get sensible outputs.

What comes next and what you wish to achieve: We would like to spend more time researching and fine-tuning open-source models on data of this nature. Ideally collaborating with IKEM on this. We hope that in the future, system like ours is actually being used every day to alleviate the time-consuming paperwork.

Video: