Revolt BI

Project / product name: Reno AI

Link to the project: https://github.com/RevoltBI/EHH2022-challenge6

Team leader: František Navrkal

Challenge: 6. Are you kidneying?

Problem: Create an algorithm to detect CKD risk based on laboratory tests of blood and urine and other data from IKEM hospital system. The solution should work well despite the fact that not all data sets have all provided attributes (features). Find out which attributes (features) are the most important for the detection of CKD.

Solution: Using the data provided to us, we were able to develop a neural net AI model that can detect CKD more than 800 days before its diagnosis with higher than 84 % accuracy. We were even able to ascertain which attributes/features (i.e. laboratory tests) are the most relevant for detecting CKD. The 800 day lead as well as the accuracy could probably be improved upon if more data (more patients over longer time periods) were used for the training of the model.

Impact: Early treatment of CKD is both much cheaper and much more effective, therefore detecting CKD onset earlier can drastically improve both patients' health outcomes as well as save money of the healthcare payers. Since a single patient on hemodialysis (which may be neccessary to treat late stage CKD) costs around 1 million CZK (40,000 EUR) per year in the Czech Republic, the cost savings alone could be quite significant.

Feasibility & financials: The fact that we were able to achieve accuracy of 84 % on test data in less than 2 days with only rather modest computational resources (the cost of computational resources for preparing the data and training the model was less than 60 EUR in total) means that developing a model ready for deployment in practice is almost certainly feasible and, considering the possible savings in the healthcare system and improved health outcomes for patients, probably very cheap.

What is new about your solution?: We applied modern machine learning approaches to this problem and achieved surprisingly great results. Our model is based on recurrent neural networks that read the patient's lab records and capture not only the values but also their evolution over time. It seems that with the right preprocessing and the right deep learning models, there is still low hanging fruit in the data.

What you have built at the hackathon - text explanation + code (e.g. GitHub link): We have created a machine learning pipeline for the provided data: we have created scripts for cleaning up the data and transforming them. We have created code for training the model and trained the model. Additionaly, we have created a simple analysis the model and reported features with the strongest gradients, which can be interpreted as "what features are the most relevant for the detection of CKD". Github: https://github.com/RevoltBI/EHH2022-challenge6

What you had before the hackathon, please mention open source as well: We have used standard open-source tools for data science: Jupyter, Tensorflow, Pandas, Numpy for data processing and model creation. We have used some Google Cloud Platform serices such as Google Cloud Storage, Vertex AI workbench and BigQuery during the hackaton. We did not have any code or data prepared before the hackaton. For video, we have used: Renderforest, DaVinci Resolve and Play.ht.

What comes next and what you wish to achieve: Our model should be validated using data from different sources (e.g., different hospitals, country-wide database). The model we have tested here is relatively small (1.2 million parameters), a larger dataset would provide an opportunity for further refinement and model enlargement. We would like to apply this approach for the detection of other diseases that have a wide representation in the population and therefore there is sufficient data to train such models.

Video: