北京生物医学工程BeijingBiomedicalEngineering决策树算法应用于MIMIC-III数据库的ICU患者急性肾损伤预测研究高文鹏I吕海金2周琅1郭圣文3摘要目的急性肾损伤(acutekidneyinjury,AKI)是重症监护病房(intensivecareunit,ICU)最常见的并发症和致死因素之一。准确预测具AKI风险的患者,明确与AKI发生相关的关键因素,可为临床决策与风险患者干预提供有效指导。方法采用公开的重症监护室数据库MIMIC-III,提取30020例患者记录(包括AKI患者17222名,Non-AKI患者12798名),收集其住ICU期间基本信息、生理生化指标、药物使用、合并症等临床信息。将患者按4:1比例随机划分训练集和独立测试集,应用逻辑回归、随机森林与LightGBM3种机器学习方法,分别建立24h、48h与72h3个时间点的AKI预测模型,采用十折交又验证法,对各种模型进行训练与测试,预测患者是否发生AKL并获取重要特征。此外,利用24h预测模型,在一周时间窗口内对ICU患者进行每隔24h预测。结果3种学习模型中,LightGBM性能最优,其24h、48h和72h模型预测AKI的受试者工作特征曲线(receiveroperatorcharacteristiccurve,ROC曲线)下面积(areaundercurve,AUC)值分别为090.088.087,F1值分别为091,088.086.在每隔24h预测时,提前1d、2d和3d预测AKI的成功率分别为89%、83%、80%。己住院时长、体质量、白蛋白、收缩压、碳酸氢盐、葡萄糖、白细胞计数、体温、舒张压、血尿素氮等是预测ICU患者AKI的重要特征,仅使用24个重要特征,模型仍能取得良好的预测性能。结论基于ICU患者的基本信息、生理生化指标、药物使用及合并症等临床信息,应用机器学习模型,可对其是否发生AKI进行多时间点的有效预测,并明确其关键风险因素。关键词急性肾损伤;重症监护室;机器学习;风险预测;虫要特征DOI:103969/j.issn1002-3208202106010中图分类号R318文献标志码A文章编号本文著录格式高文鹏,吕海金,周琅,等.决策树算法应用于MIMIC-III数据库的ICU患者急性肾损伤预测研究[J].北京生物医学工程.2021,40(6):609-617.GAOWenpeng,LYUHaijin,ZHOULang,etalDecisiontreealgorithmappliedtoMIMIC-IIIdatabaseforthepredictionofacutekidneyinjuryinICUpatients[J].BeijingBiomedicalEngineering,2021,40(6):609~617.DecisiontreealgorithmappliedtoMIMIC~IIIdatabaseforthepredictionofacutekidneyinjuryinICUpatientsGAOWenpeng',LYUHaijin2,ZHOULang',GUOShengwen31DepartmentofBiomedicalEngineering,SchoolofMaterialScienceandEngineering,SouthChinaUniversityofTechnology,Guangzhou510006:2SICU,TheThirdAffiliatedHospital,SunYat-senUniversity,Guangzhou510630;SchoolofAutomationScienceandEngineering,SouthChinaUniversityofTechnology,Guangzhou510640Correspondingauthor:GUOShengwen(E-mail:shwguo@scuteducn)[Abstract]ObjectiveAcutekidneyinjury(AKI)isoneofthemostcommoncomplicationsandfatalfactorsinintensivecareunit(ICU)AccuratepredictionofAKIriskandidentificationofkeyfactorsrelatedtoAKIcanprovideeffectiveguidanceforclinicaldecision-makingandinterventionforpatientswithAKIrisk第40卷第6期2021年12月1002-3208(2021)06-0609-091002-3208(2021)06-0609-09MethodsAtotalof30020patientsinICU(including17222AKIpatientsand12798Non_AKIpatients)wereselectedfromthepublicdatabaseinthisstudy,andbasicinformation,physiologicalandbiochemicalindicators,druguse,andcomorbidityduringtheirstayinICUwerecollectedAllpatientswererandomlydividedintotrainingsetsandindependenttestingsetsaccordingtotheratioof4:1,andlogisticregression,randomforest,andLightGBMwereappliedtoconstructmodelsforAKIpredicationinthreetimepointsincluding24h,48hand72h,respectivelyThe10~foldcrossvalidationwasusedtotrainandvalidatevariousmodelstopredicttheoccurrenceofAKI,andobtainimportantfeaturesFurthermore,24hpredictionmodelswereusedtopredictAKIevery24hduringthe7-daywindowResultsLightGBMachievedthebestperf...