【高知特】2025醫療管理轉型基于行業語言模型的醫療編碼提取技術研究報告_第1頁
【高知特】2025醫療管理轉型基于行業語言模型的醫療編碼提取技術研究報告_第2頁
【高知特】2025醫療管理轉型基于行業語言模型的醫療編碼提取技術研究報告_第3頁
【高知特】2025醫療管理轉型基于行業語言模型的醫療編碼提取技術研究報告_第4頁
【高知特】2025醫療管理轉型基于行業語言模型的醫療編碼提取技術研究報告_第5頁
已閱讀5頁,還剩33頁未讀 繼續免費閱讀

下載本文檔

版權說明:本文檔由用戶提供并上傳,收益歸屬內容提供方,若內容存在侵權,請進行舉報或認領

文檔簡介

TransformingHealthcare

Administration:IndustryLanguageModelfor

MedicalCodeExtraction

RachitGupta,GenerativeAIIndustrySolutionsLead,CognizantSentheeshLingam,ChiefArchitect-GenerativeAI,Cognizant

1?2025-2027Cognizant|Private

ScalingEnterpriseAIwithCognizant&NVIDIA

Innovationmeets

manufacturingoperations

RevolutionizingServicesatScale

AIAgents

Agentfactory

DigitalTwin

NVIDIAOmniverse

IntegratingFoundationalelements,platformsandsolutions

Democratizationthroughplatformonplatform

Tailormademodels,deepdomainrelevance

IndustryLLMs

HealthcareLLM

·Al·

Cognizant?NeuroAI

NVIDIANIMTMl

AIFoundation

Data

RewireforAI-NVIDIARAPIDSm

Infrastructure

AIfactory-ServiceasaSoftware

2?2025-2027Cognizant|Private

TheeraofindustryspecificLLMs

IndustriesneedAIthatspeakstheirlanguage–Onesizedoesn’tfitall!

RELEVANT&EFFICIENTRESPONSES

ContextualAwarenessTaskOptimization

CUSTOMIZATION&ADAPTABILITY

FinetunedforspecifictasksEasierIntegration

EFFICIENCYGAINS

ReducedInferenceCostEfficientprocessing

HIGHERACCURACY&DEEPDOMAINEXPERTISE

UnderstandsComplexJargonsReducesHallucinations

IMPROVEDCOMPLIANCE&SECURITY

RegulatoryAlignment(LikeHIPAA,GDPR,SOX)BetterDataSecurity

Asvendorsbeginto

adaptcoreGenAItechnologiestoindustries

andbusinessdomains,expectthemarkettobe

complementedbyarichsetofsolutions

specializingbyrole,businessunitand

industrythroughH12025.

By2025,two-thirdsof

businesseswillleverageacombinationof

GenAIandretrieval-augmentedgenerationto

powerdomain-specificself-service

knowledgediscovery,improvingdecision

efficacyby50%.

By2027,over50%ofthegenerativeAI

modelsusedbyenterpriseswillbedomain-

specific(industryorbusinessfunction),

upfrom1%today.

3?2025-2027Cognizant|Private

Transforminghealthcaremanagementthroughoutthepayervaluechain

GenAIsolutionsacrossthehealthcarevaluechainreadytobepoweredbyhealthcarelanguagemodel

SalesandmarketingoperationsEstimated25-30%productivityincrease

ContractManagementandAdministration

Estimated40-50%decreaseincycletimes

AppealsandGrievance

Estimated30-40%productivityimprovement

MemberPlanShoppingAssistant

Estimated50-60%improvementinplanselection

AutomatedStarProfiling

Intuitiveinsights,benchmark

carequalityandcontextualrecommendations

MedicalCodeExtraction

20-60%effortsavingscomparedtomanualcodeextraction

FWAAssistant

Estimated40-50%effort

reductioninFWAcaseidentifyingandresearch

PriorAuthorizationAssistantEstimated20-30%productivity

improvement

Personalizedcareplangeneration

Improvedcaremanagerefficiencies(>20%)andhealthoutcomes

SupportedbyDeepDomainExpertiseandStrongEngineeringCapabilities

RichSMEBase

DataScienceCoE

NVIDIAAcceleratedComputing

NVIDIANeMoTM

NVIDIANIMTM

TriZetto

4?2025–2027Cognizant|Private

Medicalcodeextractioncontinuestobeamajorchallengeinhealthcare

…aproblemthatwearecansolvethroughindustrylanguagemodel

OPERATIONALCHALLENGES

Voiceof

MedicalCoders

“Thecodingsystemsaresocomplexanddynamiccodingstandardsmakesusmorepronetomistakes”

“Thepressuretomeetproductivitystandardssometimescompromisesouraccuracy”

“There'sashortageofcertified

coders,andtheworkloadcanbeoverwhelming”

Codingmistakescontributeto63%

ofallbillingerrors

Thereisanestimated30%

shortageofprofessionalmedicalcodersacrosstheUnitedStates

TECHNICALCHALLENGES

Modelspronetoerrorsduetolackofdomain-specificknowledge,resultingin

incorrectorirrelevantcodeassignments

Highhallucinationratesingeneralpurposemodels

Longertimeforretrievaland

processingrelevantinformation,

whichcanslowdownthecoding

process

Theannualmaintenance&

retrainingtokeepmodeluptodate

ishighforGeneral-purposemodels

orRAGmodels

HEALTHCAREORGANIZATIONAL

CHALLENGES

~$60+Billion

perAnnumLossforHospitalsduetocodingerrors

Constantupdatestocodingstandardsandnavigating

complexregulations

42%ofdenialsarecausedbycoding

issues,leadingtosignificantrevenue

losses

Around11%ofallclaimsaredenied,

withsomeprovidersexperiencing

denialratesashighas30%

5?2025–2027Cognizant|Private

Certifiedmedicalcodersdiscussobstaclesinextractingmedicalcodes

“Difficulttoextractspecific

“Combinationcodesare

codesduetolackofcontext

stemmingfromincomplete

medicalrecords”

complexinnatureandeven

tougherwhentryingtofindactive

conditions”

“MissingCodesdue

tomanualreadingof

largeparagraphs”

“Codershavetoparsethroughmultiple

documentssuchastreatmentplan

option,medicationrecordstoprovide

proofofconditionbeingactiveduring

theparticularencounter.

“MisinterpretingICD-10-CM

descriptions,leadingtoerrorsin

theguidelines,may

documentation.”

“Missingadd-oncodes,whichaffects

reimbursementanddataaccuracy”

“Notcapturingmorespecificdiagnoses,

impactcodingquality,leadingto

increasedauditreviews,

potentialcompliancerisks,anddelaysindatasubmission”

mappingcodestoclinical

unintentionally

overlookkeydetails”

leadingto

underreportingofpatient

conditions”

“Evenexperienced

coders,despiteknowing

“Thesechallengescansignificantly

6?2025–2027Cognizant|Private

Revolutionizingmedicalcodingwithprecisionandefficiency

Improvedoperationalefficiencyandconsistencythroughautomationandfinetuning

30%-40%

Improvedmedical

codingaccuracyandconsistencyas

comparedtogeneralpurposeLLMs

30%-75%

40%-45%

FastertimetomarketbyutilizingNVIDIA

NeMoTM

Reductionineffortforaccuratemedicalcodeextraction

*Dependingoncomplexity

Benefits

?ReducedLatency

?Significantcostandeffortsreduction

?EfficientRevenueCycleManagement

?AccurateRiskAdjustmentDetermination

?Improvementinquality-of-CareReportingCompliance

?Decreasedexposureofproprietaryhealthdata

7?2025-2027Cognizant|Private

3

Strategicfine-tuningapproachenhancedbyCognizant'sindustryknowledge

1DataPreparation

?Collection&InjectionofMedicaldomaintaxonomy,guidelinesandICDcodedesc.

?DatacurationusingNemoCurator

?Preparedredacted2k+clinical

notes&generated10k+syntheticdatausingNemoCurator

ModelFine-Tuning

?Medicaldomainadaptationusingmedicalterminologies,guidelinesandICD-10coding

?Fine-tuningmodelwithmedicalcodeextractiontasksutilizingsynthetic

data

?LeveragedParameter-EfficientFine-Tuning(PEFT);Low-Rank

Adaptation(LoRA)techniques

Strongtechnicalexpertiseinfine-tuningandcustomizinglanguagemodels

acrossdomains

1000+resourcestrainedandcertifiedacrossNVIDIA,GCP,GenerativeAI.

4ModelInference

?Modeldeployment,provisioning&configurationusingNIMcontainer

?Medicalcodeextractionbusinessappbuildanddeployment

?ModelMonitoring,performance&scalability

Agenticmodulararchitectureforeasyscalability

SeamlessintegrationwithenterpriseappsandTriZettoproductecosystem

2Benchmarking

?Selectionofmedicaldomainandopen-sourcemodels

?Baseliningusingdifferent

approachessuchasRAG,CoT,fewshot

?Defineandevaluatemodels

ExperiencedSMEbase-300+AAPCcertifiedcoderswitha

proventrackrecordonsensitivesubmissions(e.g.,toCMS)

Accesstorichand

diverse2000+datasetsandontologies

Cognizant’sValue

TriZettoproductandhealthcaredomainexpertiseinmodelvalidation

LeverageCognizant’sbenchmarkingframeworkonNVIDIAevaluator

NVIDIA

NeMoCurator

NeMoCustomizer,Evaluator

NeMoEvaluator

NeMoRetriever,Guardrails,NIM(LoRA)

TechStack

8?2025–2027Cognizant|Private

Blueprintforsuccessfulfine-tuning:solutionarchitecture

NVIDIAAccelerated

Computing&NVIDIANIMTM

?ParallelProcessingPower-Training

epochsupto28timesfasterthan8-coreCPU

?EnergyEfficient-NVIDIANIMTMwith

TensorRT-LLMoptimizesLLMsinferencereducingenergyconsumptionbyupto3x

?Highmemorybandwidth-Fastertransferbetweencoreandmemory,MinimizeAI

modeltrainingandinference

2BENCHMARKINGSME

1DATAPREPARATION

NVIDIAAccelerated

Computing

NVIDIAAcceleratedComputing

GradioUI

UploadMedical

Notes

CloudRun

CloudStorageData

Model

Evaluator

SyntheticDataGenerator

Indexer

VectorSearch

VertexAI

NeMo

Curator

NeMo

Customizer

NeMo

Evaluator

NeMo

Retriever

NeMo

Guardrails

BigQuery

NVIDIA

NIM

NeMo

Customizer

NeMo

Evaluator

NeMo

Retriever

NeMo

Guardrails

NVIDIANIM

NeMo

Curator

NVIDIANeMoTM

NVIDIANeMoTM

NVIDIAAIEnterpriseTM

NVIDIAAIEnterpriseTM

4MODELINFERENCE

NVIDIANGCRegistry

FINE-TUNING

3

NVIDIA

AcceleratedComputing

NVIDIAAcceleratedComputing

CloudOps

Operations

Security

IAM

NVIDIANeMoTM

?LowCode,FasterBuild

?Time&EffortEfficiency-maximizesthroughputandminimizestrainingtimewithmulti-node,multi-GPUtrainingandinference

FineTunedModel

Fine-TuningJobs

OpenSourceModel

UI

MedicalCoder

NeMoCurator

NeMo

Customizer

NeMo

Evaluator

NVIDIA

NIM(LORA)

NeMo

Retriever

NeMo

Guardrails

Rules

Firewall

NeMo

Evaluator

NeMo

Retriever

NeMo

Guardrails

NVIDIANIM

NeMo

Curator

NeMo

Customizer

NVIDIANeMoTM

LocalStorage

NVIDIAAIEnterpriseTM

NVIDIANeMoTM

NVIDIAAIEnterpriseTM

NVIDIANGCREGISTRY

?ComprehensiveAIresources

?AcceleratedWorkflows

?EaseofIntegration

?RichDocumentationandSupport

9?2025-2027Cognizant|Private

Datasetpreparationformedicalcodeextraction&evaluation

SyntheticDataGeneration

MedicalDocumentsTypes

SamplingGenerationMethodology

?Samplesgeneratedfroma

subsetofICD-10codeschosenfromasetof~80kcodes

?ICDcodesweregrouped

togetherbasedonpotential

medicalconditionsthatcanco-exists.

?Samplenoteswereeach

groupingdescribingoneormorepossiblemedicalcondition

?SyntheticnotesweregeneratedforallICDcodegroupings.

?SyntheticcodeswerevalidatedbySME

?WellnessForms

?PhysicianConsultProgressNote

?DischargeSummary

?PatientHistory

?ExaminationFindings

ComplexityEvaluation

?Identifyingfactorscontributingtothecomplexity

?Grouping&scoringtheindividualfactorcontributiontowards

complexity

?Identifyingthepresenceof

contributingfactorsinmedicaldocuments

?ClassifyingandScoringdocuments

MedicalCodingHierarchy

10?2025-2027Cognizant|Private

EvaluatingLLMsbasedondocumentintricacyandpromptingmethod

Accuracybycomplexity

35

39%

40%

14%

36

7%

%

%

13%

22%

%

5%

15

8%

22%

2

ParentLevelAccuracybyComplexity

MedLM

Llama-2

Llama

Gemini

DeepSeek-R1

0%10%20%30%40%50%

MediumLow

High

1

3%

1

ChildLevelAccuracybyComplexity

MedLM

Llama-2

Llama

Gemini

DeepSeek-R1

15%

19%

1%

1%

5%

9%

13%

8%

2%

10%

29%

MediumLow

High

0%5%10%15%20%25%30%

AccuracybyApproach

MedLMLlama-2Llama

Gemini

DeepSeek-R1

ParentLevelAccuracybyApproach

5%

12%

13%

2

29%

6%

30%

52%

0%10%20%30%40%50%60%

RAG

Direct

MedLMLlama-2Llama

Gemini

DeepSeek-R1

ChildLevelAccuracybyApproach

RAG

Direct

11%

37%

5%

6%

12%

15%

2%

0%10%20%30%40%

KeyConsiderations:

?Benchmarkingacrossbothopensourceandcommercialmodels

?Medicaldocumentscategorizedbycomplexitylevelsforevaluation

?MultipleapproachesconsideredforevaluatingmodelperformanceincludingRAGapproach

?Extractedmedicalcodes(ICD-10)werematchedbothatparent&childlevels

KeyObservations

?Opensourcemodelstrainedinmedicaldomainperformbetterthangenericmodels

?RAGapproachimprovestheperformanceofallmodels

?Datacomplexityhashugeimpactonmodel

performanceparticularlywhenmultiple

scenariosarepresentinthemedicaldocuments

?Extractingaccuratechildcodesischallengingformostmodels

11?2025-2027Cognizant|Private

Challengesimpactingtheaccuracyofmedicalcodesextraction

HIGHCOMPLEXITY

MissedDiagnosesinDischargeNotes

Missedcodingspecificity

InaccurateICDCodePrediction

Missed'InitialEncounter'Coding

Missed'Z'CodesforHealthStatus

UninterpretedAbbreviations

MEDIUMCOMPLEXITY

Invalidcodeprediction

Hallucination

Entity-CodeMismatch

Missed'Acute'vs.'Chronic

LOWCOMPLEXITY

Missed's/p'(StatusPost)

Condition

Missedcommonterms

12?2025-2027Cognizant|Private

Ourapproachaimstoreplicatethereasoningprocessofamedicalcoder

MedicalCoder

SearchwithintheAlphabetic

IndextofindtheICD-10codeforspecificmedicalentitiesor

conditions.

IdentifyMedicalEntities

fromclinicalnotesfor

diagnosis,symptoms,

support,historical

statements,referrals,plans,andhealthstatuschanges.

Chapter-wiseGuideline

ChecktovalidatetheICD-10CMcodeagainstchapter-

specificguidelinesandgeneralcodingconventions.

ReviewICD-10CodefromtheICD-10-CMTabularListofDiseasesandInjuriestoensureitmatchesthe

patient’sdiagnosis.Identifyanynotesimpactingthecodeselectionalong

withidentificationofadditional

characters/codestobestsuitthediagnosis.

Finalization

?FamiliarizewithmedicaltaxonomytomapclinicaltermstoICD-10codes.

?Identifycommonmedicalabbreviationstoaccuratelyinterpretclinical

conditions.

?StudyconventionsandgeneralcodingguidelinesfromtheICD-10-CMOfficialGuidelines.

ExtractionLookup

Validation

Foundation

ICD-10CodesGeneration

?LeveragedtheenhancedfinetunedLLMsknowledgebasetoidentifytheICD-10codesfromtheextractedmedicalentities.

ICD-10CodesFinalization

?CatalogthefinallistofICD-10whicharerelevanttothecontextofthedocumentandadheringtotheadditional

requirementstobestsuitthecontent.

ICD-10GuidelineValidations

?ValidatedtheidentifiedICD-10codesagainstchapter-

specificguidelinesand

generalcodingconventionsusingPromptEngineeringtoensureadherencetocodingprotocols.

MedicalEntityExtraction

?SuccessfullyimplementedaLoRAbasedfinetuning

approachtofurthertraintheLLMsharnessingSyntheticMedicalDocuments.

?Leveragingtheaugmented

knowledgebase,specific

MedicalentitiesareextractedviaQ&Aprompting.

DomainSpecificFinetuning

?AugmentedtheexistingknowledgebaseoftheLLMswithICD-10-CM

OfficialGuidelines,AMAGlossaryofMedicalterms,ICD-10-CMIndex

documents

?SupervisedFinetuningapproachisimplementedtoexpandtheLLM’sexistingknowledgebank.

HealthcareLanguagemodel

13?2025–2027Cognizant|Private

ICD10CodeExtraction

Pre-trainedmodel

ExtractedICD10Codes:

['T446X5S','T444X5S']

ExpectationMatch:0%

Fine-tunedmodel

ExtractedICD10Codes:

['I497','K850','M6282','I2690','E119','N185','I420','H4410','I230','N400']

ExpectationMatch:40%

Analysisoffine-tunedmodeloutputforhighlycomplexclinicalnotes

Expectedentitiesinclinicalnote

BeforeFinetuning

AfterFinetuning

Acuterenalfailurerequiringhaemodialysis

Amiodaronetoxicityoflungandliver

Anaemiaduetobloodloss

?

Atrialfibrillation

?

Benignprostatichypertrophy

?

Cardiomyopathy

?

Chronicobstructivepulmonarydisease

?

Chronicrenalfailure

?

Diabetestype2withretinopathy

?

Hypercholesterolemia

?

Hypertension

?

NPHinsulin5mgq.a.m.,3mgq.hs.

?

?

Peripheralvasculardisease

?

Pneumonia

?

Systoliccongestiveheartfailure

?

Tachy-bradysyndromestatuspostDDDpacemakerplacement

?

14?2025-2027Cognizant|Private

Overviewoftheresultsfromthefine-tunedmodel

Accuracy&CoverageComparison:Llama(8b)Model

38%22%

10%

LowComplexity

100%

80%

60%

40%

20%

0%

78%64%

43%

30%

18%

%Parent

Coverage

%Child

Coverage

ParentAccuracy

Child

Accuracy

BaselineModelFinetunedModel

18%

70%

60%

50%

40%

30%

20%

10%

0%

MediumComplexity

63%

52%

32%33%27%27%

13%

%Parent

Coverage

%Child

Coverage

ParentAccuracy

Child

Accuracy

BaselineModelFinetunedModel

59%

Increaseinoverallaccuracy

78%

IncreaseinOverallAccuracy&

Coverage(weightedaccuracy)

HighComplexity

100%

80%

60%

40%

20%

0%

83%

59%

42%

28%

23%

19%

13%

11%

%Parent

Coverage

%Child

Coverage

Child

Accuracy

ParentAccuracy

BaselineModelFinetunedModel

KeyConsiderations:

?Themodelwasrefinedprogressivelyusingvarioussyntheticdatasetsthatvaryincomplexityandbatchsizes.

?TechniquessuchasLoRaandSFTwereappliedformodelfine-tuning.

?Themodelunderwentinstructionfine-tuningforICD10codeextraction.

?Chainofthoughtstrategieswereemployedtoenhancethe

model'sperformanceincodeextraction,ensuringreasoningandjustificationduringinference.

KeyObservations

?TheRAG-basedmethodappliedtothefine-tunedmodel

enhancedtheoverallperformanceacrossallmodels.

?

Fine-tunedmodelsdemonstratedsuperiorcapabilitiesinhandlingcomplexstructureddocumentsthatnecessitatelinking

informationfoundindifferentsections.

?Thesemodelsachievedimprovedoutcomeswithoutrelyingon

extensivepromptingtechniquesorintricateRAGmethods.

15?2025-2027Cognizant|Private

Applicationofafine-tunedmodel-extractingmedicalcodes

?AutomatedICDCoding:Thesystem

automatesICDcodeextractionfrom

medicaldocuments,reducingmanualeffortandpotentialerrors.

?Two-PassApproachforAccuracy:Atwo-passapproachleveragesGenAItoinitiallyidentifypossibleICDcodesandthenrefinetheselectionwithfulldocumentcontext.

?DiagnosisTermRecognition&Mapping:UtilizesGenAItoidentifydiagnosisterms

withinmedicaldocumentsandmapsthemtocorrespondingICDcodesviaanExceldatabase.

?ContextualFilteringforPrecision:

SecondpassincorporatesthefullmedicaldocumentascontexttofilterandrefinetheinitialsetofICDcodes.

?User-FriendlyInterfacewithExclusionFunctionality:

TheUIoffersafeaturetoexclude

specificdiagnosisterms,providingusercontroloverthecodeextractionprocess.Model

16?2025-2027Cognizant|Private

CognizantTriZetto?agenticAIsystemleadingthetransformationofcore

ad

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯系上傳者。文件的所有權益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網頁內容里面會有圖紙預覽,若沒有圖紙預覽就沒有圖紙。
  • 4. 未經權益所有人同意不得將文件中的內容挪作商業或盈利用途。
  • 5. 人人文庫網僅提供信息存儲空間,僅對用戶上傳內容的表現方式做保護處理,對用戶上傳分享的文檔內容本身不做任何修改或編輯,并不能對任何下載內容負責。
  • 6. 下載文件中如有侵權或不適當內容,請與我們聯系,我們立即糾正。
  • 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

評論

0/150

提交評論