Meaningful query using SNOMED CT: How can semantic searches improve patient recruitment?

1
Meaningful query using SNOMED CT How can seman;c searches improve pa;ent recruitment? Brandon Ulrich & Orsolya Bali – snowowl@b2interna;onal.com Background Data that help the health industry improve pa3ent care and reduce costs will be their most valuable asset in 5 years. Twothirds of health organiza3ons expect their secondary data use to increase significantly. The availability of seman3cally rich electronic health records (EHRs) u3lizing SNOMED CT as a reference terminology con3nues to grow providing new opportuni3es for secondary use. Exclusion 1: We need to exclude some pa3ents that have had arthri3s or a joint inflamma3on, but only where the condi3on was CAUSED BY an organism. (e.g. virus or bacteria). Background A clinical trial is tes3ng a medica3on that helps pa3ents in certain risk groups avoid developing arthri3s. We seman3cally search 1,200,000 EHRs from the Observa3onal Medical Outcomes Partnership (OMOP) to iden3fy candidate pa3ents for the trial based on study inclusion and exclusion criteria. "- +./0.1*2 %. 3%1&/ 1&4+55+*%& 6+5)( +&( 2/%.)( 78).1)2 9)5+&*: /).51&%,%;< 78).< Inclusion criteria: High arthri;s risk and suitable age Inclusion 1: Exposure to certain DNA virus families increases the risk of arthri3s. Instead of specifying all the possible relevant diseases (e.g. viral pneumonia, human papilloma virus), simply specify all diseases CAUSED BY a nonenveloped dsDNA virus. Inclusion 2: Pa3ent is less than 32 years old. Exclusion 2: Some substances have a posi3ve effect on arthri3s, so pa3ents that have taken drugs with an ACTIVE INGREDIENT of dexamethasone or estrogen should be excluded. Don’t iden3fy all possible drugs and trade names, simply exclude the pa3ent based on the ac3ve ingredients of the drugs they have taken. Exclusion criteria: Cause and Ac;ve Ingredient Results Saves ;me: Don’t need to specify every infec3on virus by virus. Improves accuracy: All drugs containing any dexamethasone or estrogen are automa3cally included. Futureproof: When the criteria are reused in future trials, newly discovered viruses and released drugs are automa3cally included. Observa3onal Medical Outcomes Partnership (OMOP) reference informa3on model SNOMED CT value set binding value set binding Metamodel Re Map EHR store Reduce Seman3c query Snow Owl EHR server Sta3s3cal analysis Results Import Distributed File System RDBMS Dashboard, business intelligence Commodity servers / cloud to scale Highvolume EHR data flow Parallel data processing Parallel sta3s3cal algorithms Commodity servers / cloud to scale Result postprocessing Commodity servers / cloud to scale Snow Owl PlaTorm Implica;ons Unlocks Seman;cs: Meaningful queries allow seman3c aggrega3on instead of simple hierarchical grouping—avoiding combinatorial explosion later in the data analysis stage. Allows queries and analy3cs for meaningful aggregates such as OMOP’s Health Outcomes of Interest (e.g. acute renal failure). Model versus Code: Flexible, modeling paradigm allows reconfigura3on as opposed to coding effort. Scalability: Massively parallel plaborm allows real3me meaningful queries and analy3cs. Problem Current systems are unable to perform real3me, seman3cally rich searches on EHR data. Real3me queries are: Limited to detailed informa3on from a single pa3ent Limited to aggregated informa3on for all pa3ents with a small number of predefined concepts Unable to access seman3cs Which limits extrac3on of knowledge from the EHR data: Limi3ng the value of clinical decision support Preven3ng use of medical knowledge and seman3c rela3onships between concepts to iden3fy associa3ons in exis3ng data Limi3ng con3nuous iden3fica3on of emerging associa3ons as data accumulates Furthermore: The volume of EHR data is growing exponen3ally Tradi3onal data warehouses struggle to exploit the full meaning within the EHRs, as the data is built around a limited number of concepts Solu;on Informa3on model combined with SNOMED CT that encapsulates seman3c informa3on Meaning not constrained to a fixed set of concepts Query language to leverage stored seman3c meaning Real3me queries can be run on opera3onal stores without requiring extrac3on and aggrega3on to analy3cal data stores Scalable Supports na3onal to global numbers of electronic health records Plaborm supports massive parallel processing either on local networks or in the cloud Flexible Works with different informa3on model standards, including HL7, openEHR, EN 13606, LRA, LIM, UML, MOF as it is agnos3c to the internal structure of the EHR data Configure instead of code; designed to support changes to informa3on models and terminology Tooling support Terminology authoring, mapping, and subset crea3on, informa3on model authoring, EHR mapping Support for report design and analysis References “Big data: The next fron3er for innova3on, compe33on, and produc3vity,” McKinsey, May 2011 “Secondary uses of Electronic Health Record data in Life Sciences,” Deloile, June 2010 “Transforming healthcare through secondary use of health data,” PriceWaterhouseCoopers, October 2009 Ques;ons? Drop by the B2i Healthcare booth in the exhibi;on area Example: Improving clinical trial pa;ent recruitment – Arthri;s study protocol Ad hoc search result: 1,200,000 EHRs seman3cally searched in 4.4 seconds iden3fying 6,237 candidate pa3ents. Note: Hardware was a single 2007 vintage Xeon 2.0 GHz 8core server !" $%&'()&*+, -./ (0123 4567 %)360%2)& %0 ()8+9)67+3%&) :+9)( +&( 36%0)( ;1)05)3 <)9+&*= 6)095&%,%2> ;1)0> <)?6)9@)0 ABC -B!! Exclusions & inclusions created from exis3ng seman3c queries 6237 candidate pa3ents returned from EHR Server

Transcript of Meaningful query using SNOMED CT: How can semantic searches improve patient recruitment?

Page 1: Meaningful query using SNOMED CT: How can semantic searches improve patient recruitment?

Meaningful  query  using  SNOMED  CT  How  can  seman;c  searches  improve  pa;ent  recruitment?  

 

Brandon  Ulrich  &  Orsolya  Bali  –  snowowl@b2interna;onal.com  

Background  Data   that   help   the  health   industry   improve  pa3ent  care   and   reduce   costs   will   be   their   most   valuable  asset  in  5  years.    Two-­‐thirds   of   health   organiza3ons   expect   their  secondary  data  use  to  increase  significantly.    The  availability  of  seman3cally  rich  electronic  health  records   (EHRs)   u3lizing   SNOMED  CT   as   a   reference  terminology   con3nues   to   grow   providing   new  opportuni3es  for  secondary  use.          

Exclusion  1:    We  need  to  exclude  some  pa3ents  that  have  had  arthri3s  or  a  joint  inflamma3on,  but  only  where  the  condi3on  was  CAUSED  BY  an  organism.  (e.g.  virus  or  bacteria).  

Background  A  clinical  trial  is  tes3ng  a  medica3on  that  helps  pa3ents  in  certain  risk  groups  avoid  developing  arthri3s.    We  seman3cally  search  1,200,000  EHRs  from  the  Observa3onal  Medical  Outcomes  Partnership  (OMOP)  to  iden3fy  candidate  pa3ents  for  the  trial  based  on  study  inclusion  and  exclusion  criteria.  

!"#$%&'()&*+,#

"-#+./0.1*2#%.#3%1&/#1&4+55+*%&##

6+5)(#+&(#2/%.)(#78).1)2#

9)5+&*:#/).51&%,%;<#

78).<#

9)=/)5>).#?@A#?-BB#

Inclusion  criteria:  High  arthri;s  risk  and  suitable  age  Inclusion  1:  Exposure  to  certain  DNA  virus  families  increases  the  risk  of  arthri3s.  Instead  of  specifying  all  the  possible  relevant  diseases  (e.g.  viral  pneumonia,  human  papilloma  virus),  simply  specify  all  diseases  CAUSED  BY  a  non-­‐enveloped  dsDNA  virus.  Inclusion  2:  Pa3ent  is  less  than  32  years  old.    

Exclusion  2:    Some  substances  have  a  posi3ve  effect  on  arthri3s,  so  pa3ents  that  have  taken  drugs  with  an  ACTIVE  INGREDIENT  of  dexamethasone  or  estrogen  should  be  excluded.  Don’t  iden3fy  all  possible  drugs  and  trade  names,  simply  exclude  the  pa3ent  based  on  the  ac3ve  ingredients  of  the  drugs  they  have  taken.    

Exclusion  criteria:  Cause  and  Ac;ve  Ingredient     Results  

Saves  ;me:  Don’t  need  to  specify  every  infec3on  virus  by  virus.  Improves  accuracy:  All  drugs  containing  any  dexamethasone  or  estrogen  are  automa3cally  included.      Future-­‐proof:  When  the  criteria  are  reused  in  future  trials,  newly  discovered  viruses  and  released  drugs  are  automa3cally  included.  

Observa3onal  Medical  Outcomes  Partnership  (OMOP)    reference  informa3on  model  

SNOMED  CT  

value  set  binding  

value  set  binding  

Meta-­‐model  

Re  

Map  

EHR  store  

Reduce  

Seman3c  query  

Snow  Owl    EHR  server  

Sta3s3cal  analysis  

Results   Import  

Distributed  File  System  

RDBMS  

Dashboard,    business  intelligence  

Commodity  servers  /  cloud  to  scale  

High-­‐volume  EHR  data  flow   Parallel  data  processing   Parallel  sta3s3cal  algorithms  

Commodity  servers  /  cloud  to  scale  

Result  post-­‐processing  

Commodity  servers  /  cloud  to  scale  

Snow  Owl  PlaTorm  

 

Implica;ons  Unlocks  Seman;cs:  Meaningful  queries  allow  seman3c  aggrega3on  instead  of  simple  hierarchical  grouping—avoiding  combinatorial  explosion  later  in  the  data  analysis  stage.  Allows  queries  and  analy3cs  for  meaningful  aggregates  such  as  OMOP’s  Health  Outcomes  of  Interest  (e.g.  acute  renal  failure).  

Model  versus  Code:  Flexible,  modeling  paradigm  allows  reconfigura3on  as  opposed  to  coding  effort.  

Scalability:  Massively  parallel  plaborm  allows  real-­‐3me  meaningful  queries  and  analy3cs.  

 

Problem  Current  systems  are  unable  to  perform  real-­‐3me,  seman3cally  rich  searches  on  EHR  data.  Real-­‐3me  queries  are:  •  Limited  to  detailed  informa3on  from  a  single  pa3ent  •  Limited  to  aggregated  informa3on  for  all  pa3ents  with  a  small  number  of  predefined  concepts  •  Unable  to  access  seman3cs  

 Which  limits  extrac3on  of  knowledge  from  the  EHR  data:  •  Limi3ng  the  value  of  clinical  decision  support  •  Preven3ng  use  of  medical  knowledge  and  seman3c  rela3onships  between  concepts  to  iden3fy  associa3ons  in  

exis3ng  data  •  Limi3ng  con3nuous  iden3fica3on  of  emerging  associa3ons  as  data  accumulates  

 Furthermore:  •  The  volume  of  EHR  data  is  growing  exponen3ally  •  Tradi3onal  data  warehouses  struggle  to  exploit  the  full  meaning  within  the  EHRs,  as  the  data  is  built  around  a  

limited  number  of  concepts  

Solu;on  •  Informa3on  model  combined  with  SNOMED  CT  that  encapsulates  seman3c  informa3on  •  Meaning  not  constrained  to  a  fixed  set  of  concepts  •  Query  language  to  leverage  stored  seman3c  meaning  •  Real-­‐3me  queries  can  be  run  on  opera3onal  stores  without  requiring  extrac3on  and  aggrega3on  to  analy3cal  data  stores  

Scalable  •  Supports  na3onal  to  global  numbers  of  electronic  health  records  •  Plaborm  supports  massive  parallel  processing  either  on  local  networks  or  in  the  cloud    Flexible      •  Works  with  different  informa3on  model  standards,  including  HL7,  openEHR,  EN  13606,  LRA,  LIM,  UML,  MOF  as  it  is  agnos3c  to  the  internal  structure  of  the  EHR  data  

•  Configure  instead  of  code;  designed  to  support  changes  to  informa3on  models  and  terminology    Tooling  support  •  Terminology  authoring,  mapping,  and  subset  crea3on,  informa3on  model  authoring,  EHR  

mapping  •  Support  for  report  design  and  analysis  

References    -­‐  “Big  data:  The  next  fron3er  for  innova3on,  compe33on,  and  produc3vity,”  McKinsey,  May  2011  -­‐  “Secondary  uses  of  Electronic  Health  Record  data  in  Life  Sciences,”  Deloile,  June  2010  -­‐  “Transforming  healthcare  through  secondary  use  of  health  data,”  PriceWaterhouseCoopers,  October  2009  Ques;ons?  Drop  by  the  B2i  Healthcare  booth  in  the  exhibi;on  area  

Example:  Improving  clinical  trial  pa;ent  recruitment  –  Arthri;s  study  protocol  

Ad  hoc  search  result:  1,200,000  EHRs  seman3cally  searched  in  4.4  seconds  iden3fying  6,237  candidate  pa3ents.    Note:  Hardware  was  a  single  2007  vintage  Xeon  2.0  GHz  8-­‐core  server  

 !"#$%&'()&*+,#

-./#(0123#4567#%)360%2)&#%0#()8+9)67+3%&)#

:+9)(#+&(#36%0)(#;1)05)3#

<)9+&*=#6)095&%,%2>#

;1)0>#

<)?6)9@)0#ABC#-B!!#

Exclusions  &  inclusions  created  from  exis3ng  seman3c  queries  

6237  candidate  pa3ents  returned  from  EHR  Server