AI Magazine Volume 20 Number 3 (1999) (© AAAI) AI in Medicine

11
AI has embraced medical applications from its inception, and some of the earliest work in suc- cessful application of AI technology occurred in medical contexts. Medicine in the twenty-first cen- tury will be very different than medicine in the late twentieth century. Fortunately, the technical challenges to AI that emerge are similar, and the prospects for success are high. W hen I was asked to make this presen- tation, the organizers specifically asked me to review a bit of the histo- ry of AI in medicine (AIM) and to provide an update of sorts. I have therefore taken the lib- erty of dividing the last 30 years of medical AI research into three eras: the era of diagnosis, the era of managed care, and the era of molec- ular medicine. A description of these eras allows me to review for you some of the early and current work in AIM and then tell you about some of the exciting opportunities now emerging. Why is AI in medicine even worth consider- ing? In the late 1950s, medicine was already drawing the attention of computer scientists principally because it contains so many stereo- typical reasoning tasks. At the same time, these tasks are fairly structured and so are amenable to automation. Every medical student learns that when one thinks about a disease, one thinks in an orderly way about epidemiology, pathophysiology, diagnosis, treatment, and then prognosis. These are the bins into which medical information is parsed. These sorts of structured reasoning methods made medicine an attractive application area. In addition, medicine is clearly knowledge intensive, and so at places like Stanford (where knowledge was power [Feigenbaum 1984]), it was very tempting to try to encode knowledge for the purposes of reproducing expert performance at diagnosis and treatment. The working hypoth- esis was that rich knowledge representations would be sufficient, with only relatively weak inference algorithms required. There was (and is) considerable debate about how complex inference should be for expert performance, but it is clear that medicine is a field in which there is a lot of knowledge required for good performance. It is also clear that physicians constantly feel a sense of overload as they deal with the individual data associated with their patients as well as the content knowledge of medicine that they are trying to apply to the treatment of these patients. I can try to provide a feel for the information-processing load on a physician: A full-time general practitioner is currently expected to longitudinally follow a panel of 2000 to 2500 patients. Of course, the severity of illness varies, but it is clear that physicians need systems (computer or other- wise) to track the data pertaining to these patients and turn it into working hypotheses for diagnosis, treatment, and long-term prog- nosis. The other appeal to working in AI in medi- cine is that the field is large, and so virtually all aspects of intelligent behavior can be studied in one part or another of medicine. You can study issues of image processing, automated management of database information, robotic automation of laboratories, computer-assisted diagnosis, multimedia for physician and patient education, virtual and telesurgery, and many other issues. For some, AI in medicine provides a kinder, gentler, “greener” applica- tion area in which to apply their techniques. Three Eras for AI in Medicine The first era of AI in medicine was the “Era of Diagnosis.” The first aspect of medical reason- ing that caught the imagination of AI researchers was the process of collecting clini- cal data and applying inference rules to make the diagnosis of a disease. This is the common image of the doctor as sleuth, determining what disease is causing the patient’s symp- Articles FALL 1999 67 AI in Medicine The Spectrum of Challenges from Managed Care to Molecular Medicine Russ B. Altman, M.D., Ph.D Copyright © 1999, American Association for Artificial Intelligence. All rights reserved. 0738-4602-1999 / $2.00 AI Magazine Volume 20 Number 3 (1999) (© AAAI)

Transcript of AI Magazine Volume 20 Number 3 (1999) (© AAAI) AI in Medicine

Page 1: AI Magazine Volume 20 Number 3 (1999) (© AAAI) AI in Medicine

� AI has embraced medical applications from itsinception, and some of the earliest work in suc-cessful application of AI technology occurred inmedical contexts. Medicine in the twenty-first cen-tury will be very different than medicine in thelate twentieth century. Fortunately, the technicalchallenges to AI that emerge are similar, and theprospects for success are high.

When I was asked to make this presen-tation, the organizers specificallyasked me to review a bit of the histo-

ry of AI in medicine (AIM) and to provide anupdate of sorts. I have therefore taken the lib-erty of dividing the last 30 years of medical AIresearch into three eras: the era of diagnosis,the era of managed care, and the era of molec-ular medicine. A description of these erasallows me to review for you some of the earlyand current work in AIM and then tell youabout some of the exciting opportunities nowemerging.

Why is AI in medicine even worth consider-ing? In the late 1950s, medicine was alreadydrawing the attention of computer scientistsprincipally because it contains so many stereo-typical reasoning tasks. At the same time, thesetasks are fairly structured and so are amenableto automation. Every medical student learnsthat when one thinks about a disease, onethinks in an orderly way about epidemiology,pathophysiology, diagnosis, treatment, andthen prognosis. These are the bins into whichmedical information is parsed. These sorts ofstructured reasoning methods made medicinean attractive application area. In addition,medicine is clearly knowledge intensive, andso at places like Stanford (where knowledgewas power [Feigenbaum 1984]), it was verytempting to try to encode knowledge for thepurposes of reproducing expert performance atdiagnosis and treatment. The working hypoth-esis was that rich knowledge representationswould be sufficient, with only relatively weak

inference algorithms required. There was (andis) considerable debate about how complexinference should be for expert performance,but it is clear that medicine is a field in whichthere is a lot of knowledge required for goodperformance. It is also clear that physiciansconstantly feel a sense of overload as they dealwith the individual data associated with theirpatients as well as the content knowledge ofmedicine that they are trying to apply to thetreatment of these patients. I can try to providea feel for the information-processing load on aphysician: A full-time general practitioner iscurrently expected to longitudinally follow apanel of 2000 to 2500 patients. Of course, theseverity of illness varies, but it is clear thatphysicians need systems (computer or other-wise) to track the data pertaining to thesepatients and turn it into working hypothesesfor diagnosis, treatment, and long-term prog-nosis.

The other appeal to working in AI in medi-cine is that the field is large, and so virtually allaspects of intelligent behavior can be studiedin one part or another of medicine. You canstudy issues of image processing, automatedmanagement of database information, roboticautomation of laboratories, computer-assisteddiagnosis, multimedia for physician andpatient education, virtual and telesurgery, andmany other issues. For some, AI in medicineprovides a kinder, gentler, “greener” applica-tion area in which to apply their techniques.

Three Eras for AI in MedicineThe first era of AI in medicine was the “Era ofDiagnosis.” The first aspect of medical reason-ing that caught the imagination of AIresearchers was the process of collecting clini-cal data and applying inference rules to makethe diagnosis of a disease. This is the commonimage of the doctor as sleuth, determiningwhat disease is causing the patient’s symp-

Articles

FALL 1999 67

AI in MedicineThe Spectrum of Challenges from

Managed Care to Molecular Medicine

Russ B. Altman, M.D., Ph.D

Copyright © 1999, American Association for Artificial Intelligence. All rights reserved. 0738-4602-1999 / $2.00

AI Magazine Volume 20 Number 3 (1999) (© AAAI)

Page 2: AI Magazine Volume 20 Number 3 (1999) (© AAAI) AI in Medicine

bution from the NLM was the creation of anonline database of the published biomedicalliterature, MEDLINE. Having gone through anumber of transformations, the MEDLINE data-base was recently made available to the generalpublic via the PUBMED resource on the WorldWide Web (www.ncbi.nlm.nih.gov/PubMed/).For better or worse (I believe for the better),physicians and patients now have unprece-dented access to a literature that is growingexponentially. The challenges in indexing,searching, and parsing this literature representa major challenge to AI investigators.

The 1970s brought the push for diagnosticperformance. De Dombal et al. in 1972 showedthat you could make clinically accurate diag-noses using Bayesian inference (de Dombal etal. 1972). Also in 1972, Kulikowski and theteam at Rutgers created the CASNET system inwhich they explored methods for using causalmodels for somewhat deeper diagnostic tasks(Kulikowski and Weiss 1982). The models weredeeper because physiological models were nowbeing used to explain symptoms and describediagnostic possibilities. Shortliffe, Buchanan,and coworkers showed soon afterwards (withthe MYCIN system in Buchanan and Shortliffe[1984]) that production rules could be used tomake expert-level diagnosis of infectious dis-eases. Pauker and coworkers created the PIP sys-tem (Presenting Illness Program) in which thecognitive processes associated with short-termand long-term memory were modeled in orderto create programs that could consider multi-ple diagnoses but then focus on the few mostlikely solutions quickly (Szolovits and Pauker1976). Figure 1 shows a figure from one of thePIP papers, in which an associative memorystructure is modeled. As particular concepts areactivated and drawn into the river represent-ing active memory, they drag into the riverwith them associated ideas that then come tothe attention of the inference engine.

A magnum opus during this period was theINTERNIST knowledge base and inference pro-gram published by Miller, Myers, and cowork-ers (Miller, Pople, and Myers 1982). INTERNIST

had the goal of diagnosing any problem withingeneral internal medicine—basically any sys-temic disease or disease of the organs betweenthe neck and the pelvis. INTERNIST was based ona very large knowledge base that was trans-ferred to a PC-based program called QMR, whichnow forms the basis for a commercial product(Miller, Masarie, and Myers 1986). TheINTERNIST/QMR knowledge base associated dis-eases with findings using two numbers: a fre-quency of association and an evoking strength.There was then an algorithm created for col-

toms. The second era of AI in medicine waswhat I have called the “Era of Managed Care ofChronic Disease.” This era has approached aset of problems quite distinct from those tack-led in the preceding period, as I will discuss.Finally, we are on the precipice of the “Era ofMolecular Medicine,” which is once againgoing to raise issues that are different fromthose occupying researchers during the firsttwo.

The Era of DiagnosisIn 1959, Ledley and Lusted (1959) published apaper in Science entitled “The Reasoning Foun-dations of Medical Diagnosis.” This classicpaper has the feature of many classic papers: Itputs forth a series of statements that are nowtaken as almost self-evident. Ledley and Lustedpointed out that medical reasoning was notmagic but instead contained well-recognizedinference strategies: Boolean logic, symbolicinference, and Bayesian probability. In partic-ular, diagnostic reasoning could be formulatedusing all three of these techniques. Their papermapped a research program for the next 15years, as investigators spun out the conse-quences of applying these inference strategiesto medical domains.

The research that followed was varied andexcellent, and I cannot properly review all thecontributions but instead will pick some exem-plary efforts. For example, in 1965 LawrenceWeed introduced a computer system calledPROMIS to support a problem-oriented medicalinformation methodology (Tufo et al. 1977).Weed’s work was among the first to demon-strate a truly electronic medical record. More-over, this record was a highly structured,strongly typed data structure (in many wayssimilar to our modern frame-based systems)that even today is rarely matched in its insis-tence on structured data input. Weed’s workwas limited by the absence of standard termi-nologies to use within his data structure, buthis belief in structured data is still a major goalwithin the medical informatics community.

In the late 1960s the National Library ofMedicine (NLM) (www.nlm.nih.gov/) wasestablished as one of the National Institutes ofHealth (NIH). This was remarkable for manyreasons, not least of which was that most insti-tutes within the NIH are associated with anorgan or a disease (for example, The NationalInstitute of Heart, Lung, and Blood or TheNational Cancer Institute). The NLM is still insearch of its organ or disease. Nevertheless, theextramural research program of the NLM hasbeen a principal source of research funds for AIin medicine. The principal intramural contri-

Articles

68 AI MAGAZINE

Page 3: AI Magazine Volume 20 Number 3 (1999) (© AAAI) AI in Medicine

lecting findings and computing the most likelydiagnoses. Since the introduction of this pro-gram, others have been introduced that arebased on similar ideas, including DXPLAIN (Bar-nett et al. 1987) and ILIAD (Bouhaddou et al.1995). The performance of these programs hasbeen evaluated and compared by runningthem on some challenging case reports (calledclinicopathological cases, or CPCs) such as thosethat appear each week in the New England Jour-nal of Medicine (www.nejm.com) (Berner, Jack-son, and Algina 1996; Wolfram 1995; Feldmanand Barnett 1991). In most cases, the perfor-mance of the programs is comparable to expertdiagnostic performance (as judged by a blindedreview of diagnoses produced by both expertsand the programs, or unblinded evaluation ofthe performance using defined performancecriteria for success). The programs routinelyoutperform medical students and physicians intraining.

In the mid- to late 1980s, Heckerman andcoworkers showed that the preliminary workof De Dombal could be extended using

Bayesian networks for diagnosis, in which theconditional dependencies between variablescould be modeled in a somewhat natural man-ner (Heckerman, Horvitz, and Nathwani1992). They also were able to recast some ofthe assumptions behind the other (apparentlynonprobabilistic) systems (MYCIN and INTERNIST)to create a unified probabilistic “map” of thespace of diagnostic algorithms (Dan andDudeck 1992; Middleton et al. 1991; Shwe etal. 1991). So by the end of the 1980s, there wasa large and distinguished literature on medicaldiagnosis. This literature has continued andexpanded to nonmedical areas such as thediagnosis of faults in electronic circuitry andother engineering applications.

The Era of Managed Care of Chronic DiseaseSo what happened to the Era of Diagnosis? Allof these systems were evaluated, and all ofthem seemed to perform near the level ofhuman experts. Well, there were a few prob-lems. First, physicians did not embrace these

Figure 1. A Representation of the Associative Memory Required for Medical Diagnosis from the PIP Work (Szolovits and Pauker 1976).

A. A “fact” is searching for a connection to the network, so that the appropriate concepts can be pulled down into the short-term memo-ry. B. A “fact” has found a matching concept and thus pulls the appropriate associated concepts into the short-term memory.

Articles

FALL 1999 69

A B

Page 4: AI Magazine Volume 20 Number 3 (1999) (© AAAI) AI in Medicine

One of the ways to reduce the cost of healthcare is to move it out of expensive hospitalrooms and into outpatient clinics. So insteadof intense episodes in the hospital, we havethese much more frequent less intenseepisodes in the clinic where similar things arebeing done but in a more fragmented manner.The fragmentation may cause confusion as weask physicians to track the progress of 2500patients with periodic interactions.

One way to capture the look and feel of AI inMedicine today is to look at the contents of arecent meeting. The AI in Medicine Europe(AIME) conference was held in Grenoble in1997 (Shahar, Miksch, and Johnson 1997). Anexamination of the table of contents revealsthree subjects, in particular, that reflect currentconcerns: (1) the representation and manipu-lation of protocols and guidelines, (2) naturallanguage and terminology, and (3) temporalreasoning and planning. Other areas of impor-tance include knowledge acquisition andlearning, image and signal processing, decisionsupport, and (our old friend) diagnostic rea-soning.

Protocols and guidelines have become animportant way to standardize care and reducevariance. Guidelines are created by panels ofphysicians who assess available data and rec-ommend treatment strategies. For example,how should a newly discovered breast lump beevaluated? The AI challenges follow directly:How do we develop robust and reusable repre-sentations of process? How do we create adap-tive plans that respond to changes in availableinformation? How do we distinguish betweenhigh-level plan recommendations and theirspecific local implementation? How do wemodify guidelines in response to data collectedduring their execution? How do we model theeffects of guidelines on organizations? There isan increasing interest in the representationand simulation of organization systems inorder to predict the effects of interventions inmedical care. One recent development in thisarea has been the development of a guidelineinterchange format (GLIF) (Ohno-Machado etal. 1998). GLIF is a syntax for specifying clinicalprotocols. It contains a language for represent-ing actions, branch steps, and synchronizationsteps (among others) needed to specify a clini-cal guideline (figure 2).

Natural language and standardized termi-nologies remain a critical issue in medicalcomputing. The medical goal is to create stan-dards for communication that move awayfrom hand-written natural language. Medicineis the only major industry still relying onhand-written documentation. How do we

technologies. Clinical data, unlike billing data,were not routinely available in a digital form,so when you ran these programs there werethese very awkward interfaces that asked youlots of questions in order to get the informa-tion necessary to do the diagnosis. Clinicianssimply did not want to spend time enteringdata that were already written into a noteusing natural language. The AI in medicinecommunity realized that they needed elec-tronic medical records as a prerequisite infra-structural element to allow the deployment ofthese technologies. Thus, issues of knowledgerepresentation, automatic data acquisition,federation of databases, and standard termi-nologies became quite important. The secondproblem for diagnostic programs was thatphysicians did not want help with diagnosis.Diagnosis is fun, and physicians are trained todo it well in medical school and only improvewith years of practice. They did not want togive up that fun to a computer. The most sig-nificant problem, however, was that diagnosisis a actually very small part of what physiciansdo in the delivery of medicine. Most visits to aphysician are for an existing, previously diag-nosed problem. The challenge to the physicianis to follow the problem and respond to itsevolution intelligently. Diagnosis is a relativelyrare event, probably accounting for less than 5percent of physician time. What physiciansreally need is help following chronic and slow-ly evolving disease in 2500 patients that areseen in brief episodes but require expert inter-ventions. So we have the era of chronic caredriving AI in medicine research. This problemis compounded by an aging population withmore chronic diseases.

There is one other element of medicine thathas changed the imperatives for AI research,and this is the emergence of new economicmodels for funding medicine (Selby 1997; Det-sky and Naglie 1990). The traditional modelhas been fee for service: A physician performsa service and gets paid an agreed-uponamount. If the physician performs lots of ser-vices, the physician makes more money. Thenew model of medical funding is based on astandard rate per patient that is paid to aphysician, regardless of the usage of services bythe patient. Now, the financial incentives arereversed. If the physician provides a service,then its cost in time and resources is taken outof the pot of money that represents potentialprofit. Now physicians still want to treat ill-ness, but there is now a huge incentive todeliver cost-effective, high-quality care. Sys-tems for supporting these activities becomethe mandate.

The AI inmedicine

communityrealized thatthey needed

electronicmedical

records as aprerequisite

infra-structural element toallow the

deployment of these

technologies.

Articles

70 AI MAGAZINE

Page 5: AI Magazine Volume 20 Number 3 (1999) (© AAAI) AI in Medicine

define formal semantics so that when we cre-ate these electronic medical records, we canpopulate them with clean data? What is theunderlying ontology for clinical medicine?How do you map natural language into stan-dard terminologies? How do we accommodatelocal and global changes to these terminolo-gies? How do we integrate legacy databaseswith newer, semantically clean databases?How can we have machine-learning tech-niques for extracting new medical knowledgefrom our semantically clean databases? It isimportant here to mention the Unified Med-ical Language System (UMLS), a project at theNational Library of Medicine with the goal ofintegrating a number of existing medicalvocabularies using a common semantic struc-ture (Bodenreider et al. 1998). The existing ter-minologies include those for specifying diag-noses, medical procedures, and bibliographicindexing (Cote and Robboy 1980; Slee 1978).The UMLS is based on a semantic network andhas about 500,000 terms that have been classi-fied into about 150 semantic types with speci-fied relationships. A fragment of its semanticnetwork is shown in figure 3.

Temporal reasoning and planning becomecritical in a setting where diseases are chronic,and interactions are episodic. The challengesare to integrate database and knowledge basetechnology with temporal inferencing capabil-ities. How do we actually modify medical data-bases so that we can do effective temporalinference with them? How can we recognizeand abstract temporal trends in clinical data?Nonmonotonic reasoning becomes essential:As new data are collected, we retract old infer-ences and assert different ones. How do we cre-ate “smooth” models of patient state based onepisodic data collection? Finally, how can wecreate plans for treatment over time?

My colleague at Stanford, Yuval Shahar, hasdone excellent work in the area of temporalabstraction and has a system that is able toautomatically take a set of discrete data pointsand transform them into sensible intervalsthat can, in turn, be grouped together intoeven higher-level abstractions (Shahar,Miksch, and Johnson 1998) (as summarized infigure 4).

There are some other application areas with-in medicine that deserve mention, includingtelemedicine, how to deliver medical care at adistance using multimedia; intensive care med-icine, with emphasis on reasoning with limitedresources; and clinical trials, methods to auto-matically recognize that a patient is eligible fora trial and to enroll them.

The Era of Molecular MedicineAlthough the management of chronic diseaseunder conditions of capitated payment arelikely to continue, I believe that there is aneven more revolutionary set of changes com-ing to medicine. These changes will arise fromthe work being done in basic biology in deter-mining the complete DNA sequence of boththe human organism as well as most major dis-ease-causing organisms. There is an excellentpaper in the IAAI-98 proceedings by Rick Lath-rop and coworkers (Lathrop et al. 1998) that isan example of the opportunities in linkingmolecular concepts with medical care and AIresearch.

Some Biology First, it is appropriate to givesome background about the genome sequenc-ing efforts. The entire development, structure,and function of an organism is specified by asequence of four DNA letters: A, T, C, and G arethe abbreviation of their chemical names

Articles

FALL 1999 71

Figure 2. The Guideline Interchange Format.This is a segment of a representation of the process of evaluating a breast mass.The language has a syntax for sequences of events, branches, and other struc-

tured metainformation about the process of evaluation.

Guideline the_breast_mass_guideline{ name = "Breast Mass Guideline";

authors = SEQUENCE 1 {"Max Borten, MD, JD";};eligibility_criteria = NULL;intention = "Evaluation of breast mass.";steps =

SEQUENCE 40{ (Branch_Step 1);

(Action_Step 101);(Action_Step 102);(Action_Step 103);(Synchronization_Step 1031);(Conditional_Step 104);(Conditional_Step 105);

Action_Step 102{ name = "Elicit Risk Factors for Breast Cancer in Personal History";

action = Action_Spec 102.1{ name = "Elicit Personal History";

description = "Elicit Personal History";patient_data = SEQUENCE

{ Patient_Data 102.1{ name = "Personal risk factors for breast cancer";type = "" ;possible_values = Sequence 0 {};temporal_constraint = "valid thru all time";didactics = SEQUENCE 0 {};};

};didactics = SEQUENCE 0 {};

};

subguideline = NULL;next_step = (Synchronization_Step 1031);didactics = SEQUENCE 0 {};}

Page 6: AI Magazine Volume 20 Number 3 (1999) (© AAAI) AI in Medicine

these 3 billion into subsegments for logisticalreasons, with an average length of 256 millionDNA letters. Genes are subsequences within thesequence of 3 billion that encode for particularfunctions or structures that exist in your body.

(energy). A human organism is specified bythree billion letters, arranged serially, that con-stitute its genome. With 2 bits per DNA letter, ittakes about 750 megabytes of data to specify ahuman. There are 23 chromosomes that divide

Articles

72 AI MAGAZINE

Figure 3. A Subset of the Semantic Net Created for the Unified Medical Language System (UMLS), in Which the Concept of Biological Function Is Specialized into Subsets.

The semantic network is used to organize about 500,000 concepts in the UMLS.

BiologicFunction

PathologicFunction

PhysiologicFunction

OrganismFunction

Organor

TissueFunction

CellFunction

MolecularFunction

MentalProcess

GeneticFunction

Mentalor

BehavioralDysfunction

NeoplasticProcess

Diseaseor

Syndrome

Cellor

MolecularDysfunction

ExperimentalModel

ofDisease

.

0 40020010050

∆∆

1000

2000

∆( )∆ ∆ ∆

100K

150K( )

•••

• • • •∆∆∆

•• •

∆∆∆∆∆∆•••

WhiteCellcount

• • •∆ ∆∆ ∆

Time (days)

Plateletcounts

PAZ protocol

M[0] M[1] M[2] M[3] M[0] M[0]

BMT

Expected CGVHD

Figure 4. Temporal Abstraction in a Medical Domain.Raw data are plotted over time at the bottom. External events and the abstractions computed from the data are plotted as intervalsabove the data. BMT = a bone-marrow transplantation event; PAZ = a therapy protocol for treating chronic graft-versus-host disease

(CGVHD), a complication of BMT; � � = event; • = platelet counts; ∆ = white cell counts; ∞ = context interval; �� = abstraction interval;M[n] = bone-marrow–toxicity grade n.

Page 7: AI Magazine Volume 20 Number 3 (1999) (© AAAI) AI in Medicine

There are about 100,000 genes within a humangenome. More than 99.9 percent of thegenome is identical for all humans. And so allthe diversity of human life is contained in the0.1 percent that is different. One of the humangenes encodes a channel that allows a chlorideion to pass from the outside of a cell to itsinside. This channel sometimes has a mutationthat leads to the disease cystic fibrosis. Anunderstanding of how the DNA letters differ inpatients with cystic fibrosis allows biologists tobegin to understand the mechanism of the dis-ease and ways to alter its course. The cysticfibrosis gene was isolated in a relatively expen-sive, focused manner before the genomesequencing project was under way. The logicbehind a genome project is to isolate all genesfor an organism and catalog them usingeconomies of scale. The primary genome pro-ject is the Human Genome Project, but thereare also genome projects for important modelorganisms and organisms that cause humandisease. The principal funding agencies for thegenome project are the National Institutes ofHealth (via the National Human GenomeResearch Institute, www.nhgri.nih.gov/) andthe Department of Energy (www.er.doe.gov/facepage/hug.htm).

Associated with the genome sequencingprojects are a number of other new technolo-gies in biology that will allow data to be col-lected on a large scale, never before possible.Soon it will be possible to assess the completeset of genes that are active in a cell and com-pare this set with the genes that are active in adiseased version of the same cell (Marshall andHodgson 1998). Thus, we can find out whichgenes are active in a normal prostate cell aswell as which genes are active in a prostatecancer cell. The differences are the obviousplaces to look for new treatments of prostatecancer. The differences may also provide newways to make a more sensitive and specificdiagnosis of prostate cancer in a patient. Final-ly, the differences may be used to determinethe likely prognosis of a particular prostatecancer, based on its constellation of genes, andwhether they are associated with an indolentor aggressive type of cancer.

Having defined all the genes in a biologicalsystem, there are incredible opportunities forinformation storage, retrieval, and analysistechnologies. First, the epidemiology of dis-ease will now have a molecular basis. We willtrack the spread of infections by identifyingthe unique sequences of the offending bacteriaand using this as a signature to follow itsspread through the population. For example,investigators have tracked the spread of tuber-

culosis with these technologies (Behr et al.1998; Blower, Small, and Hopewell 1996). Sec-ond, clinical trials will have patients who arestratified by the differences and similarities intheir genes. We will be able to relate particularclinical syndromes to particular treatmentsbased on a molecular understanding of thebasis of these syndromes. The diagnosis ofmany diseases will become a simple lookup inthe genome of the patient to see which variantis present (Winters et al. 1998). We will be ableto focus treatments using this information,once we have learned from the data the bestdrugs to use against different disease/gene vari-ations. Finally, we will have prognostic infor-mation beyond anything currently availablebecause we will have access to the full geneticendowment of a patient and, when relevant,the infectious pathogens causing disease. Insome cases, in fact, we may know decadesbefore a disease is evident that a patient is athigh risk for that disease. At this point, it isimportant to mention the ethical, social, andlegal issues associated with the HumanGenome Project. A certain fraction of theannual genome project budget is spent ongrants addressing these issues, including issuesof privacy, ethical use of medical information,patients rights to information, and the like(www.nhgri.nih.gov/About_NHGRI/Der/Elsi/).

What’s the status of the genome sequencingprojects? Although this is not AI per se, it isuseful to get a feeling for the amounts andtypes of data that are being generated. Consid-er the GENBANK database of DNA sequences(Benson et al. 1998). A recent release of thatdatabase contained 1.6 billion bases. (Remem-ber, there are 3 billion bases in a human). How-ever, this database contains DNA sequencesfrom all organisms, and not just humans. Fig-ure 5 shows the growth in the size of this data-base since its inception in 1982. All these dataare available on the World Wide Web, and oneof the remarkable aspects of the explosion ofbiological data is the ease in which it can beaccessed, and so it becomes something of aplayground for information scientists whoneed to test ideas and theories. Table 1 showsthe ranking of species in the DNA databases bythe values of sequenced bases for eachsequence. For example, we have roughly 700million bases of human genome sequence. Thehuman genome is currently scheduled to becompleted around 2003. Other organismsinclude important laboratory test organisms(for example, mouse, rat, or fruit fly) or humanpathogens (for example, the HIV virus, malar-ia, syphilis, or tuberculosis). One of the mostexciting challenges that arises as we learn the

Articles

FALL 1999 73

Page 8: AI Magazine Volume 20 Number 3 (1999) (© AAAI) AI in Medicine

ear sequences of the DNA and of protein mol-ecules (Durbin et al. 1998).

Second, the technologies for definingontologies, terminologies, and their logicalrelationships have been used to create formaltheories for areas within biology (Schulze-Kre-mer 1998).

Third, genetic algorithms and genetic pro-gramming have been used to create solutionsthat are in some cases superior to solutions cre-ated by hand (Koza 1994).

Fourth, neural networks have been used, asthey have in many other fields, to achieve clas-sification performance that is often quiteimpressive. The work in predicting aspects ofthree-dimensional structure from sequenceinformation alone has received considerableattention (Rost and Sander 1994).

Fifth, unsupervised cluster analysis of bio-logical sequences and structures, includingBayesian approaches, has been successful increating sensible categories within biologicaldata sets (States, Harris, and Hunter 1993).

Sixth, case-based reasoning has become veryimportant in areas that are still data poor. Forexample, information about the three-dimen-sional structure of biological molecules is stilllagging behind the associated DNA sequenceinformation. Thus, of the 100,000 proteins ina human, we only know the structure of about700 of them. These examples represent valu-able “cases” that are constantly being used toextrapolate new information about theremaining 99,000 proteins (Jones 1997).

Seventh, knowledge representation tech-niques have been used to represent the entireset of metabolic capabilities of the bacteriaEscherichia coli. The resulting ECOCYC knowl-edge base has been used to infer metaboliccapabilities and compare these capabilitiesacross organisms (Karp et al. 1998).

Eighth, new knowledge representation anddigital library techniques have been used torepresent the complete literature of a subdisci-pline within molecular biology using ontolo-gies for biological data, biological structure,and scientific publishing (Chen, Felciano, andAltman 1997). We have created a collaborativeresource, RIBOWEB, that allows scientists tointeract with these data and compute with itover the web.

Ninth, intelligent agents are being designedto assist biologists in understanding and min-ing the data that are accumulating from thenew high-throughput biological experiments.The National Center for Biotechnology Infor-mation (www.ncbi.nlm.nih.gov/) is the clear-ing house for many sources of useful biologicaldata. Their collection includes data about DNA

complete genetic background of bacteria is todevelop comparative methods for understand-ing how differences in genetic endowment cre-ate differences in preference for mode of infec-tion, type of infection, and virulence. Table 2shows some organisms whose genomes arecompletely known.

An international society (The InternationalSociety for Computational Biology,www.iscb.org/) has been formed to create acommunity for researchers in biocomputing.In many ways, it is a spin-off from the AAAIcommunity. A conference entitled IntelligentSystems for Molecular Biology was first held in1993 in association with AAAI (//ismb99.gmd.de/). It subsequently was disconnected fromAAAI for scheduling reasons but has beenremarkably successful. The proceeds from thismeeting provided the core funds to spawn thesociety. The name of the field (computationalbiology, bioinformatics, biocomputing, intelli-gent systems for molecular biology) has beenthe matter of some debate and often reflectsthe disciplinary training of the debate partici-pants (Altman 1998).

AI Contributions to Molecular Medi-cine I would like to briefly review some ofthe successes in the AI and molecular biologycommunity. In many cases, existing technolo-gy has been transferred effectively, or new vari-ants have been created in response to theneeds of the field.

First, Hidden Markov Models, developedoriginally for natural language processing,have become a powerful tool for analyzing lin-

Articles

74 AI MAGAZINE

Size of GENBANK DNA Database

0

500000000

1000000000

1500000000

2000000000

2500000000

Dec-8

2

May

-86

Jun-8

8

Jun-9

0

Sep-

92

Apr-9

4

Oct-9

5

Apr-9

7

Oct-9

8

GENBANK Release Date

DN

A B

ases

in

GEN

BA

NK

Figure 5. The Rapid Increase in Genetic Data Contained in a Major DNA Sequence Database.

Page 9: AI Magazine Volume 20 Number 3 (1999) (© AAAI) AI in Medicine

Articles

FALL 1999 75

Genes DNA Bases Species (Common Name)

1573906 1000128755 Homo sapiens (human)403552 191558011 Mus musculus (mouse)76540 142527757 Caenorhabditis elegans (soil nematode)71079 78218600 Arabidopsis thaliana56199 63799526 Drosophila melanogaster (fruit fly)10581 28685645 Saccharomyces cerevisiae (baker’s yeast)45849 28537572 Rattus norvegicus (rat)4953 18023376 Escherichia coli41866 17672014 Rattus sp.32190 16498151 Fugu rubripes (puffer fish)36345 16196521 Oryza sativa (rice)9610 12068959 Schizosaccharomyces pombe25383 11280798 Human immunodeficiency virus type 1 (HIV)1094 9985595 Bacillus subtilis4734 7009140 Plasmodium falciparum (malaria)16688 6331052 Brugia malayi (filariasis)5379 5922144 Gallus gallus (chicken)685 5711838 Mycobacterium tuberculosis (tuberculosis)5136 4648144 Bos taurus (cow)10847 4413291 Toxoplasma gondii (toxoplasmosis)

Aquifex aeolicus (bacteria that grows at 85° to 95° C!)Archaeoglobus fulgidus (bacteria that metabolizes sulfur, lives at high temperatures)Bacillus subtilis (ubiquitous soil bacteria)Borrelia burgdorferi (causes Lyme Disease)Chlamydia trachomatis (causes blindness in developing countries)Escherichia coli (can cause urinary tract infections, dysentery)Haemophilus influenzae (causes upper respiratory infections)Methanobacterium thermoautotrophicum (bacteria that produces methane, lives at 70° C)Helicobacter pylori (causes ulcers, maybe cancer)Methanococcus jannaschii (bacteria that produces methane)Mycobacterium tuberculosis (causes tuberculosis)Mycoplasma genitalium (smallest genome of known independent organisms)Mycoplasma pneumoniae (causes “walking pneumonia”)Pyrococus horikoshii (grows best at 98° C!)Saccharomyces cerevisiae (baker’s yeast)Treponema pallidum (causes syphillis)

Table 1. Number of Genes and Total Bases of DNA Sequenced for Various Organisms as of 10/98.

Table 2. Some Completed, Fully Sequenced Genomes.

Page 10: AI Magazine Volume 20 Number 3 (1999) (© AAAI) AI in Medicine

mission of Mycobacterium Tuberculosis.American Journal of Respiratory and CriticalCare Medicine 158(2): 465-469.

Benson, D. A.; Boguski, M. S.; Lipman, D. J.;Ostell, J.; and Ouellette, B. F. 1998. GENBANK.Nucleic Acids Research 26(1): 1–7.

Berner, E. S.; Jackson, J. R.; and Algina, J.1996. Relationships among PerformanceScores of Four Diagnostic Decision SupportSystems. Journal of the American MedicalInformatics Association 3(3): 208–215.

Blower, S. M.; Small, P. M.; and Hopewell, P.C. 1996. Control Strategies for TuberculosisEpidemics: New Models for Old Problems.Science 273(5274): 497–500.

Bodenreider, O.; Burgun, A.; Botti, G.;Fieschi, M.; LeBeux, P.; and Kohler, F. 1998.Evaluation of the Unified Medical Lan-guage System as a Medical KnowledgeSource. Journal of the American MedicalInformatics Association 5(1): 76–87.

Bouhaddou, O.; Lambert, J. G.; and Mor-gan, G. E. 1995. Iliad and the MedicalHouse Call: Evaluating the Impact of Com-mon Sense Knowledge on the DiagnosticAccuracy of a Medical Expert System. InProceedings of the Nineteenth AnnualSymposium on Computer Applications inMedical Care, 742–746. Salt Lake City,Utah: Applied Medical Informatics.

Buchanan, B., and Shortliffe, E., eds. 1984.Rule-Based Expert Systems: The MYCIN Experi-ments of the Stanford Heuristics ProgrammingProject. Menlo Park, Calif.: Addison-Wesley.

Chen, R. O.; Felciano, R.; and Altman, R. B.1997. RIBOWEB: Linking Structural Compu-tations to a Knowledge Base of PublishedExperimental Data. Intelligent Systems forMolecular Biology 5:84–87.

Cote, R. A., and Robboy, S. 1980. Progressin Medical Information Management. Sys-tematized Nomenclature of Medicine(SNOMED). Journal of the American MedicalAssociation 243(8): 756–762.

Dan, Q., and Dudeck, J. 1992. CertaintyFactor Theory and Its Implementation in aMedical Expert System Shell. Medical Infor-matics (London) 17(2): 87–103.

de Dombal, F. T.; Leaper, D. J.; Staniland, J.R.; McCann, A. P.; and Horrocks, J. C. 1972.Computer-Aided Diagnosis of AcuteAbdominal Pain. British Medical Journal2(5804): 9–13.

Department of Energy. 1996. HumanGenome Project. To Know Ourselves. Avail-able at www.ornl.gov/hgmis/tko/. Alsoavailable at www.ornl.gov/TechResources/Human_Genome/publicat/primer/intro.html.

Detsky, A. S., and Naglie, I. G. 1990. A Clin-ician’s Guide to Cost-Effectiveness Analysis[see comments]. Annals of Internal Medicine113(2): 147–154.

7. We need systems to provideand document continuing educa-tion for physicians.

Evaluation Challenges

8. We need demonstrations of thecost-effectiveness of advancedinformation technology.

9. We need to create new medicalknowledge with machine-learn-ing and/or data-mining tech-niques. Having established thedata infrastructure for clinicaldata and biological data, therewill be unprecedented opportuni-ties for gaining new knowledge.

10. Finally, we need to ensurethat there is equitable access tothese technologies across patientand provider populations.

ConclusionsThe Era of Diagnosis got things rollingand created excitement as existinginferencing strategies were tested inthe real-world application domain ofmedicine. The current Era of ChronicDisease and Managed Care haschanged the focus of our efforts. Thecoming Era of Molecular Medicinecontains challenges that can keepinformation technologists busy fordecades.

AcknowledgmentsRBA is supported by grants from NIH(LM-05652, LM-06422), NSF (DBI-9600637, agreement ACI-9619020), aFaculty Scholar grant from IBM, and agift from Sun Microsystems Inc.

ReferencesAltman, R. 1998. A Curriculum for Bioin-formatics: The Time Is Ripe. Bioinformatics14(8): 549–550.

Altman, R. B. 1997. Informatics in the Careof Patients: Ten Notable Challenges [seecomments]. Western Journal of Medicine166(2): 118–122.

Barnett, G. O.; Cimino, J. J.; Hupp, J. A.;and Hoffer, E. P. 1987. DXPLAIN. An EvolvingDiagnostic Decision-Support System. Jour-nal of the American Medical Association258(1): 67–74.

Behr, M. A.; Hopewell, P. C.; Paz, E. A.;Kawamura, L. M.; Schecter, G. F.; and Small,P. M. 1998. Predictive Value of ContactInvestigation for Identifying Recent Trans-

sequences, human genetic diseases,protein and nucleic acid structures,the biomedical literature, and otherimportant sources. The good news isthat this information is easily avail-able. The bad news is that it is not yetintegrated in a manner that allowsrapid discovery and inferencing. Thus,a biologist with lots of time can even-tually find the answer to a questionusing these databases, but they havenot yet been distilled to the pointwhere a physician in a clinical practicecan use the data to guide clinical deci-sion making. The intelligent integra-tion of biological information thusbecomes one of the major bottlenecksin progress for information processing(Markowitz and Ritter 1995).

Ten Challenges I want to end mypresentation with the 10 grand chal-lenges to medicine I have previouslyproposed (Altman 1997). These can bedivided into infrastructure, perfor-mance, and evaluation goals and aresummarized here:

Infrastructure Challenges

1. We need an electronic medicalrecord based on semanticallyclean knowledge representationtechniques.

2. We need automated capture ofclinical data, from the speech,natural language, or structuredentry, in order to provide the datarequired to move forward.

3. We need computable represen-tations of the literature. Bothclinical and basic biology datashould be structured and univer-sally accessible for automateddata analysis.

Performance Challenges

4. We still need to do automateddiagnosis. Despite the passing ofits era, it is still worth under-standing because there are timesit is useful.

5. We need automated decisionsupport for providers who inter-act with patients episodically andneed help in making decisionsabout the treatment trajectory.

6. We need systems for improvingaccess to information and expla-nation for patients.

Articles

76 AI MAGAZINE

Page 11: AI Magazine Volume 20 Number 3 (1999) (© AAAI) AI in Medicine

Durbin, R.; Eddy, S.; Krogh, A.; and Mitchi-son, G. 1998. Biological Sequence Analysis.Cambridge, U.K.: Cambridge UniversityPress.

Feigenbaum, E. A. 1984. Knowledge Engi-neering: The Applied Side of Artificial Intel-ligence. Annals of the New York Academy ofSciences 246:91–107.

Feldman, M. J., and Barnett, G. O. 1991. AnApproach to Evaluating the Accuracy ofDXPLAIN. Computer Methods and Programs inBiomedicine 35(4): 261–266.

Heckerman, D. E.; Horvitz, E. J.; and Nath-wani, B. N. 1992. Toward Normative ExpertSystems: Part I. The PATHFINDER Project.Methods of Information in Medicine 31(2):90–105.

Jones, D. T. 1997. Progress in Protein Struc-ture Prediction. Current Opinion in Structur-al Biology 7(3): 377–387.

Karp, P. D.; Riley, M.; Paley, S. M.; Pellegri-ni-Toole, A.; and Krummenacker, M. 1998.ECOCYC: Encyclopedia of Escherichia coliGenes and Metabolism. Nucleic AcidsResearch 26(1): 50–53.

Koza, J. R. 1994. Evolution of a ComputerProgram for Classifying Protein Segmentsas Transmembrane Domains Using GeneticProgramming. Intelligent Systems for Molec-ular Biology 2:244–252.

Kulikowski, C., and Weiss, S. 1982. Repre-sentation of Expert Knowledge for Consul-tation: The CASNET and EXPERT Projects. InArtificial Intelligence in Medicine, ed. P.Szolovits. Boulder, Colo.: Westview.

Lathrop, R. H.; Steffen, N. R.; Raphael, M.;Deeds-Rubin, S.; Pazzani, M. J.; Cimoch, P.J.; See, D. M.; and Tilles, J. G. 1998. Knowl-edge-Based Avoidance of Drug-ResistantHIV Mutants. In Proceedings of the TenthAnnual Innovative Applications of Artifi-cial Intelligence Conference, 1071–1078.Menlo Park, Calif.: American Associationfor Artificial Intelligence.

Ledley, R. S., and Lusted, L. B. 1959. Rea-soning Foundation of Medical Diagnosis:Symbolic Logic, Probability, and Value The-ory Aid Our Understanding of How Physi-cians Reason. Science 130(9): 9–12.

Markowitz, V. M., and Ritter, O. 1995.Characterizing Heterogeneous MolecularBiology Database Systems. Journal of Com-putational Biology 2(4): 547–556.

Marshall, A., and Hodgson, J. 1998. DNAChips: An Array of Possibilities. NatureBiotechnology 16(1): 27–31.

Middleton, B.; Shwe, M. A.; Heckerman, D.E.; Henrion, M.; Horvitz, E. J.; Lehmann, H.P.; and Cooper, G. F. 1991. ProbabilisticDiagnosis Using a Reformulation of theINTERNIST-1/QMR Knowledge Base. II. Evalua-tion of Diagnostic Performance. Methods ofInformation in Medicine 30(4): 256–267.

Miller, R.; Masarie, F.; and Myers, J. 1986.Quick Medical Reference (QMR) for Diag-nostic Assistance. MD Computing 3(5):34–48.

Miller, R. A.; Pople, H. E.; and Myers, J. D.1982. INTERNIST-1: An Experimental Com-puter-Based Diagnostic Consultant forGeneral Internal Medicine. New EnglandJournal of Medicine 307(8): 468–476.

Ohno-Machado, L.; Gennari, J. H.; Mur-phy, S. N.; Jain, N. L.; Tu, S. W.; Oliver, D.E.; Pattison-Gordon, E.; Greenes, R. A.;Shortliffe, E. H.; and Barnett, G. O. 1998.The Guideline Interchange Format: A Mod-el for Representing Guidelines. Journal ofthe American Medical Informatics Association5(4): 357–372.

Rost, B., and Sander, C. 1994. CombiningEvolutionary Information and Neural Net-works to Predict Protein Secondary Struc-ture. Proteins 19(1): 55–72.

Schulze-Kremer, S. 1998. Ontologies forMolecular Biology. Paper presented at thePacific Symposium on Biocomputing, 4–9January, Maui, Hawaii.

Selby, J. V. 1997. Linking Automated Data-bases for Research in Managed Care Set-tings. Annals of Internal Medicine 127(8 Pt2): 719–724.

Shahar, Y.; Miksch, S.; and Johnson, P.1997. A Task-Specific Ontology for theApplication and Critiquing of Time-Orient-ed Clinical Guidelines. In Proceedings of theSixth Conference on Artificial Intelligence inMedicine, ed. E. Keravnou, 51–61. New York:Springer-Verlag.

Shahar, Y.; Miksch, S.; and Johnson, P.1998. The ASGAARD Project: A Task-SpecificFramework for the Application and Cri-tiquing of Time-Oriented Clinical Guide-lines [in process citation]. Artificial Intelli-gence in Medicine 14(1–2): 29–51.

Shwe, M. A.; Middleton, B.; Heckerman, D.E.; Henrion, M.; Horvitz, E. J.; Lehmann, H.P.; and Cooper, G. F. 1991. ProbabilisticDiagnosis Using a Reformulation of theINTERNIST-1/QMR Knowledge Base. I. TheProbabilistic Model and Inference Algo-rithms. Methods of Information in Medicine30(4): 241–255.

Slee, V. N. 1978. The International Classifi-cation of Diseases: Ninth Revision (ICD-9)[Editorial]. Annals of Internal Medicine 88(3):424–426.

States, D. J.; Harris, N. L.; and Hunter, L.1993. Computationally Efficient ClusterRepresentation in Molecular SequenceMegaclassification. Intelligent Systems forMolecular Biology 1:387–94.

Szolovits, P., and Pauker, S. G. 1976.Research on a Medical Consultation Systemfor Taking the Present Illness. In Proceed-ings of the Third Illinois Conference on

Medical Information Systems, 299–320.Chicago: University of Illinois at Chicago.

Tufo, H. M.; Bouchard, R. E.; Rubin, A. S.;Twitchell, J. C.; VanBuren, H. C.; Weed, L.B.; and Rothwell, M. 1977. Problem-Orient-ed Approach to Practice. I. EconomicImpact. Journal of the American MedicalAssociation 238(5): 414–417.

Winters, M. A.; Coolley, K. L.; Girard, Y. A.;Levee, D. J.; Hamdan, H.; Shafer, R. W.;Katzenstein, D. A.; and Merigan, T. C. 1998.A 6-Basepair Insert in the Reverse Transcrip-tase Gene of Human ImmunodeficiencyVirus Type 1 Confers Resistance to MultipleNucleoside Inhibitors [in process citation].Journal of Clinical Investigation 102(10):1769–1775.

Wolfram, D. A. 1995. An Appraisal ofINTERNIST-1. Artificial Intelligence in Medicine7(2): 93–116.

Russ Altman is an assis-tant professor of medi-cine (medical informat-ics, and computerscience by courtesy) atStanford University. Heis interested in the appli-cation of advanced com-putational techniques to

problems in biomedicine and molecularbiology. He has an AB from Harvard Uni-versity in biochemistry and molecular biol-ogy, a Ph.D. from Stanford University inmedical information sciences, and an MDfrom Stanford. He is a member of the Asso-ciation of Computing Machinery, theAmerican College of Physicians, the Amer-ican Association for Artificial Intelligence,and the Institute for Electrical and Elec-tronics Engineers and is a founding mem-ber of the International Society for Compu-tational Biology. He is a fellow of theAmerican College of Medical Informatics.He is a past recipient of a U.S. PresidentialEarly Career Award for Scientists and Engi-neers and a National Science FoundationCAREER Award. His web page iswww.smi.stanford.edu/people/altman/. Hise-mail is [email protected].

Articles

FALL 1999 77