Characterizing Diagnostic Search Patterns in Digital ...shapiro/jdi3.pdfsearch patterns, we used the...

10
Characterizing Diagnostic Search Patterns in Digital Breast Pathology: Scanners and Drillers Ezgi Mercan 1 & Linda G. Shapiro 1 & Tad T. Brunyé 2 & Donald L. Weaver 3 & Joann G. Elmore 4 # Society for Imaging Informatics in Medicine 2017 Abstract Following a baseline demographic survey, 87 pa- thologists interpreted 240 digital whole slide images of breast biopsy specimens representing a range of diagnostic catego- ries from benign to atypia, ductal carcinoma in situ, and inva- sive cancer. A web-based viewer recorded pathologistsbe- haviors while interpreting a subset of 60 randomly selected and randomly ordered slides. To characterize diagnostic search patterns, we used the viewport location, time stamp, and zoom level data to calculate four variables: average zoom level, maximum zoom level, zoom level variance, and scan- ning percentage. Two distinct search strategies were con- firmed: scanning is characterized by panning at a constant zoom level, while drilling involves zooming in and out at various locations. Statistical analysis was applied to examine the associations of different visual interpretive strategies with pathologist characteristics, diagnostic accuracy, and efficien- cy. We found that females scanned more than males, and age was positively correlated with scanning percentage, while the facility size was negatively correlated. Throughout 60 cases, the scanning percentage and total interpretation time per slide decreased, and these two variables were positively correlated. The scanning percentage was not predictive of diagnostic ac- curacy. Increasing average zoom level, maximum zoom level, and zoom variance were correlated with over-interpretation. Keywords Digital pathology . Diagnostic decision-making . Breast cancer . Breast histopathology . Whole slide imaging . Diagnostic interpretation Background and Significance Digital imaging technologies have revolutionized clinical medicine, particularly within diagnostic radiology. In pathol- ogy, digital whole slide images (WSIs) are well-established; have proven efficient and reliable for research [1], education [25], and archiving [6]; and are now being utilized in patho- logic diagnosis [7, 8]. Although they are not approved by the US FDA for primary pathologic diagnosis, digital WSIs are increasingly used to obtain second opinions remotely. In ad- dition to their advantages in clinical settings, the use of com- puters to interpret digital WSI provides a unique opportunity to study pathologistsviewing behaviors and better under- stand how their interpretive strategies relate to diagnostic ac- curacy and efficiency. Pathologic diagnosis is a complex process characterized by visual search and interpretation strategies. Previous research concerning the visual search patterns of physicians has fo- cused on volumetric lung images [911], mammography [12], and breast pathology [13, 14]. The method of investiga- tion has usually included eye tracking or video recordings of physicians interpreting medical images in a setting controlled by the experimenter. Three outcomes from published research Electronic supplementary material The online version of this article (doi:10.1007/s10278-017-9990-5) contains supplementary material, which is available to authorized users. * Ezgi Mercan [email protected] 1 Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, USA 2 Department of Psychology, Tufts University, Medford, MA, USA 3 Department of Pathology and UVM Cancer Center, University of Vermont, Burlington, VT, USA 4 Department of Medicine, University of Washington School of Medicine, Seattle, WA, USA J Digit Imaging DOI 10.1007/s10278-017-9990-5

Transcript of Characterizing Diagnostic Search Patterns in Digital ...shapiro/jdi3.pdfsearch patterns, we used the...

Page 1: Characterizing Diagnostic Search Patterns in Digital ...shapiro/jdi3.pdfsearch patterns, we used the viewport location, time stamp, andzoomlevel datatocalculatefourvariables:averagezoom

Characterizing Diagnostic Search Patterns in Digital BreastPathology: Scanners and Drillers

Ezgi Mercan1& Linda G. Shapiro1 & Tad T. Brunyé2 & Donald L. Weaver3 &

Joann G. Elmore4

# Society for Imaging Informatics in Medicine 2017

Abstract Following a baseline demographic survey, 87 pa-thologists interpreted 240 digital whole slide images of breastbiopsy specimens representing a range of diagnostic catego-ries from benign to atypia, ductal carcinoma in situ, and inva-sive cancer. A web-based viewer recorded pathologists’ be-haviors while interpreting a subset of 60 randomly selectedand randomly ordered slides. To characterize diagnosticsearch patterns, we used the viewport location, time stamp,and zoom level data to calculate four variables: average zoomlevel, maximum zoom level, zoom level variance, and scan-ning percentage. Two distinct search strategies were con-firmed: scanning is characterized by panning at a constantzoom level, while drilling involves zooming in and out atvarious locations. Statistical analysis was applied to examinethe associations of different visual interpretive strategies withpathologist characteristics, diagnostic accuracy, and efficien-cy. We found that females scanned more than males, and agewas positively correlated with scanning percentage, while the

facility size was negatively correlated. Throughout 60 cases,the scanning percentage and total interpretation time per slidedecreased, and these two variables were positively correlated.The scanning percentage was not predictive of diagnostic ac-curacy. Increasing average zoom level, maximum zoom level,and zoom variance were correlated with over-interpretation.

Keywords Digital pathology . Diagnostic decision-making .

Breast cancer . Breast histopathology .Whole slide imaging .

Diagnostic interpretation

Background and Significance

Digital imaging technologies have revolutionized clinicalmedicine, particularly within diagnostic radiology. In pathol-ogy, digital whole slide images (WSIs) are well-established;have proven efficient and reliable for research [1], education[2–5], and archiving [6]; and are now being utilized in patho-logic diagnosis [7, 8]. Although they are not approved by theUS FDA for primary pathologic diagnosis, digital WSIs areincreasingly used to obtain second opinions remotely. In ad-dition to their advantages in clinical settings, the use of com-puters to interpret digital WSI provides a unique opportunityto study pathologists’ viewing behaviors and better under-stand how their interpretive strategies relate to diagnostic ac-curacy and efficiency.

Pathologic diagnosis is a complex process characterized byvisual search and interpretation strategies. Previous researchconcerning the visual search patterns of physicians has fo-cused on volumetric lung images [9–11], mammography[12], and breast pathology [13, 14]. The method of investiga-tion has usually included eye tracking or video recordings ofphysicians interpreting medical images in a setting controlledby the experimenter. Three outcomes from published research

Electronic supplementary material The online version of this article(doi:10.1007/s10278-017-9990-5) contains supplementary material,which is available to authorized users.

* Ezgi [email protected]

1 Paul G. Allen School of Computer Science and Engineering,University of Washington, Seattle, WA, USA

2 Department of Psychology, Tufts University, Medford, MA, USA3 Department of Pathology and UVM Cancer Center, University of

Vermont, Burlington, VT, USA4 Department of Medicine, University of Washington School of

Medicine, Seattle, WA, USA

J Digit ImagingDOI 10.1007/s10278-017-9990-5

Page 2: Characterizing Diagnostic Search Patterns in Digital ...shapiro/jdi3.pdfsearch patterns, we used the viewport location, time stamp, andzoomlevel datatocalculatefourvariables:averagezoom

are relevant to the present study. First, physicians reviewingmedical images tend to adopt one of two search strategies:drilling versus scanning. Drilling involves restricting a searchto a region of interest and zooming in to high magnificationlevels. Conversely, scanning involves maintaining a particularzoom level while searching relatively broad regions of interest[11]. Second, search strategies change as a function of ac-quired experience in an expert domain [9, 15] and prior expe-rience with novel review formats [16]. Third, certain visualsearch strategies have been associated with greater diagnosticaccuracy and efficiency. In radiology, physicians who use adrilling search pattern tend to show higher accuracy and effi-ciency when detecting lung nodules in volumetric images [11,17], though no research has explored drilling and scanningstrategies by pathologists reviewing non-volumetric images.

To address this knowledge gap, our study attempts to pro-vide an initial understanding of the interpretative strategiespathologists use when reviewing digital slides of breast biopsyspecimens. In this study, we investigated three aims. First, weconsidered how various pathologist characteristics are associ-ated with the two image review strategies (drilling and scan-ning) identified in the extant cognitive science literature [11].Second, we tracked how these image review strategies maychange as pathologists gain experience with the digital imag-ing format. Finally, we examined the extent to which eachinterpretive strategy is associated with diagnostic accuracyand efficiency.

While digital slides are becoming a powerful adjunct toolfor breast pathology, understanding the diagnostic processesused by pathologists as they interpret cases may provide in-sight to improve the education and training of pathologists andlead to the development of computational tools that can aid inthe diagnostic decision-making process.

Materials and Methods

Data were collected as part of the Breast Pathology (B-Path)and Digital Pathology (digiPATH) studies. The detailed expla-nation of methods used for test case development and recruit-ment of participant pathologists has been previously described[18, 19] and is briefly summarized below.

Case Selection

The 240 excisional (N = 102) and core (N = 138) breast biopsyspecimens were selected from pathology registries in Vermontand New Hampshire using a random sampling stratified bywoman’s age, breast density (N = 118 low density andN = 122high density), and initial diagnosis. New glass slides wereprepared from the selected tissue blocks.

The newly prepared glass slides were scanned at ×40 mag-nification (iScan Coreo, Ventana Medical Systems, Tuscon,

AZ, USA) to create digital WSIs, which were then reviewedby a research technician and a breast pathologist to ensureconsistency and quality. A web-based digital viewer, whichwas developed specifically for this study, allowed users topan the image and zoom in or out (up to ×40 actual and ×60digital magnification), providing an interface similar toindustry-sponsored WSI viewers but enhanced with study-specific data collection capabilities.

Expert Consensus Diagnosis

The digital WSIs were independently interpreted by three ex-perienced breast pathologists to determine independent diag-noses and representative regions of interest (ROIs); these pa-thologists then established a consensus diagnosis for each casefollowing a modified Delphi approach in subsequent webinarsand in-person meetings [19, 20]. Cases spanned a wide rangeof diagnostic categories: benign without atypia (N = 60),atypia (N = 80), ductal carcinoma in situ (DCIS) (N = 78),and invasive cancer (N = 22). See Supplementary Table 1 fordetails.

Participants

More than 200 pathologists from across the USA (Alaska,Maine, Minnesota, New Hampshire, New Mexico, Oregon,Vermont, and Washington), who regularly interpret breast bi-opsy specimens in their clinical practices, were invited to par-ticipate in the study. Each participant completed a baselinesurvey that included demographic data, experience with breastpathology, and perceptions about breast cancer interpretation.

Each participant was randomly assigned to interpret thecases in glass or digital format. A small portion of the partic-ipants did not complete the study. In this work, we are usingthe data collected from 87 pathologists who were assigned todigital format.

Data Collection on Interpretations

The 240 cases were arranged into four sets of 60 cases eachthat preserved the distribution of diagnostic categories andbreast densities of the overall case set. Participants were ran-domly assigned to interpret one of the four test sets. The orderof the 60 cases was randomized for each participant, and theyinterpreted each case independently, considering histopatho-logical features and accompanying information regarding pa-tient age and biopsy type. After viewing each case, partici-pants were instructed to select all applicable diagnoses on anelectronic histology form listing 14 possible diagnostic inter-pretations. The same categorical mapping scheme was usedfor participant diagnoses as was used for the expert consensusdiagnoses (see Supplementary Table 1).

J Digit Imaging

Page 3: Characterizing Diagnostic Search Patterns in Digital ...shapiro/jdi3.pdfsearch patterns, we used the viewport location, time stamp, andzoomlevel datatocalculatefourvariables:averagezoom

The study was conducted in two phases so that eachparticipant interpreted the same test set twice, either inglass slide format or in digital slide format or both. Thestudy is explained in detail in [21]. Participants werenot informed that they were seeing the same cases inphase II, and the cases were presented in a differentorder for each participant and also were randomlyreordered in phase II.

Detailed tracking data were automatically logged by theweb-based digital viewer. As pathologists navigated eachslide, the viewer software logged their coordinate positionsin the digital WSI, their magnification (zoom) levels, and timestamps.

Tracking data were collected only for those interpretingthe cases in digital format in phase II. Half of the partic-ipants in phase II were then asked to electronically anno-tate the digital WSI with an ROI supporting the highest-order (most severe) diagnosis while the other half werenot asked to mark an ROI on the digital image. This wasdone to control for any potential impact of the ROI place-ment task on the diagnostic decision-making process. Theparticipants randomized to mark the ROI used a tool inthe web-based viewer to draw a rectangular ROI follow-ing their diagnostic interpretation. The relationship be-tween ROI identification and diagnostic concordancewas explored in [20].

Tracking Data Analysis

A viewport scene is a rectangular part of the image thatis visible on the pathologist’s computer monitor at anytime during an interpretation. The time spent on eachviewport scene was calculated using logged timestamps.If an entry exceeded a total duration of 1 min, it wasexcluded under the assumption that the pathologist wasnot actively interpreting during that time. From thetracking logs, several variables were calculated to char-acterize the viewing behaviors of each participant, asdescribed in the succeeding sections.

Average Zoom Level, Maximum Zoom Level, and ZoomLevel Variance The web-based viewer allowed zoom levelsfrom ×1 to ×60. For each interpretation, viewport trackinglogs provided a variable number of zoom level values depend-ing on pathologists’ interpretive behavior; for this reason,summary statistics were used to describe zoom level behaviorduring each interpretation. Average and maximum zoomlevels, as well as zoom level variance, were calculated for eachinterpretation. For each interpretation, we calculated the aver-age zoom level by summing the zoom level values of allviewport scenes and dividing by the number of viewportscenes. Similarly, we calculated the maximum zoom level of

each interpretation and the standard deviation of the zoomlevel variable as the zoom level variance.

Scanning Percentage We quantified scanning behavior bycalculating the percentage of log entries associated with pan-ning behavior (i.e., changing viewport scene coordinates) ineach interpretation. Unlike average zoom level, maximumzoom level, and zoom level variance, scanning percentageconsiders the changes of zoom level in consecutive log en-tries, regardless of the zoom level itself. In other words, scan-ning percentage quantifies a behavior that can manifest atdifferent zoom levels. Scanning percentage approaches100% when the pathologist pans across different areas of thedigital image at a constant zoom, and it approaches 0% whenzooming in and out at different locations, with less panning orinfrequent but long distance pans at a low zoom magnifica-tion. For analysis, the scanning percentages were grouped intofive categorical variables (0–20, 20–40, 40–60, 60–80, and80–100%).

Analysis

To assess how pathologist demography influenced interpre-tive strategy, we modeled our data using repeated-measuresregressions, implementing the generalized estimating equa-tion (GEE) approach. The model included ten categorical pre-dictors (factors), as detailed in Table 1. The model used scan-ning percentage as a linear dependent variable (outcome).

To assess how case order within each set of 60 casesinfluenced viewing behaviors, we again modeled our datausing repeated-measures regressions, implementing theGEE approach. We implemented two models, both includ-ing interpretation order as the continuous predictor. Weused a linear dependent variable (outcome) for bothmodels: scanning percentage for the first model and totalinterpretation time per case for the second model.

To assess how interpretive strategy influenced diagnos-tic outcome, we conducted four separate repeated-measures analyses of variance (ANOVA) with four vari-ables that describe the interpretative behaviors. Each mod-el included one of four continuous variables (averagezoom, maximum zoom level, zoom level variance, orscanning percentage) and one of three categorical depen-dent variables for diagnostic outcome (over-interpretationcompared to the expert consensus diagnosis, concordancewith the expert consensus diagnosis, and under-interpretation compared to the expert consensus diagno-sis). To assess the effect of interpretative behaviors ondiagnostic efficiency, we used a repeated-measuresANOVA with a continuous dependent variable (time)and one of five independent categorical variables (scan-ning percentages 0–20, 20–40, 40–60, 60–80, or 80–100%).

J Digit Imaging

Page 4: Characterizing Diagnostic Search Patterns in Digital ...shapiro/jdi3.pdfsearch patterns, we used the viewport location, time stamp, andzoomlevel datatocalculatefourvariables:averagezoom

Results

Viewport tracking data from 87 pathologists, who completed60 cases in the digital format, were analyzed, producing a totalof 5220 interpretations and approximately 1.03 millionviewport log entries. Nine hundred seven entries were exclud-ed because they exceeded 1 min in total duration.

Tracking logs were visualized and analyzed to summa-rize the interpretive strategy of each pathologist. Figure 1contrasts visualizations representing two different pathol-ogists. The pathologist represented on the left, a scanner,chose a consistent zoom level and systematically pannedto investigate the whole image. The scanner pathologist

used the same zoom level on the majority of their cases.In contrast, the pathologist represented on the right, adriller, zoomed out periodically, selected a new area toview, then zoomed in again. The driller pathologistzoomed in and out on different regions throughout theirinterpretations. It could be argued that the driller scannedthe image with eye movements (rather than screen pans)at a lower resolution to determine areas for drilling. Someof the scanning versus drilling strategies may reflect thepathologist’s comfort level when scanning with eye move-ments at lower magnifications. The scanning percentagefor the visualization on the left is close to 100%, while itis closer to 0% for the visualization on the right.

Table 1 Characteristics and average scanning percentages of pathologists (N = 87)

Variable Number of pathologists Average scanning percentage p value Wald chi square

Age at survey (years)

30–39 10 (11%) 69 0.041 8.25140–49 25 (29%) 77

50–59 36 (41%) 75

60+ 16 (18%) 70

Gender

Male 57 (66%) 70 0.035 4.439Female 30 (34%) 82

Affiliation with academic medical center

Yes 19 (22%) 77 0.642 0.216No 68 (78%) 73

Facility size

<10 pathologists 55 (63%) 76 0.019 5.484≥10 pathologists 32 (37%) 69

Fellowship training in surgical or breast pathology

No 41 (47%) 75 0.076 3.141Yes 46 (53%) 73

Do your colleagues consider you an expert in breast pathology?

No 70 (80%) 73 0.103 2.666Yes 17 (20%) 79

Breast pathology experience (years)

<20 65 (75%) 76 0.073 3.210≥20 22 (25%) 68

Number of breast cases per week

<5 19 (22%) 73 0.490 1.4265–9 36 (41%) 75

≥10 32 (37%) 72

Marked an ROI

Yes 44 (51%) 71 0.565 0.330No 43 (49%) 77

How confident are you in your assessments of breast cases?

1 (very confident) 13 (15%) 67 0.100 7.7832 43 (49%) 75

3 21 (24%) 75

4 8 (9%) 77

5 (not confident at all) 2 (2%) 83

J Digit Imaging

Page 5: Characterizing Diagnostic Search Patterns in Digital ...shapiro/jdi3.pdfsearch patterns, we used the viewport location, time stamp, andzoomlevel datatocalculatefourvariables:averagezoom

Pathologist Demographics and Viewing Behaviors

Overall, pathologists tended to show scanning percentages ex-ceeding 50% (μ = 74%, σ = 16%), demonstrating a dispropor-tionate trend toward scanning rather than drilling. This patternwas confirmed with a one-sample t test comparing to 50%,t(86) = 13.53, p < 0.001. However, this pattern also varied sig-nificantly as a function of certain pathologist demographics.

The GEE model goodness of fit was 1140.98 (QIC), withthree significant main effects. First, age positively predicted in-creasing scanning percentage (χ2 = 8.25, p < 0.05), with higherage groups showing increasingly higher scanning percentages.Second, there were higher scanning percentages among femaleversus male pathologists (χ2 = 4.44, p < 0.05). Finally, facilitygroup size negatively predicted scanning percentage (χ2 = 5.48,p < 0.05), with pathologists working in larger facility groups

showing lower scanning percentages. No other patterns reachedtraditional (α = 0.05) significance levels.

Interpretation Order

The GEE model showed a significant negative relationship be-tween case position and scanning percentage (χ2 = 16.01,p < 0.001), with scanning percentage decreasing over the courseof the 60 cases (see Fig. 2). The total time spent on an interpre-tation of each case also decreased on average with interpretationorder. The participants interpreted later cases in less time com-pared to earlier cases (χ2 = 67.36, p < 0.001). In a previous study,we found that the diagnostic concordance with the expert paneldoes not change significantly over the 60 cases interpreted digi-tally [21].

Fig. 1 Visualization of viewport tracking logs, a scanner (left) and adriller (right), on the same image. Each participant starts at the center ofthe image with a zoom level of ×1. The rings indicate the center of each

viewport, the size of the rings indicate the zoom level (the larger the ring,the lower the zoom level), the thickness of the rings indicate the timespent at that viewport, and the lines connect consecutive viewports

Fig. 2 Average scanningpercentage of 87 pathologistsduring the interpretation of 60 testcases. The order of the 60 caseswas randomized for eachpathologist so that the nth caseincluded a random sampling ofcases from all diagnosticcategories

J Digit Imaging

Page 6: Characterizing Diagnostic Search Patterns in Digital ...shapiro/jdi3.pdfsearch patterns, we used the viewport location, time stamp, andzoomlevel datatocalculatefourvariables:averagezoom

Diagnostic Concordance with Expert ConsensusDiagnosis

The mean values of the average zoom level, maximum zoomlevel, zoom level variance, and scanning percentage variables forinterpretations are shown by expert consensus diagnosis andconcordance with expert consensus diagnosis in Table 2.Supplementary Table 2 provides detailed results of ANOVAtests.

Over-interpretationwas associatedwith increased drilling (av-erage zoom level, maximum zoom level, and zoom level vari-ance). Average zoom level, maximum zoom level, and zoomlevel variance were higher than the expert consensus diagnosisfor over-interpretations andwere lower than the expert consensusdiagnosis for under-interpretations. The trend was replicated inbenign, atypia, and invasive cases. For DCIS cases, both over-interpretation and under-interpretation were associated withhigher zoom values. All associations except those for invasivecases were statistically significant.

No association was noted between scanning percentage andaccuracy (Table 2). Supplementary Fig. 1 shows the average

over-interpretation and under-interpretation rates within differentscanning percentage groups.

Diagnostic Efficiency

Efficiency to arrive at an accurate diagnosis was negativelypredicted by the extent to which pathologists followed a scan-ning strategy; in other words, higher scanning percentage wasassociated with lower efficiency. A repeated-measuresANOVA revealed a main effect of scanning percentage cate-gory, F(4, 52) = 6.72, p < 0.001, demonstrating significantlyhigher case review times as a function of increased scanningpercentage. This pattern is depicted in Fig. 3. Follow-uppaired t tests demonstrated significant differences betweenall pairwise category comparisons, with the exception of thefirst (0–20%) versus second (20–40%) categories and fourth(60–80%) versus fifth (80–100%) categories. In contrast, ratesof diagnostic concordance with the expert consensus diagno-sis showed no significant difference across scanning percent-age groups.

Table 2 Zoom and scanning variables by concordance with expert consensus diagnosis

Consensusdiagnosis

Concordance withconsensus

Number ofinterpretations

Averagezoom level

pvalue

Maximumzoom level

pvalue

Zoom levelvariance

pvalue

Scanningpercentage

pvalue

All Under 760 7.89 ≤0.001 24.93 ≤0.001 6.22 ≤0.001 75% 0.574Agree 3672 8.86 27.29 6.98 74%

Over 788 9.94 31.87 8.10 73%

Benign Under – – ≤0.001 – ≤0.001 – ≤0.001 – 0.278Agree 933 6.64 22.10 5.28 75%

Over 348 9.71 31.49 7.98 72%

Atypia Under 492 7.39 ≤0.001 23.21 ≤0.001 5.78 ≤0.001 76% 0.276Agree 882 8.88 27.51 7.03 72%

Over 384 10.00 31.79 8.13 73%

DCIS Under 259 8.74 ≤0.001 28.14 0.001 7.03 0.003 73% 0.365Agree 1386 8.36 27.66 6.88 74%

Over 56 10.95 34.82 8.64 78%

Invasive Under 9 10.40 0.312 26.67 0.79 7.73 0.763 74% 0.652Agree 471 14.72 36.10 10.57 75%

Over – – – – –

Fig. 3 Average total time ofinterpretation in five categories ofscanning percentage

J Digit Imaging

Page 7: Characterizing Diagnostic Search Patterns in Digital ...shapiro/jdi3.pdfsearch patterns, we used the viewport location, time stamp, andzoomlevel datatocalculatefourvariables:averagezoom

Discussion

The field of pathology has begun adopting the digital WSIformat as it offers great potential for teaching [2–5] and re-search [1], as well as archival purposes [6] and gathering sec-ond opinions [7, 8]. To better understand the visual searchpatterns used in breast pathology, 87 pathologists across theUSA interpreted 60 digital WSIs of breast biopsiesrepresenting a range of diagnostic categories, amounting to5220 individual independent interpretations for analysis.

A web-based viewer tracked and recorded the interpretivebehaviors of pathologists as they viewed each digital WSI.The viewer provided pathologists with two possible actions:zooming and panning. Zooming in to an area allowed pathol-ogists to examine cytological, cellular, and nuclear structuraldetails, thereby revealing those that are not as visible to thehuman eye at lower magnification, but also limiting the por-tion of the whole slide image viewable on the screen. Thepanning action allowed pathologists to view neighboringareas of the whole slide image that were not viewable on thescreen at higher magnifications.

Combinations of both actions were used by all pathologiststo interpret the digital WSI, but interpretive patterns emergedwhen we analyzed the tracking logs. Specifically, we foundthat participants varied in their extent of panning and zoomingbehaviors over time and across cases. Drilling behaviorshowed a relative tendency to zoom in on a particular region,use panning actions sparingly to examine that region, and thenzoom out to a lower magnification. In contrast, scanning be-havior showed a relative tendency to use panning actions tosystematically explore the complete image at a constant, andrelatively low, magnification. We conceptualize drilling andscanning behavior as two complementary strategies falling atthe ends of a bipolar continuum. To quantify image reviewbehavior along this continuum, we calculated the proportionof case review behavior indicative of scanning (i.e., scanningpercentage). We wanted to explore potential explanations forthe interpretative strategies through their correlation with di-agnostic accuracy and efficiency, as well as determining ifthese patterns change over time.

A number of pathologist demographic characteristics wereassociated with changes in scanning percentage, includingage, gender, and facility size. Higher age was positively cor-related with increased scanning percentage, females scannedmore than males, and pathologists from smaller facility sizeshad higher scanning percentages. It may have been the casethat younger participants had relatively more prior experiencewith similar computer interfaces or image manipulation tools(e.g., mapping software, digital slide viewers, image editingsoftware), thereby making themmore comfortable with imagedrilling behaviors [22]. Although not a statistically significanttrend, pathologists with higher scanning percentages also re-ported lower baseline confidence in their breast pathology

skills. This finding suggests that increased scanning may berelated to personality-level (e.g., neuroticism [23]) and/orsituation-level (e.g., anxiety [24]) factors. The scanning per-centage and total time per slide decreased as pathologistsgained experience throughout the set of 60 cases. This sug-gests a learning curve where participants who started with ascanning-based strategy adopted a more hybrid approach ofscanning and drilling as they interpreted through the digitalimages. This learning curve may be due to prior inexperiencewith digital slides and computer-based viewing systems thatpathologists began to overcome through their experience inthis study. Previous research shows a learning curve forinterpreting mammograms before and after residency, sug-gesting a correlation between interpretive behavior and expe-rience [25]. The participants who marked an ROI at the end oftheir interpretations had lower scanning percentages thanthose who did not mark an ROI, but the difference was notstatistically significant. In other words, we could not find alink between the additional task of looking for a region ofinterest and the visual search strategy of the participants.

We also noted a pattern of over-interpretation at higherzoom levels. For all diagnostic categories except invasive can-cer, the cases that were over-interpreted based on the expertconsensus diagnosis had higher values of average zoom level,maximum zoom level, and zoom level variance. This relation-ship aligns with some research in the cognitive science andvisual search literature. Specifically, when observers repeated-ly examine a visual scene in detail, the probability of makingan erroneous “guess” increases [26]. These inaccurate inter-pretations likely result from a failed match between perceivedimage features and stored histopathological features in theirmemory.

In order to analyze the association of scanning with accu-racy and total interpretation time, we divided image interpre-tations into five categories based on scanning percentage. Asscanning percentage increased, so did interpretation time,though rates of over- and under-interpretations were not af-fected. Scanning was found to be a less efficient strategy fordiagnostic interpretation, and the results with the learningcurve indicated that pathologists adopted a more balancedand efficient strategy as they progressed through the set of60 cases.

There are a few reasons why scanning may prove a lessefficient, and sometimes less effective [11], method forsearching visual images. Scanning at a moderate magnifica-tion level involves constantly monitoring and updating pastand current positions relative to the entire image space, whenonly small portions of the overall image can be seen at a time.As demonstrated in prior literature, this type of constant mon-itoring can be very intensive for workingmemory, particularlywhen it is done simultaneously to a more important (primary)task (i.e., identifyingmalignancy) [27, 28]. In contrast, drillingenables a pathologist to focus attention on a single well-

J Digit Imaging

Page 8: Characterizing Diagnostic Search Patterns in Digital ...shapiro/jdi3.pdfsearch patterns, we used the viewport location, time stamp, andzoomlevel datatocalculatefourvariables:averagezoom

defined region at a time: examining a single region of interestin great depth and detail, and then iteratively returning to lowmagnification and examining the next region. In this manner,the searcher need only remember which salient region(s) theyhave or have not already “drilled into,” which involves mon-itoring and updating only a representation of salient regions inthe low-magnification space. The present results speak to therelative efficiency of drilling, suggesting support for this pos-sibility; however, no research has specifically examined therelative memory cost of employing drilling versus scanningsearch strategies.

Recent research on volumetric lung images revealed thatradiologists adopt distinct visual search strategies during in-terpretation [11]. Though this earlier research used eye track-ing to monitor and interpret visual search patterns, our find-ings suggest that similar distinctions can be ascertained byrecording zooming and panning behaviors.We expect that thisis specifically the case with 2D digital pathology images.Indeed, these images require pathologists to zoom in and outdramatically in order to magnify breast tissue and reveal spe-cific structural and cellular features. This process results inhigh-density zooming and panning data, which is likely un-characteristic of viewing behavior with narrow slices of volu-metric images. The unique characteristics of these breast bi-opsy digital WSIs may explain why our data did not suggestany influence of visual search strategy (i.e., scanning percent-age) on diagnostic accuracy, unlike earlier research with vol-umetric lung radiographs [11, 17]. Of course, when attemptingto identify specific structural or cellular features that wereviewed or neglected during the interpretive process, eye track-ing is an invaluable technique.

Several notable works in pathology studied the diagnosticsearch patterns on digital slides [12, 13, 15, 29–31]. The workof Krupinski et al. on breast pathology suggested a link be-tween expertise and search patterns [12, 13, 15, 31]. In ourstudy, the participants with more than 20 years of experiencehad lower scanning percentages yet the correlation was notsignificant (p = 0.1). However, our participant cohort includedonly practicing pathologists in comparison to the studies byKrupinski et al. which recruited trainees and experts to exam-ine the changes in the search patterns. The work of Treanoret al. compared the localization errors with interpretation er-rors in esophageal biopsies and found a trend in which lowerzoom levels are correlated with inaccurate diagnosis [29]. Ourfindings suggest that an opposite trend exists in breast pathol-ogy, where higher zoom levels are correlated with over-diagnoses of the pre-invasive lesions, and there seems to bea “happy medium” of magnification for an accurate diagnosis.The existence of a link between magnification and diagnosticaccuracy is an important insight, but the nature of the relation-ship depends on the biopsy type and the visual characteristicsof the tissue. Finally, Mello-Thoms et al. described a “focusedand efficient” strategy that correlated with the correct outcome

in dermatopathology [30]. In our study, we found that drillingis the more efficient strategy in terms of interpretation time butwe did not find any links to the diagnostic outcome. Puttingthe differences in breast and skin biopsies aside, the selectionof the diagnostic categories, the difficulty of the cases, and thedemographics of the pathologists are all important factorswhen comparing two studies. To the best of our knowledge,our work is the first to use an objective quantification of theviewing behavior in a large study.

Limitations and Strengths

This study was limited to one slide per case, which does notreflect actual clinical practice—a factor that may influencediagnostic accuracy but does not preclude evaluation of inter-pretive strategies. However, the one-slide-per-case study de-sign reduced the workload of participants and allowed them tointerpret more images representing a variety of tissue charac-teristics. This study also diverged from clinical practice in thedistribution of diagnostic categories among the cases partici-pants interpreted. Atypia and DCIS cases were over-sampledcompared to actual clinical prevalence, with the purpose ofbetter understanding the interpretation of these diagnosticallydifficult non-invasive cases. Previous research shows thatatypia and DCIS cases are more likely to be over-interpretedor misinterpreted, so it is crucial to understand interpretivebehaviors on these diagnoses [18]. A possible limitation wasthe participants’ prior inexperience with the digital format ordigital viewers. Although the field of pathology has begun toincorporate digital WSI, most US pathologists are still inex-perienced with software for digitalWSI interpretation, makingit difficult to dissociate the relative contributions of experiencewith the digital format versus expertise in breast pathology ondrilling versus scanning. Similarly, some variation betweenpathologists may be attributable to participants using theirown computer monitors; it is therefore possible that identicalmonitors may standardize the pathologists’ experience inviewing digital WSI. However, identical monitors do not re-flect actual clinical practice, where monitors vary at the levelof the practice and, often, between pathologists at the samefacility.

Limitations aside, this study is a timely and unique inves-tigation of pathologists’ interpretive strategies with digital me-dia. Strengths of this study include the large sample size ofbreast biopsy cases (N = 240) representing a full spectrum ofdiagnostic categories from benign and atypia to DCIS andinvasive cancer, interpreted and diagnosed by three expertpathologists to define diagnostic accuracy. Another strengthis the large number of practicing pathologists (N = 87) fromacross the USAwhile the previous studies in the literature hadrecruited medical students and residents in small numbers, i.e.,4 to 11 pathologists. The use of a web-based viewer allowedparticipants to use their own computers in their own time,

J Digit Imaging

Page 9: Characterizing Diagnostic Search Patterns in Digital ...shapiro/jdi3.pdfsearch patterns, we used the viewport location, time stamp, andzoomlevel datatocalculatefourvariables:averagezoom

which is as close to the real-world practice of digital pathologyas possible. Furthermore, the order of the 60 cases was ran-dom for each participant, which allowed us to see a learningcurve for the digital slide viewer without case biases attribut-able to interpretive difficulty or severity of diagnosis.

Conclusions

We identified two distinct interpretive strategies as patholo-gists viewed digital whole slides of breast biopsy specimens:scanning, where the pathologist pans at a constant zoom level,and drilling, where the pathologist zooms in and out repeat-edly. Our analysis of pathologist characteristics indicated thatscanning was more common among women and older pathol-ogists. The facility size, defined as the number of pathologistswho worked in the same facility as the participant, was also asignificant predictor of the scanning percentage with the par-ticipants from smaller facilities scanning more. One possibleexplanation for this correlation is that the participants fromlarger facilities could share cases with their colleagues, obtainsecond opinions, and learn from each other; they would havemore experience interpreting breast biopsies. Those who re-ported less confidence in their interpretation of breast tissuetended to spend more time scanning, but the correlation wasnot statistically significant.

Regarding accuracy and efficiency, we found that scanningis associated with longer interpretation time on average, yetscanners and drillers had similar levels of accuracy comparedto the consensus reference diagnoses. Through our uniquestudy design that randomized the order of cases, we also ob-served that scanning may be more common at the beginningof a pathologist’s experience in interpreting cases in the digitalformat, while a more balanced strategy of both scanning anddrilling is adopted by the end of the 60 cases. We found thataverage zoom level, maximum zoom level, and zoom levelvariance for an interpretation increased from under-interpretation to concordance and from concordance to over-interpretation. In other words, when participants under-interpreted a case, they used lower magnifications andchanged the zoom level less, as compared to concordant in-terpretations. Similarly, when they over-interpreted a case,they used higher magnifications and changed the zoom levelmore, as compared to concordant interpretations. This trendwas preserved for all diagnostic categories of breast tissue.

In conclusion, this study demonstrates that two differentsearch strategies are employed by pathologists and these strat-egies can be explained by a pathologist’s demographics, breastpathology perceptions, and prior experience in viewing thedigital format. The interpretive strategy can affect the diagnos-tic outcome and the efficiency of the diagnostic process. Thesefindings motivate further research in medical decision-making

and computerized decision support systems as digital pathol-ogy is adopted more widely.

Acknowledgements Research reported in this publicationwas support-ed by the National Cancer Institute awards R01 CA172343, R01CA140560, and KO5 CA104699. The content is solely the responsibilityof the authors and does not necessarily represent the views of the NationalCancer Institute or the National Institutes of Health. We thank VentanaMedical Systems, Inc. (Tucson, AZ, USA), a member of the RocheGroup, for the use of iScan Coreo Au™ whole slide imaging system,and HD View SL for the source code used to build our digital viewer.For a full description of HD View SL, please see http://hdviewsl.codeplex.com/.

Compliance with Ethical Standards

Conflict of Interest The authors declare that they have no conflict ofinterest.

References

1. Irshad H, Veillard A, Roux L et al.: Methods for nuclei detection,segmentation, and classification in digital histopathology: a review-current status and future potential. IEEE Rev Biomed Eng 7:97–114, 2014. doi: 10.1109/RBME.2013.2295804

2. Yin F, Han G, Bui MM et al.: Educational value of digital wholeslides accompanying published online pathology journal articles: amulti-institutional study. Arch Pathol Lab Med 140(7):694–697,2016. doi: 10.5858/arpa.2015-0366-OA

3. Saco A, Bombi JA, Garcia A et al.: Current status of whole-slideimaging in education. Pathobiology 83(2-3):79–88, 2016. doi: 10.1159/000442391

4. Kumar RK, Freeman B, Velan GM et al.: Integrating histology andhistopathology teaching in practical classes using virtual slides.Anat Rec - Part B New Anat 289(4):128–133, 2006. doi: 10.1002/ar.b.20105

5. Bruch LA, De Young BR, Kreiter CD et al.: Competency assess-ment of residents in surgical pathology using virtual microscopy.Hum Pathol 40(8):1122–1128, 2009. doi: 10.1016/j.humpath.2009.04.009

6. Gutman D, Cobb J, Somanna D: Cancer Digital Slide Archive: aninformatics resource to support integrated in silico analysis ofTCGA pathology data. Med Informatics 20(6):1091–1098, 2013.doi: 10.1136/amiajnl-2012-001469

7. Al-Janabi S, Huisman A, Van Diest PJ: Digital pathology: currentstatus and future perspectives. Histopathology. 61(1):1–9, 2012.doi: 10.1111/j.1365-2559.2011.03814.x

8. Pantanowitz L, Valenstein PN, Evans AJ et al.: Review of the cur-rent state of whole slide imaging in pathology. J Pathol Inform 2:36,2011. doi: 10.4103/2153-3539.83746

9. Brunyé TT, Carney PA, Allison KH et al.: Eye movements as anindex of pathologist visual expertise: a pilot study. PLoS One 9(8):e103447, 2014. doi: 10.1371/journal.pone.0103447

10. Bahlmann C, Patel A, Johnson J et al.: Automated detection ofdiagnostically relevant regions in H&E stained digital pathologyslides. Proc. SPIE, Med. Imaging 8315:831504, 2012. doi: 10.1117/12.912484

11. Drew T, Vo ML, Olwal A et al.: Scanners and drillers: characteriz-ing expert visual search through volumetric images. J Vis 13(10).pii: 3., 2013. doi: 10.1167/13.10.3

12. Tourassi G, Voisin S, Paquit Vet al.: Investigating the link betweenradiologists’ gaze, diagnostic decision, and image content. J Am

J Digit Imaging

Page 10: Characterizing Diagnostic Search Patterns in Digital ...shapiro/jdi3.pdfsearch patterns, we used the viewport location, time stamp, andzoomlevel datatocalculatefourvariables:averagezoom

Med Inform Assoc 20(6):1067–1075, 2013. doi: 10.1136/amiajnl-2012-001503

13. Krupinski EA, Graham AR, Weinstein RS: Characterizing the de-velopment of visual search expertise in pathology residents viewingwhole slide images. Hum Pathol 44(3):357–64, 2013. doi: 10.1016/j.humpath.2012.05.024

14. Crowley RS, Naus GJ, Stewart J et al.: Development of visualdiagnostic expertise in pathology: an information-processing study.J Am Med Informatics Assoc 10(1): 39–51, 2003. doi: 10.1197/jamia.M1123

15. Krupinski EA, Weinstein RS: Changes in visual search patterns ofpathology residents as they gain experience. In: Proceedings ofSPIE:79660P, 2011. doi: 10.1117/12.877735

16. Velez N, Jukic D, Ho J: Evaluation of 2 whole-slide imaging appli-cations in dermatopathology. Hum Pathol 39 (9), 1341–1349, 2008.doi: 10.1016/j.humpath.2008.01.006

17. Wen G, Drew T, Wolfe JM et al.: Computational assessment ofvisual search strategies in volumetric medical images strategies involumetric medical images. JMed Imaging 3(1):015501, 2016. doi:10.1117/1.JMI.3.1.015501

18. Elmore JG, LongtonGM,Carney PA et al.: Diagnostic concordanceamong pathologists interpreting breast biopsy specimens. JAMA313(11):1122–1132., 2015. doi: 10.1001/jama.2015.1405

19. Oster NV, Carney PA, Allison KH et al.: Development of a diag-nostic test set to assess agreement in breast pathology: practicalapplication of the Guidelines for Reporting Reliability andAgreement Studies (GRRAS). BMC Womens Health 13:3, 2013.doi: 10.1186/1472-6874-13-3

20. Nagarkar DB, Mercan E, Weaver DL et al.: Region of interestidentification and diagnostic agreement in breast pathology. Mod.Pathol. 29(9):1004–1011, 2016. doi: 10.1038/modpathol.2016.85

21. Elmore J, Longton G, Pepe M et al.: A randomized study compar-ing digital imaging to traditional glass slide microscopy for breastbiopsy and cancer diagnosis. J Pathol Inform 8:12, 2017. doi: 10.4103/2153-3539.201920

22. Elias SM, Smith WL, Barney CE: Age as a moderator of attitudetowards technology in the workplace: work motivation and overalljob satisfaction. Behav Inf Technol 31(5):453–467, 2012. doi: 10.1080/0144929X.2010.513419

23. Newton T, Slade P, Butler NM et al.: Personality and performanceon a simple visual search task. Pers Individ Dif 13(3):381–382,1992. doi: 10.1016/0191-8869(92)90119-A

24. Wu S, Zhong S, Liu Y. Deep residual learning for imagesteganalysis. Multimed Tools Appl, Published Online First: 15February 2017. doi:10.1007/s11042-017-4440-4

25. Miglioretti DL, Gard CC, Carney PA et al.: When radiologists per-form best: the learning curve in screening mammogram interpreta-tion. Radiology 253(3):632–640, 2009. doi: 10.1148/radiol.2533090070

26. Chun MM, Wolfe JM: Just say no: how are visual searches termi-nated when there is no target present? Cogn Psychol 30(1):39–78,1996. doi: 10.1006/cogp.1996.0002

27. Miyake A, Friedman NP, Emerson MJ et al.: The unity and diver-sity of executive functions and their contributions to complex ‘fron-tal lobe’ tasks: a latent variable analysis. Cogn Psychol 41(1):49–100, 2000. doi: 10.1006/cogp.1999.0734

28. Turner ML, Engle RW: Is working memory capacity task depen-dent? J Mem Lang 28(2):127–154, 1989. doi: 10.1016/0749-596X(89)90040-5

29. Treanor D, Lim CH, Magee D et al.: Tracking with virtual slides: atool to study diagnostic error in histopathology. Histopathology55(1):37–45, 2009. doi: 10.1111/j.1365-2559.2009.03325.x

30. Mello-thoms C, Mello CAB, Medvedeva O et al.: Perceptual anal-ysis of the reading of dermatopathology virtual slides by pathologyresidents. Arch Pathol Lab Med. 136(5):551–562, 2012. doi: 10.5858/arpa.2010-0697-OA

31. Krupinski EA, Tillack AA, Richter L et al.: Eye-movement studyand human performance using telepathology virtual slides.Implications for medical education and differences with experience.Hum Pathol 37(12):1543–1556, 2006. doi: 10.1016/j.humpath.2006.08.024

J Digit Imaging