Interobserver Agreement for the PI-RADS v2 Validation

7
Original Article Rev. Argent. Radiol. 2019;83(2): 49-55 49 Abstract. Purpose: To evaluate the interobserver variability in the use of the PI-RADS 2.0 version (PI-RADS v2) in experienced and inexperienced readers. Materials and Methods: Retrospective study to assess inter-reader agreement. Between January 2015 and December 2016, 1656 subjects were studied by multiparametric magnetic resonance imaging (mp-MRI) of the prostate at our institution. The percent distribution of reports in PI-RADS v2 category was estimated and, based on this data, 150 cases were selected with a stratified randomization schedule based on the percent distribution of each category. The cases selected were anonymously presented to three readers with five, four and two years of experience in reading mp-RMI, and over one-year experience with PI-RADS v2, and were read individually. The data obtained were analyzed indepen- dently by a fourth investigator. Results: The weighted kappa value for the observers was 0.69 (95 CI: 0.64 to 0.75). The highest agreement cor- responded to the most experienced readers, reaching a value of 0.72 (95% CI: 0.69 to 0.76). Concordance between PI-RADS scores that determine follow-up or an intervention based on clinical factors (1–2–3) and active management (4–5) corresponded to 0.70 (95% CI: 0.59 to 0.78). Discussion: Substantial agreement between radiologists was demonstrated using the PI-RADS v2 for detection of sus- picious lesions on mp-MRI, with this agreement being higher between the two most experienced readers. However, the comparison between the least experienced reader and the most experienced readers also showed considerable agree- ment. Interobserver agreement values for PI-RADS 4 were similar to those reported in the literature. Conclusions: The PI-RADS v2 has demonstrated in our center, used by radiologists specialized in abdominal imaging and prostate studies, a high level of agreement in the interpretation of prostate mp-MRI, in line with literature reports. Keywords prostate, magnetic resonance imaging, interobserver variation, PI-RADS Introduction Multiparametric magnetic resonance imaging (mp-MRI) and targeted biopsy have improved the possibility of detecting clinically significant prostate cancer, thus reducing the diag- nosis of insignificant or low-risk cancer.1 However, to achieve a higher impact, mp-MRI requires a high level of experience in its interpretation, which has determined significant interob- server variability. This may be partly due to non-standardized criteria for the diagnosis of abnormal findings on mp-MRI. 2 In 2015, as a result of a joint effort of the American College of Radiology (ACR), the European Society of Uroradiology (ESUR), and the AdMetech Foundation, the second version of “Prostate Imaging Report and Data System” (PI-RADS v2) was published, with guidelines that are intended to improve standardization in the interpretation of prostate mp-RMI.3 PI-RADS version 2.0 (PI-RADS v2) provides information for the acquisition and interpretation of prostate mp-MRI and proposes a simplified five-point scale scoring system to in- dicate the probability of the presence of clinically significant cancer. This system was created to improve the accuracy of interobserver agreement in the PI-RADS v1, which had sig- nificant limitations due to variability in interpretations, with an unclear scale in which different scores were added (scale from 3 to 15), therefore providing poor clarity for identifying clinically significant cancer. 4 PI-RADS V2 includes many changes, which are described below: (a) Introduction of the concept of a dominant sequence ac- cording to the location of the lesion. In the peripheral zone, diffusion-weighted imaging (DWI) is used as the dominant sequence with its ADC map, while in the tran- sition zone, the dominant sequence is T2. (b) Dynamic contrast material-enhanced (DCE) imaging should be reported as positive when there is early focal Interobserver Agreement for the PI-RADS v2 Validation Andrés Labra W. 1 Claudio Silva Fuente-Alba 1 Giancarlo Schiappacasse F. 1 Daniela Barahona Z. 1 Velimir Skoknic B. 2 1 Radiologist at Facultad de Medicina Clínica Alemana de Santiago, Universidad del Desarrollo, Vitacura, Chile. 2 Radiology Resident at Facultad de Medicina Clínica Alemana de Santiago, Universidad del Desarrollo, Vitacura, Chile

Transcript of Interobserver Agreement for the PI-RADS v2 Validation

Page 1: Interobserver Agreement for the PI-RADS v2 Validation

Original Article

Rev. Argent. Radiol. 2019;83(2): 49-55 49

Abstract.Purpose: To evaluate the interobserver variability in the use of the PI-RADS 2.0 version (PI-RADS v2) in experienced and inexperienced readers.Materials and Methods: Retrospective study to assess inter-reader agreement. Between January 2015 and December 2016, 1656 subjects were studied by multiparametric magnetic resonance imaging (mp-MRI) of the prostate at our institution. The percent distribution of reports in PI-RADS v2 category was estimated and, based on this data, 150 cases were selected with a stratified randomization schedule based on the percent distribution of each category. The cases selected were anonymously presented to three readers with five, four and two years of experience in reading mp-RMI, and over one-year experience with PI-RADS v2, and were read individually. The data obtained were analyzed indepen-dently by a fourth investigator.Results: The weighted kappa value for the observers was 0.69 (95 CI: 0.64 to 0.75). The highest agreement cor-responded to the most experienced readers, reaching a value of 0.72 (95% CI: 0.69 to 0.76). Concordance between PI-RADS scores that determine follow-up or an intervention based on clinical factors (1–2–3) and active management (4–5) corresponded to 0.70 (95% CI: 0.59 to 0.78).Discussion: Substantial agreement between radiologists was demonstrated using the PI-RADS v2 for detection of sus-picious lesions on mp-MRI, with this agreement being higher between the two most experienced readers. However, the comparison between the least experienced reader and the most experienced readers also showed considerable agree-ment. Interobserver agreement values for PI-RADS ≥4 were similar to those reported in the literature.Conclusions: The PI-RADS v2 has demonstrated in our center, used by radiologists specialized in abdominal imaging and prostate studies, a high level of agreement in the interpretation of prostate mp-MRI, in line with literature reports.

Keywordsprostate, magnetic resonance imaging, interobserver variation, PI-RADS

Introduction

Multiparametric magnetic resonance imaging (mp-MRI) and targeted biopsy have improved the possibility of detecting clinically significant prostate cancer, thus reducing the diag-nosis of insignificant or low-risk cancer.1 However, to achieve a higher impact, mp-MRI requires a high level of experience in its interpretation, which has determined significant interob-server variability. This may be partly due to non-standardized criteria for the diagnosis of abnormal findings on mp-MRI.2

In 2015, as a result of a joint effort of the American College of Radiology (ACR), the European Society of Uroradiology (ESUR), and the AdMetech Foundation, the second version of “Prostate Imaging Report and Data System” (PI-RADS v2) was published, with guidelines that are intended to improve standardization in the interpretation of prostate mp-RMI.3 PI-RADS version 2.0 (PI-RADS v2) provides information for

the acquisition and interpretation of prostate mp-MRI and proposes a simplified five-point scale scoring system to in-dicate the probability of the presence of clinically significant cancer. This system was created to improve the accuracy of interobserver agreement in the PI-RADS v1, which had sig-nificant limitations due to variability in interpretations, with an unclear scale in which different scores were added (scale from 3 to 15), therefore providing poor clarity for identifying clinically significant cancer.4 PI-RADS V2 includes many changes, which are described below:(a) Introduction of the concept of a dominant sequence ac-

cording to the location of the lesion. In the peripheral zone, diffusion-weighted imaging (DWI) is used as the dominant sequence with its ADC map, while in the tran-sition zone, the dominant sequence is T2.

(b) Dynamic contrast material-enhanced (DCE) imaging should be reported as positive when there is early focal

Interobserver Agreement for the PI-RADS v2 ValidationAndrés Labra W.1 Claudio Silva Fuente-Alba1 Giancarlo Schiappacasse F.1 Daniela Barahona Z.1 Velimir Skoknic B.2

1 Radiologist at Facultad de Medicina Clínica Alemana de Santiago, Universidad del Desarrollo, Vitacura, Chile.2 Radiology Resident at Facultad de Medicina Clínica Alemana de Santiago, Universidad del Desarrollo, Vitacura, Chile

Page 2: Interobserver Agreement for the PI-RADS v2 Validation

Rev. Argent. Radiol. 2019;83(2): 49-55

Interobserver Agreement for the PI-RADS v2 Validation

50

enhancement and as negative when there is no early focal enhancement or the enhancement is heterogeneous or dif-fuse. The analysis of enhancement curve types as described in the original version of PI-RADS is no longer needed.

(c) A positive DCE may increase the overall score by one point, but only if it makes a clinically significant differ-ence, as when the PI-RADS score increases from 3 to 4.

(d) An overall score on a scale of 1 to 5 is assigned according to the revised rules in the PI-RADS v2.

The aim of this study is to assess interobserver variability in the use of PI-RADS version 2.0. An adequate implementation of this system may help clinicians to estimate the risk of clinically significant prostate cancer and, therefore, to adopt an appro-priate approach for performing biopsies, either systematic or targeted to a lesion reported as significant on mp-MRI.5

Materials and Methods

Retrospective study for the assessment of inter-reader agree-ment, approved by the Institutional Review Board, given the nature of the study, granted a waiver for informed consent.Between January 2015 and December 2016, 1656 subjects underwent a prostate mp-MRI at our institution. The scans were performed using a 3 Tesla MRI scanner (Magnetom Skyra, Siemens Healthcare, Erlangen, Germany), software version Numaris/4. A 30-channel phased-array surface pelvic coil was used, with IV administration of scopolamine 10 mg, immediately prior to the scan. The study protocol included multiplanar sagittal, coronal and axial turbo-spin-echo (TSE) T2-weighted sequences (TR/TE: 4780/90, FOV 18 cm., 3/0 mm, 320/272), axial TSE T1-weighted imaging, diffusion-weighted (DW) imaging (3/0 mm, b-values 0, 50, 500, 1000 and 1600), apparent diffusion coefficient (ADC) map and dy-namic contrast material-enhanced (DCE) imaging (temporal resolution 7 sec., 3 mm). The percent distribution of PI-RADS v2 categories reported was estimated in these subjects. Based on this information, 150 cases were selected with a stratified randomization schedule based on percent distribution of PI-RADs categories during 2015 and 2016. There were no exclusion criteria.The selected cases were anonymous and presented to three readers with five, four and two years’ experience in mp-MRI reading plus over one-year experience in the use of PI-RADS v2. All images acquired in each study were available to ev-ery reader. Cases were presented to readers in a random se-quence, so that the same cases were presented to each reader, but in a different sequential order. All sequences were used by each reader to assess each case and interpret PI-RADS.The cases were read individually in five independent sessions of 30 cases on Impax 6.5.2.657 (Agfa®) platform and the

data obtained were analyzed separately by a fourth indepen-dent investigator. At the time of the individual reading, nei-ther the prostate-specific antigen (PSA) values nor the data on the clinical indication for the mp-MRI were available.The inter-reader agreement was assessed using Cohen’s weighted kappa statistics, considering categories according to Landis and Koch recommendations (0.00 Poor; 0.01-0.20 Slight; 0.21-0.40 Fair; 0.41-0.60 Moderate; 0.61-0.80 Sub-stantial; 0.81-1.00 Almost Perfect). Qualitative characteristics are described by percent distribution.The statistical analysis was performed using Stata software (version 14.0, StataCorp, College Station, Texas). A statisti-cally significant difference was defined as p < 0.05 and 95% confidence intervals were estimated accordingly.

Results

The median age of subjects was 60 years old (interquartile range (IR): 55-67) and the distribution of all 150 cases se-lected included 79 PI-RADS 2, 28 PI-RADS 3, 24 PI-RADS 4 and 19 PI-RADS 5 (stratified from percent distribution of the total population).All examinations were considered diagnostic in nature by all three readers.The weighted kappa for the three readers was 0.69 (95 CI: 0.64 to 0.75). It should be highlighted that the greatest agreement corresponded to the two most experienced read-ers, achieving a value of 0.72 (95% CI: 0.69 to 0.76) (Figure 1). The agreement between the least experienced reader and

Agreement between experienced readersWeighted Kappa = 0.72

Rea

der

1

Reader 2

Fig.1 Chart showing distribution of results from interpreta-tions by the experienced readers (five and four years’ experi-ence, respectively), with a weighted kappa of 0.72 (substan-tial agreement).

Page 3: Interobserver Agreement for the PI-RADS v2 Validation

Rev. Argent. Radiol. 2019;83(2): 49-55

Andrés Labra W et al.

51

Fig. 2 Fifty-six year-old patient. PSA 5.7 ng/ml (a) T2-WI FSE, (b,c) DWI and ADC map, (d) DCE. Signs of hyperplasia in the transition zone with adenomatous nodules. PI-RADS 2, with agreement among all three readers.

a b

c d

Page 4: Interobserver Agreement for the PI-RADS v2 Validation

Rev. Argent. Radiol. 2019;83(2): 49-55

Interobserver Agreement for the PI-RADS v2 Validation

52

a b

c d

Fig. 3 Seventy-three year-old patient. PSA 7.8 ng/ml (a) FSE T2-weighted image. Hypointense lesion of 17 mm in the posteromedial zone of the right lobe of the prostate, with extension to the contralateral side (b). Diffusion restriction on DWI and low signal on ADC map (c), with significant enhancement on DCE imaging (d). MRI/US fusion-guided biopsy shows adenocarcinoma Gleason 4 + 5 in peripheral zone. In this case, there was agreement among all readers, who reported PI-RADS 5.

Page 5: Interobserver Agreement for the PI-RADS v2 Validation

Rev. Argent. Radiol. 2019;83(2): 49-55

Andrés Labra W et al.

53

each of the most experienced readers individually ranged from 0.680 with reader 1 (95% CI 0.61 to 0.71) to 0.675 (95% CI 0.60 to 0.70) with reader 2. All these values fall into the category of substantial agreement according to Landis and Koch classification. Figures 2 and 3 show representative cases of agreement among the three readers in cases of PI-RADS 2 and PI-RADS 5, respectively.Concordance for PI-RADS scores that are an indication for follow-up (1-2-3) and for active management (4-5) corre-sponded to 0.70 (95% CI: 0.59 to 0.78) (Figure 4). In this scenario, agreement between the most experienced readers was 0.71 (95% CI: 0.58 to 0.84) (Figure 4) and agreement between the least experienced reader and each of the most experienced readers individually ranged from 0.697 with reader 1 (95% CI 0.57 to 0.82) to 0.679 (95% CI 0.55 to 0.81) with reader 2, again falling into the category of sub-stantial agreement. Figure 5 shows a representative case of disagreement between a less experienced reader and the two most experienced readers.

DiscussionThis study showed substantial agreement among three radiol-ogist using PI-RADS v2 for the detection of suspicious lesions

on mp-MRI, with such agreement being higher between the two most experienced readers. However, the comparison be-tween the least experienced reader and the two most experi-enced readers also showed substantial agreement, both with either of the other two readers individually or with both as a whole. Such levels of agreement are similar to those reported in the literature. The fact that this study included a less expe-rienced reader is very important, as it provides further validity to the clinical application of PI-RADS. Inter-observer agreement values for PI-RADS ≥4 were similar to those reported in the literature, which further supports the reli-ability of this method for the detection of significant cancer.6-9

Results obtained from this study indicate that readers with different levels of experience may use PI-RADS v2 for detect-ing clinically significant suspicious lesions with high inter-observer agreement. This is highly useful, as clinicians have more information available to decide on an adequate and re-liable management from the findings reported on mp-MRIs, being able to define whether or not to indicate systematic transrectal biopsy targeted to suspicious lesions, either with cognitive targeting or with MRI-ultrasonography (US) (MRI/US) fusion guidance.3,5,8

Unlike the studies conducted by Rosenkrantz et al.6 and Gi-rometti et al.9, in which histopathological agreement was analyzed together with the inter-observer agreement, in this study such histopathological correlation was not performed and no long-term follow-up of the patients’ progress was performed either, which are clear limitations to our study; such assessments would provide valuable data about inter-observer agreement in the setting of diagnostic validity. An-other important aspect is that the study was conducted at a single center by using a single equipment and protocol; therefore greater inter-observer variability may not be ruled out if different institutions and protocols are compared. Fu-ture studies are needed to validate PI-RADS v2 worldwide, especially prospective studies with greater risk distribution.

Conclusion

In conclusion, PI-RADS version 2.0 has demonstrated in our center, used by radiologists specialized in abdominal imaging and prostate studies, a high level of agreement in the inter-pretation of prostate mp-MRI. PI-RADS v2 represents a further step in the standardization of the interpretation of prostate mp-MRI by readers with different levels of experience.

Frecuency of significant vs. non-significant results

Rea

der

1

Reader 2

Non-significant

Non-significant Significant

Significant

Fig. 4 Diagram showing the distribution and concordance of results from experienced reader’s interpretations in terms of whether results are non-significant (PI-RADS 1-2-3) or signifi-cant (PI-RADS 4-5).

Page 6: Interobserver Agreement for the PI-RADS v2 Validation

Rev. Argent. Radiol. 2019;83(2): 49-55

Interobserver Agreement for the PI-RADS v2 Validation

54

Ethical responsibilities Protection of human subjects and animals. The authors declare that no experiments were performed on humans or animals for this investigation.Confidentiality of data. The authors declare that they have followed the protocols of their work center on the publica-tion of patient data.

Right to privacy and informed consent. The authors de-clare that no patient data appear in this article.

Conflicts of interestThe authors declare no conflicts of interest.

a b

c d

Fig. 5 Sixty-year-old patient. PSA 5.1 ng/ml (a) FSE T2-weighted image. Small hypodense 5-mm lesions in the bilateral peripheral zone (b) Minimal representation on DWI sequence and on ADC map (c), with no focal enhancement on DCE imaging (d). Systematic biopsy shows adenocarcinoma Gleason 3 + 3 in the peripheral zone of both sextants. In this case, there was agreement between the most experienced readers, who reported PI-RADS 3, and disagreement with the least experienced reader, who reported PI-RADS 2.

Page 7: Interobserver Agreement for the PI-RADS v2 Validation

Rev. Argent. Radiol. 2019;83(2): 49-55

Andrés Labra W et al.

55

References

1. Park SY, Jung DC, Oh YT, Cho NH, Choi YD, Rha KH, et al. Prostate Can-cer: PI-RADS Version 2 Helps Preoperatively Predict Clinically Significant Cancers. Radiology 2016;280(1):108-116.

2. Dickinson L, Ahmed HU, Allen C, Barentsz JO, Carey B, Futterer JJ, et al. Scoring systems used for the interpretation and reporting of mulitpara-metric MRI for prostate cancer detection, localization and chanarcteriza-tion: could standardization lead to improved utilization of imaging within the diagnostic pathway? J Magn Reson Imaging 2013:37(1);48-58

3. Weinreb JC, Barentsz JO, Choyke PL, Cornud F, Haider MA, Macura KJ, et al. PI-RADS Prostate Imaging - Reporting and Data System: 2015, Version 2. Eur Urol 2016;69(1):16-40

4. Barentsz JO, Richenberg J, Clements R, et al. ESUR prostate MR guidelines 2012. Eur Radiol 2012;22:746-757

5. Zhao C, Gao G, Fang D, Li F, Yang X, Wang H, et al. The efficiency of mul-tiparametric magnetic resonance imaging (mpMRI) using PI-RADS Version 2 in the diagnosis of clinically significant prostate cancer. Clin Imaging 2016;40(5):885-888.

6. Rosenkrantz AB, Ginocchio LA, Cornfeld D, Froemming AT, Gupta RT, Turkbey B, et al. Interobserver Reproducibility of the PI-RADS Version 2 Lexicon: A Multicenter Study of Six Experienced Prostate Radiologists. Ra-diology. 2016;280(3):793-804.

7. Muller BG, Shih JH, Sankineni S, Marko J, Rais-Bahrami S, George AK, et al. Prostate Cancer: Interobserver Agreement and Accuracy with the Revised Prostate Imaging Reporting and Data System at Multiparametric MR Imaging. Radiology 2015;277(3):741-750

8. Chen F, Cen S, Palmer S. Application of Prostate Imaging Reporting and Data System Version 2 (PI-RADS v2): Interobserver Agreement and Positive Predictive Value for Localization of Intermediate- and High-Grade Prostate Cancers on Multiparametric Magnetic Resonance Imaging. Acad Radiol 2017;24(9):1101-1106.

9. Girometti R, Giannarini G, Greco F, et al. Interreader agreement of PI-RADS v. 2 in assessing prostate cancer with multiparametric MRI: A study using whole-mount histology as the standard of reference. J Magn Reson Imaging 2019;49(2):546-555