Context: Objective: Method: Results: arXiv:1812.08815v3 ...

78
1812.08815 *correspondence: [email protected] Formal Methods in Dependable Systems Engineering:A Survey of Professionals from Europe and North America * Preprint , compiled September 23, 2020 Mario Gleirscher 1 and Diego Marmsoler 2 1 Department of Computer Science, University of York,, Deramore Lane, Heslington, York YO10 5GH, United Kingdom 2 Institut f¨ ur Informatik, Technical University of Munich,, Boltzmannstraße 3, 85748 Garching, Germany Abstract Context: Formal methods (FMs) have been around for a while, still being unclear how to leverage their benefits, overcome their challenges, and set new directions for their improvement towards a more successful transfer into practice. Objective: We study the use of formal methods in mission-critical software domains, examining industrial and academic views. Method: We perform a cross-sectional on-line survey. Results: Our results indicate an increased intent to apply FMs in industry, suggesting a positively perceived usefulness. But the results also indicate a negatively perceived ease of use. Scalability, skills, and education seem to be among the key challenges to support this intent. Conclusions: We present the largest study of this kind so far (N = 216), and our observations provide valuable insights, highlighting directions for future theoretical and empirical research of formal methods. Our findings are strongly coherent with earlier observations by Austin and Parkin (1993). Keywords formal methods · empirical research · on-line survey · usage · usefulness · practical challenges · research transfer · software engineering education & training Acronyms CMMI Capability Maturity Model Integration DI respondents with decreased us- age intent EOU ease of use FM formal method GQM goal-question-metric HQ head quarter ICT information and communica- tion technology II respondents with increased usage intent IS information system LE less experienced respondents M respondents with some motiva- tions to use FMs MbE model-based engineering ME more experienced respondents NP non-practitioners P practitioners PEOU perceived ease of use PU perceived usefulness RQ research question SE software engineering SMT satisfiability modulo theory TAM technology acceptance model TLD top-level domain NM respondents without any moti- vations to use FMs UFM Use of FMs in mission- critical SE U usefulness 1 * Funding: Partly funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under the Grant no. 381212925. Conflict of Interest: The authors declare that they have no conflict of interest. This is a post- peer-review, pre-copyedit version of an article published in Empirical Software Engineering. The final authenticated version is available online at: doi:10.1007/s10664-020-09836-5 arXiv:1812.08815v3 [cs.SE] 22 Sep 2020

Transcript of Context: Objective: Method: Results: arXiv:1812.08815v3 ...

Page 1: Context: Objective: Method: Results: arXiv:1812.08815v3 ...

1812

.088

15

*correspondence: [email protected]

FormalMethods in Dependable Systems Engineering: ASurvey of Professionals from Europe and North America∗

Preprint, compiled September 23, 2020

Mario Gleirscher1 and Diego Marmsoler2

1Department of Computer Science, University of York,, Deramore Lane, Heslington, York YO10 5GH,United Kingdom

2Institut fur Informatik, Technical University of Munich,, Boltzmannstraße 3, 85748 Garching, Germany

Abstract

Context: Formal methods (FMs) have been around for a while, still being unclear howto leverage their benefits, overcome their challenges, and set new directions for theirimprovement towards a more successful transfer into practice. Objective: We study theuse of formal methods in mission-critical software domains, examining industrial andacademic views. Method: We perform a cross-sectional on-line survey. Results: Ourresults indicate an increased intent to apply FMs in industry, suggesting a positivelyperceived usefulness. But the results also indicate a negatively perceived ease of use.Scalability, skills, and education seem to be among the key challenges to support thisintent. Conclusions: We present the largest study of this kind so far (N = 216), and ourobservations provide valuable insights, highlighting directions for future theoretical andempirical research of formal methods. Our findings are strongly coherent with earlierobservations by Austin and Parkin (1993).

Keywords formal methods · empirical research · on-line survey · usage · usefulness ·practical challenges · research transfer · software engineering education & training

AcronymsCMMI Capability Maturity Model

Integration

DI respondents with decreased us-age intent

EOU ease of use

FM formal method

GQM goal-question-metric

HQ head quarter

ICT information and communica-tion technology

II respondents with increased usageintent

IS information system

LE less experienced respondents

M respondents with some motiva-tions to use FMs

MbE model-based engineering

ME more experienced respondents

NP non-practitioners

P practitioners

PEOU perceived ease of use

PU perceived usefulness

RQ research question

SE software engineering

SMT satisfiability modulo theory

TAM technology acceptance model

TLD top-level domain

NM respondents without any moti-vations to use FMs

UFM Use of FMs in mission-critical SE

U usefulness

1* Funding: Partly funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) underthe Grant no. 381212925. Conflict of Interest: The authors declare that they have no conflict of interest. This is a post-peer-review, pre-copyedit version of an article published in Empirical Software Engineering. The final authenticatedversion is available online at: doi:10.1007/s10664-020-09836-5

arX

iv:1

812.

0881

5v3

[cs

.SE

] 2

2 Se

p 20

20

Page 2: Context: Objective: Method: Results: arXiv:1812.08815v3 ...

Preprint – FormalMethods Use 2

1 Motivation and Challenges

Over the past decades, many software errors have been deployed in the field and some of these errors hada clearly intolerable impact.2 Cost savings from reducing such impact have been the motivation of formalmethods (FMs) as a first-class approach to error prevention, detection, and removal (Holloway, 1997).

In university courses on software engineering, we learned that FMs are among the best we have to designand assure correct systems. The question “Why are FMs not used more widely?” (Knight et al., 1997)is hence more than justified. With a Twitter poll,3 which emerged from our coffee spot discussions, wesolicited opinions on a timely paraphrase of a statement argued by Holloway (1997): “FMs should be acornerstone of dependability and security of highly distributed and adaptive automation.” What can a tinyopportunity sample of 22 respondents from our social network tell? Not much, well, (i) 55% agrees, i.e.,seem to attribute importance to this role of FMs, (ii) 14% disagrees, i.e., oppose that view, (iii) 32% justdon’t know. Why should and how could FMs be a cornerstone?

Since the beginning of software engineering (SE) there has been a debate on the usefulness of FMs toimprove SE. In the 1970s and 1980s, several SE and FM researchers had started to examine this usefulnessand to identify error possibilities despite the rigour in FMs (Gerhart and Yelowitz, 1976), with the aim ofresponding to critical observations of practitioners (Jackson, 1987).

Hall (1990) and Bowen and Hinchey (1995a) illuminate 14 myths (e.g. “formal methods are unneces-sary”), providing their insights on when FMs are best used and highlighting that FMs can be overkill insome cases but are highly recommended in others. The transfer of FMs into SE practice is by far notstraightforward. Knight et al. (1997) examine reasons for the low adoption of FMs in practice. Barrocaand McDermid (1992) ask: “To what extent should FMs form part of the [safety-critical SE] method?”

Glass (2002, pp. 148–149, 165–166) and Parnas (2010) observe that “many [SE] researchers advocaterather than investigate” by assuming the need for more methodologies. Glass summarises that FMs weresupposed to help represent firm requirements concisely and support rigorous inspections4 and testing.He observes that changing requirements has become an established practice even in critical domains,and inspections, even if based on FMs, are insufficient for complete error removal. In line with Barrocaand McDermid (1992, p. 591), he notes that FMs have occasionally been sold as to make error removalcomplete, but there is no silver bullet (Glass, 2002, pp. 108–109). Bad communication between theoristsand practitioners sustains the issue that FMs are taught but rarely applied (Glass, 2002; Holloway andButler, 1996, pp. 68–70). Parnas (2010) compares alternative paradigms in FM research (e.g. axiomaticvs. relational calculi) and points to challenges of FM adoption (e.g. valid simple abstractions).

In contrast, Miller et al. (2010) draw positive conclusions from recent applications of model checking andhighlight lessons learned. In his keynote, O’Hearn (2018) conveys positive experiences in scaling FMsthrough adequate tool support for continuous reasoning in agile projects (see, e.g. Chudnov et al., 2018).Many researchers (see, e.g. Aichernig and Maibaum, 2003) have been working on the improvement ofFMs towards their successful transfer. Boulanger (2012) and Gnesi and Margaria (2013) summarisepromising industry-ready FMs and present larger case studies.

Have software errors been overlooked because of hidden inconsistencies that can be detected when prop-erly formalised? Are such errors compelling arguments for the wider use of FMs? Strong evidence forthe ease of use of FMs and their efficacy and usefulness is scarce and largely anecdotal, rarely drawnfrom comparative studies (e.g. Pfleeger and Hatton, 1997; Sobel and Clarkson, 2002), often primarilyconducted in research labs (e.g. Chudnov et al., 2018; Galloway et al., 1998 and many others). In late re-sponse to Holloway and Butler’s request for empirical data (Holloway and Butler, 1996), Graydon (2015)

2See anecdotal evidence (grey literature, press articles) on software-related incidents, for example, by Kaner andPels (1998, 2018), Charette (2018) and Neumann (2018).

3See https://twitter.com/MarioGleirscher/status/889737625178976256.4For example, walking through development artefacts in a structured and moderated discussion group and with

bug pattern checklists (Fagan, 1976).

Page 3: Context: Objective: Method: Results: arXiv:1812.08815v3 ...

Preprint – FormalMethods Use 3

still observes a lack of evidence for the effectiveness of FMs in assurance argumentation for safety-criticalsystems, suggesting empirical studies to examine hypotheses and collect evidence.

FMs have many potentials but SE research has reached a stage of maturity where strong empirical evi-dence is crucial for research progress and transfer. Jeffery et al. (2015) identify questions and metrics forFM productivity assessment, supporting FM research transfer.

Contributions. We contribute to SE and FM research (1) by presenting results of the largest cross-sectional survey of FM use among SE researchers and practitioners to this date, (2) by answering researchquestions about the past and intended use of FMs and the perception of systematically mapped FM chal-lenges, (3) by relating our findings to the perceived ease of use and usefulness of FMs using a simplifiedvariant of the technology acceptance model for evaluating engineering methods and techniques, and (4) byproviding a research design for repetitive (e.g. longitudinal) FM studies.

Overview. The next section introduces important terms. Section 3 relates our work to existing research.In Section 4, we explain our research design. We describe our data and answer our research questionsin Section 5. In Section 6, we summarise and interpret our findings in the light of existing evidence andwith respect to threats to validity. Section 7 highlights our conclusions and potential follow-up work.

2 Background and Terminology

By formal methods, we refer to explicit mathematical models and sound logical reasoning about crit-ical properties (Rushby, 1994)—such as reliability, safety, security, more generally, dependability andperformance—of electrical, electronic, and programmable electronic or software systems in mission- orproperty-critical application domains. Model checking, theorem proving, abstract interpretation, asser-tion checking, and formal contracts are examples of FMs. By use of FMs, we refer to their application inthe development and analysis of critical systems and to substantially integrating FMs with the used pro-gramming methodologies (e.g. structured development, model-based engineering (MbE), assertion-basedprogramming, test-driven development), notations (e.g. UML, SysML), and tools.

Tool and Method Evaluation. In the following, we give an overview of several evaluation approachesand explain in Section 4.2 which approach we take.

The widely used technology acceptance model (TAM; Davis, 1989) is a psychological test that allows theassessment of end-user IT based on the two constructs perceived ease of use (PEOU, i.e., positive andnegative experiences while using an IT system) and perceived usefulness (PU, i.e., positive experiences ofaccomplishing a task using an IT system compared to not using this system for accomplishing the sametask).

Complementary to TAM, Basili (1985) proposes the goal-question-metric (GQM) approach to methodand tool evaluation. While GQM serves as a good basis for quantitative follow-up studies, we followthe user-focused TAM. Maturity models according to the Capability Maturity Model Integration (SEI,2010) do not fit our purposes because they focus on engineering process improvement beyond particulardevelopment techniques. Poston and Sexton (1992) present tool survey guidelines based on technology-focused classification and selection criteria with a very limited view on tool usefulness and usability.Miyoshi and Azuma (1993) evaluate ease of use of development environments (i.e., specification andmodelling tools) using metrics from the ISO/IEC 9126 quality model.

From comparing two models of predicting an individual’s intention to use a tool, Mathieson (1991) sup-ports TAM’s validity and convenience but indicates its limits in providing enough information on users’opinions. For software methods and programming techniques, Murphy et al. (1999) show how surveys,case studies, and experiments can be used to compensate for this lack of information about usefulnessand usability.

Page 4: Context: Objective: Method: Results: arXiv:1812.08815v3 ...

Preprint – FormalMethods Use 4

Because FMs are by definition based on a formal language and usually supported by tools, it is reasonableto adopt the TAM for the assessment of FMs. Unfortunately, the body of literature on the evaluation ofFMs in TAM style is very small. However, Riemenschneider et al. (2002) apply TAM to methods (e.g.UML-based architecture design), concluding that “if a methodology is not regarded as useful by develop-ers, its prospects for successful deployment may be severely undermined.” According to their approach,FM usage intentions would be driven by (1) an organisational mandate to use FMs, (2) the compatibilityof FMs with how engineers perform their work, and (3) the opinions of developers’ coworkers and super-visors toward using FMs. Overall, the application of TAM to FMs allows causal reasoning from FM useracceptance towards intention of FM use.

Specialising the approach in Riemenschneider et al. (2002), ease of use (EOU) of a FM characterises thetype and amount of effort a user is likely to spend to learn, adopt, and apply this FM. Usefulness (U)determines how fit a FM is for its purpose, that is, how well it supports the engineer to accomplish anappropriate task. If EOU and U are measured by a survey whose data points are user perceptions then wetalk of perceived ease of use (PEOU) and perceived usefulness (PU). Together, PEOU and PU form theuser acceptance of a FM and, by support of Mathieson (1991) and Riemenschneider et al. (2002), canpredict the intention to use this FM.

Whereas TAM is a model based on the two user-focused constructs PEOU and PU, Kitchenham et al.(1997) propose a meta-evaluation approach called DESMET for tools and methods based on multipleperformance indicators (e.g. with TAM as one of the indicators).

3 RelatedWork

Table 1 shows a systematic map (Petersen et al., 2008) of 36 studies on FM research evaluation andtransfer. For each study, we estimate5 the authors’ attitude against or in favour of FMs, the motivation ofthe study, the approach followed, and the type of result obtained. Most of these works present personalexperiences, opinions, case studies, or literature summaries. In contrast, the work presented in this paperfocuses on the analysis of experience from a wide range of practitioners and experts. However, we foundfour similar studies.

Austin and Parkin (1993) sought to explain the low acceptance of FMs in industry around 1992. Usinga questionnaire similar to ours with only open questions, they evaluated 111 responses from a sample ofsize 444, using a sampling method similar to ours (then using different channels). Responses from FMusers are distinguished from general responses. Their questions examine benefits, limitations, barriers,suggestions to overcome those barriers, personal reasons for or against the use of FMs, and ways ofassessing FMs.

In a second study in 2001, Snook and Harrison (2001) conduct interviews with representatives of fivecompanies to discover the main issues involved in FM use, in particular, issues of understandability andthe difficulty of creating and utilising formal specifications.

A similar, though more comprehensive interview study was performed by Woodcock et al. (2009) in2009. They assess the state of the art of the application of FMs, using questionnaires to collect data on 62industrial projects.

Liebel et al. (2016, pp. 102–103) assess effects on and shortcomings of the adoption of MbE in embeddedSE including a discussion of FM adoption. The authors observe a lack of tool support, bad reputation, andrigid development processes as obstacles to FM adoption. Their data suggests a need of FM adoption.30% of the responses from industry declare the need for FMs as a reason to adopt MbE. Moreover,responses indicate that MbE adoption has a positive effect on FM adoption. One limitation of their studyis the small number of responses from FM users.

5This estimate is based on opinions and attitudes expressed by the original authors and, where unavailable, on ourown interpretation when reading the studies.

Page 5: Context: Objective: Method: Results: arXiv:1812.08815v3 ...

Preprint – FormalMethods Use 5

Table 1: Overview of related work on FM use and adoption, grouped by primary focus and motivation

Study A Motivation Support E C R

SurveysAustin and Parkin (1993) = LoEv Interviews • •

Snook and Harrison (2001) = LoEv Interviews •

Oliveira (2004) = Edu./Train. Course websites •

Woodcock et al. (2009)a = LoEv Interviews •

Davis et al. (2013) + TechTx Interviews • •

Liebel et al. (2016) + LoEv Online questionnaire •

Ferrari et al. (2019) + TechTx Literature study •

Literature Studies and SummariesWing (1990) + SotA O/E • •

Bloomfield et al. (1991) = SotA •

Fraser et al. (1994) = TechTx •

Heitmeyer (1998) = TechTx •

Gleirscher et al. (2019) + TechTx SWOT analysis • •

Expert Opinions and Experience ReportsJackson (1987) = TechTx •

Bjorner (1987) = TechTx •

Barroca and McDermid (1992) = SotA Multiple cases •

Bowen and Hinchey (1995a) + Hyp. Testing •

Bowen and Hinchey (1995b) + TechTx •

Hinchey and Bowen (1996) – TechTx •

Heisel (1996) + TechTx •

Holloway and Butler (1996) + LoEv •

Lai (1996) + TechTx •

Bowen and Hinchey (2005) + Hyp. Testing Literature study •

Parnas (2010) = TechTx • •

Case Studies and ExperimentsGerhart and Yelowitz (1976) = LoEv Multiple cases, O/E • • •

Hall (1990) + Hyp. Testing O/E •

Craigen et al. (1995)b + SotA Multiple cases, O/E •

Knight et al. (1997) = TechTx Field experiment •

Pfleeger and Hatton (1997) = Hyp. Testing Effect analysis •

Galloway et al. (1998) + TechTx Single case in lab • •

Sobel and Clarkson (2002) = Hyp. Testing Lab experiment •

Miller et al. (2010) = TechTx Multiple cases, O/E •

Klein et al. (2018) + TechTx •

Chudnov et al. (2018) = TechTx •

aSee also Bicarregui et al. (2009), bsee also Craigen (1995) and Craigen et al. (1993); (A)ttitude,(E)valuation/analysis, (C)hallenges, (R)ecommendations, +/=/– . . . positive/neutral/negative, LoEv . . . lack of em-pirical evidence, Hyp. Testing . . . hypotheses testing, Edu./Train. . . . education/training, O/E . . . opinion/experiencereport, SotA . . . state of the art, SWOT . . . strengths, weaknesses, opportunities, and threats, TechTx . . . technologytransfer

While these studies focus on the elicitation of the state of the art and the state of practice, the main focusof our study is to compare the current FM adoption or use with the intention to adopt and use FMs inthe future. To the best of our knowledge, our study offers the largest set of data points investigating theuse of FMs in SE, so far. In Section 6.3, we provide a further discussion of how our findings relate tothe findings of these studies, particularly to the works of Austin and Parkin (1993) and Woodcock et al.(2009).

Page 6: Context: Objective: Method: Results: arXiv:1812.08815v3 ...

Preprint – FormalMethods Use 6

Table 2: Concepts and scales for the construct “Use of FMs in mission-critical SE” (UFM)

Concept Id. Description [Scale] Point [Question]

Measured twice . . .Domain C1 Application domains of FMs [MC among domains] Past [Q1], Intent [Q8]Role C2 Role in using FMs [MC among roles] Past [Q4], Intent [Q9]Use C3 Use of FMs [experience level/relative frequency per FM class] Past [Q5/Q6], Intent

[Q10/Q11]Purpose C4 Purpose of using FMs [absolute/relative frequency per pur-

pose]Past [Q7], Intent [Q12]

Measured once . . .Experience C5 Level of FM experience [duration ranges in years] Single [Q2]Motivation C6 Motivation to use FMs [degree per motivational factor] Single [Q3]Obstacles C7 Difficulty of obstacles to using FMs [degree per challenge] Single [Q13]

MC. . . multiple-choice

4 ResearchMethod

In this section, we describe our research design, our survey instrument, and our procedure for data col-lection and analysis. For this research, we follow the guidelines of Kitchenham and Pfleeger (2008) forself-administered surveys and use our experience from a previous more general survey (Gleirscher andNyokabi, 2018).

4.1 Research Goal and Questions

The questions in Section 1 have led to this survey on the use, usage intent, and challenges of FMs. Ourinterest is devoted to the following research questions (RQs):

RQ1 In which typical domains, for which purposes, in which roles, and to what extent have FMs beenused?

RQ2 Which relationships can we observe between past experience in using FMs and the intent to useFMs?

RQ3 How difficult do study participants perceive widely known FM challenges to be?

RQ4 What can we say about the perceived ease of use and the perceived usefulness of FMs?

4.2 Construct and Link to Research Questions

Table 2 lists the (C)oncepts that constitute the construct Use of FMs in mission-critical SE (UFM), thecorresponding scales, the points of measurement, and references to (Q)uestions from the questionnaire.

Measuring Past and Intended Use. For RQ1 (UFM), we examine potential application domains forFMs (C1), roles when using FMs (C2), motivations and purposes of using FMs (C6, C4), and the extentof UFM at the general (C5) and specific (C3) experience level of our study participants when using FMs.

For RQ2, we compare the past (UFMp) and intended use (UFMi) of FMs regarding the domain (C1),role (C2), FM class (C3), and purpose (C4). We measure UFMi by relative frequency (Table 4) withrespect to a participants’ current situation, FM class, and purpose of use. Using a relative instead ofan absolute frequency scale slightly reduces the burden on respondents to make detailed and, hence,uncertain predictions of UFMi.

For RQ3, we measure the perception of difficulty of several obstacles (C7) known from the literature andfrom our experience.

Page 7: Context: Objective: Method: Results: arXiv:1812.08815v3 ...

Preprint – FormalMethods Use 7

Method Evaluation and TAM-style Interpretation. We follow DESMET (Kitchenham et al., 1997)and Murphy et al. (1999) insofar as we combine a qualitative survey (i.e., FM evaluation by SE practi-tioners and researchers) and a qualitative effects analysis based on the past and intent measurements forC4 (i.e., subjective assessment of effects of FMs by asking SE practitioners and researchers).

We assume UFM is, nowadays, to a large extent implying the use of the tools automating the corre-sponding FMs. This assumption is justified inasmuch as for all FMs referred to in this survey, tools areavailable. In fact, in the past two decades (the period most survey respondents could have possibly usedFMs), the development of a FM has mostly gone hand in hand with the development of its supportingtools.

For RQ4, we associate our findings from RQ2 and RQ3 with PEOU and PU. Whereas TAM predictsUFMi of a specific tool by measuring PEOU and PU, we directly interrogate past (like in Mohagheghiet al., 2012, Figure 2) and intended use of classes of FMs. We measure UFMi (C1, C2, C3, C4) in moredetail than TAM. Our approach relates to TAM for methods (Riemenschneider et al., 2002, Table 2) inas-much as we collect data for PEOU through asking about potential obstacles to the further use of FMs (C7)based on experience with past FM use (UFMp). For this, respondents are asked to rate the difficulty ofseveral known challenges to be tackled in typical FM applications. Furthermore, UFMi is known to becorrelated with PU. We then interpret the answers to RQ3 to examine the PEOU and, furthermore, inter-pret the answers to RQ2 to reason about PU. In Section 4.4, we discuss our questionnaire including thequestions for measuring the sub-constructs.

4.3 Study Participants and Population

Our target group for this survey includes persons with (1) an educational background in engineering andthe sciences related to critical computer-based or software-intensive systems, preferably having gainedtheir first degree, or (2) a practical engineering background in a reasonably critical system or productdomain involving software practice. We use (study or survey) participant and respondent as synonyms.We talk of FM users to refer to the part of the population that has already used FMs in one or anotherway. See Appendix A.1 and Table 8 for a more fine-grained analysis of the population.

4.4 Survey Instrument: On-line Questionnaire

Table 3 summarises the questionnaire we use to measure UFM (Table 2). The scales used for encodingthe answers are described in Table 4.

Although we do not collect personal data, respondents could leave us their email address if they want toreceive our results. We expect participants to spend about 8 to 15 minutes to complete the questionnaire.However, we thought it to be unnecessary in our case to instrument the questionnaire or our tooling toallow us to determine the time spent for submitting complete data points.

Face and Content Validity. We derived answer options from the literature, our own experience withFMs, SE research training, discussions with other SE researchers and colleagues from industry, pilotresponses, and coding of open answers. Particularly, the classification of FM methods (C3; Q5, Q6, Q10,Q11) and the list of obstacles or challenges (C7; Q13) were derived from our own training, literatureknowledge prior to this study, and experience as well as from occasional personal discussions with SEexperts from academia and industry. Most questions are half-open, allowing respondents to go beyondgiven answer options. We treat degree and relative frequency as 3-level Likert-type scales.

For each question, we provide “do not know” (dnk)-options to include participants without previousknowledge of FMs in any academic or practical context. If participants are not able to provide an answerthey can choose, e.g. “do not know”, “not yet used”, “no experience”, or “not at all”, and proceed. Thisway, we reduce bias by forced response. We indicate dnk-answers whenever we exclude them. Ourquestionnaire tool (Section 4.6) supports us with getting complete data points, reducing the effort to dealwith missing answers.

Page 8: Context: Objective: Method: Results: arXiv:1812.08815v3 ...

Preprint – FormalMethods Use 8

Table 3: Summary of questions from the questionnaireId. Question or Question Template Scale (see Table 4) Sec. Fig.

Q1 In which application domains (C1) in industry or academiahave you mainly used FMs?

MC among domains 5.2 2

Q2 How many years of FM experience (including the study ofFMs, C5) have you gained?

Duration range in years 5.2 3

Q3 Which have been your motivations (C6) to use FMs? Degree per motivationalfactor

5.2 4

Q4 In which roles (C2) have you used FMs? MC among roles 5.3 5Q5 Describe your level of experience (C3) for 〈class of formal

description techniques〉.Level of experience perclass

5.3 6

Q6 Describe your level of experience (C3) for 〈class of formalreasoning techniques〉.

Level of experience perclass

5.3 7

Q7 I have mainly used FMs for (C4) ... Absolute frequency perpurpose

5.3 8

Q8 In which domains (C1) in industry or academia do you intendto use FMs?

MC among domains 5.4 9

Q9 In which roles (C2) would (or do) you intend to use FMs? MC among roles 5.4 10Q10 I (would) intend to use (C3) 〈class of formal description

techniques〉 〈this〉 often.Relative frequency perclass

5.4 11

Q11 I (would) intend to use (C3) 〈class of formal reasoningtechniques〉 〈this〉 often.

Relative frequency perclass

5.4 12

Q12 I (would) intend to use FMs for (C4) 〈purpose〉. Relative frequency perpurpose

5.4 13

Q13 For any use of FMs in my future activities, I consider〈obstacle〉 (C7) as 〈that〉 difficult.

Degree of difficulty perobstacle

5.5 16

MC. . . multiple-choice

Table 4: Scales used in the questionnaire

Name Values Type

degree ofmotivation

“no motivation”, “moderate motivation”, “strong motivation (or requirement)” L3

degree ofdifficulty

“not as an issue.”, “as a moderate challenge.”, “as a tough challenge.”, “I don’tknow.”

L3

experience level(duration-based)

“I do not have any knowledge of or experience in FMs.”, “less than 3 years”,“3 to 7 years”, “8 to 15 years”, “16 to 25 years”, “more than 25 years”

O

experience level(task-based)

“no experience or no knowledge”, “studied in (university) course”, “applied inlab, experiments, case studies”, “applied once in engineering practice”, “appliedseveral times in engineering practice”

O

frequency(absolute)

“not at all.”, “once.”, “in 2 to 5 separate tasks.”, “in more than 5 separate tasks.” O

frequency(relative)

“no more or not at all.”, “less often than in the past.”, “as often as in the past.”,“more often than in the past.”, “I don’t know.”

L3

choice single/multiple: (ch)ecked, (un)checked Nbold. . . express lack of knowledge or indecision; (N)ominal, (O)rdinal, Ln . . . Likert-type scale with n values

Page 9: Context: Objective: Method: Results: arXiv:1812.08815v3 ...

Preprint – FormalMethods Use 9

Channel Type Examples & References

General panels SurveyCircle, www.surveycircle.comLinkedIn groups E.g. on ARP 4754, DO-178, FME, ISO 26262Mailing lists E.g. system safety (U Bielefeld, formerly U York)Newsletters BCS FACS; GI RE, SWT, TAVPersonal pages E.g. Facebook, Twitter, LinkedIn, XingResearchGate Q&A forums on www.researchgate.netXing groups E.g. Safety Engineering, RE

Table 5: Channels used for sampling

4.5 Data Collection: Sampling Procedure

We could not find an open, non-commercial panel of engineers. Large-scale panel services are eithercommercial (e.g. Decision Analyst, 2018) or they do not allow the sampling of software engineers (e.g.Leiner, 2014). Hence, we opt for a mixture of opportunity, volunteer, and cluster-based sampling. Todraw a diverse sample of potential FM users, we

1. advertise our survey on various on-line discussion channels,

2. invite software practitioners and researchers from our social networks, and

3. ask these people to disseminate our survey.

We examine C5, C1, C2, and C3 from Table 2 to check how well our sample covers the given conceptcategories. The better the coverage of these categories the wider is the range of analyses possible fromour data set. Less covered categories might indicate inappropriate concepts as well as the case that oursample just does not touch this fraction of the target population. Under the assumption that the sample isdrawn from the target population in a uniformly random fashion, we would be able to draw conclusionsabout the constitution of the target population. However, as noted, this assumption was in our case notcontrollable.

4.6 Data Evaluation and Analysis

For RQ1, we summarise the data and apply descriptive statistics for categorical and ordinal variablesin Section 5.3. We answer RQ2 by comparison of the data for the past and future views regarding thedomain (C1), role (C2), FM class (C3), and purpose (C4) in Section 5.4. Then, in Section 5.5, we answerRQ3 by

• describing the challenge difficulty ratings after associating one of (1) domain, (2) motivationalfactor, (3) role, (4) purpose, and (5) FM class with challenge (C7) and

• distinguishing (1) more experienced (ME, > 3 years) from less experienced respondents (LE,≤ 3 years), (2) practitioners (P, practised at least once) from non-practitioners (NP, not usedor only in course or lab), (3) motivated (M, moderately or strongly motivated by at least onespecified factor) from unmotivated respondents (NM, no motivating factor specified), (4) re-spondents’ past and future views, and (5) respondents with increased usage intent (II) from oneswith decreased usage intent (DI).

We apply association analysis between these categorical and ordinal variables, using pairs of matri-ces (e.g. Figure 17). We answer RQ4 by arguing from results for RQ1, 2, and 3.

Half-Open and Open Questions. We code open answers in additional text fields as follows: If we cansubsume an open answer into one of the given options, we add a corresponding rating (if necessary). Ifwe cannot do this then we introduce a new category “Other” and estimate the rating. Finally, we cluster

Page 10: Context: Objective: Method: Results: arXiv:1812.08815v3 ...

Preprint – FormalMethods Use 10

0

5

10

15

W30/0

7/1

7

W34/0

8/1

7

W38/0

9/1

7

W42/1

0/1

7

W45/1

1/1

7

W49/1

2/1

7

W01/0

1/1

8

W05/0

2/1

8

W09/0

3/1

8

W14/0

4/1

8

W18/0

4/1

8

W22/0

5/1

8

W26/0

6/1

8

W30/0

7/1

8

W34/0

8/1

8

W38/0

9/1

8

W42/1

0/1

8

W45/1

1/1

8

W49/1

2/1

8

W00/0

1/1

9

W04/0

2/1

9

W08/0

3/1

9

W13/0

4/1

9

W17/0

4/1

9

W21/0

5/1

9

Responses

Figure 1: Distribution of responses over time

the added answers and split the “Other” category (if necessary). For Q13, we performed the latter stepcombined with independent coding (Neuendorf, 2016) to confirm that the understanding of the challengecategories is consistent among the authors of the present study. For MC questions, we eliminate thechoice of “I do/have not. . . ” options from the data if ordinary answer options where also checked.

Tooling. We use Google Forms (Google, 2018) for implementing our questionnaire (Appendix A.12)and for data collection (Section 4.5) and storage. For statistical analysis and data visualisation (Sec-tion 4.6), we use GNU R (The R Project, 2018) (with the packages likert, gplots, and ggplot2 andsome helpers from the “Cookbook for R” and the “Stack Exchange Stats” community6). Content anal-ysis and coding takes place in a spreadsheet application. A draft of Appendix A has been archived inGleirscher and Marmsoler (2018).

5 Execution, Results, and Analysis

In this section, we summarise the responses to the questions in Table 3 and answer the RQs 1, 2, and 3 asexplained in Section 4.1. To answer RQ1, we describe the sample in Section 5.2 and discuss some facetsof FM use in Section 5.3. For RQ2, we summarise data about past use and usage intent in Section 5.4.For RQ3, we analyse further data in Section 5.5.

5.1 Survey Execution

For data collection, we (1) advertised our survey on the channels in Table 5 and (2) personally invited >30 persons. The sampling period lasted from August 2017 til March 2019. In this period, we repeatedstep 1 up to three times to increase the number of participants. Figure 1 summarises the distribution ofresponses. The channels in Table 5 particularly cover the European and North American areas.

5.2 Description of the Sample (answering RQ 1)

A size estimation of the channels in Table 5 yields around 65K channel memberships (for some channelswe make a best guess but, e.g. for LinkedIn the counts are given). Assuming participants are, on average,member of at least three of the channels, we could have reached up to 20K real persons. Given a recentestimate of worldwide 23 million SE practitioners (Evans Data, 2018) and assuming that at least 1% aremission-critical SE practitioners, our population might comprise at least 230K persons, possibly around38K in the US and 61K in Europe.7 We received N = 216 responses resulting in an estimated response

6See http://www.cookbook-r.com and https://stats.stackexchange.com.7An estimation in Gleirscher et al. (2019) suggests that about 5% of the overall ICT/IS developer population are

embedded systems practitioners in critical and non-critical domains. Moreover, Evans Data (2018) and Wikipediacontributors (2018) describe data from 2016 and 2017, suggesting that 3.87 million (19%) SE practitioners live inthe US and about 13.3 million (39%) in Europe, the Middle East, and Africa. According to an analysis of data fromStack Overflow by ATOMICO (2019), there is a “software engineering talent pool” of about 6.1 million in Europe.

Page 11: Context: Objective: Method: Results: arXiv:1812.08815v3 ...

Preprint – FormalMethods Use 11

otherMilitary systems not in the above domains

Process automationIndustrial machinery

I have not used FMs in any academic or industrial domain.Device industry

PlatformsBusiness information

Critical infrastructuresSupportive

Transportation

20 40 60 80

N = 216

Figure 2: (Q1) In which application domains in industry or academia have you mainly used FMs? (MC)

rate between 1 and 2% and a population coverage of at most 0.1% globally and 0.2% in the US and inEurope. About 40% of our respondents provided their email addresses, the majority from the US, UK,Germany, France, and a sixth from other EU and non-EU countries.

In the following, we summarise the responses to the questions about the application domain (Q1), thelevel of experience (Q2), and the motivations (Q3) of a FM user.

Guide to the Figures. For Likert-type ordered scales, we use centred diverging stacked bar charts (see,e.g. Figure 4) as recommended by Robbins and Heiberger (2011). The horizontal bars in each lineshow the answer fractions according to the legend at the bottom and are annotated with the percentagesof the left-most, middle, and right-most answer options. These bars are aligned by the midpoint ofthe middle group (for 3- and 5-level scales) or by the boundary between the two central groups (for 4-level scales). Bar labels often abbreviate the corresponding answer options in the questionnaire. Thequestionnaire copy in Appendix A.12 contains short definitions, explanations, and examples to clarify theanswer options. For sake of brevity, we do not repeat this information here. “M” denotes the median,“CI” the 95% confidence interval for the median calculated according to Campbell and Gardner (1988),“X” the number of excluded data points per answer option, and “NA” the number of invalid data points.

Q1: Application Domain. For each domain, Figure 2 shows the number of participants having experi-ence in that domain.8 Note that 180 of the respondents do have experience with applying FM in differentindustrial contexts, while only 36 have not applied FMs to any application domain. Medical healthcareis an example where participants could have checked more than one answer category because medicaldevices would belong to “device industry” and emergency management IT would belong to “critical in-frastructures”. See Appendix A.12 for more information about the answer categories.

Q2: FM Experience. Figure 3 depicts participants’ years of experience in using FMs, showing that thesample covers all experience levels. However, the fraction of respondents with no experience (i.e., cate-gory “0”) is comparatively low. According to Section 4.6, one third of the participants can be consideredLEs with up to three years of experience, and two thirds can be considered MEs with at least three yearsof experience (29 of those with even more than 25 years). A further analysis of the study participants’experience profile is available from Table 8 in Appendix A.1 on page 35.

Q3: Motivation. Figure 4 suggests that regulatory authorities play a subordinate role in triggering theuse of FMs. In contrast, intrinsic motivation (in terms of private interest) seems to be the major factor forusing FMs. For 9 respondents, none of the given factors was motivating at all. The 88 open responsesfor this question could either be subsumed in at least one of the given categories (65 in “Own (private)

8MC entails that the sum of answers can exceed N.

Page 12: Context: Objective: Method: Results: arXiv:1812.08815v3 ...

Preprint – FormalMethods Use 12

0 1 − 3 3 − 7 8 − 15 16 − 25 > 25

Fre

quen

cy

0

10

20

30

40

50 N = 216

Figure 3: (Q2) How many years of FM experience (including the study of FMs) have you gained?

31%

23%

34%

27%

54%

52%

52%

50%

48%

35%

32%

19%

17%

12%

20%

29%

31%

40%

26%

31%

37%

Regulatory authorities (X=0, M=no,CI[no,moderate], NA=0)

Customers / scientific community (X=0,M=moderate, CI[moderate,moderate], NA=0)

Employer / research collaborators (X=0,M=moderate, CI[moderate,moderate], NA=0)

Superior / principal investigator (X=0, M=no,CI[no,moderate], NA=0)

Study or research program (X=0, M=moderate,CI[moderate,strong motivation], NA=0)

Own interest (X=0, M=moderate, CI[moderate,strongmotivation], NA=0)

On behalf of an FM (X=0, M=no, CI[no,moderate],NA=0)

100 50 0 50 100Percentage

Degree of motivation: no moderate strong motivation

Figure 4: (Q3) Which have been your motivations to use FMs?

interest”, 11 in other categories) or be declared as a comment (3) or not a further motivation (9). Hence,coding did not require an additional answer category to Q3.

5.3 Facets of Formal Methods Use (answering RQ 1)

In the following, we summarise the responses to the questions about the role of a user (Q4), use inspecification (Q5), use in analysis (Q6), and the underlying purpose (Q7) of such use.

Q4: Role. Figure 5 shows in which roles the respondents applied FMs. An analysis of the MC answersshows that 72% of the participants used FMs in an academic environment, as a researcher, lecturer, orstudent. 50% of the participants applied FMs in practice, as an engineer or consultant (see also Gleirscherand Marmsoler, 2018).

Q5: Use in Specification. The degree of usage of FMs for specification is depicted in Figure 6. Thereis an almost balanced proportion between theoretical and practical experience with the use of variousspecification techniques. Only the use of FMs for the description of dynamical systems seems to beremarkably low.

Q6: Use in Analysis. The use of FMs for analysis is depicted in Figure 7. Similar to specificationtechniques, we observe an almost balanced proportion between theoretical and practical experience withthe usage of various analysis techniques. Outstanding is the use of assertion checking techniques, such ascontracts. As expected from the observations for Q5, the use of FMs in computational engineering, suchas algebraic reasoning about differential equations, is again exceptionally low.

Page 13: Context: Objective: Method: Results: arXiv:1812.08815v3 ...

Preprint – FormalMethods Use 13

I have not used FMs in any specific role.

Stakeholder of an FM tool or service provider

Consulting or managing practitioner in industry

External consultant

Researcher in industry

Engineering practitioner in industry

Lecturer, teacher, trainer, or coach

Bachelor, master, or PhD student

Researcher in academia

20 40 60 80 100

N = 216

Figure 5: (Q4) In which roles have you used FMs? (MC)

43%

46%

46%

72%

34%

34%

26%

15%

23%

19%

28%

13%

Predicative, relational, or algebraic (X=0,M=lab, CI[course,lab], NA=0)

Modal and temporal logic specification (X=0,M=lab, CI[course,lab], NA=0)

Process models (X=0, M=lab, CI[course,lab],NA=0)

Dynamical systems (X=0, M=course,CI[course,course], NA=0)

100 50 0 50 100Percentage

Experience: none course lab practiced practiced > 1

Figure 6: (Q5) Describe your level of experience with each of the following classes of formal descriptiontechniques.

36%

50%

45%

50%

56%

51%

56%

60%

66%

43%

36%

30%

29%

26%

24%

22%

21%

20%

22%

15%

25%

20%

18%

25%

21%

19%

13%

Abstract interpretation (X=0, M=course,CI[course,lab], NA=0)

Assertion checking (X=0, M=lab, CI[lab,lab],NA=0)

Process calculi (X=0, M=course,CI[course,course], NA=0)

Model checking, SMV (X=0, M=lab,CI[course,lab], NA=0)

Constraint solving (X=0, M=course,CI[course,lab], NA=0)

Generic theorem proving (X=0, M=course,CI[course,lab], NA=0)

Computational engineering, simulation (X=0,M=course, CI[course,course], NA=0)

Symbolic execution (X=0, M=course,CI[course,lab], NA=0)

Consistency checking (X=0, M=lab,CI[course,lab], NA=47)

100 50 0 50 100Percentage

Experience: none course lab practiced practiced > 1

Figure 7: (Q6) Describe your level of experience with each of the following classes of formal reasoningtechniques.

Page 14: Context: Objective: Method: Results: arXiv:1812.08815v3 ...

Preprint – FormalMethods Use 14

37%

37%

43%

45%

69%

63%

63%

57%

55%

31%

Clarification (X=0, M=2 − 5, CI[1,2 − 5],NA=0)

Specification (X=0, M=2 − 5, CI[2 − 5,2 −5], NA=0)

Inspection (X=0, M=2 − 5, CI[1,2 − 5],NA=0)

Synthesis (X=0, M=0, CI[0,1], NA=0)

Assurance (X=0, M=2 − 5, CI[2 − 5,2 − 5],NA=0)

100 50 0 50 100Percentage

in 0 1 2 − 5 > 5 separate tasks

Figure 8: (Q7) I have mainly used FMs for ...

Business information

Platforms

Supportive

Critical infrastructures

Device industry

Military systems not in the above domains

other

Industrial machinery

Transportation

Process automation

I have not used FMs in any academic or industri...

I would not or do not intend to use FMs in any...

0 20 40 60 80 100 120 140

Application Domain (past)Application Domain (future)

Figure 9: Number of respondents using FMs by domain (past vs. intent)

Q7: Purpose. Figure 8 depicts the participants’ purposes to apply FMs. It seems that the respondentsemploy FMs mainly for assurance, specification, and inspection. Synthesis, on the other hand, to themseems to be only a subordinate purpose in the use of FMs.

5.4 Past Use versus Usage Intent (answering RQ 2)

We investigate the usage intent of FMs across various domains and roles as well as the participants’ intentto use various FMs and their intended purpose to use FMs.

Application Domain. Figure 9 compares the respondents’ past domains of FM application with theirintended domains (see Q8). This figure reveals two insights into the participants’ intentions to use FMs:(i) Fewer participants do not want to apply FMs in the future (19) than participants that have not usedFMs (36, see yellow bars). Ten participants fall into both categories, they have not used FMs and donot intend to use FMs. (ii) The intended application of FMs outperforms the current application of FMsacross all domains. Hence, there is a tendency to increase the use of FMs across all application domains.

Role. Figure 10 compares the participants’ roles in which they applied FMs in the past with their in-tended role to apply FMs in the future (see Q9). Similar to the results for the application domain, weobserve that some participants, who have not applied FMs in any role so far, intend to apply such meth-ods in the future. However, the comparison reveals that academic disciplines (i.e., researcher and lecturer)seem to be stable. There is only a small difference between the number of participants who applied FMs

Page 15: Context: Objective: Method: Results: arXiv:1812.08815v3 ...

Preprint – FormalMethods Use 15

Bachelor, master, or PhD studentConsulting or managing practitioner in industry

External consultantResearcher in academia

Researcher in industryLecturer, teacher, trainer, or coach

Stakeholder of an FM tool or service providerEngineering practitioner in industry

I have not used FMs in any specific roleI do not or would not intend to use FMs in any ...

0 20 40 60 80 100

Role (past)Role (future)

Figure 10: Number of respondents applying FMs by role (past vs. intent)

28%

33%

35%

44%

72%

67%

65%

56%

Predicative, relational, or algebraic specification (X=26,M=equally, CI[equally,equally], NA=0)

Modal or temporal logic specification (X=27, M=equally,CI[equally,equally], NA=0)

Process models (X=27, M=equally, CI[equally,equally], NA=0)

Dynamical system models (X=39, M=equally,CI[equally,equally], NA=0)

100 50 0 50 100Percentage

Relative frequency: no more less equally more often

Figure 11: (Q10) I (would) intend to use ...

in academic domains in the past and the number of participants who want to apply such methods to thesedomains in the future.

In contrast, there is a significant increase in the number of participants aiming to apply FMs, across allindustrial roles.

Furthermore, the diagram shows a strong contrast between past and indented use in the category “Bach-elor, master, or PhD student.” We can see several reasons for this difference. From the respondents who“used FMs as a student,” many (i) might not be able to “use FMs as a student” anymore because of havinggraduated, (ii) did not find FMs or the way FMs were taught as helpful, or (iii) moved into a businessdomain with no foreseeable demand for the application of FMs.

Q10: Intended use for Specification. Figure 11 depicts the respondents’ intended future use of variousFMs for system specification (i.e., formal description techniques). The figure shows an almost equalamount of participants aiming to decrease (i.e., “no more” and “less”) and increase (i.e., “more often”)their use of FMs for specification. Only dynamical system models again seem to be an exception: moreparticipants want to decrease their use of this technology, compared to participants who want to increaseit.

Q11: Intended use for Analysis. The respondents’ intended use of FMs for the analysis of specifica-tions (i.e., formal reasoning techniques) is depicted in Figure 12. Except for process calculi, we observea general tendency of the participants to increase their future FM use.

Q12: Intended Purpose. Figure 13 indicates why respondents intend to apply FMs. Again, there is atendency of the participants to increase FM use across all listed purposes.

Page 16: Context: Objective: Method: Results: arXiv:1812.08815v3 ...

Preprint – FormalMethods Use 16

19%

25%

26%

27%

28%

31%

32%

38%

42%

81%

75%

74%

73%

72%

69%

68%

62%

58%

Abstract interpretation (X=37, M=equally,CI[equally,equally], NA=0)

Assertion checking (X=17, M=equally, CI[equally,moreoften], NA=0)

Process calculi (X=46, M=equally, CI[equally,equally],NA=0)

Model checking, SMV (X=21, M=equally, CI[equally,moreoften], NA=0)

Constraint solving (X=23, M=equally, CI[equally,moreoften], NA=0)

Generic theorem proving (X=38, M=equally, CI[equally,moreoften], NA=0)

Simulation (X=28, M=equally, CI[equally,equally], NA=0)

Symbolic execution (X=26, M=equally, CI[equally,moreoften], NA=0)

Consistency checking (X=23, M=equally, CI[equally,moreoften], NA=47)

100 50 0 50 100Percentage

Relative frequency: no more less equally more often

Figure 12: (Q11) I (would) intend to use ...

18%

18%

20%

22%

30%

82%

82%

80%

78%

70%

Clarification (X=22, M=equally, CI[equally,moreoften], NA=0)

Specification (X=12, M=equally, CI[equally,moreoften], NA=0)

Inspection (X=20, M=equally, CI[equally,more often],NA=0)

Synthesis (X=34, M=equally, CI[equally,more often],NA=0)

Assurance (X=18, M=equally, CI[equally,more often],NA=0)

100 50 0 50 100Percentage

Relative frequency: no more less equally more often

Figure 13: (Q12) I (would) intend to use FMs for ...

Q7 and Q12: Comparison of Code- and Model-based FMs. In the following, we regard practition-ers with experience level “applied several times in engineering practice” or “applied once in engineeringpractice” and frequency “applied in 2 to 5 separate tasks” or “applied in more than 5 separate tasks”(see Table 4). We compare users of code-based FMs (CBs; including “abstract interpretation”, “asser-tion checking”, “symbolic execution”, “consistency checking”; with N=128) with users of model-basedFMs (MBs; including “process calculi”, “model checking”, “theorem proving”, and “simulation”; withN=114). While some of the FM classes can be seen as both, code- and model-based, we made a choicebased on our experience but left out “constraint solving” because it is a fundamental technique intensivelyapplied in both.

The comparison of past and future use for code-based (top half of Figure 21 in Appendix A.4) and model-based FMs (bottom half of Figure 21), for example, in inspection (e.g. error detection, bug finding) showsthe following:

• CBs show slightly more frequently an increased intent (the “more often” group) than MBs; forboth sub-groups, respondents with 2 to 5 and with more than 5 past uses.

• MBs show slightly more frequently a decreased intent (the “no more” group) than CBs.

Looking at assurance (e.g. proof, error removal) shows the following:

Page 17: Context: Objective: Method: Results: arXiv:1812.08815v3 ...

Preprint – FormalMethods Use 17

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

19

23

34

33

19

7

6

11

34

15

0

10

13

24

23

15

10

2

13

26

11

0

19

22

33

24

18

12

4

15

33

18

2

6

5

12

10

7

5

2

8

14

11

1

13

11

26

20

13

8

4

12

26

9

0

24

24

43

35

20

14

4

14

38

16

8

13

13

21

18

13

7

4

11

18

8

1

16

17

34

22

19

10

4

13

33

11

1

14

16

29

20

14

9

4

7

25

8

1

15

12

26

22

13

6

4

7

24

12

0

10

8

17

14

7

5

2

6

17

14

3

20

14

33

19

17

9

2

12

31

14

1

18

13

32

24

12

12

2

6

25

7

4

Business information

Critical infrastructures

Device industry

I have not used FMsin any academic or

industrial domain

Industrial machinery

Military systems not inthe above domains

Other

Platforms

Process automation

Supportive

Transportation

Abs

trac

t int

erpr

etat

ion

Ass

ertio

n ch

ecki

ng

Com

puta

tiona

len

gine

erin

g, s

imul

atio

n

Con

sist

ency

che

ckin

g

Con

stra

int s

olvi

ng

Dyn

amic

al s

yste

ms

Gen

eric

theo

rem

pro

ving

Mod

al a

nd te

mpo

ral l

ogic

spec

ifica

tion

Mod

el c

heck

ing,

SM

V

Pre

dica

tive,

rela

tiona

l, or

alg

ebra

icsp

ecifi

catio

n

Pro

cess

cal

culi

Pro

cess

mod

els

Sym

bolic

exe

cutio

n

FM Class

App

licat

ion

Dom

ain

Figure 14: Approximation (likelihood) of practised use (UFMp) by FM class and application domain

• MBs show slightly more frequently an increased intent than CBs when looking at respondentswho have used FMs more than 5 times. However, MBs indicate slightly less frequently anincreased intent than CBs when looking at respondents with 2 to 5 uses.

• CBs indicate more dnks after 2 to 5 uses and slightly more frequently a decreased intent after 5uses in comparison with MBs.

Q1, Q5, and Q6: Practised FM Classes by Application Domain. We asked respondents about theiruse of each FM class independent of the application domain and about their general use of FMs in eachsuch domain. Hence, we can only approximate past usage per FM class and application domain assum-ing that the overall usage per respondent is uniformly distributed among the specified FM classes anddomains. For that, we interpret (and count) each respondent who specifies a domain in combination with“applied once in engineering practice” or “applied several times in engineering practice” for an FM classas a practitioner who has used (UFMp) or, respectively, wants to use (UFMi) FMs of that class in thatdomain. More generally, we count a respondent who specifies n domains, say d1 to dn, in combinationwith “applied once in engineering practice” or “applied several times in engineering practice” for m FMclasses, say c1 to cm, as a practitioner who has used (UFMp) or, respectively, wants to use (UFMi) FMsof the classes c1 to cm in the domains d1 to dn. Figures 14 and 15 show these approximations for UFMpand UFMi.

5.5 Perception of Challenges (answering RQ 3)

Table 6 lists the FM challenges subject to discussion, their background, and literature referring to them.We apply the procedure described in Section 4.6.

Page 18: Context: Objective: Method: Results: arXiv:1812.08815v3 ...

Preprint – FormalMethods Use 18

Table 6: Feedback on given and additional challenges (see Appendix A.8 for a full list of references)

Challenge Name & Description Src. Supported by (oldest,newest)

Findings for RQ3 (Section 5.5)

Scalability: Useful in handlinglarge and technologicallyheterogeneous systems

Q 7 studies, e.g. Hall(1990) and Miller et al.(2010)

toughest in Figure 16; by Ps more thanby NPs; when using FMs for assuranceand clarification; independent of FMclass

Skills & Education: Methodsknown (little misconception);trained and experienced usersavailable

Q 13 studies, e.g. Bjorner(1987), Bicarreguiet al. (2009)

2nd toughest; agreed by LEs and MEs;largely independent of FM class;comparatively small tough-proportionsby Ms

Transfer of Proofs: Relationbetween models and reality (e.g.code), handling incompletespecifications

Q 8 studies, e.g. Jackson(1987) and Parnas(2010)

Agreed by LEs and MEs; top-rated byDIs and NMs; largely independent ofFM class

Reusability: Parametric proofs,reusable specifications andverification results

Q Barroca andMcDermid (1992) andBowen and Hinchey(1995b)

Top-rated by tool provider stakeholdersand lectures

Abstraction: Useful and correct(automated) abstractions fromirrelevant detail (forcomprehension and validation)

Q 12 studies, e.g.Jackson (1987), Milleret al. (2010), andParnas (2010)

Varies notably across FM classes

Tools & Automation: Usefulnotations and trustworthy tools(for manipulation, checking,collaboration, documentation)

Q 16 studies, e.g. Bjorner(1987) and O’Hearn(2018)

Top-rated by DIs; but comparativelysmall tough-proportions frompractitioners

Maintainability: Stable proofs,easily modifiable specifications,and adaptable verification results

Q Barroca andMcDermid (1992),Knight et al. (1997),and Parnas (2010)

Comparatively small tough-proportionsfrom practitioners

Resources: Sufficient resources,good cost-benefit ratio (despiteadoption, training, licenses)

4R 11 studies, e.g. Hall(1990) and Woodcocket al. (2009)

No detailed data was collected:Because these challenges werementioned several times each,we classify them to be at leastof moderate difficulty.

Process Compatibility:Integration into existing process,method culture, standards, andregulations

6R 12 studies, e.g. Bjorner(1987) and O’Hearn(2018)

Practicality & Reputation:Benefit awareness and sufficientempirical evidence for benefits

7R 6 studies, e.g. Lai andLeung (1995) andParnas (2010)

Src.. . . source, Q . . . in questionnaire, nR . . . additionally raised by n Respondents

Page 19: Context: Objective: Method: Results: arXiv:1812.08815v3 ...

Preprint – FormalMethods Use 19

●●

21

24

24

38

30

20

21

35

18

1

0

25

27

32

39

32

22

26

46

21

1

1

20

21

26

33

31

22

28

42

24

0

1

16

25

22

33

28

19

24

38

21

0

1

22

34

34

40

33

24

22

43

22

0

0

37

39

40

60

51

35

41

61

37

0

3

21

18

19

25

23

14

21

29

17

1

0

35

41

45

58

50

38

40

66

34

1

0

37

49

50

68

59

38

41

70

39

1

1

31

35

37

47

37

27

30

47

27

1

1

23

31

30

42

36

25

26

49

25

1

3

38

46

47

59

51

35

39

64

36

1

2

29

39

33

48

38

26

32

46

29

1

2

Business information

Critical infrastructures

Device industry

I would not or do notintend to use FMs in any

academic or industrialdomain

Industrial machinery

Military systems not inthe above domains

Other

Platforms

Process automation

Supportive

Transportation

Abs

trac

t int

erpr

etat

ion

Ass

ertio

n ch

ecki

ng

Com

puta

tiona

len

gine

erin

g, s

imul

atio

n

Con

sist

ency

che

ckin

g

Con

stra

int s

olvi

ng

Dyn

amic

al s

yste

ms

Gen

eric

theo

rem

pro

ving

Mod

al a

nd te

mpo

ral l

ogic

spec

ifica

tion

Mod

el c

heck

ing,

SM

V

Pre

dica

tive,

rela

tiona

l, or

alg

ebra

icsp

ecifi

catio

n

Pro

cess

cal

culi

Pro

cess

mod

els

Sym

bolic

exe

cutio

n

FM Class

App

licat

ion

Dom

ain

Figure 15: Approximation (likelihood) of increased usage intent (UFMi) by FM class and applicationdomain

9%

11%

12%

13%

18%

12%

17%

67%

52%

49%

44%

42%

38%

37%

23%

37%

39%

43%

39%

50%

46%

Scalability (X=26, M=tough, CI[tough,tough], NA=0)

Proper abstractions from irrelevant details (X=36,M=moderate, CI[moderate,tough], NA=0)

Maintainability of verification results (X=36,M=moderate, CI[moderate,tough], NA=0)

Reusability of verification results (X=37, M=moderate,CI[moderate,tough], NA=0)

Transfer of verification results from (X=31,M=moderate, CI[moderate,tough], NA=0)

Automation or tool support (X=27, M=moderate,CI[moderate,tough], NA=0)

Skills and education (X=23, M=tough, CI[tough,tough],NA=0)

100 50 0 50 100Percentage

Degree of difficulty: not an issue moderate tough

Figure 16: (Q13) For any use of FMs in my future activities, I consider 〈obstacle〉 as [not an|a moderate|atough] issue.

General Ranking (Q13). Figure 16 shows the respondents’ ratings of all challenges. Most of thembelieve that scalability will be the toughest challenge and maintainability is considered the least difficultof all rated obstacles. For reuse of proof results, proper abstractions, and tool support, the participantsdistribute more uniformly across moderate and high difficulty.

In the following, we compare specific groups of respondents by how they perceive the difficulty of thevarious challenges. We group respondents according to the criteria in Section 4.6 and according to the

Page 20: Context: Objective: Method: Results: arXiv:1812.08815v3 ...

Preprint – FormalMethods Use 20

role, motivating factor, FM class, and purpose they specified. Appendix A.6 provides some backgroundmaterial for the following association analyses.

Less Experienced (LE) versus More Experienced (ME) Respondents (Q2). The comparison of thedifficulty ratings of LEs with the ratings of MEs shows that (i) LEs less often perceive the given challengesas tough, (ii) MEs significantly more often rate scalability as tough, (iii) both groups show the closestagreement on transfer of verification results and skills and education.

Non-Practitioners (NP) versus Practitioners (P) by Past Purpose (Q7). The perception of skills andeducation and scalability as the most difficult challenges is largely independent of the purpose, again Psattributing more significance to scalability. Scalability, the forerunner in Figure 16, exhibits the mosttough-ratings from NPs in synthesis and from Ps in assurance and clarification (see the top half of Fig-ure 22 in Appendix A.6).

Decreased Intent (DI) versus Increased Intent (II) by Purpose (Q12). The comparison of the dif-ficulty ratings of respondents with no or decreased intent to use FMs for a specific purpose and of re-spondents with equal or increased intent shows: (i) Scalability and skills and education, both forerunnersin Figure 16, show the most tough-ratings from IIs for assurance (67%) and inspection (66%) and fromDIs for synthesis (53%). (ii) The trend in Figure 16 is more clearly observable from IIs than from DIs,where transfer of verification results and automation and tool support seem to be tougher than skills andeducation.

Non-Practitioners (NP) versus Practitioners (P) by FM Class (Q5, Q6). The top half of Figure 17shows for NPs, the trend in Figure 16 is largely independent of the FM class, except for consistencychecking and logic leading with tough proportions of 49%.

The bottom half of Figure 17 shows for Ps, difficulty ratings across FM classes vary more: The foremostchallenges in Figure 16 received the most tough-ratings from users of process models, dynamical systems,process calculi, model checking, and theorem proving. Difficulty ratings of users are often centred onmoderate or tough, proper abstraction and skills and education show a comparatively wide variety acrossFM classes.

The histograms in the lower right corners in Figure 17 indicate that (i) NPs’ difficulty ratings vary lessthan Ps’ ratings, (ii) NPs’ ratings are more independent from the FM classes, and (iii) NPs’ difficultyratings are lower on average than Ps’ ratings. Appendix A.6 contains several such association matriceswith more detailed data in the matrix cells.

Decreased Intent (DI) versus Increased Intent (II) by FM Class (Q10, Q11). The trend in Figure 16is supported by many tough ratings (48%) for transfer of verification results from DIs in consistencychecking. However, DIs in process calculi provide comparatively many tough-ratings (39%) for the gen-erally low-ranked automation and tool support. Assertion checking exhibits comparatively low tough-proportions across all challenges whereas process calculi exhibit comparatively high tough-ratings. Mir-roring the trend in Figure 16, IIs show less variance than DIs across all FM classes.

Unmotivated (NM) versus Motivated (M) respondents by Motivating Factor (Q3). Respondentswith moderate to strong motivation to use FMs more likely identify the given challenges as moderate totough, regardless of the motivating factor. The trend in Figure 16 seems explainable by many tough rat-ings from respondents motivated by regulatory authorities (69%), not motivated by tool providers (56%),and not motivated by superiors/principal investigators (56%, see Figure 24 in Appendix A.6). NMs’tough-ratings are notably lower than Ms’ tough-ratings.

Past and Future Views by Role (Q4, Q9). Although participants show role-based discrepancies be-tween their past and intended use of FMs (Figure 10), the perception of difficulty of the rated challengesseems to be largely similar, following the trend in Figure 16. The high ranking of scalability (and

Page 21: Context: Objective: Method: Results: arXiv:1812.08815v3 ...

Preprint – FormalMethods Use 21

Sca

labi

lity

Pro

per

abst

ract

ions

from

irre

leva

nt

Mai

ntai

nabi

lity

ofve

rific

atio

n re

sults

Reu

sabi

lity

ofve

rific

atio

n re

sults

Tran

sfer

of

verif

icat

ion

resu

lts

Aut

omat

ion

or to

olsu

ppor

t

Ski

lls a

nd e

duca

tion

Consistency checking

Symbolic execution

Computational engineering, simulation

Generic theorem proving

Constraint solving

Model checking, SMV

Process calculi

Assertion checking

Abstract interpretation

Dynamical systems

Process models

Modal and temporal logic specification

Predicative, relational, or algebraicspecification

0 0.2 0.4 0.6 0.8 1

Tough−proportion of all ratings per cell0

15

Key/Histogram of Toughs

# C

ells

Comparison of Challenge Difficulty across FMs (users not practicing FMs, past)

Sca

labi

lity

Pro

per

abst

ract

ions

from

irre

leva

nt

Mai

ntai

nabi

lity

ofve

rific

atio

n re

sults

Reu

sabi

lity

ofve

rific

atio

n re

sults

Tran

sfer

of

verif

icat

ion

resu

lts

Aut

omat

ion

or to

olsu

ppor

t

Ski

lls a

nd e

duca

tion

Consistency checking

Symbolic execution

Computational engineering, simulation

Generic theorem proving

Constraint solving

Model checking, SMV

Process calculi

Assertion checking

Abstract interpretation

Dynamical systems

Process models

Modal and temporal logic specification

Predicative, relational, or algebraicspecification

0 0.2 0.4 0.6 0.8 1

Tough−proportion of all ratings per cell

010

Key/Histogram of Toughs

# C

ells

Comparison of Challenge Difficulty across FMs (users practicing FMs, past)

Figure 17: Difficulty of challenges (cols): NPs (top) compared to Ps (bottom) by class of used FM (rows).Legend: In each cell of an association matrix, both the solid vertical line and the colour (gradient fromred to white) represent the tough proportions (from 0 to 100%), with the dotted vertical line marking the50% margin. The histogram (to the lower right corner of each matrix) counts the combinations (cells) ineach 5%-band of tough ratings. E.g. ∼70% of “process calculi” users perceive “scalability” as a toughchallenge.

Page 22: Context: Objective: Method: Results: arXiv:1812.08815v3 ...

Preprint – FormalMethods Use 22

reusability of verification results) is supported by many tough-ratings from tool provider stakeholders forthe past view and many from lecturers for the future view. Respondents not having used FMs or notplanning to use FMs exhibit the lowest tough-ratings but also the highest fractions of dnk-answers.

Past and Future Views by Domain (Q1, Q8). The trend in Figure 16 is underpinned by highest tough-proportions for respondents from the transportation, military systems, industrial machinery, and support-ive domains.

6 Discussion

In this section, we discuss and interpret our findings, relate them to existing evidence, outline generalfeedback on the questionnaire, and critically assess the validity of our study.

6.1 Findings and their Interpretation

The following (F)indings are based on the data summarised and analysed in the Sections 5.2 to 5.4. Allfindings are then collected in Table 7 on page 25.

Findings for RQ 1. (F1) Regulatory authorities with their norms, codes, or policies represent only aminor motivating factor to use FMs. Intrinsic motivation (maybe market-triggered) seems to be stronger.This finding is consistent with what we know from the literature survey in Gleirscher et al. (2019): FMsare not formally required by corresponding standards today, not even for the highest safety integritylevels. If regulatory authorities change their recommendations to requirements, then this might spike as amotivating factor.

(F2) The low fraction of respondents with no experience in Figure 3 may have been caused (1) by ourchoice of expert channels in Table 5 where the likelihood of encountering FM users is probably higherthan in more generic SE channels (e.g. Stack Overflow) and (2) by the fact that SE students will usuallyhave an FM course or some lectures about FMs such that they would choose “1–3 years” in Q2 and“studied in course” in Q5.

(F3) We observe the least use of FMs in computational engineering and for reasoning about dynamicalsystems, for example, reasoning about the correctness of algorithms, and their implementation in embed-ded software, controlling such systems. One explanation for this is that our sample mainly comprisessoftware and systems engineers who will work less intensively with such FMs than, for example, me-chanical or control engineers. Another explanation is that such FMs are still less widely known, less welldeveloped, or less well supported by tools than FMs focusing on the reasoning about pure software.

Findings for RQ 2. (F4) It seems that in all given domains (Figure 9, except for other) respondentsintend to increase their future use of FMs. Moreover, we observe that this tendency is independent ofthe particular FM class (except process calculi) or purpose. The data also suggest that the use of FMs byteachers and researchers is saturated. This saturation indicates a stable intent to teach FMs, to performresearch in FMs, or to otherwise use FMs in teaching or research. However, there is an increased intentto apply FMs in industrial contexts in the future. One explanation could be that engineers have alreadywanted to use FMs but have not had the opportunity or were not told or permitted to do so. Anotherexplanation for an increased intent of FM non-users could be due to some bias when answering questionsabout whether someone would do (e.g. try out) something.

(F5) Our data suggest that experience in using a certain FM class is positively associated with the intent touse this FM class in the future. To investigate this suspicion, we analysed the intended use of a FM classbased on the experience of participants in using this class (also by association analysis as described inSection 4.6). We observe that the more experience one has with using a specific FM class, the more likelythey will apply it in the future (see the two charts in the Appendices A.3 and A.5). No experience witha specific FM class correlates with a low intent to use that class. Participants not having used FMs and,

Page 23: Context: Objective: Method: Results: arXiv:1812.08815v3 ...

Preprint – FormalMethods Use 23

Bachelor, master, or PhD studentConsulting or managing practitioner in

External consultantResearcher in academia

Researcher in industryLecturer, teacher, trainer, or coach

Stakeholder of an FM toolEngineering practitioner in industry

I have not used FMs

Number of non−practitioners (MC)

0 5 10 15 20 25

Figure 18: The past role profile of the 46 non-practitioners (out of 216 respondents) helps to explainfinding F8

hence, unfamiliar with them might not have had the need in the first place. Only little experience with acertain FM class significantly increases the intent to apply it again in the future. Similar observations canbe made for the use of FMs in general for a specific purpose.

(F6) The differences in past and intended use between code- and model-based FMs (Section 5.4), forexample, when looking at inspection and assurance, are marginal. Moreover, we cannot find a signif-icant difference or a trend between these two categories of FMs when considering different purposes,experience levels, and usage frequencies.

(F7) The approximation in the Figures 14 and 15 allows the, albeit vague, interpretation of the numbersas the likelihood that respondents have used (Figure 14) or want to use (Figure 15) a particular FM ina particular domain. Assuming this model, Figure 15 indicates the highest likelihoods of an increasedUFMi for methods such as “assertion checking”, “constraint solving”, “model checking”, and “symbolicexecution” in domains such as “transportation”, “critical infrastructures”, and the “device industry”.

Findings for RQ 3. (F8) Scalability and skills and education lead the challenge ranking, independentof the domain, FM class, motivating factor, and purpose. Practitioners see scalability as more problematicthan non-practitioners, whereas non-practitioners perceive skills and education as more problematic thanpractitioners. Figure 18 may explain the latter by showing a high fraction of students among the 46non-practitioners.

(F9) Maintainability of proof results or other verification artefacts was found to be the least difficult chal-lenge. However, in the lower half of Figure 17, the challenge column “maintainability” shows relativelylow frequencies for “modal and temporal logic” and “model checking” (possibly because of the highlevel of automation) whereas “theorem proving” (possibly because of a low level of automation) and“constraint solving” (possibly because of being too versatile or generic for the present purpose) showthe highest frequencies of tough ratings. See Figure 26 in the Appendix A.6 for more details. (F10)Reusability of proof results was rated as tough by several practitioner groups.

(F11) FM users with decreased usage intent rate tool deficiencies as their top obstacle to FM adoption.(F12) Furthermore, our respondents raised three additional challenges (i.e., resources, process compati-bility, and practicability & reputation) which we cross-validated with the literature (see highlighted rowsin Table 6). The fact that these obstacles were mentioned several times in addition to the given obstacles

Page 24: Context: Objective: Method: Results: arXiv:1812.08815v3 ...

Preprint – FormalMethods Use 24

justifies them to be highly relevant and at least moderate. However, our data does not allow to rank themmore precisely.

(F13) Challenges are perceived as moderate or tough, largely similarly between the pairs of groups wedistinguish in Section 4.6.

(F14) With 72% of tough ratings for scalability, process calculi (e.g. ACP, CCS, CSP) perform in themidfield despite their high reputation as compositional methods. Scalability of process models (e.g. Petrinets, Mealy machines, labelled transition systems, Markov models) is also ranked in the middle field oftough challenges. The ranking of these models, however, is unsurprising in the light of the difficult scal-ability of model checking, a frequently used verification technique for process models and the leader inthis ranking (cf. Figure 17). One explanation for the high number of tough-ratings from NPs in synthesiscould be that NPs might either not associate FMs with synthesis in general, or because automated synthe-sis of sophisticated artefacts is known to be an unsolved problem in many cases, independent of the useof FMs.

6.2 Relationship to TAM for Methods (answering RQ 4)

In analogy to the reasoning in Davis (1989), an increased positive experience with practically applyingFMs forms a high degree of PU (Section 2). Davis (1989, pp. 329, 331) observed that current and intendedusage are significantly correlated with PU, less with PEOU. In fact, F4 suggests an increased intent touse FMs in the future. Moreover, F5 suggests a positive association of the degree of experience withUFMi, that is, more experience increases the intent. (F15) Because the use of FMs is not mandatory formost respondents, a likely explanation for an increased intent (UFMi) is that our respondents perceive theusefulness of FMs to be more positive than negative.

Inspired by Riemenschneider et al. (2002), in the last paragraph of Section 4.2, we justify the use ofchallenge scales to collect data for PEOU and PU. We justify the validity of the FM-specific challengescale using the studies in Table 1. The column “supported by” in Table 6 indicates studies discussingthe corresponding challenges. From these discussions, we infer that tackling these challenges contributesto an increased EOU and U. First, the studies suggest that FMs are easier to use if users have sufficientskills and education, if the methods scale to large systems, if mature tools and automation are available,and if proofs are easily maintainable and reusable. Second, the studies suggest that FMs are more usefulif they are compatible with the process, if their cost-benefit ratio is low, if their abstractions are correctand expressive, and if proofs can be correctly transferred to reality. Hence, these challenges representFM-specific substrata (Davis, 1989, p. 325) of EOU and U for FMs. Moreover, a high degree of PEOUcorresponds to an increased positive user experience with FMs which translates to a low proportion oftough ratings for the obstacles measured in Q13. However, from F13, we observe that respondents ratemost challenges as moderate to tough, largely independent of other variables (F8).

(F16) Overall, it thus seems that our respondents perceive the ease of use of FMs to be more negativethan positive. According to Table 6, many of the surveyed studies discuss skills & education (12 studies)and tools & automation (16) as important challenges. Moreover, Figure 16 suggests that conceptualdifficulties (possibly, from a lack of education and training, from difficulties in FM teaching, from a lackof FM students) seem to be at least as responsible for the negative ease of use as the lower ranked tooldeficiencies. Indeed, in a recent discussion of “push-button verifiers”, O’Hearn (2018) highlights thatboth conceptual expertise and tool deficiencies are still significant bottlenecks. However, an investigationof respondents’ experiences with FM tools in comparison to their experiences with FM concepts goesbeyond the possibilities of the data collected for this study.

6.3 Relationship to Existing Evidence

Our systematic map shows that our list of challenges is completely backed by substantial literature (seeTable 6) raising and discussing these challenges. (F17) However, the fact that maintainability andreusability were least covered by our literature is, on the one hand, in line with F9 but, on the otherhand, not with F10 and typical cultures of reuse in practice.

Page 25: Context: Objective: Method: Results: arXiv:1812.08815v3 ...

Preprint – FormalMethods Use 25

Table 7: Summary of findings per research question

RQ1: In which typical domains, for which purposes, in which roles, and to what extent have FMs beenused?F1 Intrinsic motivation to use FMs is stronger than norms or codes of regulatory authorities.F2 The fraction of respondents with no experience at all is comparatively low.F3 Respondents use FMs the least in computational engineering and for dynamical systems.

RQ2: Which relationships can we observe between past experience in using FMs and intent to use FMs?F4 Increased intent to use FMs observable across all application domains.F5 Amount of experience is positively associated with the strength of usage intent.F6 The responses do not show any significant differences between code- and model-based FMs.F7 Respondents show high likelihoods of an increased intent to use FMs such as “model checking” or “asser-

tion checking” in areas such as “transportation” or “critical infrastructures”.

RQ3: How difficult do study participants perceive widely known FM challenges?F8 Scalability and skills & education lead the challenge difficulty ranking.F9 Maintainability of proof results is found to be the least worrying challenge.F10 Reusability of proof results is rated as tough by several practitioner groups.F11 FM users with decreased usage intent rate tool deficiencies as their top obstacle.F12 Respondents identified resources, process compatibility, and reputation as further obstacles.F13 All considered challenges are generally perceived as moderate or tough.F14 Among the FM classes, process models are most positively associated with tough scalability.

RQ4: What can we say about the perceived ease of use and the perceived usefulness of FMs?F15 Respondents perceive the usefulness of FMs as mainly positive and intend to increase their use.F16 Respondents perceive the ease of use of FMs as mainly negative.

Relationship to Existing Evidence (from the literature):F17 Proof maintainability and reusability are least covered by the literature.F18 We repeat Austin and Parkin (1993), excluding benefit analysis but with a broader sample and more de-

tailed questions.

Beyond the general findings about FM benefits in Austin and Parkin (1993), we steered our half-openquestionnaire towards a refined classification of responses, comparing past with intended use, and inter-rogating recently perceived obstacles among a methodologically and geographically more diverse sample.Their sample mainly covers Z and VDM users in the UK. Our questionnaire has less focus on representa-tion and methodology and excludes both questions on benefits and on suggestions to overcome obstacles.Regarding the latter, Austin and Parkin (1993) mention the improvement of education and standardisation,the preparation of case studies, and the definition of FM effectiveness metrics.

(F18) The report of Austin and Parkin (1993) from the National Physical Laboratory archive was unfor-tunately no more available to us. We finally managed to get access to a paper copy provided by a friendlycolleague. This, however, only happened after conducting this survey. Anyway, we found that our conclu-sions are nearly identical to Austin and Parkin’s. The data from Figure 16 and Table 6 confirms that manyof the obstacles (i.e., limitations and barriers) they identified back in 1991/2 remain (e.g. understandingthe notation and the underlying mathematics, resistance to process changes), some have been lightly ad-dressed (e.g. lack of cost/benefit evidence) and some have been more strongly addressed (e.g. lack ofexpressiveness, lack of appropriate tools). Not mentioned in Austin and Parkin (1993) is scalability, ratedby our respondents to be the toughest obstacle.

F5 is in line with other observations in Bicarregui et al. (2009) and Woodcock et al. (2009) that the re-peated use of a FM results in lower overheads (i.e., an experienced effort or cost reduction and improvederror removal), up to an order of magnitude less than its first use (Miller et al., 2010). Finally, our studygeneralises the main findings about barriers in Davis et al. (2013) to several geographies and applica-tion domains, however, using an on-line questionnaire instead of interviews and not asking for barriermitigations.

Page 26: Context: Objective: Method: Results: arXiv:1812.08815v3 ...

Preprint – FormalMethods Use 26

6.4 Threats to Validity

We assess our research design with regard to four common criteria (Shull et al., 2008; Wohlin et al.,2012). Per threat ( ), we estimate its criticality (minor or major), describe it, and discuss its partial (◦) orfull (X) mitigation.

6.4.1 Construct Validity

Why would the construct (Section 4.2) appropriately represent the phenomenon?

maj : Inappropriate questions and conceptual misalignment / To support face validity, we applied ourown experience from FM use to develop a core set of questions. For the design of our questionnaire,we use feedback from colleagues, from respondents we personally know, and from the general feedbackon the survey to improve and support content validity. A positive comparison with the questionnaire inAustin and Parkin (1993) finally confirms the appropriateness of our questions. However, we might haveneeded additional questions to check for conceptual alignment, for example, to more precisely determinewhether the respondents’ understanding of FMs and of the use or application of FMs closely matchesours. However, from 18 respondents giving feedback on our questionnaire, only one commented on thedefinition and one on the classification of FMs. That suggests that many respondents did not have or werenot aware of misunderstandings worth mentioning. ◦

min : Questionnaire limited for measurement of PEOU (e.g. per FM class) and PU / We avoid derivingconclusions specific to a FM or a corresponding tool from our data. X

min : Bias by omitted scale values (e.g. FM class, domain, purpose) / Respondents are encouraged toprovide open answers to all questions, helping us to check scale completeness. Between 8% and 40% ofthe respondents made use of the text field “Other.” Our systematic map confirms that we have not listedunknown challenges in Q13. We identified three additional challenges via open answers and the litera-ture. We believe to have achieved good criterion validity through questions and scales for distinguishingimportant sub-groups (see Section 4.6) of our population. X

min : Educational background asked indirectly / We approximate what we need to know by using datafrom Q1, Q3, Q4, and Q5. X

6.4.2 Internal Validity

Why would the procedure in Section 4 lead to reasonable and justified results?

min : Incomplete data points / After the 47th response, feedback from colleagues and respondentsresulted in an extension of Q3 with the option “on behalf of FM tool provider” (Figure 4) and of Q6 andQ11 with the option “consistency checking” (Figure 7). The enhancement of 169 complete data points to216 maintained all trends. X

min : Duplicate & invalid answers / To identify intentional misconduct, we checked for timestampanomalies and for duplicate or meaningless phrases in open answers. Voluntarily provided email ad-dresses (90/220) indicate only 4 double participants. We remove these 4 data points from our data set.

Google Forms includes data points only if all mandatory questions are answered and the submit button ispressed. We also performed a consistency check of MC questions and corrected 5 data points where “Ido/have not. . . ” was combined with other checked options. X

min : Inter- vs. intra-UFM inference / Our study design is not suitable for “inter-UFM predictions”, forexample, to predict that (dis)satisfied model checking practitioners have an increased (a decreased) intentto use theorem proving. However, the argumentation in the Sections 4.2 and 6.2 aims at “intra-UFMpredictions”, that is, inferring an increased or decreased intent to use model checking from the quantityand quality of past experience in using model checking. Such predictions may inherit possible limitationsof TAM studies. ◦

Page 27: Context: Objective: Method: Results: arXiv:1812.08815v3 ...

Preprint – FormalMethods Use 27

6.4.3 External Validity

Why would the procedure in Section 4 lead to similar results with more general populations?

maj : Low response rate / We believe our estimates in Section 5.2 to be sensible. We tried to (i) im-prove targeting by repetitively advertising on multiple appropriate channels, (ii) spot unreliable contactinformation, (iii) provide incentive (study results via email), (iv) keep the questionnaire short and com-prehensible, (v) avoid forced answers, and (vi) allow lack of topic knowledge. Some uncertainties remain,for example, lack of sympathy, personal motivation, and interest, or strong loyalty, and high expectationsin the outcome, or intentional bias. However, from an estimated population of around 100K (i.e., therounded sum of 38K and 61K), the minimum sample size for 95% confidence intervals with continuousscale error margins of less than 7% is 196, consistent with the ballpark figure in Gleirscher et al. (2019,p. 117:29). Our sample (N=216) exceeds this number. The 95% confidence intervals for the Likert itemsshow that the margin of error for the median sometimes deviates by one category (e.g. Figure 4).

In this first study, we aim at understanding common perceptions, such as “FMs are not practically useful”or “FMs are difficult to apply”. Because these statements address FMs as a whole, we believe suchlocal errors do not affect our general conclusions. However, the response rate (1 to 2%) and populationcoverage (0.1%, cf. Section 5.1) were too low to avoid such errors and refute specific null-hypotheses,such as “FM m is effective for role r and purpose p in domain d” (by the FM community) or “FM m isdifficult to apply for role r and purpose p in domain d” (by SE practitioners), with satisfactory statisticalpower. X

maj : Bias towards specific groups (Shull et al., 2008, p. 181) / We distributed our questionnaire overgeneral SE channels. We mix opportunity (only 5 to 10% chain referral), volunteer, and cluster-basedsampling. Selection bias, a problem in snowball sampling (Biernacki and Waldorf, 1981), is limited bygood visibility and accessibility of the target population in these channels (Section 5.2) as well as little useand control of referral chains among respondents. Our sample includes 50% practitioners according toSection 4.6, ≈ 21% NPs (incl. laypersons), and ≈ 31% pure academics. A bias towards FM experts (Fig-ure 3) does not harm our PEOU discussion led by practitioners but shapes our PU discussion. Regardingapplication domains, our conclusions cannot be generalised to, e.g. critical IT systems in the finance ande-voting sectors. ◦

min : Non-response / We decided not to enforce responses or provide incentives. Still, our data suggeststhat our advertisement stimulated responses from FM-critical minds. ◦

min : Lack of FM knowledge / 11 to 18% of our respondents did not know specific challenges (Fig-ure 16). For RQ1 (Figures 2 to 16), dnk-data points have no influence because the findings of RQ1directly describe and interpret the status quo of UFMp. For test purposes, we included dnk-data points inthe analyses of RQ2 and RQ3 (Figures 11 to 16), with no relevant influence. X

min : Geographical background missing / Respondents were not required to own a Google account toavoid tracking and to increase anonymity and the response rate. The limited geographical knowledgeabout our sample constrains the generalisability of our conclusions, e.g. to geographies such as China,India, or Brazil. ◦

6.4.4 Reliability

Why would a repetition of the procedure in Section 4 with different samples from the same populationlead to the same results?

maj : Internal consistency / All 7 items for the concept “obstacle to FM effectiveness” (C7) show goodinternal consistency for our sample with a Cronbach α = 0.84, the PEOU-part of C7 consisting of 5 itemsshows an α = 0.79 (Shull et al., 2008). The other concepts are not measured with multiple items. ◦

maj : Change of proportions / The limited sample and the low response rate make it hard to mitigatethis risk. However, we compared the first (til 4.8.2018, N1 = 114) and second (from 5.8.2018, N2 = 102)half of our sample to simulate a repetition of our survey with the same questionnaire. A two-sided Mann-

Page 28: Context: Objective: Method: Results: arXiv:1812.08815v3 ...

Preprint – FormalMethods Use 28

Whitney U test for difference does not show a significant difference between these two groups (e.g. forQ13 and Q4). Only for the Q3 item “On behalf of FM tool provider,” a p = 0.07 indicates a potentialdifference. The addition of that item only after the 47th respondent might explain this difference. ◦

7 Conclusions

We conducted an on-line survey of mission-critical software engineering practitioners and researchers toexamine how formal methods have been used, how these professionals intend to use them, and how theyperceive challenges in using them. This study aims to contribute to the body of knowledge of the softwareengineering and formal methods communities.

Overall Findings. From the evidence we gathered for the use of formal methods, we make the follow-ing observations:

• Intrinsic motivation is stronger than the regulatory one.• Despite the challenges, our respondents show an increased intent to use FMs in industrial con-

texts.• Past experience is correlated with usage intent.• All challenges were rated either moderately or highly difficult, with scalability, skills, and edu-

cation leading. Experienced respondents rate challenges as highly difficult more often than lessexperienced respondents.

• From the literature and the responses, we identified three additional challenges: sufficient re-sources, process compatibility, good practicality/reputation.

• The negative responses to the questions about obstacles to FM effectiveness suggest that the easeof use of FMs is perceived more negative than positive.

• Gaining experience and confidence in the application of a FM seems to play a role in developinga positive perception of usefulness of that FM.

Barroca and McDermid (1992) present evidence to show that FMs can be used in industry effectivelyand more widely. Their observation from 1992 is that FM use had been limited, benefits were clearbut limitations were subtle. In response to Barroca and McDermid’s finding “FMs are both oversold andunder-used”, our insights from the analysis of RQ 2 and 3 lead us to conclude that today FMs are probablymore underused than oversold. However, our data also suggests that these methods still need substantialimprovement and support in several areas in order for their benefits to be better utilised.

General Feedback on the Survey. The questionnaire seems to be well-received by the participants.One of them found it an “interesting set of questions.” This impression is confirmed by another partici-pant:

“Well chosen questions which do not leave me guessing. Relevant to future FM research andpractice.”

Another respondent noted:

“Thank you very much for this survey. It is very constructive and important. It handles most ofthe issues encountered by any practitioner and user of FMs.”

Only one participant found the questionnaire difficult for FM beginners.

Implications Towards a Research Agenda. In the spirit of Jeffery et al. (2015) and complementing thesuggestions from the SWOT analysis in Gleirscher et al. (2019), we want to make another step in settingout an agenda for future FM research.

Page 29: Context: Objective: Method: Results: arXiv:1812.08815v3 ...

Preprint – FormalMethods Use 29

To address scalability, we need more research on how compositional methods (e.g. automated assume-guarantee reasoning, Cofer et al., 2012; automated assertion checking, Leino, 2017) can be better lever-aged in practical settings. To address skills and education, we need an enhanced and up-to-date FM bodyof knowledge (FMBoK; Oliveira et al., 2018). From his survey of “FMs courses in European highereducation”, Oliveira (2004) observes that (i) “model-oriented specification”, “formalising distribution,concurrency and mobility”, and “logical foundations of formal methods” showed to be the topic areasmost frequently taught by FM lecturers, and (ii) Z, B, SML, CSP, and Haskell showed to be the mostpopular formal notations and languages taught in these courses. A comparison of the current state withOliveira’s observations can help to evaluate and revise current FM curricula (e.g. for undergraduate SEas suggested in Davis et al., 2013) and to derive recommendations for improved FM courses fosteringgood modelling, composition, and refinement skills in SE practice. To address controllable abstractions,we need semantics workbenches for underpinning domain-specific languages with formal semantics. Webelieve that further steps in theory integration and unification (Gleirscher et al., 2019) can help establishproof hierarchies and, hence, reusability and proof transfer.

To address process compatibility, we need more research in continuous reasoning (e.g. Chudnov et al.,2018; O’Hearn, 2018), a revival of activities, possibly even regulations, in tool integration and model datainterchange, and guidance on how to update engineering development processes. To address reputation,we need to provide more incentives for practitioners to use FMs and take recent progress in FM researchinto account when changing current software processes, policies, regulations, and standards. This in-cludes convincing practitioners to invest in the support of large-scale studies for monitoring FM use inindustry. Cost-savings analyses of FM applications (e.g. Jeffery et al., 2015) supported by strong em-pirical designs (i.e., controlled field experiments) can help to collect the necessary evidence for decisionmaking, successful knowledge transfer, and for implementing this vision.

This survey underpins and enhances the analysis of strengths and weaknesses of FMs in Gleirscher et al.(2019) and can be a guide (1) for consulting and managing practitioners when considering the introductionof FMs into a engineering organisation, (2) for research managers when shaping a grant programme forFM experimentation and transfer, and (3) for associate editors when organising a journal special sectionon applied FM research.

Future Work. Our survey is another important step in the research of effectively applying FM-basedtechnologies in practice. To put it with the words of one of our participants: “[A] closed questionnaire isjust a start.”

Hence, we aim at a follow-up study (i) to find out which particular FM (and tool) is used in which domainfor which particular purpose and role (e.g. was SMT solving used for model checking in certification orfor task scheduling at run-time?), (ii) to measure where particular techniques work well (e.g. which typesof formal contracts work well in control software requirements management in a DO-178C context?),(iii) to measure key indicators for successful use of FMs, (iv) to identify management techniques neededto accommodate the changes in working practices, and, finally, (v) to provide guidance to future projectswishing to adopt FMs.

In a next survey, we like to ask about typical FM benefits, about suggestions for barrier mitigation (Daviset al., 2013), pose more specific questions on scalability and useful abstraction, the geographical9 andeducational background, and for conceptual alignment. Further analysis of obstacles, benefits, and usageintent could also benefit from a more fine-grained distinction between FMs directly applied to programcode and FMs focusing on more abstract models. We would also like to change from 3-level to 5-levelLikert-type scales to receive fine-granular responses. Our research design accounts for repeatability,hence, allowing us to go for a longitudinal study.

The research design, and even our current data set, allows the derivation of the usage intent (UFMi) foreach FM class, application domain, and obstacle. These UFMi values could be used to analyse whethera particular FM might be (1) underused (i.e., domains with an increased usage intent indicate a potential

9According to https://en.wikipedia.org/wiki/United_Nations_geoscheme.

Page 30: Context: Objective: Method: Results: arXiv:1812.08815v3 ...

Preprint – FormalMethods Use 30

for more applicability) or (2) oversold (i.e., domains with a lower usage intent and were obstacles areperceived as being particularly tough and, hence, FMs as being less effective).

Acknowledgements. It is our pleasure to thank all survey participants for their time spent and their valuableresponses, and all channel moderators for forwarding our postings. We are much obliged to Jim Woodcock, who hasled previous studies in our direction, and supported us to critically reflect our work and relate it to existing evidence.He connected us with John Fitzgerald, who made his paper copy of Austin and Parkin (1993) available to us suchthat we were able to complete our investigation. We are grateful to John Fitzgerald and also to John McDermid forhelpful feedback and for encouraging us to do further research in this direction. We would like to spend sinceregratitude to Krzysztof Brzezinski, Louis Brabant, and Emmanuel Eze for pointing us to several related works.

References

Aichernig, Bernhard K. and Tom Maibaum, eds. (Nov. 18, 2003). Formal Methods at the Crossroads. From Panaceato Foundational Support. Springer Berlin Heidelberg. isbn: 3-540-20527-6.

ATOMICO (2019). The State of European Tech 2019. Section 6.4. url: https://web.archive.org/web/20191220234928/http://2019.stateofeuropeantech.com/chapter/people/article/strong-talent-base/.

Austin, Stephen and Graeme Parkin (Mar. 1993). Formal methods: A survey. Tech. rep. Teddington, Middlesex, UK:National Physical Laboratory.

Barroca, Leonor M. and John A. McDermid (1992). “Formal methods: Use and relevance for the development ofsafety-critical systems”. In: Comp. J. 35.6, pp. 579–99. doi: 10.1093/comjnl/35.6.579.

Basili, Victor R. (July 1, 1985). Quantitative evaluation of software methodology. Tech. rep. TR-1519. Universityof Maryland. url: https://drum.lib.umd.edu/bitstream/handle/1903/7520/Quantitative+Evaluation.pdf?sequence=1 (visited on 05/30/2019).

Bicarregui, J. C. et al. (2009). “Industrial Practice in Formal Methods: A Review”. In: FM 2009: Formal Methods.Ed. by Ana Cavalcanti and Dennis R. Dams. Berlin, Heidelberg: Springer Berlin Heidelberg, pp. 810–813. isbn:978-3-642-05089-3.

Biernacki, Patrick and Dan Waldorf (1981). “Snowball Sampling: Problems and Techniques of Chain Referral Sam-pling”. In: Sociological Methods & Research 10.2, pp. 141–163. doi: 10.1177/004912418101000205.

Bjorner, D. (1987). “On the Use of Formal Methods in Software Development”. In: Proceedings of the 9th Inter-national Conference on Software Engineering. ICSE ’87. Monterey, California, USA: IEEE Computer SocietyPress, pp. 17–29. isbn: 0-89791-216-0. doi: 10.5555/41765.41768. url: http://dl.acm.org/citation.cfm?id=41765.41768.

Bloomfield, RE, PKD Froome, and BQ Monahan (1991). “Formal methods in the production and assessment ofsafety critical software”. In: Reliability Engineering & System Safety 32.1-2, pp. 51–66.

Boulanger, Jean-Louis (July 11, 2012). Industrial Use of Formal Methods: Formal Verification. Wiley-ISTE. 298 pp.isbn: 9781848213630.

Bowen, J. P. and M. G. Hinchey (July 1995a). “Seven more myths of formal methods”. In: IEEE Software 12.4,pp. 34–41. issn: 0740-7459. doi: 10.1109/52.391826.

— (Apr. 1995b). “Ten commandments of formal methods”. In: Computer 28.4, pp. 56–63. issn: 0018-9162. doi:10.1109/2.375178.

Bowen, Jonathan P. and Michael G. Hinchey (2005). “Ten Commandments Revisited: A Ten-year Perspective onthe Industrial Application of Formal Methods”. In: Proceedings of the 10th International Workshop on FormalMethods for Industrial Critical Systems. FMICS ’05. Lisbon, Portugal: ACM, pp. 8–16. isbn: 1-59593-148-1.doi: 10.1145/1081180.1081183.

Campbell, Michael J. and Martin J. Gardner (1988). “Calculating confidence intervals for some non-parametric anal-yses”. In: British Medical Journal 296, p. 1454.

Charette, Robert N. (June 2018). Fiat Chrysler Is Being Sued Over a Software Flaw. IEEE. url: https://web.archive . org / web / 20180629231601 / https : / / spectrum . ieee . org / riskfactor / computing /software/court-allows-lawsuit-to-proceed-against-fiat-chrysler-over-software-flaw.

Chudnov, Andrey et al. (2018). “Continuous Formal Verification of Amazon s2n”. In: Computer Aided Verification.Springer International Publishing, pp. 430–446. doi: 10.1007/978-3-319-96142-2_26.

Cofer, Darren D. et al. (2012). “Compositional Verification of Architectural Models”. In: NASA Formal Methods -4th International Symposium, NFM 2012, Norfolk, VA, USA, April 3-5, 2012. Proceedings, pp. 126–140. doi:10.1007/978-3-642-28891-3\_13.

Page 31: Context: Objective: Method: Results: arXiv:1812.08815v3 ...

Preprint – FormalMethods Use 31

Craigen, D., S. Gerhart, and T. Ralston (Feb. 1995). “Formal methods reality check: industrial usage”. In: IEEETransactions on Software Engineering 21.2, pp. 90–98. issn: 0098-5589. doi: 10.1109/32.345825.

Craigen, Dan (1995). “Formal methods technology transfer: Impediments and innovation (abstract)”. In: CONCUR’95: Concurrency Theory: 6th International Conference Philadelphia, PA, USA, August 21–24, 1995 Proceed-ings. Ed. by Insup Lee and Scott A. Smolka. Berlin, Heidelberg: Springer Berlin Heidelberg, pp. 328–332. isbn:978-3-540-44738-2. doi: 10.1007/3-540-60218-6_24.

Craigen, Dan, Susan Gerhart, and Ted Ralston (1993). “An International Survey of Industrial Applications of FormalMethods”. In: Z User Workshop, London 1992: Proceedings of the Seventh Annual Z User Meeting, London14–15 December 1992. Ed. by J. P. Bowen and J. E. Nicholls. London: Springer London, pp. 1–5. isbn: 978-1-4471-3556-2. doi: 10.1007/978-1-4471-3556-2_1.

Davis, Fred D. (Sept. 1989). “Perceived Usefulness, Perceived Ease of Use, and User Acceptance of InformationTechnology”. In: MIS Quarterly 13.3, pp. 319–40.

Davis, Jennifer A. et al. (2013). “Study on the Barriers to the Industrial Adoption of Formal Methods”. In: FormalMethods for Industrial Critical Systems. Springer Berlin Heidelberg, pp. 63–77. doi: 10.1007/978-3-642-41010-9_5.

Decision Analyst (Aug. 2018). Technology Advisory Board. Decision Analyst, Inc. url: https://web.archive.org/web/20191214142906/https://www.decisionanalyst.com/online/acop/.

Evans Data (2018). Global Developer Population and Demographic Study. Tech. rep. Volume 1. Evans Data Corpo-ration. url: https://web.archive.org/web/20191015060004/https://evansdata.com/reports/viewRelease.php?reportID=9.

Fagan, M. E. (1976). “Design and Code Inspections to reduce Errors in Program Development”. In: IBM SystemsJournal 15.3, pp. 182–211. doi: 10.1147/sj.153.0182.

Ferrari, Alessio et al. (Jan. 2019). Survey on Formal Methods and Tools in Railways Technical Report on the activitiesperformed within ASTRail, Deliverable D4.1. doi: 10.5281/zenodo.2535023.

Fraser, Martin D., Kuldeep Kumar, and Vijay K. Vaishnavi (1994). “Strategies for incorporating formal specificationsin software development”. In: Communications of the ACM 37.10, pp. 74–86. doi: 10.1145/194313.194399.

Galloway, Andy J., Trevor J. Cockram, and John A. McDermid (Sept. 1998). “Experiences with the Applicationof Discrete Formal Methods to the Development of Engine Control Software”. In: IFAC Proceedings Volumes31.32, pp. 49–56. doi: 10.1016/S1474-6670(17)36335-8.

Gerhart, Susan L. and Lawrence Yelowitz (1976). “Observations of Fallibility in Applications of Modern Program-ming Methodologies”. In: IEEE Trans. Software Eng. 2.3, pp. 195–207. doi: 10.1109/TSE.1976.233815.

Glass, Robert L. (Oct. 28, 2002). Facts and Fallacies of Software Engineering. Pearson Education (US). isbn: 978-0321117427.

Gleirscher, Mario and Diego Marmsoler (Nov. 14, 2018). Electronic Supplementary Material for “Formal Methods:Oversold? Underused? A Survey”. Zenodo. doi: 10.5281/zenodo.1487596.

Gleirscher, Mario and Anne Nyokabi (2018). System Safety Practice: An Interrogation of Practitioners about TheirActivities, Challenges, and Views with a Focus on the European Region. Tech. rep. York, UK: Department ofComputer Science, University of York, UK. arXiv: 1812.08452 [cs.SE].

Gleirscher, Mario, Simon Foster, and Jim Woodcock (2019). “New Opportunities for Integrated Formal Methods”. In:ACM Computing Surveys 52 (6), 117:1–117:36. issn: 0360-0300. doi: 10.1145/3357231. arXiv: 1812.10103[cs.SE].

Gnesi, Stefania and Tiziana Margaria (2013). Formal Methods for Industrial Critical Systems: A Survey of Applica-tions. Wiley-IEEE Press. isbn: 9781118459898.

Google (Aug. 2018). Google Forms Service. Google, Inc. url: http://forms.google.com.Graydon, P.J. (June 2015). “Formal Assurance Arguments: A Solution in Search of a Problem?” In: Dependable

Systems and Networks (DSN), 2015 45th Annual IEEE/IFIP International Conference on, pp. 517–528. doi:10.1109/DSN.2015.28.

Hall, Anthony (1990). “Seven Myths of Formal Methods”. In: IEEE Software 7.5, pp. 11–19. doi: 10.1109/52.57887.

Heisel, Maritta (Jan. 1, 1996). “A Pragmatic Approach to Formal Specification”. In: Object-Oriented BehavioralSpecifications. Springer. isbn: 978-0-7923-9778-6. doi: 10.1007/978-0-585-27524-6_4.

Heitmeyer, Constance L. (1998). “On the Need for ’Practical’ Formal Methods”. In: Proceedings of the 5th Inter-national Symposium on Formal Techniques in Real-Time Fault Tolerant Systems (FTRTFT). Vol. LICS 1486.Lyngby, DenmarkLyngby, Denmark, pp. 18–26.

Hinchey, Michael G. and Jonathan P. Bowen (Apr. 1996). “To formalize or not to formalize?” In: IEEE Computer29.4, pp. 18–19.

Holloway, C. M. (Oct. 1997). “Why engineers should consider formal methods”. In: 16th DASC. AIAA/IEEE DigitalAvionics Systems Conference. Reflections to the Future. Proceedings. Vol. 1, pp. 16–22. doi: 10.1109/DASC.1997.635021.

Page 32: Context: Objective: Method: Results: arXiv:1812.08815v3 ...

Preprint – FormalMethods Use 32

Holloway, C. M. and R. W. Butler (1996). “Impediments to industrial use of formal methods”. In: Computer 29.4,pp. 25–26. doi: 10.1109/MC.1996.488298.

Jackson, Michael (1987). “Power and Limitations of Formal Methods for Software Fabrication”. In: Journal of In-formation Technology 2.2, pp. 72–76. doi: 10.1177/026839628700200204.

Jeffery, Ross et al. (Apr. 2015). “An empirical research agenda for understanding formal methods productivity”. In:Information and Software Technology 60, pp. 102–112. doi: 10.1016/j.infsof.2014.11.005.

Kaner, Cem and David Pels (Aug. 1998). Bad Software. Wiley. isbn: 978-0471318262.— (Aug. 2018). Bad Software: Website. url: https://web.archive.org/web/20191210042547/http:

//badsoftware.com/.Kitchenham, B., S. Linkman, and D. Law (1997). “DESMET: a methodology for evaluating software engineering

methods and tools”. In: Computing & Control Engineering Journal 8.3, pp. 120–126. doi: 10.1049/cce:19970304.

Kitchenham, Barbara A. and Shari L. Pfleeger (2008). “Guide to Advanced Empirical Software Engineering”. In:Springer. Chap. Personal Opinion Surveys, pp. 63–92.

Klein, Gerwin et al. (Sept. 2018). “Formally verified software in the real world”. In: Communications of the ACM61.10, pp. 68–77. doi: 10.1145/3230627.

Knight, John C. et al. (1997). “Why Are Formal Methods Not Used More Widely?” In: Fourth NASA Formal MethodsWorkshop, pp. 1–12.

Lai, R. (Jan. 1, 1996). “How could research on testing of communicating systems become more industrially relevant?”In: Springer, pp. 3–13. doi: 10.1007/978-0-387-35062-2_1.

Lai, Richard and Wilfred Leung (1995). “Industrial and Academic Protocol Testing: The Gap and the Means ofConvergence”. In: Computer Networks and ISDN Systems 27.4, pp. 537–547. doi: 10.1016/0169-7552(93)E0110-Z.

Leiner, D. J. (2014). SoSci Survey. Tech. rep. url: https://web.archive.org/web/20191202015133/https://www.soscisurvey.de/.

Leino, K. Rustan M. (2017). “Accessible Software Verification with Dafny”. In: IEEE Software 34.6, pp. 94–97. doi:10.1109/ms.2017.4121212.

Liebel, Grischa et al. (2016). “Model-based engineering in the embedded systems domain: an industrial survey onthe state-of-practice”. In: Software & Systems Modeling 17.1, pp. 91–113. doi: 10.1007/s10270-016-0523-3.

Mathieson, Kieran (1991). “Predicting User Intentions: Comparing the Technology Acceptance Model with the The-ory of Planned Behavior”. In: Information Systems Research 2.3, pp. 173–191. doi: 10.1287/isre.2.3.173.

Miller, Steven P., Michael W. Whalen, and Darren D. Cofer (Feb. 2010). “Software model checking takes off”. In:Communications of the ACM 53.2, pp. 58–64. doi: 10.1145/1646353.1646372.

Miyoshi, T. and M. Azuma (1993). “An empirical study of evaluating software development environment quality”.In: IEEE Transactions on Software Engineering 19.5, pp. 425–435. doi: 10.1109/32.232010.

Mohagheghi, Parastoo et al. (Jan. 2012). “An empirical study of the state of the practice and acceptance of model-driven engineering in four industrial cases”. In: Empirical Software Engineering 18.1, pp. 89–116. doi: 10.1007/s10664-012-9196-x.

Murphy, G. C., R. J. Walker, and E. L. A. Banlassad (1999). “Evaluating emerging software development technolo-gies: lessons learned from assessing aspect-oriented programming”. In: IEEE Transactions on Software Engi-neering 25.4, pp. 438–455. doi: 10.1109/32.799936.

Neuendorf, Kimberly A. (Aug. 2016). The Content Analysis Guidebook. 2nd. Sage. isbn: 9781412979474.Neumann, Peter G. (May 2018). “Risks to the Public”. In: ACM SIGSOFT Software Engineering Notes 43.2, pp. 8–

11. doi: 10.1145/3203094.3203102.O’Hearn, Peter W. (2018). “Continuous Reasoning”. In: Proceedings of the 33rd Annual ACM/IEEE Symposium on

Logic in Computer Science - LICS’18. ACM Press. doi: 10.1145/3209108.3209109.Oliveira, Jose Nuno (2004). “A Survey of Formal Methods Courses in European Higher Education”. In: Teaching

Formal Methods. Springer Berlin Heidelberg, pp. 235–248. doi: 10.1007/978-3-540-30472-2_16.Oliveira, Jose Nuno et al. (Aug. 2018). Formal Methods Body of Knowledge (FMBoK). url: https : / / web .

archive.org/web/20200109111534/https://formalmethods.wikia.org/wiki/FMBoK.Parnas, David Lorge (2010). “Really Rethinking ’Formal Methods’”. In: IEEE Computer 43.1, pp. 28–34. doi: 10.

1109/mc.2010.22.Petersen, Kai et al. (2008). “Systematic Mapping Studies in Software Engineering”. In: 12th International Conference

on Evaluation and Assessment in Software Engineering, EASE 2008, University of Bari, Italy, 26-27 June 2008.doi: 10.14236/ewic/ease2008.8.

Pfleeger, S. L. and L. Hatton (1997). “Investigating the influence of formal methods”. In: Computer 30.2, pp. 33–43.doi: 10.1109/2.566148.

Poston, R. M. and M. P. Sexton (1992). “Evaluating and selecting testing tools”. In: IEEE Software 9.3, pp. 33–42.doi: 10.1109/52.136165.

Page 33: Context: Objective: Method: Results: arXiv:1812.08815v3 ...

Preprint – FormalMethods Use 33

Riemenschneider, C. K., B. C. Hardgrave, and F. D. Davis (2002). “Explaining software developer acceptance ofmethodologies: a comparison of five theoretical models”. In: IEEE Transactions on Software Engineering 28.12,pp. 1135–1145. doi: 10.1109/tse.2002.1158287.

Robbins, Naomi B. and Richard M. Heiberger (2011). “Plotting Likert and Other Rating Scales”. In: Joint StatisticalMeeting, pp. 1058–66.

Rushby, John (1994). “Critical system properties: Survey and taxonomy”. In: Reliability Engineering & System Safety43.2, pp. 189–219. doi: 10.1016/0951-8320(94)90065-5.

SEI (2010). CMMI for Development. Tech. rep. CMU/SEI-2010-TR-033. CMU.Shull, Forrest, Janice Singer, and Dag I. K. Sjøberg, eds. (Oct. 2008). Guide to Advanced Empirical Software Engi-

neering. London: Springer.Snook, C and R Harrison (2001). “Practitioners’ views on the use of formal methods: an industrial survey by struc-

tured interview”. In: Information and Software Technology 43.4, pp. 275 –283. issn: 0950-5849. doi: 10.1016/S0950-5849(00)00166-X.

Sobel, A.E.K. and M.R. Clarkson (Mar. 2002). “Formal methods application: an empirical tale of software develop-ment”. In: IEEE Transactions on Software Engineering 28.3, pp. 308–320. doi: 10.1109/32.991322.

The R Project (Aug. 2018). R. The R Project. url: https://web.archive.org/web/20200109063512/http://www.r-project.org/.

Wikipedia contributors (2018). Software engineering demographics — Wikipedia, The Free Encyclopedia. url:https://en.wikipedia.org/w/index.php?title=Software_engineering_demographics\&oldid=823840899 (visited on 01/09/2020).

Wing, J. M. (1990). “A specifier’s introduction to formal methods”. In: Computer 23.9, pp. 8–22. doi: 10.1109/2.58215.

Wohlin, Claes et al. (June 2012). Experimentation in Software Engineering. Springer. isbn: 9783642290435.Woodcock, Jim et al. (Oct. 2009). “Formal Methods: Practice and Experience”. In: ACM Comput. Surv. 41.4, 19:1–

19:36. issn: 0360-0300. doi: 10.1145/1592434.1592436.

Page 34: Context: Objective: Method: Results: arXiv:1812.08815v3 ...

Preprint – FormalMethods Use 34

A SupplementaryMaterial for “FormalMethods in Dependable SoftwareEngineering: A Survey”

In the following, we provide additional material to the survey, including

1. a more detailed analysis of responses to certain questions (Appendix A.1),2. further visualizations of the collected data (Appendices A.2 to A.6),3. more details on our analysis of related work (Table 9 in Appendix A.7),4. more details on the mapping from studies to challenges (Appendix A.8),5. a comprehensive table of open answers (Appendix A.9),6. a copy of the advertisement flyer (Appendix A.10),7. a screenshot of the Twitter poll (Appendix A.11), and8. a copy of the whole questionnaire (Appendix A.12).

A.1 Data for Analysis of RQ1 and Estimation of External Validity

Based on the responses for question Q1, the Table 8 provides an overview of categories of respondentsreferred to in our analysis (particularly, in Section 5.2 and Figure 10) along with the corresponding countsbased on the sample from 31.3.2019 with N = 220.

For question Q1, by practitioner, we mean “practitioner in dependable or mission-critical software engi-neering.” To include respondents from all areas of the population or at any study stage, we generalize“practitioner” by the term “user”. Below, for the questions Q5 and Q6, we then refer to “formal methoduser” and “FM-non-user”.

Page 35: Context: Objective: Method: Results: arXiv:1812.08815v3 ...

Preprint – FormalMethods Use 35

Table 8: Overview of categories of respondents for Q1, Q3, Q4, Q5, and Q6

Category of respondents to . . . Description Count Fraction

. . . Question Q4:Respondents with academiceducational background (AEB)

Researchers in academia; bachelor, master, orPhD students; lecturers, teachers, trainers,coaches

156 72%

Academics with pure transferexperience

AEB cut with researchers in industry,consulting and managing practitioners, andexternal consultants; without engineeringpractitioners in industry and without toolprovider stakeholders

35 16%

Academics with practical experience AEB cut with engineering practitioners inindustry

41 21%

Academics with experience in transferand practice

AEB cut with researchers in industry,consulting and managing practitioners, andexternal consultants; cut with engineeringpractitioners in industry and without toolprovider stakeholders

31 14%

Practitioners incl. transfer practitionersand industrial consultants, all withacademic background

AEB cut with researchers in industry,consulting and managing practitioners, externalconsultants, and engineering practitioners inindustry

86 40%

Pure academics AEB without respondents specifying additionalroles

66 31%

Respondents not specifying aneducational background (NEB)

The complement of AEB 60 27%

Respondents not specifying aneducational background and beingresearchers in industry

NEB intersected with researchers in industry 13 6%

Consultants NEB intersected with consulting or managingpractitioners and external consultants

23 11%

Pure practitioners NEB cut with engineering practitioners inindustry

23 11%

Tool provider stakeholders notspecifying an educational background

NEB cut with stakeholder of an FM tool orservice provider

5 2.3%

Non-academic FM non-users NEB cut with “I have not used FMs in anyspecific role.”

19 9%

Practitioners incl. industrialconsultants

Consulting and managing practitioners,external consultants, and engineeringpractitioners in industry

108 50%

FM users (all) Respondents having used FMs in one oranother way and context in the past

212 98%

FM users (beyond students) Excl. “only-in-course” respondents 202 93.5%

. . . Question Q1:FM non-users Respondents who chose “I have not used FMs

in any academic or industrial domain.”36 17%

. . . Question Q3:Respondents with no motivation Respondents who selected “no” for all

motivating factors9 4%

. . . Questions Q5 and Q6:Non-practitioners including FMnon-users

Respondents who chose “no experience or noknowledge”, “studied in (university) course” or“applied in lab, experiments, case studies” forall FM techniques. This group includeslaypersons.

46 21%

Sample (N) All valid responses 216 100%

Page 36: Context: Objective: Method: Results: arXiv:1812.08815v3 ...

Preprint – FormalMethods Use 36

DE DE/IT DE/US EU EU/US EU/US/AUS FR UK

Estimated geographical reachability of population via survey channels0

24

68

10

12

AR AT BE DE DK ES FI FR IE IT JP NG NL SE UK US ZA

Geographical distribution of respondents by email address (TLD, company HQ, if provided)

05

10

15

20

Figure 19: Geographical analysis of the sample. Legend: top-level domain (TLD), head quarter (HQ)

A.2 Geographical Analysis of the Sample

Figure 19 shows geographical aspects of the sample for this study.

A.3 Usage Intent (UFMi) by Purpose (for Analysis of RQ2)

The comparison in Figure 20 (and in the figures of the Appendices A.4 and A.5) contains two columns.

The left column describes for each purpose (e.g. specification) how often (e.g. in 2 to 5 separate tasks)respondents have used FMs in the past (UFMp).

The right column describes for each purpose the usage intent (UFMi) depending on how often respondentshave used FMs in the past (UFMp). The horizontal bars representing the UFMp frequency categories arelisted in descending order by the overall size of both UFMi groups “more often” and “dnk”. We choseto keep dnk-answers visible despite the readability inconvenience caused by the dnks influencing theordering. However, in the majority of cases the largest group of respondents intending to increase FMuse in the future is visible first or near the top.

A.4 Code-based vs. Model-based FMs for Assurance vs. Inspection

The data for this comparison is summarised in Figure 21.

Page 37: Context: Objective: Method: Results: arXiv:1812.08815v3 ...

Preprint – FormalMethods Use 37

0

20

40

60Clarification (N=216)

0

20

40

60Specification (N=216)

0

20

40

60Inspection (N=216)

0

30

60

90

120Synthesis (N=216)

0

20

40

60

0 1 2 − 5> 5 separate tasksrq2_comparison_P4_F5_purpose (past)

Assurance (N=216)

7%

11%

6%

45%

59%

54%

47%

42%

34%

35%

47%

13%0

1

2 − 5

> 5 separate tasks

13%

12%

4%

46%

58%

56%

43%

40%

29%

31%

53%

15%0

1

2 − 5

> 5 separate tasks

11%

5%

34%

16%

52%

48%

48%

44%

37%

47%

18%

41%

0

1

2 − 5

> 5 separate tasks

3%

16%

6%

39%

55%

53%

44%

41%

41%

32%

50%

20%0

1

2 − 5

> 5 separate tasks

6%

40%

6%

36%

66%

49%

44%

40%

28%

11%

50%

24%

0

1

2 − 5

> 5 separate tasks

rq2_comparison_P4_F5_purpose (future)

no more less equally more often dnk

Figure 20: Comparison of past and future usage intent by purpose

Page 38: Context: Objective: Method: Results: arXiv:1812.08815v3 ...

Preprint – FormalMethods Use 38

0

10

20

30

40Clarification (N=128)

0

20

40

Specification (N=128)

01020304050Inspection (N=128)

0

20

40

60Synthesis (N=128)

0

20

40

0 1 2 − 5> 5 separate tasksrq2_comparison_P4_F5_purpose_cbfm (past)

Assurance (N=128)

6%

11%

7%

38%

59%

50%

48%

45%

35%

39%

46%

17%0

1

2 − 5

> 5 separate tasks

7%

7%

47%

6%

60%

54%

42%

40%

33%

39%

11%

55%

0

1

2 − 5

> 5 separate tasks

5%

29%

6%

7%

59%

50%

48%

40%

36%

21%

46%

53%

0

1

2 − 5

> 5 separate tasks

0%

0%

19%

36%

57%

57%

44%

41%

43%

43%

37%

24%0

1

2 − 5

> 5 separate tasks

5%

32%

22%

7%

72%

56%

56%

38%

23%

12%

22%

55%

0

1

2 − 5

> 5 separate tasks

rq2_comparison_P4_F5_purpose_cbfm (future)

no more less equally more often dnk

0

10

20

30

40Clarification (N=114)

0

10

20

30

40

50Specification (N=114)

0

10

20

30

40Inspection (N=114)

0

10

20

30

40Synthesis (N=114)

0

20

40

0 1 2 − 5> 5 separate tasksrq2_comparison_P4_F5_purpose_mbfm (past)

Assurance (N=114)

0%

5%

9%

35%

67%

55%

45%

45%

33%

39%

45%

20%0

1

2 − 5

> 5 separate tasks

0%

10%

38%

4%

60%

56%

44%

43%

40%

33%

19%

53%

0

1

2 − 5

> 5 separate tasks

11%

6%

0%

4%

57%

50%

50%

43%

32%

44%

50%

53%

0

1

2 − 5

> 5 separate tasks

14%

0%

5%

31%

54%

50%

48%

44%

32%

50%

48%

24%0

1

2 − 5

> 5 separate tasks

5%

4%

50%

38%

65%

43%

40%

38%

30%

53%

10%

25%

0

1

2 − 5

> 5 separate tasks

rq2_comparison_P4_F5_purpose_mbfm (future)

no more less equally more often dnk

Figure 21: Comparison of past and future usage for code-based (top half) and model-based FMs (bottomhalf) by purpose

Page 39: Context: Objective: Method: Results: arXiv:1812.08815v3 ...

Preprint – FormalMethods Use 39

A.5 Usage Intent (UFMi) by FM Class (for Analysis of RQ2)

0

20

40Predicative, relational,

or algebraicspecification (N=216)

0

20

40

60Modal and temporal logicspecification (N=216)

0

20

40

60Process models (N=216)

0

20

40

60

80Dynamical systems(N=216)

0

20

40

60Abstract interpretation

(N=216)

0

20

40

60Assertion checking

(N=216)

0

20

40

60Process calculi (N=216)

0

20

40Model checking, SMV

(N=216)

0

20

40

Constraint solving(N=216)

0

20

40

60Generic theorem proving(N=216)

0

25

50

75Computational

engineering, simulation(N=216)

0

20

40

60Symbolic execution(N=216)

0

10

20

30

40

50

none course lab practicedpracticed > 1rq2_comparison_P2_P3_F3_F4_usage (past)

Consistency checking(N=169)

15%20%6%

40%63%

52%35%34%34%32%

33%45%60%26%5%none

course

labpracticed

practiced > 1

25%23%46%14%9%

48%38%37%36%35%

27%38%17%50%56%

none

courselab

practicedpracticed > 1

34%26%

63%19%

12%

47%36%

33%33%

29%

19%38%

4%47%

59%none

courselab

practiced

practiced > 1

0%29%10%52%35%

58%57%50%35%34%

42%14%40%13%31%

nonecourse

labpracticed

practiced > 1

45%5%

28%16%16%

46%46%40%39%37%

9%49%32%45%47%

none

courselab

practiced

practiced > 1

11%

24%23%

45%4%

51%

50%50%

48%39%

38%

26%27%

6%57%

nonecourse

labpracticed

practiced > 1

47%10%29%31%36%

40%39%38%35%21%

13%52%33%33%43%

none

courselab

practiced

practiced > 1

4%26%23%15%48%

57%53%48%45%33%

39%21%29%40%20%none

courselab

practicedpracticed > 1

12%11%9%

26%53%

65%62%60%52%35%

24%26%31%22%12%none

course

labpracticed

practiced > 1

0%3%

35%25%54%

61%57%52%45%33%

39%40%13%29%13%none

courselab

practicedpracticed > 1

17%10%26%39%7%

55%45%41%37%33%

28%45%33%24%60%

nonecourse

lab

practiced

practiced > 1

10%14%16%14%46%

66%64%48%45%38%

24%23%36%41%15%none

courselab

practicedpracticed > 1

17%

23%

4%

4%

57%

63%

62%

57%

48%

35%

20%

15%

38%

48%

8%none

course

lab

practiced

practiced > 1

rq2_comparison_P2_P3_F3_F4_usage (future)

no more less equally more often dnk

Page 40: Context: Objective: Method: Results: arXiv:1812.08815v3 ...

Preprint – FormalMethods Use 40

A.6 Data for the Analysis of RQ3

Figure 22 and the following figures in this section show pairs of matrices, so-called “heatmaps”, usefulfor association analysis between categorical and ordinal variables. The cells in the matrices representcombinations of the scales, each cell containing data about the mode and median of “degree of difficulty”ratings, their proportion of tough ratings, and the actual numbers of data points. Both the colour gradi-ent (red to white) and the solid vertical lines in the cells represent the tough proportions (left = 0 to right= 100%), with the dotted vertical line signifying the 50% margin.

Page 41: Context: Objective: Method: Results: arXiv:1812.08815v3 ...

Preprint – FormalMethods Use 41

Sca

labi

lity

Pro

per

abst

ract

ions

from

irre

leva

nt

Mai

ntai

nabi

lity

ofve

rific

atio

n re

sults

Reu

sabi

lity

ofve

rific

atio

n re

sults

Tran

sfer

of

verif

icat

ion

resu

lts

Aut

omat

ion

or to

olsu

ppor

t

Ski

lls a

nd e

duca

tion

Assurance

Synthesis

Inspection

Specification

Clarification

0.38, 21/55,mod:tough,med:tough

0.25, 14/55,mod:dnk,

med:tough

0.22, 12/55,mod:dnk,

med:tough

0.22, 12/55,mod:dnk,

med:tough

0.35, 19/55,mod:tough,med:tough

0.29, 16/55,mod:moderate,

med:tough

0.4, 22/55,mod:tough,med:tough

0.55, 63/115,mod:tough,med:tough

0.35, 40/115,mod:tough,med:tough

0.26, 30/115,mod:moderate,

med:tough

0.29, 33/115,mod:moderate,

med:tough

0.37, 43/115,mod:tough,med:tough

0.3, 34/115,mod:moderate,

med:tough

0.46, 53/115,mod:tough,med:tough

0.46, 28/61,mod:tough,med:tough

0.3, 18/61,mod:tough,med:tough

0.31, 19/61,mod:dnk,

med:tough

0.28, 17/61,mod:dnk,

med:tough

0.31, 19/61,mod:tough,med:tough

0.28, 17/61,mod:moderate,

med:tough

0.44, 27/61,mod:tough,med:tough

0.44, 21/48,mod:tough,med:tough

0.19, 9/48,mod:dnk,

med:tough

0.31, 15/48,mod:dnk,

med:tough

0.23, 11/48,mod:dnk,

med:tough

0.35, 17/48,mod:tough,med:tough

0.27, 13/48,mod:dnk,

med:tough

0.38, 18/48,mod:tough,med:tough

0.42, 29/69,mod:tough,med:tough

0.23, 16/69,mod:dnk,

med:tough

0.3, 21/69,mod:dnk,

med:tough

0.25, 17/69,mod:dnk,

med:tough

0.36, 25/69,mod:tough,med:tough

0.35, 24/69,mod:tough,med:tough

0.45, 31/69,mod:tough,med:tough

0 0.2 0.4 0.6 0.8 1

Tough−proportion of all ratings per cell

06

Key/Histogram of Toughs#

Cel

ls

Comparison of Challenge Difficulty across Purposes(users not practicing FMs, past)

Sca

labi

lity

Pro

per

abst

ract

ions

from

irre

leva

nt

Mai

ntai

nabi

lity

ofve

rific

atio

n re

sults

Reu

sabi

lity

ofve

rific

atio

n re

sults

Tran

sfer

of

verif

icat

ion

resu

lts

Aut

omat

ion

or to

olsu

ppor

t

Ski

lls a

nd e

duca

tion

Assurance

Synthesis

Inspection

Specification

Clarification

0.66, 107/161,mod:tough,med:tough

0.41, 66/161,mod:tough,med:tough

0.34, 55/161,mod:moderate,med:moderate

0.4, 64/161,mod:tough,med:tough

0.44, 71/161,mod:tough,med:tough

0.35, 56/161,mod:moderate,med:moderate

0.49, 79/161,mod:tough,med:tough

0.64, 65/101,mod:tough,med:tough

0.4, 40/101,mod:moderate,med:moderate

0.37, 37/101,mod:moderate,med:moderate

0.43, 43/101,mod:tough,

med:moderate

0.47, 47/101,mod:tough,med:tough

0.38, 38/101,mod:moderate,med:moderate

0.48, 48/101,mod:tough,

med:moderate

0.65, 100/155,mod:tough,med:tough

0.4, 62/155,mod:tough,med:tough

0.31, 48/155,mod:moderate,med:moderate

0.38, 59/155,mod:tough,

med:moderate

0.46, 71/155,mod:tough,med:tough

0.35, 55/155,mod:moderate,med:moderate

0.48, 74/155,mod:tough,med:tough

0.64, 107/168,mod:tough,med:tough

0.42, 71/168,mod:tough,med:tough

0.31, 52/168,mod:moderate,med:moderate

0.39, 65/168,mod:tough,med:tough

0.43, 73/168,mod:tough,med:tough

0.35, 59/168,mod:moderate,med:moderate

0.49, 83/168,mod:tough,med:tough

0.67, 99/147,mod:tough,med:tough

0.44, 64/147,mod:tough,med:tough

0.31, 46/147,mod:moderate,med:moderate

0.4, 59/147,mod:tough,

med:moderate

0.44, 65/147,mod:tough,med:tough

0.33, 48/147,mod:moderate,med:moderate

0.48, 70/147,mod:tough,med:tough

0 0.2 0.4 0.6 0.8 1

Tough−proportion of all ratings per cell

04

8

Key/Histogram of Toughs

# C

ells

Comparison of Challenge Difficulty across Purposes(users practicing FMs, past)

Figure 22: Comparison of challenge difficulty across purposes (UFMp)

Page 42: Context: Objective: Method: Results: arXiv:1812.08815v3 ...

Preprint – FormalMethods Use 42

Sca

labi

lity

Pro

per

abst

ract

ions

from

irre

leva

nt

Mai

ntai

nabi

lity

ofve

rific

atio

n re

sults

Reu

sabi

lity

ofve

rific

atio

n re

sults

Tran

sfer

of

verif

icat

ion

resu

lts

Aut

omat

ion

or to

olsu

ppor

t

Ski

lls a

nd e

duca

tion

Assurance

Synthesis

Inspection

Specification

Clarification

0.41, 16/39,mod:tough,med:tough

0.28, 11/39,mod:dnk,

med:tough

0.18, 7/39,mod:dnk,

med:tough

0.15, 6/39,mod:dnk,

med:tough

0.31, 12/39,mod:dnk,

med:tough

0.28, 11/39,mod:dnk,

med:tough

0.26, 10/39,mod:not an issue,

med:tough

0.52, 28/54,mod:tough,med:tough

0.2, 11/54,mod:moderate,med:moderate

0.2, 11/54,mod:not an issue,

med:moderate

0.24, 13/54,mod:dnk,

med:tough

0.33, 18/54,mod:tough,med:tough

0.3, 16/54,mod:moderate,med:moderate

0.44, 24/54,mod:tough,med:tough

0.42, 15/36,mod:tough,med:tough

0.22, 8/36,mod:not an issue,

med:moderate

0.14, 5/36,mod:not an issue,

med:moderate

0.17, 6/36,mod:dnk,

med:tough

0.31, 11/36,mod:tough,med:tough

0.22, 8/36,mod:dnk,

med:tough

0.33, 12/36,mod:tough,med:tough

0.46, 17/37,mod:tough,med:tough

0.19, 7/37,mod:moderate,med:moderate

0.19, 7/37,mod:dnk,

med:tough

0.19, 7/37,mod:dnk,

med:tough

0.41, 15/37,mod:tough,med:tough

0.35, 13/37,mod:tough,med:tough

0.32, 12/37,mod:tough,med:tough

0.47, 20/43,mod:tough,med:tough

0.21, 9/43,mod:dnk,

med:tough

0.19, 8/43,mod:dnk,

med:tough

0.12, 5/43,mod:dnk,

med:tough

0.37, 16/43,mod:tough,med:tough

0.3, 13/43,mod:tough,med:tough

0.37, 16/43,mod:tough,med:tough

0 0.2 0.4 0.6 0.8 1

Tough−proportion of all ratings per cell0

48

Key/Histogram of Toughs

# C

ells

Comparison of Challenge Difficulty across Purposes(respondents with no or decreased intent, future)

Sca

labi

lity

Pro

per

abst

ract

ions

from

irre

leva

nt

Mai

ntai

nabi

lity

ofve

rific

atio

n re

sults

Reu

sabi

lity

ofve

rific

atio

n re

sults

Tran

sfer

of

verif

icat

ion

resu

lts

Aut

omat

ion

or to

olsu

ppor

t

Ski

lls a

nd e

duca

tion

Assurance

Synthesis

Inspection

Specification

Clarification

0.67, 107/159,mod:tough,med:tough

0.4, 64/159,mod:moderate,med:moderate

0.33, 53/159,mod:moderate,med:moderate

0.4, 63/159,mod:tough,

med:moderate

0.45, 72/159,mod:tough,med:tough

0.34, 54/159,mod:moderate,med:moderate

0.51, 81/159,mod:tough,med:tough

0.63, 81/128,mod:tough,med:tough

0.45, 58/128,mod:tough,med:tough

0.35, 45/128,mod:moderate,med:moderate

0.4, 51/128,mod:tough,med:tough

0.45, 58/128,mod:tough,med:tough

0.37, 47/128,mod:moderate,med:moderate

0.46, 59/128,mod:tough,med:tough

0.66, 105/160,mod:tough,med:tough

0.42, 67/160,mod:tough,med:tough

0.35, 56/160,mod:moderate,med:moderate

0.4, 64/160,mod:tough,

med:moderate

0.44, 71/160,mod:tough,med:tough

0.37, 59/160,mod:moderate,med:moderate

0.51, 81/160,mod:tough,med:tough

0.63, 106/167,mod:tough,med:tough

0.42, 70/167,mod:tough,med:tough

0.34, 57/167,mod:moderate,med:moderate

0.39, 65/167,mod:tough,

med:moderate

0.43, 72/167,mod:tough,med:tough

0.33, 55/167,mod:moderate,med:moderate

0.5, 84/167,mod:tough,med:tough

0.64, 97/151,mod:tough,med:tough

0.43, 65/151,mod:tough,med:tough

0.35, 53/151,mod:moderate,med:moderate

0.42, 63/151,mod:tough,

med:moderate

0.45, 68/151,mod:tough,med:tough

0.34, 52/151,mod:moderate,med:moderate

0.51, 77/151,mod:tough,med:tough

0 0.2 0.4 0.6 0.8 1

Tough−proportion of all ratings per cell

04

8

Key/Histogram of Toughs

# C

ells

Comparison of Challenge Difficulty across Purposes(respondents with same or increased intent, future)

Figure 23: Comparison of challenge difficulty across purposes (UFMi)

Page 43: Context: Objective: Method: Results: arXiv:1812.08815v3 ...

Preprint – FormalMethods Use 43

Sca

labi

lity

Pro

per

abst

ract

ions

from

irre

leva

nt

Mai

ntai

nabi

lity

ofve

rific

atio

n re

sults

Reu

sabi

lity

ofve

rific

atio

n re

sults

Tran

sfer

of

verif

icat

ion

resu

lts

Aut

omat

ion

or to

olsu

ppor

t

Ski

lls a

nd e

duca

tion

On behalf of an FM tool

Own interest

Study or research program

Superior / principal investigator

Employer / research collaborators

Customers / scientific community

Regulatory authorities

0.56, 63/112,mod:tough,med:tough

0.37, 41/112,mod:tough,med:tough

0.29, 33/112,mod:moderate,

med:tough

0.35, 39/112,mod:tough,med:tough

0.43, 48/112,mod:tough,med:tough

0.32, 36/112,mod:moderate,med:moderate

0.54, 60/112,mod:tough,med:tough

0.42, 21/50,mod:tough,med:tough

0.34, 17/50,mod:tough,med:tough

0.32, 16/50,mod:tough,med:tough

0.18, 9/50,mod:moderate,med:moderate

0.3, 15/50,mod:tough,med:tough

0.32, 16/50,mod:tough,med:tough

0.44, 22/50,mod:tough,med:tough

0.42, 28/66,mod:tough,med:tough

0.27, 18/66,mod:dnk,

med:tough

0.27, 18/66,mod:moderate,

med:tough

0.23, 15/66,mod:moderate,

med:tough

0.29, 19/66,mod:tough,med:tough

0.27, 18/66,mod:moderate,med:moderate

0.45, 30/66,mod:tough,med:tough

0.56, 66/117,mod:tough,med:tough

0.32, 37/117,mod:tough,med:tough

0.26, 30/117,mod:moderate,med:moderate

0.29, 34/117,mod:moderate,

med:tough

0.4, 47/117,mod:tough,med:tough

0.34, 40/117,mod:moderate,

med:tough

0.47, 55/117,mod:tough,med:tough

0.45, 33/73,mod:tough,med:tough

0.26, 19/73,mod:dnk,

med:tough

0.27, 20/73,mod:dnk,

med:tough

0.29, 21/73,mod:dnk,

med:tough

0.3, 22/73,mod:tough,med:tough

0.3, 22/73,mod:dnk,

med:tough

0.44, 32/73,mod:tough,med:tough

0.36, 21/59,mod:tough,med:tough

0.24, 14/59,mod:dnk,

med:tough

0.24, 14/59,mod:dnk,

med:tough

0.2, 12/59,mod:dnk,

med:tough

0.25, 15/59,mod:dnk,

med:tough

0.22, 13/59,mod:moderate,

med:tough

0.41, 24/59,mod:tough,med:tough

0.5, 56/112,mod:tough,med:tough

0.33, 37/112,mod:tough,med:tough

0.23, 26/112,mod:moderate,med:moderate

0.29, 32/112,mod:moderate,

med:tough

0.41, 46/112,mod:tough,med:tough

0.27, 30/112,mod:moderate,med:moderate

0.43, 48/112,mod:tough,med:tough

0 0.2 0.4 0.6 0.8 1

Tough−proportion of all ratings per cell0

6

Key/Histogram of Toughs

# C

ells

Comparison of Challenge Difficulty across Motivations(users without motivation)

Sca

labi

lity

Pro

per

abst

ract

ions

from

irre

leva

nt

Mai

ntai

nabi

lity

ofve

rific

atio

n re

sults

Reu

sabi

lity

ofve

rific

atio

n re

sults

Tran

sfer

of

verif

icat

ion

resu

lts

Aut

omat

ion

or to

olsu

ppor

t

Ski

lls a

nd e

duca

tion

On behalf of an FM tool

Own interest

Study or research program

Superior / principal investigator

Employer / research collaborators

Customers / scientific community

Regulatory authorities

0.62, 65/104,mod:tough,med:tough

0.38, 39/104,mod:tough,med:tough

0.33, 34/104,mod:moderate,med:moderate

0.36, 37/104,mod:moderate,med:moderate

0.4, 42/104,mod:tough,med:tough

0.35, 36/104,mod:moderate,med:moderate

0.39, 41/104,mod:moderate,med:moderate

0.64, 107/166,mod:tough,med:tough

0.38, 63/166,mod:moderate,

med:tough

0.31, 51/166,mod:moderate,med:moderate

0.4, 67/166,mod:tough,med:tough

0.45, 75/166,mod:tough,med:tough

0.34, 56/166,mod:moderate,med:moderate

0.48, 79/166,mod:tough,med:tough

0.67, 100/150,mod:tough,med:tough

0.41, 62/150,mod:tough,med:tough

0.33, 49/150,mod:moderate,med:moderate

0.41, 61/150,mod:tough,med:tough

0.47, 71/150,mod:tough,med:tough

0.36, 54/150,mod:moderate,med:moderate

0.47, 71/150,mod:tough,med:tough

0.63, 62/99,mod:tough,med:tough

0.43, 43/99,mod:tough,med:tough

0.37, 37/99,mod:moderate,med:moderate

0.42, 42/99,mod:tough,med:tough

0.43, 43/99,mod:tough,

med:moderate

0.32, 32/99,mod:moderate,med:moderate

0.46, 46/99,mod:tough,

med:moderate

0.66, 95/143,mod:tough,med:tough

0.43, 61/143,mod:tough,med:tough

0.33, 47/143,mod:moderate,med:moderate

0.38, 55/143,mod:tough,

med:moderate

0.48, 68/143,mod:tough,med:tough

0.35, 50/143,mod:moderate,med:moderate

0.48, 69/143,mod:tough,med:tough

0.68, 107/157,mod:tough,med:tough

0.42, 66/157,mod:tough,med:tough

0.34, 53/157,mod:moderate,med:moderate

0.41, 64/157,mod:tough,

med:moderate

0.48, 75/157,mod:tough,med:tough

0.38, 59/157,mod:moderate,med:moderate

0.49, 77/157,mod:tough,med:tough

0.69, 72/104,mod:tough,med:tough

0.41, 43/104,mod:tough,med:tough

0.39, 41/104,mod:moderate,med:moderate

0.42, 44/104,mod:tough,med:tough

0.42, 44/104,mod:tough,

med:moderate

0.4, 42/104,mod:moderate,med:moderate

0.51, 53/104,mod:tough,med:tough

0 0.2 0.4 0.6 0.8 1

Tough−proportion of all ratings per cell

010

Key/Histogram of Toughs

# C

ells

Comparison of Challenge Difficulty across Motivations(users with at least moderate motivation)

Figure 24: Comparison of challenge difficulty across motivations

Page 44: Context: Objective: Method: Results: arXiv:1812.08815v3 ...

Preprint – FormalMethods Use 44

Sca

labi

lity

Pro

per

abst

ract

ions

from

irre

leva

nt

Mai

ntai

nabi

lity

ofve

rific

atio

n re

sults

Reu

sabi

lity

ofve

rific

atio

n re

sults

Tran

sfer

of

verif

icat

ion

resu

lts

Aut

omat

ion

or to

olsu

ppor

t

Ski

lls a

nd e

duca

tion

Consistency checking

Symbolic execution

Computational engineering, simulation

Generic theorem proving

Constraint solving

Model checking, SMV

Process calculi

Assertion checking

Abstract interpretation

Dynamical systems

Process models

Modal and temporal logic specification

Predicative, relational, or algebraicspecification

0.61, 67/109,mod:tough,med:tough

0.39, 42/109,mod:tough,med:tough

0.29, 32/109,mod:moderate,

med:tough

0.31, 34/109,mod:moderate,

med:tough

0.39, 43/109,mod:tough,med:tough

0.31, 34/109,mod:moderate,med:moderate

0.49, 53/109,mod:tough,med:tough

0.58, 88/153,mod:tough,med:tough

0.32, 49/153,mod:moderate,

med:tough

0.28, 43/153,mod:moderate,med:moderate

0.31, 48/153,mod:moderate,

med:tough

0.39, 59/153,mod:tough,med:tough

0.32, 49/153,mod:moderate,med:moderate

0.42, 64/153,mod:tough,med:tough

0.58, 100/172,mod:tough,med:tough

0.38, 65/172,mod:tough,med:tough

0.29, 50/172,mod:moderate,med:moderate

0.33, 56/172,mod:moderate,

med:tough

0.38, 66/172,mod:tough,med:tough

0.32, 55/172,mod:moderate,med:moderate

0.45, 77/172,mod:tough,med:tough

0.55, 93/168,mod:tough,med:tough

0.35, 58/168,mod:tough,med:tough

0.28, 47/168,mod:moderate,med:moderate

0.3, 51/168,mod:moderate,

med:tough

0.41, 69/168,mod:tough,med:tough

0.33, 55/168,mod:moderate,med:moderate

0.45, 75/168,mod:tough,med:tough

0.57, 94/164,mod:tough,med:tough

0.34, 55/164,mod:moderate,

med:tough

0.27, 45/164,mod:moderate,med:moderate

0.32, 52/164,mod:moderate,

med:tough

0.42, 69/164,mod:tough,med:tough

0.35, 57/164,mod:moderate,med:moderate

0.48, 79/164,mod:tough,med:tough

0.53, 80/151,mod:tough,med:tough

0.32, 48/151,mod:moderate,

med:tough

0.29, 44/151,mod:moderate,

med:tough

0.3, 46/151,mod:moderate,

med:tough

0.4, 60/151,mod:tough,med:tough

0.33, 50/151,mod:moderate,med:moderate

0.46, 70/151,mod:tough,med:tough

0.56, 96/171,mod:tough,med:tough

0.36, 61/171,mod:tough,med:tough

0.3, 52/171,mod:moderate,

med:tough

0.35, 59/171,mod:tough,med:tough

0.4, 68/171,mod:tough,med:tough

0.33, 56/171,mod:moderate,med:moderate

0.5, 85/171,mod:tough,med:tough

0.59, 73/124,mod:tough,med:tough

0.34, 42/124,mod:tough,med:tough

0.31, 38/124,mod:moderate,med:moderate

0.34, 42/124,mod:tough,med:tough

0.42, 52/124,mod:tough,med:tough

0.34, 42/124,mod:moderate,med:moderate

0.44, 54/124,mod:tough,med:tough

0.57, 91/160,mod:tough,med:tough

0.34, 54/160,mod:tough,med:tough

0.31, 49/160,mod:moderate,

med:tough

0.34, 55/160,mod:tough,med:tough

0.44, 71/160,mod:tough,med:tough

0.33, 53/160,mod:moderate,med:moderate

0.43, 69/160,mod:tough,med:tough

0.57, 104/184,mod:tough,med:tough

0.37, 68/184,mod:tough,med:tough

0.31, 57/184,mod:moderate,med:moderate

0.34, 62/184,mod:tough,med:tough

0.4, 74/184,mod:tough,med:tough

0.33, 61/184,mod:moderate,med:moderate

0.48, 88/184,mod:tough,med:tough

0.53, 75/142,mod:tough,med:tough

0.38, 54/142,mod:tough,med:tough

0.34, 48/142,mod:tough,med:tough

0.35, 49/142,mod:tough,med:tough

0.42, 59/142,mod:tough,med:tough

0.33, 47/142,mod:moderate,med:moderate

0.42, 59/142,mod:tough,med:tough

0.55, 87/159,mod:tough,med:tough

0.39, 62/159,mod:tough,med:tough

0.33, 52/159,mod:tough,med:tough

0.31, 49/159,mod:moderate,

med:tough

0.4, 63/159,mod:tough,med:tough

0.37, 59/159,mod:moderate,

med:tough

0.49, 78/159,mod:tough,med:tough

0.58, 82/142,mod:tough,med:tough

0.32, 45/142,mod:moderate,

med:tough

0.28, 40/142,mod:moderate,med:moderate

0.3, 43/142,mod:moderate,

med:tough

0.39, 55/142,mod:tough,med:tough

0.34, 48/142,mod:moderate,

med:tough

0.46, 65/142,mod:tough,med:tough

0 0.2 0.4 0.6 0.8 1

Tough−proportion of all ratings per cell0

15

Key/Histogram of Toughs

# C

ells

Comparison of Challenge Difficulty across FMs (users not practicing FMs, past)

Sca

labi

lity

Pro

per

abst

ract

ions

from

irre

leva

nt

Mai

ntai

nabi

lity

ofve

rific

atio

n re

sults

Reu

sabi

lity

ofve

rific

atio

n re

sults

Tran

sfer

of

verif

icat

ion

resu

lts

Aut

omat

ion

or to

olsu

ppor

t

Ski

lls a

nd e

duca

tion

Consistency checking

Symbolic execution

Computational engineering, simulation

Generic theorem proving

Constraint solving

Model checking, SMV

Process calculi

Assertion checking

Abstract interpretation

Dynamical systems

Process models

Modal and temporal logic specification

Predicative, relational, or algebraicspecification

0.6, 36/60,mod:tough,med:tough

0.38, 23/60,mod:tough,med:tough

0.32, 19/60,mod:moderate,med:moderate

0.43, 26/60,mod:tough,

med:moderate

0.5, 30/60,mod:tough,med:tough

0.3, 18/60,mod:moderate,med:moderate

0.52, 31/60,mod:tough,med:tough

0.63, 40/63,mod:tough,med:tough

0.49, 31/63,mod:tough,med:tough

0.38, 24/63,mod:moderate,med:moderate

0.44, 28/63,mod:tough,med:tough

0.49, 31/63,mod:tough,med:tough

0.37, 23/63,mod:moderate,med:moderate

0.59, 37/63,mod:tough,med:tough

0.64, 28/44,mod:tough,med:tough

0.34, 15/44,mod:moderate,

med:tough

0.39, 17/44,mod:moderate,med:moderate

0.45, 20/44,mod:tough,med:tough

0.55, 24/44,mod:tough,med:tough

0.39, 17/44,mod:moderate,med:moderate

0.55, 24/44,mod:tough,med:tough

0.73, 35/48,mod:tough,med:tough

0.46, 22/48,mod:tough,

med:moderate

0.42, 20/48,mod:moderate,med:moderate

0.52, 25/48,mod:tough,med:tough

0.44, 21/48,mod:tough,

med:moderate

0.35, 17/48,mod:moderate,med:moderate

0.54, 26/48,mod:tough,med:tough

0.65, 34/52,mod:tough,med:tough

0.48, 25/52,mod:tough,med:tough

0.42, 22/52,mod:tough,

med:moderate

0.46, 24/52,mod:tough,med:tough

0.4, 21/52,mod:tough,

med:moderate

0.29, 15/52,mod:moderate,med:moderate

0.42, 22/52,mod:moderate,med:moderate

0.74, 48/65,mod:tough,med:tough

0.49, 32/65,mod:tough,med:tough

0.35, 23/65,mod:moderate,med:moderate

0.46, 30/65,mod:tough,med:tough

0.46, 30/65,mod:tough,med:tough

0.34, 22/65,mod:moderate,med:moderate

0.48, 31/65,mod:tough,med:tough

0.71, 32/45,mod:tough,med:tough

0.42, 19/45,mod:moderate,med:moderate

0.33, 15/45,mod:moderate,med:moderate

0.38, 17/45,mod:moderate,med:moderate

0.49, 22/45,mod:tough,med:tough

0.36, 16/45,mod:moderate,med:moderate

0.36, 16/45,mod:moderate,med:moderate

0.6, 55/92,mod:tough,med:tough

0.41, 38/92,mod:tough,med:tough

0.32, 29/92,mod:moderate,med:moderate

0.37, 34/92,mod:tough,

med:moderate

0.41, 38/92,mod:tough,med:tough

0.33, 30/92,mod:moderate,med:moderate

0.51, 47/92,mod:tough,med:tough

0.66, 37/56,mod:tough,med:tough

0.46, 26/56,mod:tough,

med:moderate

0.32, 18/56,mod:moderate,med:moderate

0.38, 21/56,mod:tough,

med:moderate

0.34, 19/56,mod:moderate,med:moderate

0.34, 19/56,mod:moderate,med:moderate

0.57, 32/56,mod:tough,med:tough

0.75, 24/32,mod:tough,med:tough

0.38, 12/32,mod:moderate,med:moderate

0.31, 10/32,mod:moderate,med:moderate

0.44, 14/32,mod:tough,

med:moderate

0.5, 16/32,mod:tough,med:tough

0.34, 11/32,mod:moderate,med:moderate

0.41, 13/32,mod:moderate,med:moderate

0.72, 53/74,mod:tough,med:tough

0.35, 26/74,mod:moderate,med:moderate

0.26, 19/74,mod:moderate,med:moderate

0.36, 27/74,mod:tough,med:tough

0.42, 31/74,mod:tough,

med:moderate

0.34, 25/74,mod:moderate,med:moderate

0.57, 42/74,mod:tough,med:tough

0.72, 41/57,mod:tough,med:tough

0.32, 18/57,mod:moderate,med:moderate

0.26, 15/57,mod:moderate,med:moderate

0.47, 27/57,mod:tough,

med:moderate

0.47, 27/57,mod:tough,

med:moderate

0.23, 13/57,mod:moderate,med:moderate

0.4, 23/57,mod:moderate,med:moderate

0.62, 46/74,mod:tough,med:tough

0.47, 35/74,mod:tough,med:tough

0.36, 27/74,mod:moderate,med:moderate

0.45, 33/74,mod:tough,med:tough

0.47, 35/74,mod:tough,med:tough

0.32, 24/74,mod:moderate,med:moderate

0.49, 36/74,mod:tough,med:tough

0 0.2 0.4 0.6 0.8 1

Tough−proportion of all ratings per cell

010

Key/Histogram of Toughs

# C

ells

Comparison of Challenge Difficulty across FMs (users practicing FMs, past)

Figure 25: Comparison of challenge difficulty across FM classes (UFMp)

Page 45: Context: Objective: Method: Results: arXiv:1812.08815v3 ...

Preprint – FormalMethods Use 45

Sca

labi

lity

Pro

per

abst

ract

ions

from

irre

leva

nt

Mai

ntai

nabi

lity

ofve

rific

atio

n re

sults

Reu

sabi

lity

ofve

rific

atio

n re

sults

Tran

sfer

of

verif

icat

ion

resu

lts

Aut

omat

ion

or to

olsu

ppor

t

Ski

lls a

nd e

duca

tion

Consistency checking

Symbolic execution

Simulation

Generic theorem proving

Constraint solving

Model checking, SMV

Process calculi

Assertion checking

Abstract interpretation

Dynamical system models

Process models

Modal or temporal logic specification

Predicative, relational, or algebraicspecification

0.48, 19/40,mod:tough,med:tough

0.2, 8/40,mod:moderate,med:moderate

0.18, 7/40,mod:dnk,

med:tough

0.28, 11/40,mod:dnk,

med:tough

0.48, 19/40,mod:tough,med:tough

0.3, 12/40,mod:tough,med:tough

0.4, 16/40,mod:tough,med:tough

0.52, 26/50,mod:tough,med:tough

0.2, 10/50,mod:moderate,med:moderate

0.22, 11/50,mod:dnk,

med:tough

0.24, 12/50,mod:dnk,

med:tough

0.4, 20/50,mod:tough,med:tough

0.28, 14/50,mod:moderate,

med:tough

0.36, 18/50,mod:tough,med:tough

0.5, 29/58,mod:tough,med:tough

0.26, 15/58,mod:moderate,med:moderate

0.16, 9/58,mod:moderate,med:moderate

0.31, 18/58,mod:tough,med:tough

0.38, 22/58,mod:tough,med:tough

0.24, 14/58,mod:moderate,med:moderate

0.4, 23/58,mod:tough,med:tough

0.57, 38/67,mod:tough,med:tough

0.25, 17/67,mod:moderate,med:moderate

0.21, 14/67,mod:not an issue,

med:moderate

0.21, 14/67,mod:dnk,

med:tough

0.37, 25/67,mod:tough,med:tough

0.31, 21/67,mod:moderate,med:moderate

0.42, 28/67,mod:tough,med:tough

0.44, 24/55,mod:tough,med:tough

0.24, 13/55,mod:moderate,

med:tough

0.18, 10/55,mod:moderate,med:moderate

0.25, 14/55,mod:dnk,

med:tough

0.31, 17/55,mod:tough,med:tough

0.31, 17/55,mod:moderate,med:moderate

0.4, 22/55,mod:tough,med:tough

0.47, 23/49,mod:tough,med:tough

0.18, 9/49,mod:moderate,med:moderate

0.2, 10/49,mod:dnk,

med:tough

0.24, 12/49,mod:dnk,

med:tough

0.33, 16/49,mod:tough,med:tough

0.31, 15/49,mod:tough,med:tough

0.39, 19/49,mod:tough,med:tough

0.53, 38/72,mod:tough,med:tough

0.29, 21/72,mod:moderate,

med:tough

0.26, 19/72,mod:moderate,med:moderate

0.26, 19/72,mod:dnk,

med:tough

0.38, 27/72,mod:tough,med:tough

0.39, 28/72,mod:tough,med:tough

0.46, 33/72,mod:tough,med:tough

0.37, 14/38,mod:tough,med:tough

0.29, 11/38,mod:tough,med:tough

0.24, 9/38,mod:not an issue,

med:tough

0.18, 7/38,mod:dnk,

med:tough

0.29, 11/38,mod:dnk,

med:tough

0.24, 9/38,mod:not an issue,

med:moderate

0.26, 10/38,mod:dnk,

med:tough

0.44, 25/57,mod:tough,med:tough

0.26, 15/57,mod:moderate,

med:tough

0.23, 13/57,mod:dnk,

med:tough

0.23, 13/57,mod:dnk,

med:tough

0.39, 22/57,mod:tough,med:tough

0.25, 14/57,mod:moderate,med:moderate

0.39, 22/57,mod:tough,med:tough

0.49, 38/78,mod:tough,med:tough

0.31, 24/78,mod:moderate,med:moderate

0.24, 19/78,mod:moderate,med:moderate

0.33, 26/78,mod:tough,med:tough

0.4, 31/78,mod:tough,med:tough

0.27, 21/78,mod:moderate,med:moderate

0.41, 32/78,mod:tough,med:tough

0.53, 35/66,mod:tough,med:tough

0.27, 18/66,mod:moderate,med:moderate

0.24, 16/66,mod:moderate,med:moderate

0.33, 22/66,mod:tough,med:tough

0.41, 27/66,mod:tough,med:tough

0.29, 19/66,mod:moderate,med:moderate

0.39, 26/66,mod:tough,med:tough

0.52, 27/52,mod:tough,med:tough

0.21, 11/52,mod:dnk,

med:tough

0.27, 14/52,mod:tough,med:tough

0.31, 16/52,mod:dnk,

med:tough

0.35, 18/52,mod:tough,med:tough

0.31, 16/52,mod:moderate,

med:tough

0.44, 23/52,mod:tough,med:tough

0.48, 30/63,mod:tough,med:tough

0.29, 18/63,mod:tough,med:tough

0.22, 14/63,mod:not an issue,

med:moderate

0.29, 18/63,mod:tough,med:tough

0.35, 22/63,mod:tough,med:tough

0.27, 17/63,mod:moderate,med:moderate

0.33, 21/63,mod:tough,med:tough

0 0.2 0.4 0.6 0.8 1

Tough−proportion of all ratings per cell0

15

Key/Histogram of Toughs

# C

ells

Comparison of Challenge Difficulty across FMs(users with no or decreased intent, future)

Sca

labi

lity

Pro

per

abst

ract

ions

from

irre

leva

nt

Mai

ntai

nabi

lity

ofve

rific

atio

n re

sults

Reu

sabi

lity

ofve

rific

atio

n re

sults

Tran

sfer

of

verif

icat

ion

resu

lts

Aut

omat

ion

or to

olsu

ppor

t

Ski

lls a

nd e

duca

tion

Consistency checking

Symbolic execution

Simulation

Generic theorem proving

Constraint solving

Model checking, SMV

Process calculi

Assertion checking

Abstract interpretation

Dynamical system models

Process models

Modal or temporal logic specification

Predicative, relational, or algebraicspecification

0.66, 70/106,mod:tough,med:tough

0.42, 45/106,mod:tough,med:tough

0.33, 35/106,mod:moderate,med:moderate

0.37, 39/106,mod:moderate,med:moderate

0.42, 45/106,mod:tough,

med:moderate

0.29, 31/106,mod:moderate,med:moderate

0.52, 55/106,mod:tough,med:tough

0.63, 88/140,mod:tough,med:tough

0.44, 61/140,mod:tough,med:tough

0.33, 46/140,mod:moderate,med:moderate

0.39, 54/140,mod:tough,

med:moderate

0.44, 61/140,mod:tough,med:tough

0.34, 48/140,mod:moderate,med:moderate

0.49, 68/140,mod:tough,med:tough

0.64, 83/130,mod:tough,med:tough

0.4, 52/130,mod:tough,med:tough

0.37, 48/130,mod:moderate,med:moderate

0.36, 47/130,mod:moderate,med:moderate

0.43, 56/130,mod:tough,med:tough

0.37, 48/130,mod:moderate,med:moderate

0.46, 60/130,mod:tough,med:tough

0.59, 66/111,mod:tough,med:tough

0.44, 49/111,mod:tough,med:tough

0.35, 39/111,mod:moderate,med:moderate

0.45, 50/111,mod:tough,med:tough

0.45, 50/111,mod:tough,med:tough

0.32, 36/111,mod:moderate,med:moderate

0.51, 57/111,mod:tough,med:tough

0.68, 94/138,mod:tough,med:tough

0.44, 61/138,mod:tough,med:tough

0.33, 46/138,mod:moderate,med:moderate

0.41, 56/138,mod:tough,med:tough

0.44, 61/138,mod:tough,med:tough

0.34, 47/138,mod:moderate,med:moderate

0.49, 68/138,mod:tough,med:tough

0.66, 97/146,mod:tough,med:tough

0.45, 65/146,mod:tough,med:tough

0.35, 51/146,mod:moderate,med:moderate

0.41, 60/146,mod:tough,med:tough

0.46, 67/146,mod:tough,med:tough

0.36, 53/146,mod:moderate,med:moderate

0.51, 74/146,mod:tough,med:tough

0.68, 67/98,mod:tough,med:tough

0.43, 42/98,mod:moderate,med:moderate

0.32, 31/98,mod:moderate,med:moderate

0.41, 40/98,mod:tough,

med:moderate

0.46, 45/98,mod:tough,med:tough

0.31, 30/98,mod:moderate,med:moderate

0.5, 49/98,mod:tough,med:tough

0.65, 105/161,mod:tough,med:tough

0.39, 63/161,mod:moderate,

med:tough

0.3, 49/161,mod:moderate,med:moderate

0.39, 62/161,mod:tough,med:tough

0.46, 74/161,mod:tough,med:tough

0.35, 57/161,mod:moderate,med:moderate

0.52, 84/161,mod:tough,med:tough

0.68, 83/122,mod:tough,med:tough

0.44, 54/122,mod:tough,med:tough

0.32, 39/122,mod:moderate,med:moderate

0.39, 48/122,mod:tough,

med:moderate

0.4, 49/122,mod:moderate,med:moderate

0.36, 44/122,mod:moderate,med:moderate

0.5, 61/122,mod:tough,med:tough

0.69, 68/99,mod:tough,med:tough

0.37, 37/99,mod:moderate,

med:tough

0.33, 33/99,mod:moderate,med:moderate

0.35, 35/99,mod:moderate,med:moderate

0.4, 40/99,mod:moderate,med:moderate

0.39, 39/99,mod:moderate,med:moderate

0.46, 46/99,mod:tough,med:tough

0.63, 78/123,mod:tough,med:tough

0.41, 50/123,mod:tough,med:tough

0.33, 41/123,mod:moderate,med:moderate

0.37, 46/123,mod:moderate,med:moderate

0.44, 54/123,mod:tough,med:tough

0.35, 43/123,mod:moderate,med:moderate

0.51, 63/123,mod:tough,med:tough

0.63, 86/137,mod:tough,med:tough

0.42, 58/137,mod:tough,med:tough

0.3, 41/137,mod:moderate,med:moderate

0.38, 52/137,mod:moderate,med:moderate

0.45, 62/137,mod:tough,med:tough

0.34, 46/137,mod:moderate,med:moderate

0.5, 68/137,mod:tough,med:tough

0.68, 86/127,mod:tough,med:tough

0.42, 53/127,mod:moderate,

med:tough

0.34, 43/127,mod:moderate,med:moderate

0.38, 48/127,mod:moderate,med:moderate

0.46, 58/127,mod:tough,med:tough

0.34, 43/127,mod:moderate,med:moderate

0.53, 67/127,mod:tough,med:tough

0 0.2 0.4 0.6 0.8 1

Tough−proportion of all ratings per cell

015

Key/Histogram of Toughs

# C

ells

Comparison of Challenge Difficulty across FMs(users with same or increased intent, future)

Figure 26: Comparison of challenge difficulty across FM classes (UFMi)

Page 46: Context: Objective: Method: Results: arXiv:1812.08815v3 ...

Preprint – FormalMethods Use 46

Sca

labi

lity

Pro

per

abst

ract

ions

from

irre

leva

nt

Mai

ntai

nabi

lity

ofve

rific

atio

n re

sults

Reu

sabi

lity

ofve

rific

atio

n re

sults

Tran

sfer

of

verif

icat

ion

resu

lts

Aut

omat

ion

or to

olsu

ppor

t

Ski

lls a

nd e

duca

tion

I have not used FMs in

Engineering practitioner in industry

Stakeholder of an FM tool or

Lecturer, teacher, trainer, or coach

Researcher in industry

Researcher in academia

External consultant

Consulting or managing practitioner inindustry

Bachelor, master, or PhD student

0.21, 4/19,mod:dnk,

med:tough

0.11, 2/19,mod:dnk,med:dnk

0.26, 5/19,mod:dnk,

med:tough

0.16, 3/19,mod:dnk,

med:tough

0.32, 6/19,mod:dnk,

med:tough

0.32, 6/19,mod:dnk,

med:tough

0.42, 8/19,mod:tough,med:tough

0.58, 38/66,mod:tough,med:tough

0.44, 29/66,mod:tough,med:tough

0.26, 17/66,mod:moderate,med:moderate

0.32, 21/66,mod:moderate,med:moderate

0.33, 22/66,mod:moderate,med:moderate

0.29, 19/66,mod:moderate,med:moderate

0.45, 30/66,mod:tough,med:tough

0.75, 18/24,mod:tough,med:tough

0.38, 9/24,mod:moderate,med:moderate

0.33, 8/24,mod:moderate,med:moderate

0.54, 13/24,mod:tough,med:tough

0.38, 9/24,mod:moderate,med:moderate

0.21, 5/24,mod:moderate,med:moderate

0.46, 11/24,mod:tough,

med:moderate

0.71, 63/89,mod:tough,med:tough

0.43, 38/89,mod:moderate,med:moderate

0.38, 34/89,mod:moderate,med:moderate

0.51, 45/89,mod:tough,med:tough

0.47, 42/89,mod:tough,med:tough

0.39, 35/89,mod:moderate,med:moderate

0.53, 47/89,mod:tough,med:tough

0.71, 40/56,mod:tough,med:tough

0.46, 26/56,mod:tough,med:tough

0.38, 21/56,mod:moderate,med:moderate

0.41, 23/56,mod:tough,

med:moderate

0.39, 22/56,mod:tough,med:tough

0.34, 19/56,mod:moderate,med:moderate

0.46, 26/56,mod:tough,med:tough

0.76, 81/107,mod:tough,med:tough

0.49, 52/107,mod:tough,med:tough

0.41, 44/107,mod:tough,

med:moderate

0.47, 50/107,mod:tough,med:tough

0.54, 58/107,mod:tough,med:tough

0.38, 41/107,mod:moderate,med:moderate

0.5, 54/107,mod:tough,med:tough

0.65, 32/49,mod:tough,med:tough

0.49, 24/49,mod:tough,med:tough

0.35, 17/49,mod:moderate,med:moderate

0.47, 23/49,mod:tough,med:tough

0.53, 26/49,mod:tough,med:tough

0.39, 19/49,mod:moderate,med:moderate

0.45, 22/49,mod:moderate,med:moderate

0.57, 28/49,mod:tough,med:tough

0.39, 19/49,mod:moderate,med:moderate

0.33, 16/49,mod:moderate,med:moderate

0.39, 19/49,mod:tough,

med:moderate

0.39, 19/49,mod:moderate,med:moderate

0.33, 16/49,mod:moderate,med:moderate

0.47, 23/49,mod:tough,med:tough

0.63, 66/105,mod:tough,med:tough

0.37, 39/105,mod:moderate,med:moderate

0.3, 32/105,mod:moderate,med:moderate

0.38, 40/105,mod:tough,med:tough

0.41, 43/105,mod:tough,med:tough

0.31, 33/105,mod:moderate,med:moderate

0.48, 50/105,mod:tough,med:tough

0 0.2 0.4 0.6 0.8 1

Tough−proportion of all ratings per cell0

10

Key/Histogram of Toughs

# C

ells

Comparison of Challenge Difficulty across Roles (Past)

Sca

labi

lity

Pro

per

abst

ract

ions

from

irre

leva

nt

Mai

ntai

nabi

lity

ofve

rific

atio

n re

sults

Reu

sabi

lity

ofve

rific

atio

n re

sults

Tran

sfer

of

verif

icat

ion

resu

lts

Aut

omat

ion

or to

olsu

ppor

t

Ski

lls a

nd e

duca

tion

I do not or would not

Engineering practitioner in industry

Researcher in industry

Stakeholder of an FM tool or

Researcher in academia

Lecturer, teacher, trainer, or coach

External consultant

Consulting or managing practitioner inindustry

Bachelor, master, or PhD student

0.4, 8/20,mod:tough,med:tough

0.2, 4/20,mod:dnk,

med:tough

0.2, 4/20,mod:not an issue,

med:tough

0.1, 2/20,mod:dnk,

med:tough

0.3, 6/20,mod:dnk,

med:tough

0.3, 6/20,mod:dnk,

med:tough

0.3, 6/20,mod:not an issue,

med:tough

0.58, 55/95,mod:tough,med:tough

0.36, 34/95,mod:tough,med:tough

0.32, 30/95,mod:moderate,med:moderate

0.33, 31/95,mod:moderate,med:moderate

0.35, 33/95,mod:moderate,med:moderate

0.35, 33/95,mod:moderate,med:moderate

0.51, 48/95,mod:tough,med:tough

0.69, 62/90,mod:tough,med:tough

0.46, 41/90,mod:tough,med:tough

0.36, 32/90,mod:moderate,med:moderate

0.36, 32/90,mod:moderate,med:moderate

0.39, 35/90,mod:moderate,med:moderate

0.36, 32/90,mod:moderate,med:moderate

0.46, 41/90,mod:tough,med:tough

0.72, 28/39,mod:tough,med:tough

0.54, 21/39,mod:tough,med:tough

0.36, 14/39,mod:moderate,med:moderate

0.44, 17/39,mod:tough,med:tough

0.38, 15/39,mod:moderate,med:moderate

0.26, 10/39,mod:moderate,med:moderate

0.41, 16/39,mod:moderate,med:moderate

0.7, 79/113,mod:tough,med:tough

0.48, 54/113,mod:tough,med:tough

0.35, 39/113,mod:moderate,med:moderate

0.42, 47/113,mod:tough,med:tough

0.49, 55/113,mod:tough,med:tough

0.36, 41/113,mod:moderate,med:moderate

0.47, 53/113,mod:tough,med:tough

0.72, 64/89,mod:tough,med:tough

0.52, 46/89,mod:tough,med:tough

0.4, 36/89,mod:tough,

med:moderate

0.45, 40/89,mod:tough,med:tough

0.48, 43/89,mod:tough,med:tough

0.4, 36/89,mod:moderate,med:moderate

0.52, 46/89,mod:tough,med:tough

0.69, 61/89,mod:tough,med:tough

0.46, 41/89,mod:tough,med:tough

0.37, 33/89,mod:moderate,med:moderate

0.43, 38/89,mod:tough,med:tough

0.46, 41/89,mod:tough,med:tough

0.34, 30/89,mod:moderate,med:moderate

0.51, 45/89,mod:tough,med:tough

0.56, 49/87,mod:tough,med:tough

0.43, 37/87,mod:tough,med:tough

0.33, 29/87,mod:moderate,med:moderate

0.36, 31/87,mod:moderate,med:moderate

0.36, 31/87,mod:moderate,med:moderate

0.3, 26/87,mod:moderate,med:moderate

0.55, 48/87,mod:tough,med:tough

0.54, 26/48,mod:tough,med:tough

0.44, 21/48,mod:tough,med:tough

0.35, 17/48,mod:moderate,med:moderate

0.38, 18/48,mod:moderate,

med:tough

0.44, 21/48,mod:tough,med:tough

0.33, 16/48,mod:moderate,med:moderate

0.52, 25/48,mod:tough,med:tough

0 0.2 0.4 0.6 0.8 1

Tough−proportion of all ratings per cell

010

Key/Histogram of Toughs

# C

ells

Comparison of Challenge Difficulty across Roles (Future)

Figure 27: Comparison of challenge difficulty across roles

Page 47: Context: Objective: Method: Results: arXiv:1812.08815v3 ...

Preprint – FormalMethods Use 47

Sca

labi

lity

Pro

per

abst

ract

ions

from

irre

leva

nt

Mai

ntai

nabi

lity

ofve

rific

atio

n re

sults

Reu

sabi

lity

ofve

rific

atio

n re

sults

Tran

sfer

of

verif

icat

ion

resu

lts

Aut

omat

ion

or to

olsu

ppor

t

Ski

lls a

nd e

duca

tion

I have not used FMs in

Process automation

Transportation

Industrial machinery

Other

Military systems not in the above

Device industry

Critical infrastructures

Supportive

Platforms

Business information

0.28, 10/36,mod:dnk,

med:tough

0.17, 6/36,mod:dnk,

med:tough

0.31, 11/36,mod:dnk,

med:tough

0.28, 10/36,mod:dnk,

med:tough

0.31, 11/36,mod:dnk,

med:tough

0.25, 9/36,mod:dnk,

med:tough

0.39, 14/36,mod:tough,med:tough

0.67, 20/30,mod:tough,med:tough

0.47, 14/30,mod:tough,med:tough

0.33, 10/30,mod:moderate,med:moderate

0.4, 12/30,mod:moderate,med:moderate

0.43, 13/30,mod:moderate,med:moderate

0.4, 12/30,mod:moderate,med:moderate

0.6, 18/30,mod:tough,med:tough

0.71, 55/78,mod:tough,med:tough

0.46, 36/78,mod:tough,med:tough

0.33, 26/78,mod:moderate,med:moderate

0.36, 28/78,mod:tough,

med:moderate

0.45, 35/78,mod:tough,med:tough

0.42, 33/78,mod:moderate,med:moderate

0.5, 39/78,mod:tough,med:tough

0.71, 22/31,mod:tough,med:tough

0.35, 11/31,mod:moderate,med:moderate

0.32, 10/31,mod:moderate,med:moderate

0.35, 11/31,mod:moderate,med:moderate

0.42, 13/31,mod:tough,

med:moderate

0.35, 11/31,mod:moderate,med:moderate

0.52, 16/31,mod:tough,med:tough

0.75, 6/8,mod:tough,med:tough

0.38, 3/8,mod:moderate,med:moderate

0.5, 4/8,mod:tough,med:tough

0.5, 4/8,mod:tough,med:tough

0.62, 5/8,mod:tough,med:tough

0.5, 4/8,mod:tough,med:tough

0.62, 5/8,mod:tough,med:tough

0.78, 14/18,mod:tough,med:tough

0.28, 5/18,mod:moderate,med:moderate

0.39, 7/18,mod:tough,

med:moderate

0.33, 6/18,mod:moderate,med:moderate

0.5, 9/18,mod:tough,med:tough

0.28, 5/18,mod:moderate,med:moderate

0.44, 8/18,mod:moderate,med:moderate

0.59, 23/39,mod:tough,med:tough

0.38, 15/39,mod:moderate,med:moderate

0.33, 13/39,mod:moderate,med:moderate

0.44, 17/39,mod:tough,

med:moderate

0.41, 16/39,mod:moderate,med:moderate

0.36, 14/39,mod:moderate,med:moderate

0.46, 18/39,mod:tough,med:tough

0.68, 43/63,mod:tough,med:tough

0.44, 28/63,mod:tough,

med:moderate

0.3, 19/63,mod:moderate,med:moderate

0.41, 26/63,mod:moderate,med:moderate

0.44, 28/63,mod:moderate,med:moderate

0.38, 24/63,mod:moderate,med:moderate

0.41, 26/63,mod:moderate,med:moderate

0.71, 55/78,mod:tough,med:tough

0.45, 35/78,mod:tough,

med:moderate

0.35, 27/78,mod:moderate,med:moderate

0.42, 33/78,mod:tough,med:tough

0.44, 34/78,mod:tough,med:tough

0.35, 27/78,mod:moderate,med:moderate

0.44, 34/78,mod:tough,med:tough

0.6, 24/40,mod:tough,med:tough

0.4, 16/40,mod:moderate,med:moderate

0.4, 16/40,mod:moderate,med:moderate

0.48, 19/40,mod:tough,med:tough

0.45, 18/40,mod:tough,med:tough

0.28, 11/40,mod:moderate,med:moderate

0.48, 19/40,mod:tough,med:tough

0.6, 25/42,mod:tough,med:tough

0.4, 17/42,mod:moderate,med:moderate

0.45, 19/42,mod:tough,

med:moderate

0.52, 22/42,mod:tough,med:tough

0.38, 16/42,mod:moderate,med:moderate

0.4, 17/42,mod:moderate,med:moderate

0.6, 25/42,mod:tough,med:tough

0 0.2 0.4 0.6 0.8 1

Tough−proportion of all ratings per cell0

10

Key/Histogram of Toughs

# C

ells

Comparison of Challenge Difficulty across Domains (Past)

Sca

labi

lity

Pro

per

abst

ract

ions

from

irre

leva

nt

Mai

ntai

nabi

lity

ofve

rific

atio

n re

sults

Reu

sabi

lity

ofve

rific

atio

n re

sults

Tran

sfer

of

verif

icat

ion

resu

lts

Aut

omat

ion

or to

olsu

ppor

t

Ski

lls a

nd e

duca

tion

I would not or do not

Other

Process automation

Transportation

Industrial machinery

Military systems not in the above

Device industry

Critical infrastructures

Supportive

Platforms

Business information

0.42, 8/19,mod:tough,med:tough

0.26, 5/19,mod:dnk,

med:tough

0.21, 4/19,mod:not an issue,

med:tough

0.16, 3/19,mod:not an issue,

med:tough

0.26, 5/19,mod:dnk,

med:tough

0.32, 6/19,mod:dnk,

med:tough

0.37, 7/19,mod:tough,med:tough

0.5, 1/2,mod:moderate,med:moderate

0.5, 1/2,mod:moderate,med:moderate

0.5, 1/2,mod:not an issue,

med:moderate

0.5, 1/2,mod:not an issue,

med:moderate

0.5, 1/2,mod:moderate,med:moderate

0, 0/2,mod:moderate,med:moderate

1, 2/2,mod:tough,med:tough

0.64, 57/89,mod:tough,med:tough

0.47, 42/89,mod:tough,med:tough

0.34, 30/89,mod:moderate,med:moderate

0.4, 36/89,mod:tough,

med:moderate

0.44, 39/89,mod:tough,

med:moderate

0.35, 31/89,mod:moderate,med:moderate

0.54, 48/89,mod:tough,med:tough

0.64, 90/141,mod:tough,med:tough

0.38, 53/141,mod:moderate,

med:tough

0.29, 41/141,mod:moderate,med:moderate

0.37, 52/141,mod:tough,med:tough

0.41, 58/141,mod:tough,med:tough

0.32, 45/141,mod:moderate,med:moderate

0.5, 71/141,mod:tough,med:tough

0.65, 60/92,mod:tough,med:tough

0.43, 40/92,mod:tough,med:tough

0.32, 29/92,mod:moderate,med:moderate

0.45, 41/92,mod:tough,med:tough

0.46, 42/92,mod:tough,med:tough

0.35, 32/92,mod:moderate,med:moderate

0.52, 48/92,mod:tough,med:tough

0.63, 50/79,mod:tough,med:tough

0.44, 35/79,mod:tough,med:tough

0.35, 28/79,mod:moderate,med:moderate

0.43, 34/79,mod:tough,

med:moderate

0.41, 32/79,mod:moderate,med:moderate

0.33, 26/79,mod:moderate,med:moderate

0.48, 38/79,mod:tough,med:tough

0.61, 70/114,mod:tough,med:tough

0.41, 47/114,mod:tough,med:tough

0.32, 37/114,mod:moderate,med:moderate

0.44, 50/114,mod:tough,med:tough

0.48, 55/114,mod:tough,med:tough

0.34, 39/114,mod:moderate,med:moderate

0.54, 61/114,mod:tough,med:tough

0.62, 84/135,mod:tough,med:tough

0.38, 51/135,mod:moderate,

med:tough

0.36, 49/135,mod:moderate,med:moderate

0.48, 65/135,mod:tough,med:tough

0.47, 64/135,mod:tough,med:tough

0.33, 44/135,mod:moderate,med:moderate

0.49, 66/135,mod:tough,med:tough

0.68, 69/101,mod:tough,med:tough

0.47, 47/101,mod:tough,med:tough

0.31, 31/101,mod:moderate,med:moderate

0.4, 40/101,mod:tough,med:tough

0.44, 44/101,mod:tough,med:tough

0.34, 34/101,mod:moderate,med:moderate

0.48, 48/101,mod:tough,med:tough

0.64, 61/96,mod:tough,med:tough

0.45, 43/96,mod:tough,med:tough

0.35, 34/96,mod:moderate,med:moderate

0.44, 42/96,mod:tough,med:tough

0.42, 40/96,mod:tough,med:tough

0.28, 27/96,mod:moderate,med:moderate

0.45, 43/96,mod:tough,med:tough

0.64, 48/75,mod:tough,med:tough

0.45, 34/75,mod:tough,med:tough

0.37, 28/75,mod:moderate,med:moderate

0.51, 38/75,mod:tough,med:tough

0.47, 35/75,mod:tough,med:tough

0.37, 28/75,mod:moderate,med:moderate

0.49, 37/75,mod:tough,med:tough

0 0.2 0.4 0.6 0.8 1

Tough−proportion of all ratings per cell

010

Key/Histogram of Toughs

# C

ells

Comparison of Challenge Difficulty across Domains (Future)

Figure 28: Comparison of challenge difficulty across domains

Page 48: Context: Objective: Method: Results: arXiv:1812.08815v3 ...

Preprint – FormalMethods Use 48

Sca

labi

lity

Pro

per

abst

ract

ions

from

irre

leva

ntde

tails

Mai

ntai

nabi

lity

ofve

rific

atio

n re

sults

Reu

sabi

lity

ofve

rific

atio

n re

sults

Tran

sfer

of

verif

icat

ion

resu

ltsfr

om m

odel

s to

rea

lity

Aut

omat

ion

or to

olsu

ppor

t

Ski

lls a

nd e

duca

tion

Comparison of Challenge Difficulty as rated by Expert Users(> 3 years of experience)

% o

f exp

ert u

sers

rat

ing

chal

leng

e as

'tou

gh'

0.0

0.2

0.4

0.6

0.8

1.0

Sca

labi

lity

Pro

per

abst

ract

ions

from

irre

leva

ntde

tails

Mai

ntai

nabi

lity

ofve

rific

atio

n re

sults

Reu

sabi

lity

ofve

rific

atio

n re

sults

Tran

sfer

of

verif

icat

ion

resu

ltsfr

om m

odel

s to

rea

lity

Aut

omat

ion

or to

olsu

ppor

t

Ski

lls a

nd e

duca

tion

Comparison of Challenge Difficulty as rated by non−Expert Users(<= 3 years of experience)

% o

f non

−ex

pert

s ra

ting

chal

leng

e as

'tou

gh'

0.0

0.2

0.4

0.6

0.8

1.0

Figure 29: Comparison of expert and non-expert users by their perception of challenge difficulty

A.7 Details on the Systematic Map

Table 9 contains the data we collected from the literature for the systematic map.

Page 49: Context: Objective: Method: Results: arXiv:1812.08815v3 ...

Preprint

–Formal

Methods

Use

49

Table 9: Details for the classification of related work

Reference Motivation Approach Result Relation List of Obstacles

Gerhart andYelowitz(1976)

Pinpoint fallibilityin the use of FMs

Evaluation of methodologies,error classification andanalysis, using severalexamples of fundamentalalgorithms

Identification of three classesof errors: specification errors,systematic constructionerrors, proved program errors.Discussion of potential causesof errors of each class.

Observations containobstacles,recommendations theiralleviation

Formality gap, appropriateabstraction and structuring,lack of skills and education

Jackson (1987) Generic evaluation Expert opinion FMs have inherent limitations Aim of method transfer,design of applicable FMs

Inherent informality offormalisation, difficult tocommunicate to customers,lack ofexpressiveness/freedom ofexpression, lack ofmethodology

Bjorner (1987) Proposes a softwareDevelopmentmethod based onFM

Personal opinion, experience Identify 3 main challenges:education, hiring, and toolsupport.

Challenges could beconsidered obstacles

Lack of skills and education,lack of tools,changeability/compatibilitywith existing process (methodculture)

Hall (1990) Present and test FMmyths.

They evaluate FMs on onelarger case study ( 50.000lines of Objective C Code)where they use Z (550 Zschemas to define 280operations) to develop aCASE tool.

Formal methods are powerfultools which must be betterunderstood by developers atlarge.

Rejection of commonHypothesis about FMs.Evaluation of a singleFM by means of a casestudy.

Myth 1: improper abstraction;transfer of v.results; (myth2-4: skills and directededucation;) myth 5:time-budget restrictions; myth6: improper abstraction, toolsupport (usability); myth 7:scalability

Wing (1990) FM adoption Literature review, summary,and analysis

Overview/taxonomy of FMs;analysis of limitations

Justifies the FMclassification used in ourquestionnaire

Proper abstraction (formalitygap, neglected environmentalassumptions)

Bloomfieldet al. (1991)

Introduction toformal methods

Reference to technologytransfer, existing case studies,and tool support

At the present time, formalmethods are good for thedescription of sequentialproperties of systems, and forcommunication protocols,although they do not yetaddress temporal propertiesand concurrency particularlywell.

Investigation of state ofthe art, no comparisonwith future.

Handling of incomplete specs,lack of verification tools,costly training, changingmanagement style

Continued on next page

Page 50: Context: Objective: Method: Results: arXiv:1812.08815v3 ...

Preprint

–Formal

Methods

Use

50

Reference Motivation Approach Result Relation List of Obstacles

Austin andParkin (1993)

Lack of acceptancein industry

Literature survey andquestionnaire

Identify obstacles and suggestto improve education andstandardisation and toperform case studies anddefine metrics.

Our questionnaire hasless focus onrepresentation andmethodology andexcludes questions onbenefits, suggestions.Their sample mainlycovers Z/VDM users inthe UK. Our analysis ofpast use is moreelaborate.

Math, tools, lack ofcost/benefit evidence, changeresistance

Craigen et al.(1993)

Determine state ofthe art about the useof FM in practice

Analysis of 12 case studies FMs are beginning to be usedseriously and successfully byindustry

Study of FMs inindustrial practice

Stated as recommendations:scalability, lack of toolsupport, lack ofskills/education, transfer ofverif. obl./results from/tocode; resource constraints

Fraser et al.(1994)

Lack of FMadoption,improvement of RE

Discuss benefits and problemsof FM adoption, literaturestudy

Present a two-dimensionalframework for assessingstrategies for incorporatingformal specifications insoftware development

Add suggestions on FMintroduction

Lack of method and toolsupport, lack ofskills/training, not suitable forrequirements prototyping,lack of cost/benefit evidence

Bowen andHinchey(1995a)

Re-examine Hall’smyths, introduce 7new Myths.

Argumentation andmentioning of case studies(although no reference to thestudies are given).

More real links betweenindustry and academia arerequired, and the successfuluse of formal methods mustbe better publicized.

Rejection of commonHypothesis about FMs.Evaluation withreference to case studies.

Time-budget-restrictions, lackof tool support, lack ofintegration in current process,scalability (multi-tech)

Bowen andHinchey(1995b)

Identify maximsthat may help in theapplication offormal methods inan industrial setting.

Based on observations (byourselves and others) on anumber of recently completedand in-progress projects

10 Hypotheses on how toimprove the success of FMusage

Investigation of FMusage; Argumentationand Examples.

Stated as commandments:lack of tool support(documentation guidelines),proper abstraction, budgetrestrictions (bad cost-benefitratio), lack of experts (skillsand education); compatibilitywith current process (lack ofquality culture); lack ofreuseability

Continued on next page

Page 51: Context: Objective: Method: Results: arXiv:1812.08815v3 ...

Preprint

–Formal

Methods

Use

51

Reference Motivation Approach Result Relation List of Obstacles

Lai and Leung(1995)

Investigate reasonswhy academicmethods are notadapted by industry

Personal experience Provide 5 reasons:practicality, too academic,Education, Resistance tochange, difficulty to re-invest

The reasons can beconsidered as obstacles

Practicality (being tooacademic): scalability, skillsand education, resistance tochange (compat with existingprocess); budget restrictions

Heisel (1996) In practice FMs arenot widely used

Personal Opinion/Observation Proposes a pragmaticapproach to FM

Method tries toovercome obstacles

Lack of skills/education, lackof experts, improperabstraction (low useability,low correctness); budgetrestrictions; compatibilitywith existing process

Hinchey andBowen (1996)

Identify reasons forindustry’sreluctance to takeformal methods toheart.

Experience in editing acollection of essays on theindustrial application of FM.

Misconception of Myths,Standards, Tools, andEducation are obstacles to useFM in industry.

They identify obstacles.However, they to notprovide evidence againstor in favor of them.

Wrong skills and education,lack of clarification (badreputation/misconception);lack of standards/regulation;lack of tools;

Holloway andButler (1996)

FM adoption Position statement, experiencereport, expert opinion

List of impediments to FMadoption

Do not measure usageintent but highlight lackof transfer efforts

Inadequate tools andexamples, inadequate transfer

Lai (1996) Academic methods(FM) not used incommunicationindustry

Personal opinions; critizeresearch transfer and suggestimprovements that might alsobe helpful for FM transfer topractice

8 Reasons why academianeeds to do industry research,7 catalysts, 12 industryrelevant factors

Reasons can beconsidered obstacles

Lack of empirical evidence,lack of skills/education,scalability, properabstractions; compatibilitywith existing process, lack oftool support

Pfleeger andHatton (1997)

Investigateeffectiveness ofFMs

Case study and effectanalysis: Comparison ofchange requests for FM-basedand non-FM-based codefragments as a result ofpostdelivery problems causedby these fragments.

FM have a positive effect oncode quality

Empirical evidence forFM effectiveness shownfor FMs in design phasebut not in more general,however only one system

Cost-benefit (fault removaleffectiveness)

Knight et al.(1997)

FMs are notaccepted byindustry

Elementary Field Experiment Evaluation of Z, PVS, andstate-charts

Evaluation of FMs inpractice

Integration in existing process(tools, methods,environments, people); properand useful abstractions (incl.usability/comprehensibility);tool support (groupdevelopment, collaborativeeng.); evolution andspec/proof maintainability;budget constraints

Continued on next page

Page 52: Context: Objective: Method: Results: arXiv:1812.08815v3 ...

Preprint

–Formal

Methods

Use

52

Reference Motivation Approach Result Relation List of Obstacles

Galloway et al.(1998)

Lack of FMadoption,improvement ofrequirements

Single case study in the lab Applied discrete FMs (e.g. Z,PVS, and state-charts) forspecifying and aggregatingrequirements of aircraftengine control systems

Interesting discussion ofFM weaknesses in a veryrelevant applicationcontext

Inadequate abstraction(difficult to integrate discreteand continuous abstractions);lack of training; lack ofinterest from engineers; FMsare perceived as too expensiveto apply

Heitmeyer(1998)

FM tools are usefulbut not used inindustry sincepeople are notskilled enough

Reference to a few casestudies

Provide a set of guidelineshow to make FM more usable

Usability as obstacle Tool support, properabstraction, processchangeability/compatibility

Snook andHarrison(2001)

Lack of empiricalinvestigation on theuse of FM

5 Structured Interviews Improved quality of softwarewith little or no additionallifecycle costs

Empirical investigationof the benefits of FM inindustry

Lack of skills/education,improper abstractions(understandability); transferof verif.results (models toreality)

Bowen andHinchey(2005)

Re-examination oftheir 1995commandments 10Years later

Personal experience,reference to literature

empirically validatecommandments with littleconclusion

They investigate the useof FM in industry

Tool support still an issuedespite some case studies

Bicarreguiet al. (2009)

No recent study onthe industrial use ofFMs.

Structured Questionnaire on62 industrial projects

4 challenges where identified Empirical Studyidentifying obstacles

Lack of tool support, lack ofempirical evidence; lack ofexperience (skills/education),budget restrictions

Woodcocket al. (2009)

State of the art ofindustrial use of FM(extension ofBicarregui et al.,2009)

Structured Questionnaire on62 industrial projects (claimedto be most comprehensivereview ever made of formalmethods application inindustry)

Identify several challenges Empirical Studyidentifying obstacles

Budget/resource constraints(high entry costs,cost-benefit), lack of toolsupport (automation)

Miller et al.(2010)

FMs are not widelyused in Industry

3 Case studies about the useof Model Checking inIndustry

Model Checking can beeffectively used to find errorsearly in the developmentprocess for many classes ofmodels

They investigate theapplicability of one FMin industry

Lessons from case studies:scalability and properabstraction (useability)

Continued on next page

Page 53: Context: Objective: Method: Results: arXiv:1812.08815v3 ...

Preprint

–Formal

Methods

Use

53

Reference Motivation Approach Result Relation List of Obstacles

Parnas (2010) FMs are not widelyused in industry

Argumentation/PersonalExperience

Provides reasons why FMsare not used and suggestionsfor improvement

Provides Obstacles Proper abstraction, lack ofempirical evidence, lack oftool support;maintainability/transfer(refinement,step-by-step)

Mohagheghiet al. (2012)

Adoption of MbE inSE

Tool evaluations based onTAM (Riemenschneider et al.,2002), interviews, survey

Evaluation of PEOU, PU,current, and future use

Also use IVs current andfuture use; little focus onFM; but MDE issometimes based on FM,MDE adoption canimprove FM adoption;

Lack of training, lack ofmaturity, broken tool chains,high cost of adoption (toolintegration)

Davis et al.(2013)

Identify barriers toFM adoption andsuggestions forbarrier mitigation

Interviews with 31practitioners from the USaerospace domain

Top barriers: education, tools,work environment; topmitigations: education, toolintegration, evidence of FMbenefits; occasionalnon-barriers: evidence onsavings, FM complexity,training/skills

Similar researchquestions, open-endedinterview questions;restricted to one domainand geography

Education, tools,environment, engineering,certification, misconceptions,scalability, evidence ofbenefits, cost

Liebel et al.(2016)

Adoption of MbE(incl. FM) inembedded SE

Online survey on needs,positive/negative effects, andshortcomings of MDEadoption

SotA and challengeassessment: FMs not usedwidely; their data suggests aneed of FM adoption; 30% ofthe responses from industrydeclare the need for FMs as areason to adopt MDE; medianof responses suggests thatMbE adoption has a positiveeffect on FM adoption;

Few FM users asparticipants

Lack of tool support, badreputation, rigid developmentprocesses

Ferrari et al.(2019)

Lack of FMadoption in railwaydomain

Review of FM literature, FMprojects, and FM toolsaccording toDESMET (Kitchenham et al.,1997); survey amongpractitioners

UML dominates as the MbElanguage, many FMs andFM-based tools are used, Bdominates as the FM; toolranking/selection matrix

Analyse maturity of FMsfrom literature review;evaluate rele-vance/quality/maturity ofFM and FM tool featuresfrom subjectiveassessment of surveyrespondents;

Difficulty to learn, lack of toolqualifications, lack ofexpressiveness

Continued on next page

Page 54: Context: Objective: Method: Results: arXiv:1812.08815v3 ...

Preprint

–Formal

Methods

Use

54

Reference Motivation Approach Result Relation List of Obstacles

Klein et al.(2018)

FM adoption Large case study,measurement of proof effort

FMs can scale to real systems,mixed assurance levels arepossible

Evidence for scalabilitycontradicting the beliefof our responses

Incompleteness of theoremsfor abstracting from allhardware features

Page 55: Context: Objective: Method: Results: arXiv:1812.08815v3 ...

Preprint – FormalMethods Use 55

A.8 Mapping of Studies to Challenges for RQ3

In addition to Table 6 in Section 5.5, Table 10 provides the complete lists of surveyed studies mapped tothe corresponding challenges.

Table 10: Mapping of studies to challenge names (with the number of studies in parentheses)

Challenge Name Supported by

Scalability (7) Bowen and Hinchey (1995a), Craigen et al. (1995, 1993), Hall (1990), Lai (1996), Laiand Leung (1995), and Miller et al. (2010)

Skills & Education(13)

Barroca and McDermid (1992), Bicarregui et al. (2009), Bjorner (1987), Bowen andHinchey (1995b), Craigen et al. (1995, 1993), Galloway et al. (1998), Hall (1990),Heisel (1996), Hinchey and Bowen (1996), Lai (1996), Lai and Leung (1995), andSnook and Harrison (2001)

Transfer of Proofs(8)

Barroca and McDermid (1992), Bloomfield et al. (1991), Craigen et al. (1995, 1993),Hall (1990), Jackson (1987), Parnas (2010), and Snook and Harrison (2001)

Reusability (2) Barroca and McDermid (1992) and Bowen and Hinchey (1995b)

Abstraction (12) Barroca and McDermid (1992), Bowen and Hinchey (1995b), Galloway et al. (1998),Hall (1990), Heisel (1996), Heitmeyer (1998), Jackson (1987), Knight et al. (1997), Lai(1996), Miller et al. (2010), Parnas (2010), and Snook and Harrison (2001)

Tools & Automation(16)

Bicarregui et al. (2009), Bjorner (1987), Bloomfield et al. (1991), Bowen and Hinchey(1995a,b), Bowen and Hinchey (2005), Craigen et al. (1995, 1993), Hall (1990),Heitmeyer (1998), Hinchey and Bowen (1996), Knight et al. (1997), Lai (1996),O’Hearn (2018), Parnas (2010), and Woodcock et al. (2009)

Maintainability (3) Barroca and McDermid (1992), Knight et al. (1997), and Parnas (2010)

Resources (11) Bicarregui et al. (2009), Bloomfield et al. (1991), Bowen and Hinchey (1995a,b),Craigen et al. (1995, 1993), Hall (1990), Heisel (1996), Knight et al. (1997), Lai andLeung (1995), and Woodcock et al. (2009)

ProcessCompatibility (12)

Bjorner (1987), Bloomfield et al. (1991), Bowen and Hinchey (1995a,b), Craigen et al.(1995), Heisel (1996), Heitmeyer (1998), Hinchey and Bowen (1996), Knight et al.(1997), Lai (1996), Lai and Leung (1995), and O’Hearn (2018)

Practicality &Reputation (6)

Bicarregui et al. (2009), Galloway et al. (1998), Glass (2002), Lai (1996), Lai andLeung (1995), and Parnas (2010)

A.9 All Answers to Open Questions

Table 11 provides all answers to the open questions of our questionnaire. We fixed a small number ofspelling mistakes in the responses during the preparation of this table.

Page 56: Context: Objective: Method: Results: arXiv:1812.08815v3 ...

Preprint

–Formal

Methods

Use

56

Table 11: Open answers to the questions Q3, Q6, Q7, Q11, Q12, and Q13; R. . . respondent

R Date D3o Further motivationsto use FMs

P3o Other FMsused

P4o Used FMs forother purp.

F4o Future use ofother FMs

F5o Future use forother purp.

O1o Further obstacles to FMuse

General Feedback

2 2017/08/14 none none none none none cool4 2017/08/14 Science shall be best and

great.No, that appearapproximatelycomplete to me.

That is pretty all. No No In real operative situations arealways hidden obstacles as wellas those listed already. Imag-ine real and nominal definitionswere really first order logic.Add operational definition andstay sound.

Abstraction, reduction, exemplifi-cation, representation, knowledge,experience, skill and the art ofanalysing the taxonomies underselection are necessary or in need.

8 2017/08/18 Scientific curiosity9 2017/08/18 Test-Generation for small

problems, like MCDCcoverage

high license costs

19 2017/08/25 I just liked logic.Actually never caredabout applying it, butresearch positions are inFM so well :-)

goes completely against anyexisting software developmentprocess: FM require that youwrite specifications before im-plementing. I don’t believe any-more that anybody would dothis any time.

21 2017/08/28 Error elimination22 2017/08/28 Completeness, accuracy Survey not suitable for beginners

(like me) with the FMs.28 2017/09/12 Budgetary restrictions32 2017/10/16 environment34 2017/10/20 The cost-benefit relationship is

not slanted in FM’s favour: sub-stantial improvements in soft-ware engineering will not comefrom application of FM butthrough process improvement,better training and education,and better control of require-ments. Formal methods haveuses for verifying critical com-ponents but they will not solvethe big problems in software en-gineering.

Continued on next page

Page 57: Context: Objective: Method: Results: arXiv:1812.08815v3 ...

Preprint

–Formal

Methods

Use

57

No. Date Q3 Further motivations touse FMs

Q6 Other FMs used Q7 Used FMs forother purp.

Q11 Future use ofother FMs

Q12 Future use forother purp.

Q13 Further obstacles to FMuse

General Feedback

35 2017/10/25 Supporting the design andconstruction of reliableand dependable systems

Many dialects ofPetri nets anAutomata

36 2017/10/25 Improved level ofconfidence versus moretraditional means

reverse engineering The main obstacle to adoptionby industry is the ”use” of engi-neers not able to handle abstrac-tion, leading to poor results.

37 2017/10/25 The need to achieve anddemonstrate the highestpossible integrity ofsystems

None None None None Perceived cost and difficulty ofuse requiring specialist knowl-edge

38 2017/10/25 No. None None There are no specific barriers(apart from not having neededpractice with them) apart from Ifind my Formalised Methods (tostrict procedures) gets the taskdone well enough.

I make a clear distinction betweenthe mathematically Formal Meth-ods and the procedure based For-malised Methods.

39 2017/10/25 B method40 2017/10/25 Cost and quality of

finished productSPARK Adaprogramminglanguage, Z, CSP

Broken market/market-for-lemons in software quality

43 2017/10/25 They are the only way toguarantee certainproperties of software andits documentation

Specification:TLA+, Z

Fault and failureanalysis

44 2017/10/25 none none no No45 2017/10/25 Clearly thinking about the

problem and correctness47 2017/10/25 I work with ’time

triggered’ systems.These can be mod-elled effectively(semi-formally)without a com-plete mathematicalmodel. Many ofour customers thinkthis is far moreadvanced than theyrequire ...

No customer demand.

Continued on next page

Page 58: Context: Objective: Method: Results: arXiv:1812.08815v3 ...

Preprint

–Formal

Methods

Use

58

No. Date Q3 Further motivations touse FMs

Q6 Other FMs used Q7 Used FMs forother purp.

Q11 Future use ofother FMs

Q12 Future use forother purp.

Q13 Further obstacles to FMuse

General Feedback

48 2017/10/25 ”Bugs”, software defects,are semanticinconsistencies in code.Formal methodsacknowledge thenecessary semantics andhelp prevent defects.

Currentlyintegrating a degreeof formal methodsinto softwarerequirements anddesign modellingmethod

Drive constructionof good code. Prop-erties of a goodproof are also prop-erties of good code:minimal steps,logical progressionof steps, ... Writingcode to be prove-able (even if neverexplicitly proven)yields better code.

I don’t understandthis question.

None that I canthink of now

The software industry’s culturalpredisposition to informal ap-proaches (”But that’s not theway we’ve always done it!”)

49 2017/10/26 Combining traditionaland Formal methodsbased Software V & V

50 2017/10/26 I love logic and maths. my experience ismostly as aresearcher andteacher, somethingfor which the abovedoes not allow meto tick a box

to make exam ques-tions :-)

54 2017/11/04 Belief it helps constructsystems with much lesserrors

Most of practitioners don’t anddon’t want to know formalmethods. So there is a strongneed for ”hidden” use of formalmethods, like compilers.

FM have strong potentials butare also difficult to use in indus-try practices. Transferring verygood academia results into indus-try practices is challenging.

56 2017/11/06 no58 2017/11/23 To reduce costs in system

development.Reduce costs.

59 2017/11/23 simplifies sometimesthings, because ofenforcement of asystematic approach

60 2017/11/23 The beauty ofmathematics

61 2017/11/23 research program ignorant persons64 2017/11/23 Elegance and precision VDM, Z, Event-B Interesting set of questions65 2017/11/23 Maintenance cost and

reliability66 2017/11/23 I consider it the

foundation of principledsoftware engineering

Unifying Theoriesof Programming

rapid prototypingcalculation tools

lack of time

67 2017/11/23 Higher level of trust pseudo-occam,SPARK-ada

Not very user friendly, too ab-stract and diverse syntax be-tween different FMs, productiv-ity

Continued on next page

Page 59: Context: Objective: Method: Results: arXiv:1812.08815v3 ...

Preprint

–Formal

Methods

Use

59

No. Date Q3 Further motivations touse FMs

Q6 Other FMs used Q7 Used FMs forother purp.

Q11 Future use ofother FMs

Q12 Future use forother purp.

Q13 Further obstacles to FMuse

General Feedback

68 2017/11/23 Increasing engineeringreliability

The TRIOspecificationlanguage

UML with for-mal application-dependent tailoring.

71 2017/11/23 Teaching how to buildhigh-quality software

72 2017/11/23 Engineers will not be able to ap-ply formal methods.

73 2017/11/23 Achieving correct,fault-free software

In lectures andteaching FMs.

76 2017/11/25 no77 2017/11/26 Thesis - tool verification79 2017/11/29 productivity of the

*whole* process80 2017/11/30 Soundness checking of

automation83 2017/12/06 Well chosen questions which do

not leave me guessing. Relevant tofuture FM research and practice.

84 2017/12/06 None - impractical formost systems

85 2017/12/06 Identified as industrialbest practice

87 2017/12/06 Desire to develop systemsthat I have solid evidenceperform correctly underall scenarios.

Aligning academic researchwith industrial need. Gettingtool developers to invest in FM.

90 2017/12/19 business and researchdifferentiation

91 2018/01/02 Lack of tools and wide range ofmethods in existence.

93 2018/03/16 I don’t know I haven’t I haven’t I don’t know Academia I don’t know94 2018/07/30 Niche market, outperform

companies using”traditional” means

96 2018/07/31 verify protocols97 2018/08/01 I believe that software

engineering should havethe same mathematicalunderpinning as regularengineering (I majored inengineering)

Continued on next page

Page 60: Context: Objective: Method: Results: arXiv:1812.08815v3 ...

Preprint

–Formal

Methods

Use

60

No. Date Q3 Further motivations touse FMs

Q6 Other FMs used Q7 Used FMs forother purp.

Q11 Future use ofother FMs

Q12 Future use forother purp.

Q13 Further obstacles to FMuse

General Feedback

98 2018/08/01 To increase quality andreliability

100 2018/08/01 needed - - - - - -102 2018/08/02 no none no no105 2018/08/02 See Andy Galloway,

Trevor Cockram, andJohn McDermid.Experiences with theapplication of discreteformal methods to thedevelopment of enginecontrol ...

Independentreviewer of FMspecifications

None

107 2018/08/02 Pushing back theboundaries of knowledge.Making programs whichare correct byconstruction.

FermaT ProgramTransformations

108 2018/08/03 critical infrastructuresand technologyrequirements

specification of hy-brid communicatingsystems

future technologicaldesign methodolo-gies

109 2018/08/03 No further motivations touse FMs

111 2018/08/03 A candidate supplieroffered FMs in a tenderresponse to address anassurance requirement

reviewingconsistent set ofrequirementsexpressed using Z,OBJ, etc

I need to provide assurance thatis understandable to those whoneed to be assured

Formal Methods are perceived tobe expensive to use, and they canbe, but this can be offset by thebenefits; there does not seem to bemuch work published on this thatcould be used to persuade the Cus-tomer and the budget holder...

113 2018/08/03 Formal is the only way toobtain rigorousverification.

114 2018/08/04 Strategic facilitation inlarge enterprises, whereformal modelling canreveal gaps in the clientsunderstanding of theirown ecosystems.

Z, VDM, AXES Requirements engi-neering

Protective Analysis(PAN)

115 2018/08/04 As per my interpretation,FM is a consistent alogical way to describe aSystem/Product

Current corporations do notconsider valuable FMs duringproduct development, as a con-sequence, Engineers will noteven try to use it

Looks really interesting and I willdo some research from my side tolearn a little bit more

Continued on next page

Page 61: Context: Objective: Method: Results: arXiv:1812.08815v3 ...

Preprint

–Formal

Methods

Use

61

No. Date Q3 Further motivations touse FMs

Q6 Other FMs used Q7 Used FMs forother purp.

Q11 Future use ofother FMs

Q12 Future use forother purp.

Q13 Further obstacles to FMuse

General Feedback

117 2018/08/05 To change the way theworld produces software

Security and safetyanalysis

No Safety and securityanalysis of Systems

Awareness of commercial po-tential

Closed questionnaire is just a start

118 2018/08/05 Improve general systemreliability

TRIO , timedAutomata, B

Requirement engi-neering

Research on inno-vative formalisms,foundational re-search

122 2018/08/06 I have developed varioussafety-related gas sensorsand other measurementdevices. I was invited tobecome an assessor ofthese systems by acertifying body. I wasshocked by what Idiscovered when Iscrutinised the methodsand approaches used bythe embedded softwaredevelopers whose workwas being certified. Iresolved to improve myown knowledge andpractice so that I coulddeal authoritatively withsome of theunsatisfactory situationsthat arose during variouscertifications.

I have great diffi-culty in persuadinganyone to take aninterest in FM. Irepresented theUK on a Europeancommittee con-cerned with safetyrelated gas systems.The ignorance andlack of interest inFM was startling- the German rep-resentatives wereparticularly hostile.So I only use FM inspecial cases withspecific customers.There is muchprejudice out thereagainst FM.

See answer above. We have topublicise some successes in or-der to lift the veil of ignoranceabout FM.

Continued on next page

Page 62: Context: Objective: Method: Results: arXiv:1812.08815v3 ...

Preprint

–Formal

Methods

Use

62

No. Date Q3 Further motivations touse FMs

Q6 Other FMs used Q7 Used FMs forother purp.

Q11 Future use ofother FMs

Q12 Future use forother purp.

Q13 Further obstacles to FMuse

General Feedback

123 2018/08/06 We provide customers anFM tool

Most tools have many limita-tions and do not support widedomains by themselves

126 2018/08/07 Quality requirements Complexity129 2018/08/07 Scientific curiosity. Formalising inter-

national standards131 2018/08/07 It is a hard problem.133 2018/08/07 Improving confidence in

software134 2018/08/07 Formal methods are the

backbone of the sciencein ”Computing Science”.

136 2018/08/07 To model complexphenomena in socialsciences

137 2018/08/07 No138 2018/08/07 Protocol analysis Protocol analysis141 2018/08/08 The lists essentially

contains formalverificationapproaches. Formalmethods aremathematicallygroundedapproaches todesign/analyseartefacts, e.g. the Bmethod is a formalmethod to devisesoftwarecomponents. I usethe B method in mycurrent job.

System-level analy-sis. Software de-sign.

System analysis.Software design.

Continued on next page

Page 63: Context: Objective: Method: Results: arXiv:1812.08815v3 ...

Preprint

–Formal

Methods

Use

63

No. Date Q3 Further motivations touse FMs

Q6 Other FMs used Q7 Used FMs forother purp.

Q11 Future use ofother FMs

Q12 Future use forother purp.

Q13 Further obstacles to FMuse

General Feedback

145 2018/08/08 Curiosity None146 2018/08/08 Interest / Research As testing tools. Industry mind set147 2018/08/08 gain confidence in the

algorithms we werereasoning about.

None. I am convinced that onlyFMs can help us from shitty andbuggy software.

150 2018/08/09 The robustness andpreciseness that theformal methods disciplinecan provide via itsspecifications andverifications techniques

Thank you very much for this sur-vey. It is very constructive andimportant. It handles most of theissues encountered by any practi-tioner and user of formal methods.

151 2018/08/09 I only encountered FMs as a topicin my studies once. They do notplay any role in my profession atthe moment.

153 2018/08/09 Quality154 2018/08/09 Main part of business

model of our startupSorry for not dis-closing our FutureExtent of FormalMethods Use

Hypes (eg AI) that direct deci-sion makers in other directions

Sorry for not disclosing our FutureExtent of Formal Methods Use

157 2018/08/10 I think FM is misplaced in ourstudy program SE, especially itbeing mandatory. It is very rarelyused in industry (for a reason) andthere are so many more importantthings to learn in order to becomethe Software Engineers of tomor-row (eg. Entrepreneurship).

160 2018/08/10 I am a PhD studentusing FM for formalverification. I planon continuing to dothis...

161 2018/08/11 Time constraints, financial con-straints, refactoring efforts

167 2018/08/13 They are fun and solve a”real” problem

168 2018/08/13 Abstract stateMachines, PetriNets, LTL, CTL,SAT and SMTsolvers

Continued on next page

Page 64: Context: Objective: Method: Results: arXiv:1812.08815v3 ...

Preprint

–Formal

Methods

Use

64

No. Date Q3 Further motivations touse FMs

Q6 Other FMs used Q7 Used FMs forother purp.

Q11 Future use ofother FMs

Q12 Future use forother purp.

Q13 Further obstacles to FMuse

General Feedback

169 2018/08/13 Strong (potential)guarantees unobtainableby other methods

Requirement fal-sification (throughsimulation), dy-namic/online assur-ance (monitoringand reasoning)

Lack of appreciation and under-standing from other (CS and en-gineering) communities

Nice questions overall. In thosewith relative answers (e.g.,more/less frequent than before), itwould be nice to have my previousanswers on the same page (tocheck what I answered exactly).

171 2018/08/13 I have developed bothVDM andRely/Guarantee concepts

VDM,Rely/Guarantee,Separation Logic

Rely/Guarantee education

174 2018/08/13 model-based testing176 2018/08/14 Petri Nets very strange you put Petri nets to

Section P2 as a description tech-nique, and process calculi to Sec-tion P3 as a reasoning technique.In fact, Petri nets offer much morereasoning techniques than processcalculi do

178 2018/08/15 Software Projects often have alarge existing code base. Theassumptions that hold in thatcode base need to be thor-oughly analyzed first. More-over, I am missing automaticmapping from source code toformal logic models / descrip-tions.

I’ve missed questions towards whypeople do not use formal methodsin a given domain. What are theobstacles that they are currentlyfacing?

179 2018/08/15 Interest in reliable criticalsystems

Changed focus of interest at mywork

182 2018/08/16 I believe in doing thingsright or doing it not at all.

code generationbased on allegories

The way academia communi-cates FMs to engineers in prac-tice.

183 2018/08/16 Interest186 2018/08/17 Teaching FM Combinations of

formal and semi-formal methods

187 2018/08/19 FM = only method ableto solve the problem

193 2018/08/26 Z Karnaugh Maps ?for circuit design

195 2018/09/04 business in FM mcrl2197 2018/09/24 FMs have been in

existence since 1967.they have never made itinto the mainstream.

CSP Finding synchro-nization bugs

Continued on next page

Page 65: Context: Objective: Method: Results: arXiv:1812.08815v3 ...

Preprint

–Formal

Methods

Use

65

No. Date Q3 Further motivations touse FMs

Q6 Other FMs used Q7 Used FMs forother purp.

Q11 Future use ofother FMs

Q12 Future use forother purp.

Q13 Further obstacles to FMuse

General Feedback

198 2018/09/24 need for rigorousspecification andverification

199 2018/09/24 State of the art in controlengineering andautomation

204 2018/09/26 Move to management level207 2018/10/05 Only reasonable way to

get my results208 2018/10/05 I never felt tempted to use

formal methodsTo clearly defineconcepts

213 2018/12/16 Regulatory authority ok215 2019/02/04 Functional safety

standards and theirapplication

Modellinglanguages

rather semi-FMs,are more practicaland easy to apply

support test casegeneration andvalidation

customers ability to make useof them as well in the specifica-tion and development process ofcritical systems

217 2019/03/20 UML218 2019/03/22 increase confidence in

system security221 2020/04/20 Static analyzer e.g.

PVS Studio1. The tools don’t even work. 2.The tools aren’t cost effective.3. We would have to train all theprogrammers, the QA team, thesystems engineering team, and(importantly) the company ex-ecutives.

Page 66: Context: Objective: Method: Results: arXiv:1812.08815v3 ...

Preprint – FormalMethods Use 66

A.10 Copy of the Advertisement Flyer

A.11 Screenshot of the Twitter Poll

A.12 Copy of the Questionnaire

The PDF export of our on-line questionnaire on the next page corresponds to the questionnaire weused for the sample taken until 31.3.2019 with N = 220. Since 26.5.2019, an extended version of thequestionnaire had been available online at

https://goo.gl/forms/FnKNQtTmI3A6BekM2.

We crafted this questionnaire using Google Forms (Google, 2018). We use numbered identifiers for eachquestion category, demographic questions are prefixed with a “D”, questions about past FM use (UFMp)with a “P”, about future or intended FM use (UFMi) with an “F”, questions about obstacles with an “O”.Open questions are suffixed by an “o”.

Page 67: Context: Objective: Method: Results: arXiv:1812.08815v3 ...

Preprint – FormalMethods Use 67

Use of Formal MethodsDear participant, thank you for your interest in this 8-10min survey on the use of formal methods (FMs).

This survey does NOT require previous knowledge in FMs or in their actual application in a practical context. However, this survey targets persons with an educational background in engineering and sciences OR with a practical engineering background in a reasonably critical systems or product domain.

By "FMs", we refer to explicit mathematical models and sound formal logical reasoning about critical properties---such as reliability, safety, availability, data privacy or, more generally, dependability and security---of electrical, electronic, and programmable electronic or software systems in critical application domains. FMs include, for example, formal specification, theorem proving, model checking, formal contracts, SMT solving, process algebras.

By "use of FMs", we refer to the application of FMs to engineered systems in the context of education, research, and, particularly, the field of industrial practice and by using formal languages together with manual or automated tool-based techniques.

This survey is anonymous. However, you can provide your email address if you are interested in receiving our final results afterwards.

The underlying study is conducted by Mario Gleirscher at University of York and Diego Marmsoler at Technical University of Munich.

* Erforderlich

Demographic Questions

Page 68: Context: Objective: Method: Results: arXiv:1812.08815v3 ...

Preprint – FormalMethods Use 68

D1. In which application domain(s) in industry or academia (if any) have youmainly used FMs? *Wählen Sie alle zutreffenden Antworten aus.

I have not used FMs in any academic or industrial domain.

Critical infrastructures (e.g. telecom, energy, road/air/naval/rail traffic, smartbuildings or cities)

Process automation (e.g. chemical process plants, power plants, warehouselogistics, production lines)

Industrial machinery (e.g. stationary robotics, production machines)

Transportation (e.g. automotive, utility vehicles, naval, aeronautics, trainsystems, freight logistics, cable cars, mobile robotics, drones/UAVs)

Device industry (e.g. medical, health-care, semi-conductors, consumerelectronics)

Military systems not in the above domains (e.g. for command, control,surveillance)

Business information (e.g. database applications, banking, finance, ERP, PLM,web services, cloud apps)

Platforms (e.g. operating systems, middle-ware, firmware, drivers, databasesystems, libraries)

Supportive (e.g. CASE tools, checking or verification tools, CAD/CAM systems)

Sonstiges:

1.

D2. How many years of FM experience (including the study of FMs) have yougained? *Markieren Sie nur ein Oval.

I do not have any knowledge of or experience in FMs.

less than 3 years

3 to 7 years

8 to 15 years

16 to 25 years

more than 25 years

2.

Page 69: Context: Objective: Method: Results: arXiv:1812.08815v3 ...

Preprint – FormalMethods Use 69

D3. Which have been your motivations (if any) to use FMs? *Markieren Sie nur ein Oval pro Zeile.

nomotivation

moderatemotivation

strong motivation (orrequirement)

Regulatory authoritiesCustomers / scientificcommunityEmployer / researchcollaboratorsSuperior(s) / principalinvestigator(s)Study or researchprogramOwn (private) interestOn behalf of an FM toolor service provider

3.

D3o. Which have been your furthermotivations to use FMs (if any)?

4.

Past and Current Use of Formal MethodsThe following questions aim at your EXPERIENCE with the use of FMs in your PAST and CURRENT activities and projects.

NOTE: If you are not able to say anything about past or current use, please, choose the corresponding "not yet used...", "no experience...", or "not at all" options and proceed to the next page!

P1. In which role(s) have you used FMs? *Wählen Sie alle zutreffenden Antworten aus.

I have not used FMs in any specific role.

Engineering practitioner in industry (e.g. programmer)

Consulting or managing practitioner in industry (e.g. architect, requirements orsystems engineer)

External consultant (e.g. external requirements or systems engineer)

Researcher in industry

Researcher in academia

Lecturer, teacher, trainer, or coach

Bachelor, master, or PhD student

Stakeholder of an FM tool or service provider

Sonstiges:

5.

Experience in Formal Methods Use

Page 70: Context: Objective: Method: Results: arXiv:1812.08815v3 ...

Preprint – FormalMethods Use 70

P2. Describe your level of experience with each of the following classes offormal description techniques? *Markieren Sie nur ein Oval pro Zeile.

noexperience

or noknowledge

studied in(university)

course

applied inlab,

experiments,case studies

appliedonce in

engineeringpractice

appliedseveraltimes in

engineeringpractice

Predicative,relational, oralgebraicspecificationModal andtemporal logicspecificationProcess models(e.g. Petri nets,Mealy machines,LTS, Markovprocesses)Dynamicalsystems (i.e.differentialequations)

6.

Page 71: Context: Objective: Method: Results: arXiv:1812.08815v3 ...

Preprint – FormalMethods Use 71

P3. Describe your level of experience with each of the following classes offormal reasoning techniques? *Markieren Sie nur ein Oval pro Zeile.

noexperience

or noknowledge

studied in(university)

lectures

applied inlab,

experiments,case studies

appliedonce in

engineeringpractice

appliedseveraltimes in

engineeringpractice

AbstractinterpretationAssertionchecking (e.g. forpre/postspecification,contracts)Process calculi(e.g. CSP, CCS,pi, mu, hybrid)Model checking,SMV (of e.g.temporal orprobabilisticproperties)Constraint (SAT,SMT) solving(e.g. for staticcode analysis),optimisationtechniquesGeneric (first-order, HOL)theorem proving(using e.g. termrewriting,functionalprogramming)Computationalengineering,simulation (usinge.g. differentialcalculus,numericalmethods)Symbolicexecution (e.g.scenario testing,model animation)Consistencychecking (e.g.syntax or bugpattern checking)

7.

Page 72: Context: Objective: Method: Results: arXiv:1812.08815v3 ...

Preprint – FormalMethods Use 72

P3o. List other FMs you have experience with (if any):8.

Purposes of Formal Methods Use

P4. I have mainly used FMs for ... *Markieren Sie nur ein Oval pro Zeile.

... notat all.

...once.

... in 2 to 5separate tasks.

... in more than 5separate tasks.

... clarification (i.e.explicit description foranalyzing a problem)... specification (e.g.contracts,documentation andcommunication ofrequirements anddesign)... inspection (i.e. errordetection, e.g. non-conformance checking,model-based testing)... synthesis (e.g.transformation,compilation)... assurance (e.g. errorremoval, propertyverification, refinementor equivalence proofs,argumentation)

9.

P4o. I have used FMs for otherpurposes (if any):

10.

Intended Future Use of Formal MethodsThe following questions aim at your INTENT to use FMs in your FUTURE activities and projects.

NOTE: Your intend to use FMs will also be interpreted as the "mere possibility of FM usage in the corresponding ways" according to and based on your responses.

Future Extent of Formal Methods Use

Page 73: Context: Objective: Method: Results: arXiv:1812.08815v3 ...

Preprint – FormalMethods Use 73

F1. In which application domain(s) in industry or academia (if any) would (ordo) you intend or recommend to use FMs? *Wählen Sie alle zutreffenden Antworten aus.

I would not or do not intend (or recommend) to use FMs in any academic orindustrial domain.

Critical infrastructures (e.g. telecom, energy, road/air/naval/rail traffic, smartbuildings or cities)

Process automation (e.g. chemical process plants, power plants, warehouselogistics, production lines)

Industrial machinery (e.g. stationary robotics, production machines)

Transportation (e.g. automotive, utility vehicles, naval, aeronautics, trainsystems, freight logistics, cable cars, mobile robotics, drones/UAVs)

Device industry (e.g. medical, health-care, semi-conductors, consumerelectronics)

Military systems not in the above domains (e.g. for command, control,surveillance)

Business information (e.g. database applications, banking, finance, ERP, PLM,web services, cloud apps)

Platforms (e.g. operating systems, middle-ware, firmware, drivers, databasesystems, libraries)

Supportive (e.g. CASE tools, checking or verification tools, CAD/CAM systems)

Sonstiges:

11.

F2. In which role(s) would (or do) you intend to use FMs? *Wählen Sie alle zutreffenden Antworten aus.

I do not or would not intend to use FMs in any specific role.

Engineering practitioner in industry (e.g. programmer, test or verificationengineer)

Consulting or managing practitioner in industry (e.g. architect, requirements orsystems engineer)

External consultant (e.g. external requirements or systems engineer)

Researcher in industry

Researcher in academia

Lecturer, teacher, trainer, or coach

Bachelor, master, or PhD student

Stakeholder of an FM tool or service provider

Sonstiges:

12.

Page 74: Context: Objective: Method: Results: arXiv:1812.08815v3 ...

Preprint – FormalMethods Use 74

F3. I (would) intend to use ... *Markieren Sie nur ein Oval pro Zeile.

... nomore or

not at all.

... lessoften than

in thepast.

... asoften as

in thepast.

... moreoften thanin the past.

I don'tknow.

... predicative,relational, oralgebraicspecification... modal or temporallogic specification... process models(e.g. Petri nets,Mealy machines,LTS, Markovprocesses)... dynamical systemmodels (i.e.differential equations)

13.

Page 75: Context: Objective: Method: Results: arXiv:1812.08815v3 ...

Preprint – FormalMethods Use 75

F4. I (would) intend to use ... *Markieren Sie nur ein Oval pro Zeile.

... nomore or

not at all.

... lessoften than

in thepast.

... asoften as

in thepast.

... moreoften thanin the past.

I don'tknow.

... abstractinterpretation... assertion checking(e.g. for pre/postspecification,contracts)... process calculi(e.g. CSP, CCS, pi,mu, hybrid)... model checking,SMV (of e.g.temporal orprobabilisticproperties)... constraint (SAT,SMT) solving (e.g. forstatic code analysis),optimisationtechniques... generic (first-order,HOL) theoremproving (using e.g.term rewriting,functionalprogramming)... simulation (i.e.computationalengineering usinge.g. differentialcalculus, numericalmethods)... symbolic execution(e.g. scenario testing,model animation)... consistencychecking (e.g. syntaxor bug patternchecking)

14.

F4o. I (would) intend to use other FMs,semi-FMs (i.e. without formal semanticsand proof system), or highly systematicprocedure (if any, please, provide somedetails):

15.

Future Purposes of Formal Methods Use

Page 76: Context: Objective: Method: Results: arXiv:1812.08815v3 ...

Preprint – FormalMethods Use 76

F5. I (would) intend to use FMs for ... *Markieren Sie nur ein Oval pro Zeile.

... nomore or

not at all.

... lessoften than

in thepast.

... asoften as

in thepast.

... moreoften thanin the past.

I don'tknow.

... clarification (i.e.explicit description foranalyzing a problem)... specification (e.g.contracts,documentation andcommunication ofrequirements anddesign)... inspection (i.e.error detection, e.g.non-conformancechecking, model-based testing)... synthesis (e.g.transformation,compilation)... assurance (e.g.error removal,property verification,refinement orequivalence proofs,argumentation)

16.

F5o. I (would) intend to use FMs forother purposes (if any):

17.

Potential Obstacles to the Intended Use of FormalMethods

Page 77: Context: Objective: Method: Results: arXiv:1812.08815v3 ...

Preprint – FormalMethods Use 77

O1. For any potential use of FMs in my future activities and projects, I consider... *Markieren Sie nur ein Oval pro Zeile.

... not asan issue.

... as amoderatechallenge.

... as a toughchallenge.

I don'tknow.

... scalability (e.g.towards large orheterogeneoussystems)... proper (automated)abstractions fromirrelevant details... maintainability ofverification results (e.g.stable proofs)... reusability ofverification results (e.g.parametric proofs)... transfer ofverification results frommodels to reality... automation or toolsupport (incl. notations,DSLs, IDEs)... skills and education(e.g. methods knownand ready to use)

18.

O1o. Which further obstacles (if any)would potentially hinder you to useFMs as intended?

19.

Thank you for your participation!

For SurveyCircle users (www.surveycircle.com): The Survey Code is: XXXX-XXXX-XXXX-XXXX

Please, feel free to provide us any feedback on the questionnaire or on itstopic:

20.

Page 78: Context: Objective: Method: Results: arXiv:1812.08815v3 ...

Preprint – FormalMethods Use 78

Bereitgestellt von

If you are interested in our results youcan provide us your email addressbelow:

21.