Keynote 1 9:15 AM - 10:15 AM Title: Algorithms to Live By ... · Keynote Abstracts and Speaker Bios...

Welcome! On behalf of the department of Operations, Business Analytics & Information Systems at the Lindner College of Business, welcome to the University of Cincinnati and the 2017 SIGSAND Symposium. This symposium is a forum for scholars, practitioners, and doctoral students interested in systems analysis and design (SAND). Our objective is to promote and develop high quality research on all issues related to SAND. Many of the presentations in our program represent early research and research-in-progress papers. As a developmental outlet designed to facilitate feedback, we invite you to consider yourself an ‘informal discussant’ of every paper and actively engage in discussions. We look forward to your participation! We hope you benefit from the symposium and that you enjoy your time in Cincinnati. The city and university have much to offer and there is room in our program for you to explore. Should you need any recommendations or help, please do not hesitate to reach out. Below are some helpful tips, maps, and notes about accessibility. Binny Samuel, Roman Lukyanenko & Arturo Castellanos SIGSAND Officers [email protected] , [email protected] & [email protected]

Program Notes Internet Access

Friday: open network at Sharonville Convention Center Saturday: open network (UCGuest) at UC or eduroam

Parking Note for Saturday, May 20

Please pick up a complimentary parking validation ticket during Saturday’s program if you park in the Campus Green Garage on Saturday May 20. You MUST exit the garage by 3 PM or pay an hourly rate, due to a soccer game.

Wayfinding (click on the links below)

Interactive map of SIGSAND locations Map of Sharonville Convention Center Map of Over-the-Rhine (OTR) Gateway District Map of Cincinnati Downtown Using the Streetcar

http://business.uc.edu/sigsand2017.html

mailto:[email protected]



https://drive.google.com/open?id=13cJ5ZxFUPJC_DUL90jczlDiaRdQ&usp=sharing

https://sharonvilleconventioncenter.com/floor-plans/

http://96bda424cfcc34d9dd1a-0a7f10f87519dba22d2dbc6233a731e5.r41.cf2.rackcdn.com/overtherhinechamber/documents/map-pdfs/OTR_2015_Wayfinding_Map_FINAL.pdf

https://www.cincyusa.com/transportation/CincinnatiMaps.pdf

http://www.cincinnatibellconnector.com/about-the-streetcar/about/faqs

Program Schedule

1 The three keynotes are in conjunction with the UC Center for Business Analytics “Analytics Summit 2017”; see http://business.uc.edu/academics/centers/analytics-center/events/analytics-summit-2017.html

Thursday May 18

Welcome Dinner 6:00 – 8:00 pm Taft's Ale House, 1429 Race St, Cincinnati, OH 45202

Friday May 19

Shuttle Departs for Convention Center

7:15 am

Origin: Fairfield Inn & Suites by Marriott Cincinnati Uptown/University Area, 2500 S Market St, Cincinnati, OH 45219 (meet in hotel lobby if utilizing shuttle) Destination: Sharonville Convention Center, 11413 Chester Rd, Cincinnati, OH 45246

Breakfast & Registration

7:45 – 8:30 am Sharonville Convention Center (SCC), Northern Lights Ballroom

Welcome 8:30 – 8:35 am SCC 107-108

SIGSAND Officers

Session 1a Chair: Carson Woo

8:35 – 9:05 am SCC 107-108

Developing Test Cases Using ER Based Approach; Palash Bera and Abhimanyu Gupta

Keynote1 9:15 – 10:15 am

SCC Northern Lights Ballroom

Algorithms to Live By; Brian Christian, Author

Break 10:15 – 10:30 am SCC Northern Lights Ballroom; Coffee and Drinks Available

Session 1b Chair: Carson Woo

10:30 – 11:00 am SCC

107-108

Integrating Scientific Research: Theory and Design of Discovering Similar Constructs; James Endicott, Kai R. Larsen, Roman Lukyanenko, and Chih How Bong

Break 11:00 – 11:15 am SCC Northern Lights Ballroom; Coffee and Drinks Available

Session 2 Chair: Roger Chiang

11:20 am –12:20 pm SCC 107-108

Conceptual Modeling Research in Information Systems: What we now know and what we still do not know; Mohammad Ali Jabbari Sabegh, Roman Lukyanenko, Jan Recker, Binny Samuel, and Arturo Castellanos

Lunch & Keynote 12:25 – 1:40 pm

SCC Northern Lights Ballroom

PFF Player Grades ....the story of why not all data sets are created equal; Neil Hornsby, Founder, Pro Football Focus

Session 3

Chair:

Dinesh Batra

1:40–2:40 pm SCC 107-108

What Makes a Good Crowd? Rethinking the Relationship between Recruitment Strategies and Data Quality in Crowdsourcing; Shawn Ogunseye and Jeffrey Parsons

Repurposing User-Generated Electronic Documentation: Lessons from Case Management in Foster Care; Arturo Castellanos, Alfred Castillo, Roman Lukyanenko, and Monica Chiarini Tremblay

http://business.uc.edu/academics/centers/analytics-center/events/analytics-summit-2017.html

Program Schedule

Break 2:40 – 3:00 pm SCC Northern Lights Ballroom; Refreshments Available

Business Meeting / Keynote

3:00 – 4:00 pm

SCC 107-108 / Northern Lights Ballroom

Annual SIGSAND Business Meeting /

Consumer Analytics in an Ambiguous World; Stephan Chase, former VP Consumer Analytics, Marriott International

Break 4:00 – 4:30 pm SCC 107-108

Shuttle Departs Convention Center for Restaurant

4:30 pm

Origin: Sharonville Convention Center, 11413 Chester Rd, Cincinnati, OH 45246

Destination: 99 Restaurant, 11974 Lebanon Rd, Cincinnati, OH 45241

Dinner 5:00 pm 99 Restaurant, 11974 Lebanon Rd, Cincinnati, OH 45241

Shuttle Departs for Hotel

Upon completion of dinner

Origin: 99 Restaurant, 11974 Lebanon Rd, Cincinnati, OH 45241

Destination: Fairfield Inn & Suites by Marriott Cincinnati Uptown/University Area, 2500 S Market St, Cincinnati, OH 45219

Saturday May 20

Lindner College of Business Lindner Hall 2925 Campus Green, Dr Cincinnati, OH 45221 All meetings will take place in Linder Hall (LH) Room 608 (An informal shuttle to Lindner Hall can be arranged upon request.)

Check-In 9:00 – 9:15 am LH 608 SIGSAND Officers; Refreshments Available

Session 4

Chair: Jeff Parsons

9:15 – 10:15 am LH 608

Developing a Dependency Descriptive Entity Relationship Diagram (DDERD); Andrew Harrison, Narayan S. Umanath

Conceptual Data Models and Narratives: A Tool to Help the Tool; Merete Hvalshagen, Binny M. Samuel, and Roman Lukyanenko

Break 10:15 – 10:45 am LH 608; Refreshments Available

Session 5 Chair: Vijay Khatri

10:45 – 11:45 am LH 608

Is Microservices a Viable Technology for Business Application Development? An Organizational Theory Based Rationale; Padmal Vitharana and Hemant Jain

Getting an Old Dog to Learn New Tricks: The Role of Collective Ownership in ISD; Salman Nazir

Closing

Thanks 11:45 – 12:00 pm LH 608

Michael Fry, Department Head and Professor of Operations, Business Analytics and Information Systems

Lunch 12:00 – 1:00 pm LH 608

Proceedings - Table of Contents

Page

Keynote Abstracts and Speaker Bios 1

Developing Test Cases Using ER Based Approach

Palash Bera and Abhimanyu Gupta 2

Integrating Scientific Research: Theory and Design of Discovering Similar Constructs

James Endicott, Kai R. Larsen, Roman Lukyanenko, and Chih How Bong 7

Conceptual Modeling Research in Information Systems: What we now know and what we

still do not know

Mohammad Ali Jabbari Sabegh, Roman Lukyanenko, Jan Recker, Binny Samuel, and

Arturo Castellanos

14

What Makes a Good Crowd? Rethinking the Relationship between Recruitment

Strategies and Data Quality in Crowdsourcing

Shawn Ogunseye and Jeffrey Parsons

24

Repurposing User-Generated Electronic Documentation: Lessons from Case Management in

Foster Care

Arturo Castellanos, Alfred Castillo, Roman Lukyanenko, and Monica Chiarini Tremblay

31

Developing a Dependency Descriptive Entity Relationship Diagram (DDERD)

Andrew Harrison, Narayan S. Umanath 38

Conceptual Data Models and Narratives: A Tool to Help the Tool

Merete Hvalshagen, Binny M. Samuel, and Roman Lukyanenko 45

Is Microservices a Viable Technology for Business Application Development? An

Organizational Theory Based Rationale

Padmal Vitharana and Hemant Jain

54

Getting an Old Dog to Learn New Tricks: The Role of Collective Ownership in ISD

Salman Nazir 61

Keynotes presented by UC Center for Business Analytics Analytics Summit 2017

Keynote 1 9:15 AM - 10:15 AM Title: Algorithms to Live By: The Computer Science of Human Decisions Abstract: Many of the decisions we face in our everyday lives run deeply parallel to some of the canonical problems in computer science and operations research. Brian Christian will discuss both how we can leverage insights from these fields to develop better intuitions in our own thinking, as well as how our human values and principles might translate into an era of increasingly automated decision-making. Brian Christian Brian is the co-author of Algorithms to Live By: The Computer Science of Human Decisions, a #1 Audible bestseller, Amazon best science book of the year and MIT Technology Review best book of the year. He also is the author of The Most Human Human which was named a Wall Street Journal bestseller and a New Yorker favorite book of the year. _____________________________________________________________________________

Keynote 2 12:40 - 1:40 PM Title: PFF Player Grades ....the story of why not all data sets are created equal Neil Hornsby: Pro Football Focus Neil Hornsby founded Pro Football Focus (PFF) in 2006 and developed a patent pending grading methodology to objectively assess and rank individual player performance. Neil is responsible for partnerships with 27 NFL teams and many College football programs as well as major media networks. He is constantly developing new insights into the game and player performance to help teams win more football games. _____________________________________________________________________________

Keynote 3: 3:00 -4:00 PM Title: Consumer Analytics in an Ambiguous World Abstract: Despite dizzying advances being made across many forms of analytics, the election of 2016 offered political analysts a lesson in humility. Stephan Chase explains the challenges posed by consumer analytics and offers, by way of a critique of industry’s obsession with Millennials, guideposts for employing consumer analytics to inform corporate strategy. Stephan Chase: Chase Intel A former VP of Consumer Analytics at Marriott International, Stephan is a strategic and analytical leader with a proven track record in driving short and long-term revenue through the creation and application of customer-focused analytics.

Proceedings of the 16th AIS SIGSAND Symposium, Cincinnati, OH, May 19-20, 2017 1 of 64

Bera and Gupta Developing Test Cases

Developing Test Cases Using ER Based Approach

Palash Bera Saint Louis University

[email protected]

Abhimanyu Gupta Saint Louis University

[email protected]

ABSTRACT

Software test design is primarily a manual process that is performed by the software testers. Consequently, the test cases that are developed in this process are unstructured and subjective in nature. Further, application requirements change very frequently and force testers to redevelop the test cases constantly. To address this issue, we propose an action triad method that is based on ER modeling technique. We suggest a set of steps that can be followed to convert the application requirements to a set of triads. When these triads are fed into an optimization engine, a set of standardized and structured test cases are obtained. The test cases are useful as they are automatically generated based on development of the triads. In addition to the saving of time to manually create the test cases, a major advantage of the method is it handles the change of requirements efficiently. When the changes in the requirements are updated in the triads, then a new set of test cases is regenerated.

Keywords

Software Test Design, Test Case, ER Model

INTRODUCTION

A key objective of software testing is to determine that the software products satisfy the business requirements that guided their design and development (Board, 2015). Traditionally, the complete set of functionalities of the application under test is broken down into test cases with steps and expected results (Board, 2015). This process is termed test design and is often performed manually. There is no structured method of converting the business requirements to test design. Thus, this manual exercise often leads to a large number of repetitions and missed requirements in the test cases that are created. These problems are further aggravated as the business requirements of the software change very frequently that force testers to adjust the test design constantly and consequently change the test cases. In this paper, we propose a method based on ER modeling technique that can help develop automated test cases. In the next section, the test case development process is discussed followed by the introduction of the proposed method. The method is demonstrated using a small case study in the following section. The final section is the

discussion section where the benefits of this method are described.

TEST CASE DEVELOPMENT

Test design is the process of transforming test objectives into test cases (Board, 2015). A test case is a set of inputs, execution conditions, and expected results to verify that a software program is compliant to specific requirements (IEEE, 1990). Kaner (2000) mentions that designing good test cases is a complex art as the process of test creation is subjective and is based on testers’ domain knowledge. Also there is no clear method in writing test cases. For example he mentions that if an application has 20 variables, then should we create one test case combining all variables or multiple test cases for each variable. In agile development, creating test cases is particularly challenging as the application changes frequently during the development process.

ACTION TRIAD METHOD

The Approach

To develop automated test cases, we propose a method that can convert business requirements to conceptual models. Conceptual models are used for documenting the features of the domain that needs to be reflected in the Information Systems (Dobing & Parsons, 2008). We use a modified version of ER modeling as a conceptual modeling technique. ER models (Teory, Yang, & Fry, 1986) describe the domain concepts using entities and relationships and the technique is a popular conceptual modeling technique in practice (Dobing & Parsons, 2006). The traditional ER models can be used for modeling domain concepts of an organization but ER models cannot be used directly for modeling software applications. Therefore we adopted certain changes to the ER modeling technique and termed it Action Triad Method.

State diagrams as conceptual models have also been used for test design. A state diagram depicts the states that a system can assume and shows the events that cause and/or result from a change from one state to another (Board, 2015). Test cases are derived from the state diagrams by identifying valid and invalid state transitions (Board, 2015). However, there are two main challenges in using such models. First, developing such models is difficult due to the complexity of state diagrams. As the application gets complex, the number of states and the transitions grow rapidly creating explosion of states (a



phenomenon called – state explosion (Valmari, 1998)). Second, creating optimized test cases from these models is challenging because of large number of states and the frequent change in the functionalities of the application.

The philosophy that we adopt for developing Action Triad Method is software testing can be considered as a game where a tester “pokes” an application under test using various input actions and data combinations and then the tester compares the observed behavior of the application with its expected behavior. Thus the application should be modeled in such a way that it can be poked based on different combination of values in test cases. These test cases can be created using the attribute and instance concepts of ER model. However, if all possible combinations of values are considered then large number of test cases will be created. Therefore, an optimization component should be used to come up with minimum number of test cases that can cover maximum combination of values. This approach is shown in Figure 1.

Figure 1. Approach to generate automated test cases

The Action Triad concepts

The ER concepts of entities and relationships are modified and used in the “action triad” method. In this method, entities are replaced by concepts (agent or object), relationships by functions, and attributes by dimensions. Concepts and functions have dimensions.

Action is the focus of this method and is modeled as a relationship between two concepts. As actions are performed by specific agents on other agents or objects thus this model is described as a set of action triads consisting of Agent-Action-Concept. Accordingly we define an action triad as <x, y, z> where x is an agent, y is an action performed by the agent, and z is the concept (agent or object) on which the action is performed on. The concepts of action triad are described in Table 1.

Concepts in Action Triad

Definition

Agent An agent is an entity that can interact with objects or other agents (Wooldridge, 2000).

Function A function represents activities that are performed by agents.

Object An object represents non-agents in the domain with which the agents act. Objects can be tangible (e.g. Phone) or intangible (e.g. Web site).

Dimension Dimensions describe the objects or agents in measurable form.

Instance Dimensions have instances that are generally expressed in text or numbers.

Table 1. Action triad concepts

A key feature of this method is to instantiate each dimension with additional concepts that are relevant to software testing- instances, scenario, expected results, and requirement ID. A dimension can have multiple values as instances. Each instance can have a positive or a negative scenario. Positive scenario means that the action with the specific instance can be performed successfully. If the user cannot perform the action successfully with a specific instance then the scenario is negative. For example, if password is null then the null instance is considered as a negative scenario as the action login (dimension of which is password) cannot be completed successfully. Expected results indicate the outcome when an action is taken using a specific instance (e.g. null password should result in incomplete login). Requirement IDs correspond to the details of the instances mentioned in the functional requirements document.

Guidelines for applying Action Triad Method

A set of guidelines is proposed so that application requirements can be captured in action triads. These guidelines are:

• Decompose an application to sets of action triads.

• Each action triad is unique for an application under testing.

• Functions must have dimensions but agents and objects may not.

• Each dimension must have an instance with at least one positive scenario.

On implementation of these guidelines, the dimensions of functions will end up as steps of test cases with instances as values in the steps.

A CASE STUDY

To illustrate the application of the method, a small case study is used. In this case study, a user logs in and logs out of an application. The description of the requirements is provided in Table 2. The objective of this case study is to create automated test cases to test the functionalities as described in Table 2.



Table 2. Requirements Example

The screen shots of the interface of the application are shown in Figure 2.

Figure 2. Login and Logout functions of an application

To apply the action triad method, the application description is decomposed into two action triads: <User, Login, Application> and <User, Logout, Application>. The dimensions of the functions and the agents are identified next and are shown in ER representations in Figure 3. The dimensions of the two functions- login and logout are the actions that users can perform in the application. Note that the number of triads is not dependent on the number of application screens.

Figure 3. Action triads in ER representations

The dimensions are further elaborated in Table 3 with instances, scenarios, expected results, and requirements ID. The details of this information are obtained from Table 2 and Figure 2. The number of instances that can be used for dimensions in functions will depend on the requirements documentation. If more number of instances

is used in the triads then more number of test cases may be generated. A mix of positive and negative scenarios for the instances is recommended so that the application can be tested properly.

Note that the username and password appear as dimensions in both user agent and the login function. In the former case, the instances of these dimensions have positive scenarios, where the instances are actual username and password that are assigned to the user. In the latter case, the instances are the ones that a tester will input in the application to test it. Some of these instances will have negative scenarios such as password is Blank.

Dimension Instance Scenario Expected Result

Req ID

UserUsername John Positive

UserPassword 1234! Positive

LoginUsername John Positive Username John should be displayed

1.1, 1.4

LoginUsername Blank Negative No username is displayed

1.1

LoginPassword 1234! Positive Encrypted password is displayed

1.1, 1.4

LoginPassword Blank Negative No password is displayed

1.1

LoginPassword Password Negative Encrypted password is displayed

1.1

LoginTrigger Login Positive On successful login the user is taken to the browse page

1.2

LoginUsername JohnInvalid

Negative Username JohnInvalid should be displayed

1.1

LogoutTrigger Logout Positive On clicking the logout, the user is taken to the alert

1.3

LogoutConfirmTrigger

Yes Positive User is logged out of the application

1.3

LogoutConfirmTrigger

Cancel Negative User is not logged out of the application and taken back to the browse application page

1.3

Table 3. Details of the dimensions of the Login and Logout functions

A tool for the Action Triad method has been developed that facilitates the input of the triad concepts. The triads (Figure 3) and the information mentioned in Table 4 can be entered in the tool. A screen shot of the tool is shown in Figure 4. The tool checks for the implementation of the guidelines. For example, if the user forgets to create an instance of a dimension then it will prompt an error.

Description Requirement ID

The user needs to provide a valid username and password to successfully login to the application.

1.1

After the user logs into the application, then she is taken to the browse application page where the user can click on different reports.

1.2

On clicking the logout button the user gets an alert to logout. If the user clicks on yes then she is logged out otherwise on clicking cancel the user is taken back to the browse application page.

1.3

A user has a valid username (John) and a valid password (1234!).

1.4



Figure 4. Tool to input the concepts of Action Triad

Figure 5. Dimension interface of the tool

Once the triad information is obtained, the tool uses a statistical pairwise optimization engine to create test cases with specific steps. The engine optimizes the dimensions and their instances to create test cases. The dimensions (column 1 of table 3) become test case step descriptions and the instances (column 2 of table 3) become the values of these descriptions. As all the instances will not fit into one test case therefore the optimization engine will create multiple test cases ensuring that a pair of instances is covered in at least one test case. To make the test case step description readable, specific user actions (e.g. enter or click) are used in the step descriptions. In the tool, each dimension can be classified as: text, dropdown, and button (Fig. 5). These keywords are linked to action

words that users perform to test the application. For example, if text is selected as a dimension type then the keyword “Enter” will be used in the test case step description. Thus a test case step description could be “Enter John as LoginUsername” where John is an instance of the dimension LoginUsername whose type is text. Similarly when dimension type is button then the keyword “click” is used in the step description.

For the case study described here, the action triad method generated 7 test cases (2 positive and 5 negative) for the two triads. Table 4 shows two test cases that are generated from the action triad method. Each test case has a description, step number, step description, expected results, and traceability. The step description has specific actions with specific values (e.g. Enter “John” as LoginUserName). However, each test case is different as different combinations of instances are used. If one of the instances has a negative scenario then the test case is considered negative meaning the actions mentioned in the test case should not be successfully executed. If no instances have negative scenarios then the test case is considered as positive meaning the actions mentioned in the test case should be successfully executed.

Each test case starts with the instances of the dimensions (e.g. UserPassword and UserUsername) of the entities (e.g. user). This step is considered as a pre-requisite i.e. the condition that is required before the test case can be run. A pre-requisite step does not have expected results and traceability.

Each test case can have only one negative instance scenario. This is because from a tester’s perspective, if a test case has two or more negative instances (e.g. username is null and password is null) then it is not possible to identify the exact cause of failure of the test (e.g. whether the test failed because password was incorrect or it failed because the username was incorrect).



Table 4. Sample test cases created from the Action Triad method

DISCUSSION

A key activity in test design phase is to create test cases that represent different functionalities of the application. In this paper, a method is suggested to translate the application requirements to representations similar to the ER models. Further, the representations can be converted to structured automated test cases.

The action triad method has several advantages. First, the test cases are obtained in a standardized format. The testers are benefitted from the standardized test cases as they are represented consistently. Second, test case creation is a time consuming effort and an automated process of test creation can save human effort and time. The number of test cases that are generated depends on the triads with different dimensions and instances. Triads with large number of dimensions and instances may create large number of test cases. Third, it is common in software development that the requirements change frequently. When requirements change, manual test cases are updated by the testers. In the action triad method, the test cases are not required to be maintained but the triads need to be updated. For example, in the case study used here, the requirements might change where username now requires a minimum length of 8 characters. This change will now require creating few more instances in the login triad (e.g. one username with 7 characters as a negative scenario and another username with 8 characters as a positive scenario. Also the current valid username “John” needs to be updated as well). When the new instances are updated in the login triad then a new set of test cases is generated by the tool. Thus the method can handle the change requirements whenever it is necessary.

A complex application will have a large number of states and transitions. Thus modeling such application using state diagrams is challenging and so is deriving the test cases from the diagrams. Alternatively, we suggest using ER based Action Triad method as there is no need of modeling transitions and only the higher state changes are modeled using triads. Decomposing an application into set of triads is still a modeling skill that modelers need to apply. However, once the triads are identified, the methodology helps to create the model elements consistently.

REFERENCES

Board, I. S. T. Q. (2015). Standard Glossary of Terms Used in Software Testing Version 3.1. Retrieved from Dobing, B., & Parsons, J. (2006). How UML is used. Commun. ACM, 49(5), 109-113. doi:10.1145/1125944.1125949 Dobing, B., & Parsons, J. (2008). Dimensions of UML Diagram Use: A Survey of Practitioners. Journal of Database Management, 19, 1-18. IEEE. (1990). IEEE Standard 610 IEEE Standards Collection: Software Engineering. Kaner, C. (2000). Architectures of Test Automation. Paper presented at the STAR West, San Jose, Canlifornia. Teory, T. J., Yang, D., & Fry, J. P. (1986). A Logical Design Methodology for Relational Databases Using the Extended Entity-Relationship Model. ACM Computing Surveys, 18(2), 197-222. Valmari, A. (1998). The State Explosion Problem, Lectures on Petri nets: Advances in Petri nets, Berlin-Heidelberg. Wooldridge, M. (2000). Reasoning about Rational Agents. Massachusetts: The MIT Press.


Endicott et al. Construct Similarity

Integrating Scientific Research: Theory and Design of Discovering Similar Constructs

James Endicott University of Colorado

[email protected]

Kai R. Larsen University of Colorado

[email protected]

Roman Lukyanenko University of Saskatchewan

[email protected]

Chih How Bong University of Malaysia Sarawak

[email protected]

ABSTRACT

Assessing the similarity of proposed theoretical constructs to each other and those previously known and studied is imperative in theoretical research. In this paper we turn to theories of similarity judgement from cognitive psychology for the understanding of the process of establishing similarity between one or more constructs. Then, guided by these theories, we develop an integrated method for automatic detection of similar constructs. We apply the method to constructs from leading IS journals, a major journal in psychology, and the interdisciplinary overlap between the IS and psychology constructs. Our paper contributes to methodology of research, design science research, behavioral IS research, text mining and information retrieval theory and practice, IS research on ontology alignment and schema matching as well as cognitive theories of similarity in psychology.

Keywords

Construct Similarity, Information Integration, Cognitive Similarity, Cognitive Psychology, Nomological Net, Schema Matching.

INTRODUCTION

Research on information integration is typically viewed in the context of information systems (IS) development or IS use. Major streams of information integration research includes ontology alignment and database schema mapping (Doan and Halevy 2005, Evermann 2008b, Rahm and Bernstein 2001, Sekhavat and Parsons 2012) as well as search and retrieval (Ingwersen and Järvelin 2006, Passant 2007).

Recently, IS community begins to encourage the application of theories and methods born in the IS context to other scientific disciplines (Beath et al. 2013, Goes 2013). Thus, Parsons and Wand (2012) applied classification principles to the context of scientific

knowledge representation. We follow this example to extend findings originally considered in the context of information integration to the problem of integrating scientific body of knowledge.

Motivated by these efforts, in this early research effort, we employ theories of cognitive similarity that showed promise in IS ontology alignment and schema matching research to develop a framework for understanding the process of establishing similarity between one or more scientific constructs.

A scientific construct is “a conceptual term used to describe a phenomenon of theoretical interest that cannot be observed directly” (Hinkin 2005, p. 162). It could be a psychological construct (representing mental states) or a social one (representing collective intentionality) (Searle 1995). To illustrate, consider an example of a psychological construct, emotional trust, “defined as the extent to which one feels secure and comfortable about relying on the trustee” (Komiak and Benbasat 2006, p. 943). An example of a social construct is corporate social responsibility that can be defined as “actions that appear to further some social good, beyond the interests of the firm and that which is required by law” (El Ghoul et al. 2011, p. 2388, McWilliams and Siegel 2001, p. 117). In both cases, identification of similar constructs is required for literature review, to establish contribution novelty, to build a nomological network, and to argue for the application of research findings beyond a specific research context. Indeed, identification of similar (or related) constructs may suggest additional antecedents and outcomes of the construct of interest, and thus can enrich the researcher’s theory. Theoretical constructs are considered “building blocks of science” (Osigweh 1989, p. 591). Yet we continue to lack theoretical understanding of how similar constructs are determined. This process is also highly contingent on the knowledge, diligence and expertise of the researcher, with little tool support for doing this.







Guided by the theories of cognitive similarity, we develop an integrated method for automatic detection of similar constructs. An ability to automatically determine construct similarity can broadly support research activities, including conducting literature reviews, establishing contribution novelty, generalizing findings beyond the specific context, and potentially enabling automatic construction of a nomological net. The ability to automatically connect related constructs, can increase interdisciplinary of research and assist in building bridges across disciplines. Finally, this research further advances the area information integration research and is potentially applicable to data integration in corporate as well as social media contexts.

SIMILARITY IN IS

Similarity-based approaches have been used in IS within at least three perspectives that are relevant to our research: 1) statistical techniques to develop and test theories (e.g., factor analysis); 2) topical analyses of IS grounded in natural language processing (e.g., Delen and Crossland 2008, Larsen et al. 2008, Larsen and Monarchi 2004, Sidorova et al. 2008); and 3) cognitive similarity to support data integration. Only within the last approach are cognitive theories of similarity explicitly examined.

In contrast to statistical and NLP similarity research in IS, cognitive psychology examines the underlying mechanisms of similarity perception and judgement in humans and other animals (Hahn 2014, Medin et al. 1993). Theories of cognitive similarity have recently been applied to the problem of information integration. Initial work was exploratory and empirical (i.e., trying to understand what theories may be applicable and what factor database developers perceive as important when integrating database schemata) (Evermann 2008a, b, 2009, 2010, Lukyanenko and Evermann 2011). Lukyanenko and Evermann (2011) proposed to improve the algorithms for automatic database schema matching by considering theories of cognitive similarity. The paper identified several promising theories, including spatial, featural, transformational, and structural theories (see below), and suggested additional focus on the latter theory. Evermann (2012), Nasir et al. (2013) and Raad and Evermann (2015) extended this work by applying the structural theory of similarity to the problem of ontology alignment. This work motivates our application of cognitive similarity theories to text similarity and the problem of theoretical construct matching.

THEORIES OF COGNITIVE SIMILARITY

Cognitive psychology—a major reference discipline of IS—considers similarity to be one of the most fundamental mental processes. A multitude of similarity theories have been proposed, evaluated, and applied in real-life as computer algorithms or decision aids, providing a solid foundation for our work on construct similarity. Prior psychology research has suggested four major theoretical accounts of similarity, including spatial,

featural, transformational, and structural similarity (Barsalou 2014, Hahn 2014, Larkey and Markman 2005, Schwering 2005). Below, we briefly review each theory.

One of the earliest accounts of similarity, spatial similarity (also known as geometric), upholds the assumption that real-world phenomena are mapped by humans into a multidimensional mental space (Shepard 1962a, b).

Motivated by the shortcomings of spatial models, Tversky (1977) proposed featural similarity theory. Tversky’s contrast model of similarity describes objects as sets of attributes or features. Thus, given two objects, similarity depends on the sum of features they have in common and the number of distinct features not shared by both objects. Unlike spatial theory, the key to similarity in the featural model is the presence and absence of features, where unique features (i.e., not present in both objects) decrease their similarity.

Whereas featural models treat objects as sets of features (Gati and Tversky 1982) and spatial models are based on feature dimensions (Shepard 1962a), evidence shows that similarity judgement is also sensitive to the roles features play. Structural similarity theory argues that for many types of comparisons (e.g., analogical, non-literal), similarity depends on the coherence of structure rather than featural overlap.

Whereas for the theories reviewed above, the units of similarity were object features (their dimensions and interrelations), a fourth theory, transformational theory, focuses on the number of steps and the effort it takes to transform one object into another (Hahn et al. 2003, Imai 1977). For example, string sequence XXXOOO is more similar to OOOXXX than to OXOXXO because it requires a single operation (a mirror transformation) (Imai 1977, Larkey and Markman 2005).

IMPLEMENTING SIMILARITY THEORIES

The four major similarity theories have been tested against each other. Each theory has been shown to have unique advantages with respect to modeling the similarity process, but each also has its own limitations. Taken together, these theories, one might claim, represent major mental processes involved in similarity judgment. We thus conclude that it is important to consider all major similarity theories in order to ensure that much of the essence of the complex similarity process is captured in a method.

We propose that the automatic method for evaluating construct similarity should instantiate each quadrant of the similarity framework. Moreover, we argue that automatic method for evaluating construct similarity should additionally combine the four similarity theories into a single hybrid algorithm.



EVALUATING THE PROPOSED METHOD

To evaluate the proposed method for automatic identification of similar constructs, we first instantiate each cognitive similarity theory into a computer algorithm. We established instantiation validity (Lukyanenko et al. 2014) of each algorithm by ensuring that its features and behavior can be traced to the constructs and assumptions of the underlying theories (Arazy et al. 2010, Lukyanenko et al. 2015).

Specifically, we chose Latent Dirichlet Allocation (LDA) (Niekler and Jähnichen 2012), Latent Semantic Analysis (LSA) (Deerwester et al. 1990), Google Trigram Model (GTM; Islam et al. 2012) and Mihalcea (Mihalcea et al. 2006) algorithms to instantiate spatial, featural, structural and transformational similarity theories, respectively.

We then apply each algorithm to the texts that represent constructs from leading IS journals, a major journal in psychology, and the interdisciplinary overlap between the IS and psychology constructs.

For IS corpus we use articles from Management Information Systems Quarterly (MISQ) and Information Systems Research (ISR) published between 1990 through 2009. The psychology dataset uses articles published in the Journal of Applied Psychology (JAP) between 2002 and 2003. The interdisciplinary data set takes place at the intersection between IS and psychology and uses a union of articles appearing in either of these data sets.

For each of these data sources, we manually extracted the text contained within the paragraphs of the articles, with the names and definitions of constructs appearing in them, as well as items that are used as prompts in surveys. Additionally, a gold standard was created by human experts that identifies pools, where two constructs in the same pool were defined to represent the same latent construct. All construct, article, and gold standard information were kindly provided by the Human Behavior Project (HBP) at the University of Colorado, Boulder. The process used by HBP to collect, curate, and classify constructs is described in detail in Larsen and Bong (2016), Appendix B.

For each data source, we used three kinds of texts in order to calculate text similarity. First, we used the construct name given by the article’s author. Second, we used the definition of the construct present in the article. Finally, we used an aggregation of the items within the text. This aggregation was performed through a pairwise comparison between the text of each item pair for two constructs. Then, we averaged the two highest scoring item pairs and assigned that average to the similarity score between constructs. In order to facilitate calculation of this aggregate score, only constructs which contained at least two items were included in the study.

RESULTS

We compared the similarity algorithms, chosen based on the four quadrants through use of the Area Under the Curve (AUC) of the Receiver Operating Characteristic (ROC) curve. A ROC curve plots the true positive rate by the false positive rate of a predictive algorithm, where the correct labels are defined to be what human evaluators assigned during the creation of the gold standard. The AUC then integrates the area under the curve created by the ROC. For the task of accurately classifying constructs, a ROC curve is a better measure to use than picking a specific cutoff point, because different researchers would likely have different thresholds for determining when an algorithm is producing too many false positives.

Because we have no a prior reason to believe that the distributions of similarity scores will follow a normal distribution, bootstrapping was used to estimate median and variance of the algorithms. Bootstrapping creates subsamples of the original data set by selecting rows at random with replacement so that a number of rows equal to that present in the original data set are in the subsample but duplicates or omissions are allowed. In the case of this study, stratified random sampling was used because the number of pairs that are in the same pool is so imbalanced compared to those that are not. For each of 20,001 subsamples, the AUC was calculated.

Initially, Wilcoxon signed-rank tests (Desmar 2005) were used to compare the AUCs as a test on paired nonparametric rank data is appropriate for AUCs (Fang et al. 2013). This analysis resulted in all group differences being significant beyond the p < 0.001 level. However, these p-values being so small is merely an artifact of how poorly p-values work when using large sample sizes as documented by Lin et al. (2013). To address this problem, Lin suggests using confidence intervals as is done in other fields such as econometrics. Based on this, 95% confidence intervals are reported for AUCs in appendix B and comparisons of algorithms are performed based on testing whether the 95% confidence interval of the difference between two algorithms includes 0.

For each of the data sources and types of text, 95% confidence intervals were estimated using bootstrapping. Additionally, a 95% confidence interval was estimated on the difference between each algorithm and the best performing algorithm for the data source as well as between each algorithm and the best performing algorithm for the task (where tasks are how well a type of text performed for a given data source). This provided us with the information necessary to examine whether the best performing algorithm was significantly different from others. The results of that examination are reported in Figure 1, where the top performing algorithm on each task are coded in green, and algorithms which are not significantly different are coded in yellow. Similarly, the best performing algorithm for each data source and algorithms which cannot be distinguished from it are bolded.



Name Def Agg Name Def Agg Name Def Agg

LDA-SD 0.683 0.704 0.730 0.877 0.784 0.816 0.744 0.680 0.848

LDA-IR 0.698 0.712 0.738 0.894 0.805 0.818 0.772 0.731 0.867

LSA-Self 0.662 0.657 0.798 0.856 0.835 0.824 0.749 0.789 0.900

LSA-Bus 0.689 0.670 0.761 0.757 0.800 0.798 0.771 0.878 0.854

LSA-News 0.703 0.674 0.758 0.779 0.849 0.826 0.660 0.890 0.876

LDA-Cos 0.713 0.678 0.732 0.909 0.837 0.828 0.792 0.841 0.883

Structural GTM 0.711 0.667 0.792 0.723 0.765 0.847 0.698 0.786 0.853

Transformational Mi 0.677 0.683 0.808 0.749 0.789 0.879 0.715 0.834 0.914Hybrid CID1 0.698 0.687 0.828 0.823 0.807 0.809 0.812 0.851 0.883

IS Psych Inter

Featural

Spatial

Figure 1. AUC Scores for Each Similarity Algorithm

The first result to note is that many of the algorithms performed at the excellent (0.8-0.9) discrimination level, and three reached the “outstanding” (>0.9) discrimination level identified by Hosmer (2013). Performance on the Psych data source is particularly notable in this regard, because 19 of the 27 algorithm-text type of combinations reached the excellent level including all but one algorithms based on item aggregates. Also worth noting is that the Hybrid algorithm, CID1, performed at the excellent level in 7 of 9 tasks. Finally, aggregates produced higher AUCs than names in 22/27 tasks and definitions in 21/27 tasks.

When examining the data sources, IS had a clear performance leader in CID1 Aggregate which outperforming all other algorithms significantly with a difference of at least 0.106 for each Name and Definition based algorithm and at least 0.015 for other aggregate based algorithms. Results were less striking within the Psych data source where most algorithms performed better, leaving less room for variance in performance. While LDA-cosine appeared to be the top performer, it could not be distinguished from LDA-SD or LDA-IR for name; LSA-News for definitions; or GTM and Mi for aggregates. That having been said, at best, the algorithms that could have outperformed LDA-cosine would only have done so by less than 0.01 for all cases but Mi Aggregate where the confidence interval on their difference was (-0.095, +0.033). Within the Interdisciplinary data source, we found results similar to those in the IS data source except with Mi Aggregate being the unquestionable top performer.

The static algorithms, spatial and featural, appear to play a key role in comparing the similarity of definitions. Across all three data sources, the top performing text algorithm was never a structural or a transformational algorithm. A potential explanation for this phenomenon is that similarity between definitions may not operate at either the word or sentence level but instead at a higher paragraph level. If this is the case, then examining structural or transformational changes at the sentence level (e.g. word order within sentences) is less important than changes at the paragraph level (e.g. order of sentences within a paragraph). As such, algorithms which perform well when comparing sentences may be inappropriate when comparing paragraphs.

Along a similar vein, it appears that transformational algorithms do not perform well when comparing the similarity of names. In no instance did a transformational algorithm produce a top performing result for a name. This could represent how little of a word will retain its original state if any transformation is made. As a result, each mental step involved in a transformation is a larger one because it modifies a greater proportion of the meaning than would a similar change. This would also explain why transformational algorithms were at their strongest when compared working with aggregates. The aggregates were analyzed by comparing each possible pair of items, which then had an average of a subset calculated. As a result, even replacing an item completely would only impact 1/n of the scores feeding into the similarity comparison, so smaller transformations that only altered some text within an item would be comparatively miniscule.

The last interesting comparisons to be made are between the different data sources used. The most striking difference between the data sources is what text types are most predictive of whether or not constructs are synonymous. In the IS data source, aggregates performed well, but both names and definitions were found to be unacceptable more often than not. In the Psych data source, on the other hand, aggregates performed well, even better than in the IS data source; names and definitions also showed excellent results for about half of the algorithms. While there are many possible interpretations for these Psych findings, the one that warrants the most attention, if not caution, is that they may indicate that IS, as a discipline, is poor at naming and defining the constructs we use in a way that correlates with how they are operationalized in survey research, at least relative to a mature discipline, such as psychology. If this is the case, there is a strong need in our field to refine the process of construct development so that when a construct is operationalized, it will be done using language that reflects the definition of the construct.

DISCUSSION AND IMPLICATIONS

Our paper is the first step in the research on construct similarity and makes several important contributions to the methodology of research, design science, behavioral information systems and psychology research, text mining and information retrieval theory and practice, as well as theoretical research on similarity in cognitive psychology and research on information integration in IS.

Considering the results we obtained, we confidently conclude that the algorithms we chose based on the four similarity theories are able to sufficiently approximate the complex similarity judgement by human experts (with ability to discriminate between true and false positive rates as high as 90%). This suggests that our automatic method can be a reliable support tool for researchers engaged in construct similarity-related tasks.



Based on the analysis of the three datasets, the results offer strong evidence for utility of the proposed method, calling for instantiation of the four similarity quadrants to automatically detect construct similarity. As is evident from the results above, every theoretical quadrant of the framework contains at least one algorithm that is highly effective at detecting construct synonymy, and every quadrant contains at least one algorithm that outperforms all of the other algorithms. Further, in no case does a single algorithm outperform other algorithms more than twice. This is consistent with the empirical findings in psychology (discussed earlier), wherein no given theory has been consistently superior in all tasks.

Our results carry important implications for the enterprise of integrating sciences. Theoretical constructs are basic building blocks of social sciences (Osigweh 1989), and identifying their interrelationships is fundamental for building a cumulative body of research knowledge (Bagozzi and Fornell 1982, Osigweh 1989). In this paper, we attempted to unpack the black box of construct similarity evaluations routinely performed by researchers, with the aim of making a theoretical contribution. Similarity judgments are ubiquitous in the process of conducting construct research. However, to date, no theory for how such judgments were performed has existed. We contributed by grounding construct similarity evaluations in fundamental theories of cognitive similarity from psychology. We reviewed four major theories (spatial, featural, structural, and transformational) that provide a general theoretical foundation for understanding how humans approach construct similarity evaluations. Unlike current statistical methods, our method does not require response data in order to have an assessment of similarity. This means that we can offer an a priori, proactive assessment that can be used before the costly and time-intensive process of data collection and gathering.

The findings of this study further contribute to cognitive psychology. To our knowledge, this is the first attempt at a concurrent implementation and comparison between the major similarity theories in the context of texts. The results further validate a key design feature of our method, namely to implement multiple algorithms, rather than a single one, and to motivate future work on combining algorithms and their respective theories. Our work answers a recent call in the design science research community to instantiate multiple design artifacts (in our case, algorithms) – a notion known as artifact sampling (Lukyanenko et al. 2016). The effectiveness of all algorithms, suggests that the traditional approach taken in psychology where only a single theory was in focus, may have neglected the complex nature of the similarity judgment.

The findings of this study further contribute to research on information integration, including work on schema matching, and ontology alignment. Recent research began to consider and apply theories of cognitive similarity to

ontology alignment and database schema matching (Evermann 2008a, b, 2009, 2010, Lukyanenko and Evermann 2011). However, in each case, only a single theory was used to inform design and evaluation (Evermann 2012, Nasir et al. 2013, Raad and Evermann 2015). Our findings show significant promise in developing methods that leverage advantages of each similarity theory to produce a more effective mechanism of ontology and database integration.

We believe our contribution to the information integration research is timely. Traditional information integration research has been concerned with finding similar elements in highly structured data sets (Batini et al. 1986). With proliferation of online content producing technologies, ordinary people increasingly generate vast amounts of information collectively known as user generated content (Daugherty et al. 2008, Kane et al. 2014, Levina and Arriaga 2014). These can be blogs, social media posts, forum and discussion board messages, product reviews and online chat conversations. It is estimated that the volume of user generated content has surpassed the volume of data in corporate databases (Vallente 2014). Our work aims to provide both theoretical and practical insight into the problem of integrating unstructured information. Indeed, researchers continue to call for novel approaches to structure user generated content to make it more consistent and usable in organizational analysis (Lukyanenko et al. 2017). As finding similarity precedes development of structure (Goldstone 1994), our work has strong potential to contribute to the efforts to make user generated content more usable. In the future, we hope to extend our work to the area of user generated content (specifically, crowdsourcing) to address the many information integration challenges in this new and exciting domain.

References Arazy O, Kumar N, Shapira B (2010) A theory-driven

design framework for social recommender systems. J. Assoc. Inf. Syst. 11(9):455–490.

Bagozzi RP, Fornell C (1982) Theoretical concepts, measurements, and meaning. Second Gener. Multivar. Anal. (Praeger, New York, NY), 5–23.

Barsalou LW (2014) Cognitive Psychology: An Overview for Cognitive Scientists (Taylor & Francis).

Batini C, Lenzerini M, Navathe SB (1986) A comparative analysis of methodologies for database schema integration. Comput. Surv. 18(4):323–364.

Beath C, Berente N, Gallivan MJ, Lyytinen K (2013) Expanding the frontiers of information systems research: Introduction to the special issue. J. Assoc. Inf. Syst. 14(4):5.

Burton-Jones A, Volkoff O (2017) How can we develop contextualized theories of effective use? A demonstration in the context of community-care electronic health records. Inf. Syst. Res. Forthcoming:1–40.



Daugherty T, Eastin M, Bright L (2008) Exploring Consumer Motivations for Creating User-Generated Content. J. Interact. Advert. 8(2):16–25.

Deerwester S, Dumais ST, Furnas GW, Landauer TK, Harshman R (1990) Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. 41(6):391.

Delen D, Crossland MD (2008) Seeding the survey and analysis of research literature with text mining. Expert Syst. Appl. 34(3):1707–1720.

Doan A, Halevy AY (2005) Semantic-integration research in the database community - A brief survey. Ai Mag. 26(1):83–94.

El Ghoul S, Guedhami O, Kwok CC, Mishra DR (2011) Does corporate social responsibility affect the cost of capital? J. Bank. Finance 35(9):2388–2406.

Evermann J (2008a) An Exploratory Study of Database Integration Processes. IEEE Trans. Knowl. Data Eng. 20(1):99–115.

Evermann J (2008b) Theories of meaning in schema matching: A review. J. Database Manag. 19(3):55–82.

Evermann J (2009) Theories of meaning in schema matching: An exploratory study. Inf. Syst. 34(1):28–44; 17p.

Evermann J (2010) Contextual Factors in Database Integration — A Delphi Study. Parsons J, Saeki M, Shoval P, Woo C, Wand Y, eds. Conceptual Modeling – ER 2010. (Springer Berlin / Heidelberg), 274–287.

Evermann J (2012) Applying Cognitive Principles of Similarity to Data Integration–The Case of SIAM. AMCIS 2012 Proc.

Gati I, Tversky A (1982) Representations of qualitative and quantitative dimensions. J. Exp. Psychol. Hum. Percept. Perform. 8(2):325.

Goes PB (2013) EDITOR’S COMMENTS: Information Systems Research and Behavioral Economics. Manag. Inf. Syst. Q. 37(3):iii–viii.

Goldstone RL (1994) The role of similarity in categorization: Providing a groundwork. Cognition 52(2):125–157.

Hahn U (2014) Similarity. Wiley Interdiscip. Rev. Cogn. Sci. 271–280.

Hahn U, Chater N, Richardson LB (2003) Similarity as transformation. Cognition 87(1):1–32.

Hinkin TR (2005) Scale development principles and practices. Res. Organ. Found. Methods Inq. 161–179.

Hosmer DW, Lemeshow S, Sturdivant RX (2013) Applied logistic regression (John Wiley & Sons, Hoboken, NJ).

Imai S (1977) Pattern similarity and cognitive transformations. Acta Psychol. (Amst.) 41(6):433–447.

Ingwersen P, Järvelin K (2006) The turn: Integration of information seeking and retrieval in context (Springer Science & Business Media).

Islam A, Milios E, Kešelj V (2012) Text similarity using google tri-grams. Adv. Artif. Intell. (Springer), 312–317.

Kane GC, Alavi M, Labianca GJ, Borgatti S (2014) What’s different about social media networks? A framework and research agenda. MIS Q. 38(1):274–304.

Komiak SYX, Benbasat I (2006) The effects of personalization and familiarity on trust and adoption of recommendation agents. MIS Q. 30(4):941–960.

Larkey LB, Markman AB (2005) Processes of Similarity Judgment. Cogn. Sci. 29(6):1061–1076.

Larsen KR, Monarchi DE (2004) A Mathematical Approach to Categorization and Labeling of Qualitative Data: the Latent Categorization Method. Sociol. Methodol. 34(1):349–392.

Larsen KR, Monarchi DE, Hovorka DS, Bailey CN (2008) Analyzing unstructured text data: Using latent categorization to identify intellectual communities in information systems. Decis. Support Syst. 45(4):884–896.

Levina N, Arriaga M (2014) Distinction and Status Production on User-Generated Content Platforms: Using Bourdieu’s Theory of Cultural Production to Understand Social Dynamics in Online Fields. Inf. Syst. Res. 25(3):468–488.

Lukyanenko R, Evermann J (2011) A survey of cognitive theories to support data integration. AMCIS 2011 Proc. (Detroit, USA), 1–15.

Lukyanenko R, Evermann J, Parsons J (2014) Instantiation Validity in IS Design Research. DESRIST 2014 LNCS 8463. (Springer), 321–328.

Lukyanenko R, Evermann J, Parsons J (2015) Guidelines for Establishing Instantiation Validity in IT Artifacts: A Survey of IS Research. DESRIST 2015 LNCS 9073. (Springer, Berlin / Heidelberg).

Lukyanenko R, Parsons J, Wiersma YF, Wachinger G, Huber B, Meldt R (2017) Representing Crowd Knowledge: Guidelines for Conceptual Modeling of User-generated Content. J. Assoc. Inf. Syst. 34(1):1–42.

Lukyanenko R, Samuel BM, Evermann J, Parsons J (2016) Toward Artifact Sampling in IS Design Research. Workshop Inf. Technol. Syst. 1–10.

McWilliams A, Siegel D (2001) Corporate social responsibility: A theory of the firm perspective. Acad. Manage. Rev. 26(1):117–127.

Medin DL, Goldstone RL, Gentner D (1993) Respects for similarity. Psychol. Rev. 100(2):254.

Mihalcea R, Corley C, Strapparava C (2006) Corpus-based and knowledge-based measures of text semantic similarity. 775–780.

Nasir M, Hoeber O, Evermann J (2013) Supporting ontology alignment tasks with edge bundling. (ACM), 11.



Niekler A, Jähnichen P (2012) Matching results of latent dirichlet allocation for text. 317–322.

Osigweh CAB (1989) Concept fallibility in organizational science. Acad. Manage. Rev. 14(4):579–594.

Parsons J, Wand Y (2012) Extending Classification Principles from Information Modeling to Other Disciplines. J. Assoc. Inf. Syst. 14(5):2.

Passant A (2007) Using Ontologies to Strengthen Folksonomies and Enrich Information Retrieval in Weblogs. Int. Conf. Weblogs Soc. Media ICWSM.

Raad E, Evermann J (2015) The role of analogy in ontology alignment: A study on LISA. Cogn. Syst. Res. 33:1–16.

Rahm E, Bernstein PA (2001) A survey of approaches to automatic schema matching. VLDB J. 10(4):334–350.

Schwering A (2005) Hybrid Model for Semantic Similarity Measurement. Meersman R, Tari Z, eds. On the Move to Meaningful Internet Systems 2005: CoopIS, DOA, and ODBASE. (Springer Berlin / Heidelberg), 1449–1465.

Searle JR (1995) The construction of social reality (Simon and Schuster).

Sekhavat YA, Parsons J (2012) Semantic Schema Mapping Using Property Precedence Relations. Semantic Comput. ICSC 2012 IEEE Sixth Int. Conf. On. (IEEE), 210–217.

Shepard R (1962a) The analysis of proximities: Multidimensional scaling with an unknown distance function. I. Psychometrika 27(2):125–140.

Shepard R (1962b) The analysis of proximities: Multidimensional scaling with an unknown distance function. II. Psychometrika 27(3):219–246.

Sidorova A, Evangelopoulos N, Valacich JS, Ramakrishnan T (2008) Uncovering the intellectual core of the information systems discipline. MIS Q.:467–482.

Tversky A (1977) Features of similarity. Psychol. Rev. 84(4):327–352.

Vallente D (2014) Information Explosion & Cloud Storage


Jabbari Sabegh et al. A New Agenda for Conceptual Modeling

Conceptual Modeling Research in Information Systems: What we now know and what we still do not know

Mohammad Ali Jabbari Sabegh Queensland University of Technology

[email protected]

Roman Lukyanenko University of Saskatchewan

[email protected] Jan Recker

Queensland University of Technology [email protected]

Binny Samuel University of Cincinnati

[email protected] Arturo Castellanos

Baruch College (CUNY) [email protected]

ABSTRACT

Much of conceptual modeling research over recent times has been guided by a seminal research agenda developed by Wand and Weber (2002), which identified twenty-two research opportunities. In this paper, we explore whether existing research has provided sufficient answers to these questions. Our findings from a review of the literature show a dialectic: several of the opportunities noted in 2002 have been addressed substantially while others have been entirely neglected. We also found several path breaking studies that addressed problems not spotted by the initial framework. To stimulate a forward-looking wave of conceptual modeling research, we provide a new framework that draws the attention of conceptual modeling research to the interplay between digital representations and outcomes.

Keywords

Conceptual modeling, research opportunities, literature review, research agenda

INTRODUCTION

Conceptual modeling has long been regarded a niche topic of interest to the community of scholars interested in systems analysis and design. A seminal event in the history of conceptual modeling research that brought the topic to the mainstream area of information systems (IS) research was the publication of a research framework and agenda by Wand and Weber (2002). This publication stimulated studies that answer questions regarding how to create high-quality conceptual models to better facilitate developing, implementing, using, and maintaining more valuable IS.

Wand and Weber (2002) proposed several research opportunities based on four main concepts of conceptual modeling research: conceptual modeling grammar, method, script and context. Fifteen years after its

publication, we evaluate whether the original research questions by Wand and Weber (2002), or the answers provided to them, have been sufficient.

In this paper, therefore, we pursue a two-fold objective. First, we examine the published research on conceptual modeling since the publication of the Wand and Weber (2002) paper. We synthesize relevant studies on conceptual modeling that in our view contribute to and shape our understanding of the conceptual modeling discipline, and then identify the remaining gaps in the field that need further investigation. Second, we also ask whether the framework by Wand and Weber (2002) remains ideal to this day or whether a new agenda should be set. In addressing both objectives, our paper provides a comprehensive retrospective perspective on conceptual modeling research, as well as substantive generative directions for future research. In this abbreviated paper, we highlight some aspects of both perspectives.

REVIEW PROCEDURES

Our literature review method involved four steps (Webster and Watson 2002; Paré, Trudel, Jaana and Kitsiou 2015).

1. We selected our sample: we considered studies published in the AIS basket of eight journals (Liu and Myers 2011), as the representative for mainstream high-quality research in IS, and the Journal of Database Management because this journal has traditionally been one of the leading substantive journals publishing studies on conceptual modeling.

2. We performed a full-text search in all papers in the selected journals using relevant keywords since 2002. We retrieved 3,546 papers by October 2016. We then excluded all papers that used the term conceptual model to refer to a theory or research framework, in which we reduced the total to 105 relevant papers. The summary of the keywords, search results and



distribution of the papers are omitted from this paper to conserve space.

3. We developed and applied a coding scheme (available from the authors) to categorize each paper alongside multiple broad dimensions of focus and goal, prominent conceptual modeling element addressed (building on the classifications used in Wand and Weber 2002), research method used, and evidence obtained if any.

4. To ensure a reliable application of the coding scheme, one of us coded all 105 papers whilst a second author independently coded a random subset of 30 papers. Their inter-coder agreement was 62%. The two authors then discussed disagreements, updated coding criteria and instructions, and then independently revised the coding over two more rounds until 100% agreement was reached. The first

author then revised the coding of the remaining 75 articles.

ANALYSIS OF THE LITERATURE

In this section, we present a brief overview of the findings from our analysis with respect to the four main conceptual modeling categories. First we offer three general observations: (a) 46% of reviewed studies were published in the Journal of Database Management; (b) more than 37% of the published studied concentrated on more than one element of conceptual modeling (e.g. grammar and script) together; and (c) UML and ERD were the most popular grammars investigated in the literature. Table 1 summarizes papers on the research opportunities proposed by Wand and Weber (2002), aggregated by focus and differentiated by type of contribution.

Element Focus # # Contribution

Con

cept

ual M

odel

ing

Gra

mm

ar

Evaluating ontologies based on empirical testing of their predictions 0

Evaluating grammars for ontological expressiveness 6 1 Empirical 5 Non-Empirical

Assigning ontological meaning to constructs of design grammars and generating ontologically motivated modeling rules 8 5 Empirical

3 Non-Empirical Resolving outstanding ontological problems that impact conceptual modeling-e.g., nature of the part-of relationship 2 1 Empirical

1 Non-Empirical

Empirically testing predicted strengths and weaknesses in new and existing grammars based on their ontological expressiveness 0

Determining which combinations of grammars best support users who undertake conceptual-modeling work 0

Empirically testing the predicted implications of construct deficit and overload in grammars 10 10 Empirical

Grammar-Other 6 3 Empirical 3 Non-Empirical

Con

cept

ual M

odel

ing

Met

hod

Evaluating how well different methods allow users to elicit and model critical domain knowledge 9 8 Empirical

1 Non-Empirical Developing procedures to assist users of a grammar in identifying and classifying phenomena according to the grammar's constructs 19 8 Empirical

11 Non-Empirical Determining the beliefs and values that underlie different methods and evaluating the consequences of these beliefs and values for practice 1 1 Empirical

Method-Other 9 6 Empirical 3 Non-Empirical

Con

cept

ual M

odel

ing

Scrip

t

Evaluating competing scripts generated via the same grammar to describe some phenomenon 21 21 Empirical

Evaluating competing scripts generated via different grammars to describe the same phenomenon 3 3 Empirical

Evaluating different combinations of scripts to determine which combination best supports the task at hand 0

Developing theory to predict and understand how humans use scripts to accomplish various tasks 4 3 Empirical

1 Non-Empirical

Scripts-Other 9 3 Empirical 6 Non-Empirical

Con

cept

ual-

Mod

elin

g C

onte

xt

Indi

vidu

al

Diff

eren

ces Development of knowledge-based tools to support conceptual modeling 5 5 Empirical

Predicting which cognitive and personality variables bear on a user's ability to undertake conceptual-modeling work 17 11 Empirical

6 Non-Empirical Predicting and testing empirically which social skills affect the outcomes of conceptual modeling tasks 1 1 Non-Empirical



Individual difference factor-Other 1 1 Empirical

Task

Evaluating the strengths and weaknesses of conceptual modeling grammars, methods, and scripts in the context of different tasks 4 3 Empirical

1 Non-Empirical

Task factors-Other 5 3 Empirical 2 Non-Empirical

Soci

al A

gend

a

Understanding which values and beliefs underlie conceptual-modeling work in practice 3 3 Empirical

Determining the costs and benefits of adopting different values and beliefs when undertaking conceptual-modeling work 0

Articulating detailed conceptual-modeling procedures that are congruent with different beliefs and values 0

Understanding how existing conceptual modeling grammars and methods facilitate conceptual-modeling work under different values and beliefs 0

Social Agenda factors-Other 2 1 Empirical 1 Non-Empirical

Oth

er

Papers that did not match the research framework by Wand and Weber (2002) 21

2 Empirical

19 Non-Empirical

Table 1. Papers on Conceptual Modeling Elements by Type of Contribution

Conceptual Modeling Grammars

The dominant findings of research on modeling grammars were that any one grammar has some level of construct deficit (Recker, Rosemann, Indulska and Green 2009; Irwin and Turk 2005), and that the diagrams created with the ontologically motivated rules lead to better domain understanding (Bera, Krasnoperova and Wand 2010). However, there were also some contrary findings about the effect of construct overload on clarity and usefulness of conceptual models (e.g., Shanks et al. (2008) vs. Bowen et al. (2009)).

6 studies did not fall in any grammar-related research opportunities. Three main themes emerged from these studies. First, researchers highlighted the importance of factors other than ontological elements (Figl, Mendling and Strembeck 2012; Clarke, Burton-Jones and Weber 2016); second, some studies examined factors affecting usage behavior of grammars (Dobing and Parsons 2008; Recker 2010), and third, research addressed the complexity of grammars and difficulties in learning how to use them (VanderMeer and Dutta 2009).

Conceptual Modeling Methods

Research on modeling methods mostly focused on developing rules and methods (Poels, Maes, Gailly and Paemeleire 2011; Poels 2011) to assist users of grammars (Parsons and Wand 2008), and reducing the variety of developed models (Hadar and Soffer 2006). The dominant idea emerging from research in this category was that cognitive principles and ontological guidelines can assist users of grammars (Bera et al., 2010).

9 studies did not fall in any method-related research opportunities. One of the arguments of this line of work was that ontological guidelines, per se, cannot sufficiently cover the problems of conceptual modeling (Clarke et al.,

2016). Second, researchers suggested methods to overcome problems such as information loss (Lukyanenko, Parsons and Wiersma 2014) and to improve quality of mappings and transformations of conceptual schemas to designed platforms (An, Hu and Song 2010; Pardillo, Mazón and Trujillo 2011).

Conceptual Modeling Scripts

Four main themes emerged from studies on scripts: first, evaluations of scripts developed using the same grammar based on ontological factors. This stream of work has argued that ontological clarity improves the performance of model users (e.g., Parsons 2011; Bowen, O'Farrell and Rohde 2006). However, some studies also provided contradictory results (Bowen et al., 2009; Bera, Burton-Jones and Wand 2014). Second, evaluations of different scripts developed using different grammars (Figl et al., 2012; Khatri, Vessey, Ramesh, Clay and Park 2006). The dominant findings of this stream of research are that notational deficits that exist in some grammars increase cognitive load. The third theme was that using high-quality information in different formats improves users’ performance (Burton-Jones and Meso 2008). The fourth theme was that following ontological guideline to develop models decreases both developers’ and model readers’ cognitive difficulties (Bera et al., 2010; Bera 2012).

9 papers did not fall in any categories on modeling scripts. Two main outcomes emerged from these studies; first, sets of quality measures that relate to modeling scripts (Siau 2004; Krogstie, Sindre and Jørgensen 2006); second, the use of different types of additional information in support of modeling scripts (Burton-Jones and Meso 2008; Gemino and Parker 2009). The most notable unanswered opportunity in this category concerns the lack of empirical investigations on the use of multiple models.



Conceptual Modeling Context

Individual Difference Factors

The main themes arising from this stream of research were, first, the importance of the use of collected and learned knowledge in conceptual modeling (Purao, Storey and Han 2003; Koschmider, Song and Reijers 2010); second, aspects of traceability of the system (Pardillo et al., 2011; Loucopoulos and Kadir 2008); third, the importance of cognitive and personality variables (Davern, Shaft and Te'eni 2012; Browne and Parsons 2012); and fourth, the relevance of support from managerial teams (Bandara, Gable and Rosemann 2005) in conceptual-modeling work.

Task Factors

The main foci were the effects of differences in task settings (Recker 2010), the purpose of conceptual modeling (Green and Rosemann 2004; Recker, Indulska, Rosemann and Green 2010), and different stakeholders involved in conceptual modeling (Green and Rosemann 2004).

Several researchers identified the availability of tools for different tasks as an important factor in conceptual modeling (Bandara et al., 2005; Recker 2012). Other important task-related factors identified were domain tangibility (Soffer and Hadar 2007), the modeling grammar choice in dependence of a task (Bandara et al., 2005), and task complexity in general (VanderMeer and Dutta 2009).

Social Agenda Factors

One of the main arguments that emerged from studies in this classification was that the definitions of success may differ by the unit of analysis (e.g., developer, project, organization) and that the relationship among these definitions is complex (Hadar and Soffer 2006; Larsen, Niederman, Limayem and Chan 2009). Another study revealed that modeling conventions play an important role in the process of conceptual modeling (Recker 2010).

Two studies examined opportunities in addition to those proposed by Wand and Weber (2002). The first emerging idea was to use knowledge from social networks in order to improve the quality of conceptual models (Koschmider et al., 2010). The second emerging idea concerned environmental considerations during conceptual modeling (Zhang, Liu and Li 2011).

Articles that did not match the research framework by Wand and Weber (2002)

21 papers did not fall in any categories of the research framework proposed by Wand and Weber (2002). We identified four additional main streams from these studies: first, multidimensional conceptual modeling (Trujillo, Luján-Mora and Song 2004; Garrigós, Pardillo, Mazón, Zubcoff, Trujillo and Romero 2012); second, quality of

knowledge engineering (Chua, Storey and Chiang 2012); third, a complementary role of ontologies (Fonseca and Martin 2007), and forth, different aspects of model-driven architecture engineering, such as security features (Fernández-Medina, Trujillo and Piattini 2007; D'aubeterre, Singh and Iyer 2008) or software configuration and design patterns (Dreiling, Rosemann, Van Der Aalst, Heuser and Schulz 2006; Vergara, Linero and Moreno 2007).

GUIDING THE NEXT WAVE OF CONCEPTUAL MODELING RESEARCH: A NEW FRAMEWORK

Based on our literature review, we believe that Wand and Weber’s (2002) framework was useful and necessary at its time. It organized key aspects of conceptual modeling research to progress and assisted in ascertaining conceptual modeling’s place as a core research stream in IS. The volume of literature published since 2002 also suggests that the framework served its purpose of guiding the community of researchers.

However, in our own use of the framework for research and for the purpose of this literature review, we identified several reasons why we believe that a new framework may be more suitable to guide the next wave of conceptual modeling research than simply following-up on the outstanding research opportunities we identified above. Our main reasons are the following:

1. Wand and Weber’s framework is script-centric; it places the creation of modeling scripts (via grammars, methods and in a context) at the core of modeling activity. This, for example, makes it difficult to accommodate cases where the modeling activity does not give prominence to modeling scripts.

2. The framework is focused on supporting IS development (via modeling). While IS development is a major part of IS, the exisiting framework prohibits consideration of the use of existing IS, interaction with the data provided through an IS (e.g., business analytics) or indeed any impacts that stem from the use of IS (i.e., outcomes).

3. The framework is coined by the tacit assumption that modeling is typically undertaken by professional IS analysts, knowledgeable in appropriate methods and grammars. Recently, however the proliferation of content-producing technologies that may support creation of digital representations by ordinary people (e.g., Twitter’s hashtags), raises questions about modeling performed by ordinary people which may be more creative and spontaneous than the traditional process (Lukyanenko, Parsons, Wiersma, Wachinger, Huber and Meldt 2017; Chang 2010; Ramesh and Browne 1999).

4. Consistent with the decades of conceptual modeling research preceding the framework where many modeling grammars and approaches have been proposed, the framework emphasized evaluation of



existing grammars, potentially to the neglect of the design of entirely novel modeling artifacts or approaches. The dramatic changes to the information technology landscape, however, call for revisiting traditional design assumptions and suggests development of novel conceptual modeling methods, grammars and scripts. An already debated instance in this context is the use of conceptual modeling for agile development (Erickson, Lyytinen and Siau 2005; Lukyanenko, Parsons and Samuel 2015) to name just one example.

5. The framework is technology-agnostic. With a steady availability of design automation tools (Orlikowski 1993) and the increasing prevalence of technologies with inherent agency even without human interventions, the modeling of domains, existing or future, is not necessarily a function of human conceptualization or behavior alone any more. Mining techniques that construct process models from event logs automatically are a case in point (van der Aalst 2011). This calls for consideration of direct technology support, enablement or even embodiment of conceptual modeling.

6. The framework is static and does not explicitly consider feedback resulting from the creation and use of modeling. This makes it difficult to accommodate multi-stage study designs, such as action design research (Akhigbe and Lessard 2016; Sein, Henfridsson, Purao, Rossi and Lindgren 2011) involving modeling phenomena.

In sum, while the Wand and Weber (2002) framework remains reflective of existing practice and has been useful to the academic discourse up to this day, it under-represents the ever-widening spectrum of phenomena that can be supported by conceptual modeling. Therefore, in what follows we propose a new framework with the objective of capturing both the traditional as well as emerging opportunities.

Key to the new framework is the view that a digital representation of reality - which lies at the core of conceptual modeling research - is becoming a major societal force as information technology increasingly entwines with all human activities (Leonardi 2011). Representations can be either formal or informal conceptualizations of user views and information requirements, structure and behavior of information systems, personal, social and business processes and existing information records. Representations can take forms of diagrams (e.g., such as ER diagram), but can also include narratives, images, and other multimedia forms. From a cognitive perspective, the representations we refer to are considered external representations (Zhang 1997); artifacts that exist outside of any one individual’s mind and contain knowledge and structure about a domain.

As human reliance on IS for daily functions grows, people routinely reason and act based on their perceptions of

representations of reality stored in digital systems and increasingly shun direct and traditional interactions. Floridi (2012) coins this on-going process the “enveloping” of society by an ever-increasing digital layer. We believe conceptual modeling research brings an important array of theories, tools, methods and objects of research to develop, support and interpret modern digital representations. While representations are a research object for many scientific disciplines (Hoyningen-Huene 2013), the IS conceptual modeling community has unique expertise investigating representations in the context of information technology. We thus propose a new research agenda of investigating representations to support the development and use of information and information technologies. This agenda remains cognizant and incorporates all issues related to conceptual modeling scripts, grammars, methods and context that Wand and Weber’s framework stipulated, but is substantially broader as it explicitly recognizes the role of the conceptual modeling community in supporting a wide range of human interactions with information technologies. At the same time, it retains the core of the traditional framework, as the issue of representation constituted a major part of research on conceptual modeling scripts, grammars and methods (Browne and Parsons 2012; Burton-Jones and Grange 2013; Kent 1978; Rai 2017; Wand and Weber 1995). Figure 1 shows our view of this framework.

To illustrate the applicability of our new framework, consider several research directions that follow from it:

1. While Wand and Weber’s framework was script-centric, our new framework does not insist on this emphasis, which makes it easier to accommodate emerging forms of representations. As the digital envelope expands, much of this process is spontaneous and highly creative, through which novel forms of representation are born. Thus, many successful systems (e.g., Facebook, Twitter) may not implement traditional modeling (e.g, ER diagrams) or use traditional storage technology (e.g., relational) and their successes paves way to novel modeling paradigms (e.g., agile modeling, noSQL databases). Many of these emerging systems explicitly proceed without a modeling script, or use modeling in a different way (e.g., for feasibility analysis or data interpretation) (Storey and Song 2017) . The new framework calls to investigate novel representational approaches and assumptions made when no script is involved (e.g., Kaur and Rani 2013; Lukyanenko and Parsons 2013).

2. While it remains important to study effective and appropriate representations to support development of new IS, with the growth of digital content, novel needs are emerging. Repurposing data for unanticipated insights is at the heart of the increasing growth of data mining, business analytics and applied artificial intelligence (Rai 2017; Chen, Chiang and Storey 2012). Here, representations remain critical,



but their role changes – they no longer guide IS development, but are needed to integrate, visualize and interpret massive volumes of heterogeneous data to make informed decisions. Further, different assumptions made when assembling information for the analytics process may result in different model performance and predictive power, and thus may result in different actions taken.

3. In moving beyond the conceptual modeling scripts, our new framework enables exciting new synergies between conceptual modeling research and other research streams that may be affected by the assumptions behind and the quality of the representations. This includes studies that investigate the impact of new representations by ordinary users on information quality, effective use, adoption and more generally IS success (Lukyanenko et al., 2014; Burton-Jones and Grange 2013; Lukyanenko and Parsons 2014).

4. As our new framework does not insist on the traditional modeling process, it supports the emerging practice of information production by ordinary people. Currently, very little is known about these more spontaneous kinds of models paving the way to an exciting new direction for the conceptual modeling research (Lukyanenko et al., 2017; Recker 2015).

5. Our new framework explicitly recognizes the need for ongoing design innovation in response to

technological change. For example, the requirements of open information environments – where controls over information production are considerably weaker than in the traditional corporate settings, motivating the search for novel approaches to conceptual modeling that is more adaptable, flexible and open (Chen 2006; Liddle and Embley 2007; Parsons and Wand 2014). Likewise, the blooming practice of machine learning and business analytics may require new forms of representations of data.

6. The new framework proposes feedbacks as part of a research agenda. We explicitly recognize that antecedents could influence other antecedents. For example, ontological assumptions could influence grammars or creator’s capabilities may influence the method employed in appropriating a grammar. Next, outcomes can have feedback loops to other outcomes. Using a representation for communication about a domain could lead to better domain understanding (Geiger 2010; Power 2011; Hoffer, Ramesh and Topi 2012; Anglim, Milton, Rajapakse and Weber 2009). Lastly, outcomes can also impact the antecedents to representations. For example, lack of effective use, adoption or quality could lead to a change in the creator’s capabilities as the creator may learn or realize a better way to create future representations to mitigate the issues. Explicit modelling of feedback in the new framework should provide impetus for more research of this type.

Figure 1. A New Research Framework to support future conceptual modeling research

Representation

(Of reality, data, process or system)

Can be: script, notes, language, narrative, videos, pictures, IS itself

Antecedents

Methods, grammars, principles, rules

Assumptions (e.g., ontological)

Creator’s capabilities, skills, motivation

Agency (material/social/imbricated or human/technological/socio-technical)

Context

Outcomes

Domain understanding, communication

IS success (information quality, effective use, adoption)

Actions taken (e.g., decisions, interventions)

Design-focus Evaluation, Behavioral focus



CONCLUSION

Conceptual modeling as a research field has matured into an established research area of IS. Perhaps it is not as regarded in the same manner as research on technology adoption and business value of technology, but conceptual modeling stands as a cornerstone of the research discipline.

Yet, the standing and reputation of conceptual modeling within the discipline is not stable. As any other field, conceptual modeling research is rightfully under constant scrutiny in terms of its validity, applicability, relevance and utility in our ever-changing world. To cement the place as a research field within IS and surrounding disciplines, it will be important to constantly review and revise our own research efforts on conceptual modeling.

To that end, in this paper we have taken two important steps. We examined the influence and consequences of a seminal research framework in the field, and we provided a new research framework that we believe offers a reinvigorating and exciting new perspective on conceptual modeling research challenges and opportunities. In doing so, we have created new pathways to research on conceptual modeling that (a) both relax and challenge our own assumptions about what conceptual modeling is, and (b) move our research efforts towards the fringes of the conceptual modeling paradigm, to areas where we are required to explore unknown territory rather than confirm known principles. Our new framework makes an important step in this direction by drawing attention to significant new opportunities for the conceptual modeling community and substantially expanding our view of what counts as conceptual modeling research. It also stands to bring different research communities that deal with digital representations (e.g. information quality and conceptual modeling) into closer contact promising more opportunities for cross-pollination of ideas and interdisciplinary collaboration. Our new framework strongly suggests that conceptual modeling research is impacted by and impacts a broad range of issues related to information and information technology.

In following the agenda set by our work, we may find out that conceptual modeling has its limits. But we will for certain increase our confidence in where, how and why conceptual modeling is effective and useful – and we may discover that conceptual modeling has premises and promises that we are yet to foresee.

REFERENCES

[1] Y. Wand, R. Weber, Research commentary: information systems and conceptual modeling—a research agenda, Information Systems Research, 13 (2002) 363-376.

[2] J. Webster, R.T. Watson, Analyzing the past to prepare for the future: Writing a literature review, MIS Quarterly, 26 (2002) xiii-xxiii. [3] G. Paré, M.-C. Trudel, M. Jaana, S. Kitsiou, Synthesizing information systems knowledge: A typology of literature reviews, Information & Management, 52 (2015) 183-199. [4] F. Liu, M.D. Myers, An analysis of the AIS basket of top journals, Journal of systems and information technology, 13 (2011) 5-24. [5] J. Recker, M. Rosemann, M. Indulska, P. Green, Business process modeling-a comparative analysis, Journal of the Association for Information Systems, 10 (2009) 333-363. [6] G. Irwin, D. Turk, An ontological analysis of use case modeling grammar, Journal of the Association for Information Systems, 6 (2005) 1-36. [7] P. Bera, A. Krasnoperova, Y. Wand, Using ontology languages for conceptual modeling, Journal of Database Management, 21 (2010) 1-28. [8] G. Shanks, E. Tansley, J. Nuredini, D. Tobin, R. Weber, Representing part-whole relations in conceptual modeling: an empirical evaluation, MIS Quarterly, 32 (2008) 553-573. [9] P.L. Bowen, R.A. O'Farrell, F.H. Rohde, An empirical investigation of end-user query development: the effects of improved model expressiveness vs. complexity, Information Systems Research, 20 (2009) 565-584. [10] K. Figl, J. Mendling, M. Strembeck, The influence of notational deficiencies on process model comprehension, Journal of the Association for Information Systems, 14 (2012) 312-338. [11] R. Clarke, A. Burton-Jones, R. Weber, On the Ontological Quality and Logical Quality of Conceptual-Modeling Grammars: The Need for a Dual Perspective, Information Systems Research, 27 (2016) 365-382. [12] B. Dobing, J. Parsons, Dimensions of UML diagram use: a survey of practitioners, Journal of Database Management, 19 (2008) 1-18. [13] J. Recker, Continued use of process modeling grammars: the impact of individual difference factors, European Journal of Information Systems, 19 (2010) 76-92. [14] D. VanderMeer, K. Dutta, Applying learner-centered design principles to UML sequence diagrams, Journal of Database Management, 20 (2009) 25-47. [15] G. Poels, A. Maes, F. Gailly, R. Paemeleire, The pragmatic quality of Resources ‐Events‐ Ag diagrams: an experimental evaluation, Information Systems Journal, 21 (2011) 63-89. [16] G. Poels, Understanding business domain models: The effect of recognizing resource-event-agent conceptual modeling structures, Journal of Database Management, 22 (2011) 69-101.



[17] J. Parsons, Y. Wand, Using cognitive principles to guide classification in information systems modeling, MIS Quarterly, 30 (2008) 839-868. [18] I. Hadar, P. Soffer, Variations in conceptual modeling: classification and ontological analysis, Journal of the Association for Information Systems, 7 (2006) 568-592. [19] R. Lukyanenko, J. Parsons, Y.F. Wiersma, The IQ of the Crowd: Understanding and Improving Information Quality in Structured User-Generated Content, Information Systems Research, 25 (2014) 669-689. [20] Y. An, X. Hu, I.-Y. Song, Maintaining mappings between conceptual models and relational schemas, Journal of Database Management, 21 (2010) 36-68. [21] J. Pardillo, J.-N. Mazón, J. Trujillo, An MDA Approach and QVT Transformations for the Integrated Development of Goal-Oriented Data Warehouses and Data Marts, Journal of Database Management (JDM), 22 (2011) 43-68. [22] J. Parsons, An experimental study of the effects of representing property precedence on the comprehension of conceptual schemas, Journal of the Association for Information Systems, 12 (2011) 401-422. [23] P.L. Bowen, R.A. O'Farrell, F.H. Rohde, Analysis of competing data structures: Does ontological clarity produce better end user query performance, Journal of the Association for Information Systems, 7 (2006) 514-544. [24] P. Bera, A. Burton-Jones, Y. Wand, How Semantics and Pragmatics Interact in Understanding Conceptual Models, Information Systems Research, 25 (2014) 401-419. [25] V. Khatri, I. Vessey, V. Ramesh, P. Clay, S.-J. Park, Understanding conceptual schemas: Exploring the role of application and IS domain knowledge, Information Systems Research, 17 (2006) 81-99. [26] A. Burton-Jones, P.N. Meso, The effects of decomposition quality and multiple forms of information on novices' understanding of a domain from a conceptual model, Journal of the Association for Information Systems, 9 (2008) 748-802. [27] P. Bera, Analyzing the Cognitive Difficulties for Developing and Using UML Class Diagrams for Domain Understanding, Journal of Database Management, 23 (2012) 1-29. [28] K. Siau, Informational and computational equivalence in comparing information modeling methods, Journal of Database Management, 15 (2004) 73-86. [29] J. Krogstie, G. Sindre, H. Jørgensen, Process models representing knowledge for action: a revised quality framework, European Journal of Information Systems, 15 (2006) 91-102. [30] A. Gemino, D. Parker, Use case diagrams in support of use case modeling: Deriving understanding from the picture, Journal of Database Management, 20 (2009) 1-24.

[31] S. Purao, V.C. Storey, T. Han, Improving analysis pattern reuse in conceptual design: Augmenting automated processes with supervised learning, Information Systems Research, 14 (2003) 269-290. [32] A. Koschmider, M. Song, H.A. Reijers, Social software for business process modeling, Journal of Information Technology, 25 (2010) 308-322. [33] P. Loucopoulos, W. Kadir, BROOD: business rules-driven object oriented design, Journal of Database Management, 19 (2008) 41-73. [34] M. Davern, T. Shaft, D. Te'eni, Cognition matters: Enduring questions in cognitive IS research, Journal of the Association for Information Systems, 13 (2012) 273-314. [35] G.J. Browne, J. Parsons, More Enduring Questions in Cognitive IS Research, Journal of the Association for Information Systems, 13 (2012) 1000-1011. [36] W. Bandara, G.G. Gable, M. Rosemann, Factors and measures of business process modelling: model building through a multiple case study, European Journal of Information Systems, 14 (2005) 347-360. [37] P. Green, M. Rosemann, Applying ontologies to business and systems modelling techniques and perspectives: Lessons learned, Journal of Database Management, 15 (2004) 105-117. [38] J. Recker, M. Indulska, M. Rosemann, P. Green, The ontological deficiencies of process modeling in practice, European Journal of Information Systems, 19 (2010) 501-525. [39] J. Recker, “Modeling with tools is easier, believe me”—The effects of tool functionality on modeling grammar usage beliefs, Information Systems, 37 (2012) 213-226. [40] P. Soffer, I. Hadar, Applying ontology-based rules to conceptual modeling: a reflection on modeling decision making, European Journal of Information Systems, 16 (2007) 599-611. [41] T.J. Larsen, F. Niederman, M. Limayem, J. Chan, The role of modelling in achieving information systems success: UML to the rescue?, Information Systems Journal, 19 (2009) 83-117. [42] H. Zhang, L. Liu, T. Li, Designing IT systems according to environmental settings: A strategic analysis framework, The Journal of Strategic Information Systems, 20 (2011) 80-95. [43] J. Trujillo, S. Luján-Mora, I.-Y. Song, Applying UML and XML for designing and interchanging information for data warehouses and OLAP applications, Journal of Database Management, 15 (2004) 41-72. [44] I. Garrigós, J. Pardillo, J.-N. Mazón, J. Zubcoff, J. Trujillo, R. Romero, A Conceptual modeling personalization framework for OLAP, Journal of Database Management, 23 (2012) 1-16. [45] C.E.H. Chua, V.C. Storey, R.H. Chiang, Knowledge representation: a conceptual modeling approach, Journal of Database Management, 23 (2012) 1-30.



[46] F. Fonseca, J. Martin, Learning the differences between ontologies and conceptual schemas through ontology-driven information systems, Journal of the Association for Information Systems, 8 (2007) 129-142. [47] E. Fernández-Medina, J. Trujillo, M. Piattini, Model-driven multidimensional modeling of secure data warehouses, European Journal of Information Systems, 16 (2007) 374-389. [48] F. D'aubeterre, R. Singh, L. Iyer, Secure activity resource coordination: empirical evidence of enhanced security awareness in designing secure business processes, European Journal of Information Systems, 17 (2008) 528-542. [49] A. Dreiling, M. Rosemann, W. Van Der Aalst, L. Heuser, K. Schulz, Model-based software configuration: patterns and languages, European Journal of Information Systems, 15 (2006) 583-600. [50] N.M. Vergara, J.M.T. Linero, A.V. Moreno, Model-driven component adaptation in the context of Web Engineering, European Journal of Information Systems, 16 (2007) 448-459. [51] R. Lukyanenko, J. Parsons, Y.F. Wiersma, G. Wachinger, B. Huber, R. Meldt, Guidelines for Conceptual Modeling of User-generated Content, Journal of the Association for Information Systems 18 (2017) forthcoming. [52] H.C. Chang, A new perspective on Twitter hashtag use: Diffusion of innovation theory, Proceedings of the American Society for Information Science and Technology, 47 (2010) 1-4. [53] V. Ramesh, G.J. Browne, Expressing casual relationships in conceptual database schemas, Journal of Systems and Software, 45 (1999) 225-232. [54] J. Erickson, K. Lyytinen, K. Siau, Agile modeling, agile software development, and extreme programming: the state of research, Journal of database Management, 16 (2005) 88-100. [55] R. Lukyanenko, J. Parsons, B. Samuel, Do We Need an Instance-Based Conceptual Modeling Grammar?, in: Symposium on Research in Systems Analysis and Design (AIS SIGSAND 2015), 2015. [56] W.J. Orlikowski, CASE tools as organizational change: Investigating incremental and radical changes in systems development, MIS quarterly, 17 (1993) 309-340. [57] W.M.P. van der Aalst, Process Mining Discovery, Conformance and Enhancement of Business Processes, Heidelberg, Germany: Springer, 2011. [58] O. Akhigbe, L. Lessard, Situating requirements engineering methods within design science research, in: Breakthroughs and Emerging Insights from Ongoing Design Science Projects: Research-in-progress papers and poster presentations from the 11th International Conference on Design Science Research in Information Systems and Technology (DESRIST) 2016. St. John, Canada, 23-25 May, DESRIST 2016, 2016.

[59] M.K. Sein, O. Henfridsson, S. Purao, M. Rossi, R. Lindgren, Action design research, MIS quarterly, (2011) 37-56. [60] P.M. Leonardi, When flexible routines meet flexible technologies: Affordance, constraint, and the imbrication of human and material agencies, MIS Quarterly, 30 (2011) 147-167. [61] J. Zhang, The Nature of External Representations in Problem Solving, Cognitive Science, 21 (1997) 179-217. [62] L. Floridi, The road to the philosophy of information, in: Luciano Floridi’s Philosophy of Technology, Springer, 2012, pp. 245-271. [63] P. Hoyningen-Huene, Systematicity: The nature of science, Oxford University Press, Oxford, UK, 2013. [64] A. Burton-Jones, C. Grange, From use to effective use: A representation theory perspective, Information Systems Research, 24 (2013) 632-658. [65] W. Kent, Data and Reality: Basic assumptions in data processing reconsidered, 211p, North-Holland Pub. Co., North-Holland, Amsterdam, 1978. [66] A. Rai, Diversity of design science contributions, MIS Quarterly, 41 (2017) iii–xviii. [67] Y. Wand, R. Weber, On the deep structure of information systems, Information Systems Journal, 5 (1995) 203-223. [68] V.C. Storey, I.-Y. Song, Big data technologies and management:What conceptual modeling can do, Data & Knowledge Engineering, (2017). [69] K. Kaur, R. Rani, Modeling and querying data in NoSQL databases, in: 2013 IEEE International Conference on Big Data, 2013, pp. 1-7. [70] R. Lukyanenko, J. Parsons, Is traditional conceptual modeling becoming obsolete?, in: International Conference on Conceptual Modeling, Springer, 2013, pp. 61-73. [71] H. Chen, R.H. Chiang, V.C. Storey, Business intelligence and analytics: From big data to big impact, MIS quarterly, 36 (2012) 1165-1188. [72] R. Lukyanenko, J. Parsons, Using Field Experimentation to Understand Information Quality in User-generated Content, in: CodeCon@MIT, 2014. [73] J. Recker, Research on conceptual modelling: less known knowns and more unknown unknowns, please, in: 11th Asia-Pacific Conference on Conceptual Modelling, Australian Computer Society, 2015, pp. 3-7. [74] P.P. Chen, Suggested research directions for a new frontier–active conceptual modeling, in: International Conference on Conceptual Modeling, Springer, 2006, pp. 1-4. [75] S.W. Liddle, D.W. Embley, A common core for active conceptual modeling for learning from surprises, in: International Workshop on Active Conceputal Modeling of Learning, Springer, 2007, pp. 47-56. [76] J. Parsons, Y. Wand, A foundation for open information environments, in: European Conference on Information Systems, Tel Aviv, Israel, 2014.



[77] J. Geiger, Why Do We Model?, Information Management (1521-2912), 20 (2010) 32-33. [78] D.A.N. Power, Data Modeling: The Lingua Franca of a Successful MDM Effort, Information Management (1521-2912), 21 (2011) 28-29. [79] J.A. Hoffer, V. Ramesh, H. Topi, Modern Database Management, 11th ed., Prentice Hall, Boston, MA, 2012.

[80] B. Anglim, S.K. Milton, J. Rajapakse, R. Weber, Current trends and future directions in the practice of high-level data modeling: An empirical study, in: ECIS, 2009, pp. 122-133.


Ogunseye and Parsons What Makes a Good Crowd?

What Makes a Good Crowd? Rethinking the Relationship between Recruitment Strategies and Data Quality in

Crowdsourcing

Shawn Ogunseye Memorial University of Newfoundland

[email protected]

Jeffrey Parsons Memorial University of Newfoundland

[email protected]

ABSTRACT

Conventional wisdom dictates that the quality of data collected in a crowdsourcing project is positively related to how knowledgeable the contributors are. Consequently, numerous crowdsourcing projects implement crowd recruitment strategies that reflect this reasoning. In this paper, we explore the effect of crowd recruitment strategies on the quality of crowdsourced data using classification theory. As these strategies are based on knowledge, we consider how a contributor’s knowledge may affect the quality of data he or she provides. We also build on previous research by considering relevant dimensions of data quality beyond accuracy and predict the effects of available recruitment strategies on these dimensions of data quality.

Keywords

Data quality, Crowdsourcing, Citizen science, Crowd recruitment

INTRODUCTION

Crowdsourcing implies “outsourcing a task to a ‘crowd’, rather than to a designated ‘agent’ (an organization, informal or formal team, or individual), such as a contractor, in the form of an open call” (Afuah & Tucci, 2012; Howe, 2006). Through crowdsourcing, information technology (IT) has been successfully used by organizations and individuals to engage large groups in many ways (Castriotta & Di Guardo, 2011; Hosseini, Phalp, Taylor, & Ali, 2014; Tarrell et al., 2013; Tripathi, Tahmasbi, Khazanchi, & Najjar, 2014). Examples include harnessing collective intelligence for decision-making and acquiring distributed or independent knowledge about a variety of interests like traffic monitoring and bird watching (Bonney et al., 2009; Buecheler, Sieg, Füchslin, & Pfeifer, 2010; Cohn, 2008; Malone, Laubacher, & Dellarocas, 2010). However, to successfully leverage the crowd for such purposes, crowdsourcers 1 of

1 Crowdsourcer is a term used in Estellés-Arolas & González-Ladrón-De-Guevara (2012) and implies the decision makers in a crowdsourcing

crowdsourcing projects must make a “design decision” about who will perform the task (Lyon & Pacuit, 2013; Malone et al., 2010). The primary decision in recruitment is whether to recruit only knowledgeable contributors or allow everyone to participate (Budescu & Chen, 2014; Lukyanenko, Parsons, & Wiersma, 2014). Understanding the consequences of these crowd recruitment options on the success of crowdsourcing projects will enable crowdsourcers to make informed staffing decisions.

The limited literature in this area has mainly assumed implicitly that expert crowds are better and therefore provide insight on strategies to ensure the knowledge of contributors (Aspinall, 2010; Budescu & Chen, 2014; Du, Hong, Wang, Wang, & Fan, 2017; Wang & Strong, 1996; Wiggins, Newman, Stevenson, & Crowston, 2011). Some have compared the quality of data provided by two groups of contributors – experts and novices – in a crowdsourcing or citizen science context; however, they have addressed only the accuracy dimension of data quality2 (see Austen, Bindemann, Griffiths, & Roberts, 2016; Crall et al., 2011). Moreover, contributor knowledge is not binary; that is, contributors are not just either experts or non-experts, but possess knowledge at some level along a continuum (Collins & Evans, 2007). Furthermore, these crowd recruitment studies and others have indicated that expert contributors do not provide more accurate data than novices, thereby posing a challenge to the benefits of strategies that “chase after experts in the crowd” (Surowiecki, 2005 p. XIX). Increasing our understanding of the impact of knowledge-based crowd selection strategies on the quality of project who may be the sponsor or data consumers (see Parsons and Wand 2013). 2 Data quality is measured based on its intrinsic quality (e.g., accuracy, reputation of its provider), contextual quality (e.g., completeness, timeliness or ability to provide context for the data) and its representational quality (e.g., the format and meaning of the data) (Nelson, Todd, & Wixom, 2005; Wang & Strong, 1996). Like these papers, we consider data and information to be similar and interchangeable but distinguish contributions submitted by the crowd as data and the overall data collected by the citizen science system as information. All our reference to data quality will therefore imply intrinsic data quality, contextual data quality and representational data quality.



crowdsourced data stands to affect the success of data crowdsourcing projects (Wang & Strong, 1996; Wiggins et al., 2011). Chiefly, we seek to understand what is gained and what is lost in terms of data quality when crowdsourcers make the choice to recruit only expert contributors, or open up their crowdsourcing projects to everyone. Our stance is that while citizen science studies may be used for surveillance or monitoring (Wiersma, 2010), novel discoveries are possible and desirable. In addition, collected data may have more uses than initially anticipated during the design of the study (Parsons & Wand, 2014).

Our analysis will be applicable to the broader sphere of crowdsourcing and other crowd-facing information systems; however, we focus on citizen science systems –“collaborations between scientists and volunteers, particularly (but not exclusively) to expand opportunities for scientific data collection and to provide access to scientific information for community members” (“Defining Citizen Science — Citizen Science Central,” n.d.). Further discussions of the quality of crowdsourced data and crowd recruitment in this paper will therefore mostly align with the characteristics of citizen science crowdsourcing.

Crowd Recruitment Strategies

Crowd recruitment strategies comprise the decisions crowdsourcers make about who they will let participate in their project to increase the chance of collecting reliable and high quality data (Geiger, Seedorf, Schulze, Nickerson, & Schader, 2011). Although participation in crowdsourcing projects is voluntary, crowd recruitment strategies consist of preselection measures explicitly or implicitly implemented by crowdsourcers to choose which volunteers get to participate. The central decision is whether to recruit only knowledgeable contributors or allow everyone to participate (Budescu & Chen, 2014; Lukyanenko et al., 2014). Besides recruitment decisions, crowdsourcers make other design decisions about the goal of the project (for examples see Bonney et al., 2009; Cooper, Dickinson, Phillips, & Bonney, 2007; Wiggins & Crowston, 2012), motivation to contribute (or why the crowd will participate) (for examples see Nov, Arazy, & Anderson, 2011; Raddick et al., 2009; Rotman et al., 2012) and how the system is designed (Lukyanenko et al. 2014). Whereas there is ample literature that can guide management on the latter decisions, more understanding of crowd recruitment strategies and their impact on crowdsourced data in citizen science is needed. In this paper, we focus on the impact of strategies that target contributors’ prior experience, training and the disparate levels of contributor knowledge (Austen et al., 2016; Crall et al., 2011). Understanding crowd recruitment strategies and their impact on crowdsourced data will not only clarify the pros and cons of recruitment decisions for data quality, but also guide crowdsourcers’ other design

decisions. This paper therefore continues the discussion towards providing better understanding of how the quality of crowdsourced data is affected by recruiting a crowd of knowledgeable contributors or an undefined crowd (Crall et al., 2011). Citizen Science Data Quality

Crowdsourcers recruit crowds for their projects taking into account concerns for data quality. They may seek to implement measures to ensure data quality before, during or after data collection (Wiggins et al., 2011). For instance, to ensure data quality before data collection, crowdsourcers may train potential participants to attain a desired level of knowledge required specifically to perform the citizen science task. During data collection, participants’ input may be algorithmically compared to “known states” or against an existing knowledge base for validation (Wiggins et al., 2011). After data collection, experts may subject participants’ submissions to review before acceptance. A typical example is e-bird (www.ebird.org), which allows everyone to participate, but employs a team of experts to “sift through … the observations [of other contributors and] validate them (Gura, 2013). A commonality in these strategies, as evidenced in many sampled citizen science projects, is that they are undergirded by the assumption that there is “value of expertise in ensuring data quality” (Wiggins et al., 2011 p.17).

Wang & Strong (1996) defined data quality in terms whether “data … are fit for use by data consumers”. They identified several data quality dimensions - attributes that represent aspects of data quality and are pertinent in establishing it. According to Nelson, Todd, & Wixom, (2005); Wang & Strong, (1996); Wixom & Todd, (2005), dimensions of data quality empirically determined to be pertinent to consumers are: (1) Accuracy – the notion that the data provided is correct and objective; (2) Completeness – the degree to which all the data and their states that may be relevant to the consumer are captured by the contributed data; (3) Context-awareness (Currency) – the degree to which data “precisely reflects the current state of the world that it represents”; (4) Format –the degree to which the data contributed is interpretable and can be understood.

Contributed data is accurate when it is factually correct with respect to the entity it represents. However, when data incorrectly identifies one entity as another, the eventual dataset is not only inaccurate but also incomplete. In the same vein, contextual data about the current state of entities under study can provide significant insights (sometimes unanticipated) to consumers about the phenomenon.

Consistent with previous studies, we contend that, for data to be of sufficient quality to meet anticipated and unanticipated uses, all these dimensions must be considered. We will therefore consider high quality data as giving contextual information where available, representing more completely the available instances of



phenomena, being accurate, and being easy for the consumer to understand. In this study, we will define low quality crowdsourced data as data collected through crowdsourcing however constrained so that it does not fulfill all the necessary dimensions of data quality explicated herein. On the other hand, we will consider a crowdsourced data to be of high quality if it covers all the relevant dimensions of data quality. Accordingly, recruitment strategies that lead to the collection of the a low quality dataset is considered less desirable while those that avail the crowdsourcer the opportunity to collect data that meets all quality dimensions addressed herein will be considered ideal. Contributor Knowledge in Citizen Science

Contributors to citizen science projects may have different levels of measurable knowledge about a phenomenon under study (Collins & Evans 2007). Therefore, we propose that participants in citizen science projects may contribute data based on one or more of these types of knowledge.

Domain Knowledge (DK): we refer to prior domain knowledge as the knowledge participants have about the phenomenon under study. This knowledge may have been acquired through some training and is usually broad, covering more than just the phenomena being studied in a citizen science project.

Task Knowledge (TK): some citizen science projects train potential participants on the task to be performed in the project and assess their knowledge based on the training. We refer to this type of training as task training. In this case, participants do not necessarily have prior knowledge of the domain of study.

No Knowledge (NK): It is also possible for potential participants to have no knowledge of a domain of scientific inquiry. Such a participant would be referred to as a novice.

Domain and Task Knowledge (DKTK): Although it is difficult to claim absolute ignorance (Kloos & Sloutsky, 2008), it is plausible to have a mixture of some level of domain or task knowledge. In this case, different combinations of task and domain knowledge are possible; for example, a contributor may be highly knowledgeable about insects (high DK), but not have the knowhow to perform a citizen science task of classifying bees (low TK). Conversely, they may have low DK and high TK and many other possible variations of both knowledge types.

The different crowd recruitment strategies employed in citizen science emphasize one or more of these types of knowledge. In the next section, we explore theoretical perspectives on the likely consequences of these types of knowledge and, concomitantly, recruitment strategies, on the quality of crowdsourced data collected in citizen science projects.

THEORETICAL FOUNDATION

Since recruitment decisions are centered on the relevance of contributor knowledge to achieving high quality crowdsourced data, we must understand how knowledge affects data quality. Furthermore, the eventual quality of information collected in a citizen science project is determined by the quality of data contributed by the individual crowd members. We therefore take a microscopic view of data quality and its dimensions and how an individual contributor’s knowledge determines the quality of data he or she contributes. Humans identify or classify by matching attributes of newly observed entities to known attributes of similar entities. In a microscopic view of data quality, the dimensions of quality become more granular: accuracy is viewed as the correct identification of the attributes of instances. Completeness is the capacity of contributors to consider all relevant attributes of a phenomenon that may be useful in classifying it and not just the diagnostic ones. Context-awareness refers to the capacity of contributors to identify attributes external to the entity under study, but in interaction with it. Format will imply their capacity to report the entity either using its attributes or the determined class of the entity based on those attributes.

Classification theory provides a useful foundation for understanding and exploring the interaction between knowledge and data quality. Classification (or categorization) is a fundamental human capability. We classify to make efficient use of our cognitive resources by organizing our existing knowledge about phenomena mainly through their similarities, allowing us in turn to make predictions about new instances and events (Best, Yim, & Sloutsky, 2013; Parsons & Wand, 2008, 2013). Classes are therefore useful abstractions of the similarities of the classified phenomena. Classification theory and its relevance to information system (IS) analysis and design have been extensively discussed (see Parsons, 1996; Wand, Monarchi, Parsons, & Woo, 1995).

To classify instances of phenomena, humans learn to manage limited cognitive resources by paying selective attention to only relevant features that aid in identifying instances of the class, while irrelevant features (those not useful for predicting class membership) can be safely ignored. Although selective attention leads to efficient learning, especially when making connections between instances with very sparse similarities and dense dissimilarities, it has costs. The primary cost of selective attention is a learned inattention to features that are not “diagnostic” in the present context (Colner & Rehder, 2009; Hoffman & Rehder, 2010). If these features, however, become diagnostic in another context, then the ability to make predictions and transfer knowledge is lost. We consider two perspectives on selective attention in literature.



First, the tendency for selective attention and classification occurs naturally in humans as we acquire knowledge about entities in our world. Nonetheless, the absence of this tendency is “a developmental default” (Gelman, 1988; Gelman & Coley, 1990; Gelman & Markman, 1986; Kloos & Sloutsky, 2008). It forms with development to aid classification as a mechanism for coping with the deluge of information around us. For this reason, the capacity to classify is a distinguishing factor between adults and infants. For example, experiments conducted by Best et al. (2013), comparing the ability of infants and adults to selectively attend to attributes of instances based on prior or current knowledge, show that infants do not have the capacity for selective attention. Infants reason about classes by observing all the features of individual instances (Gelman & Markman, 1986). We contend they are naturally comparable to novices in the domain of a distributed knowledge crowdsourcing project. Like infants, novices also lack prior knowledge. Infants can, therefore, help us understand how novices and less knowledgeable contributors – people with incomplete knowledge – perceive instances (Keil, 1989; Kloos & Sloutsky, 2008).

Conversely, the tendency of adults to selectively attend to attributes of phenomenon about which they have knowledge can help us understand knowledgeable contributors in crowdsourcing projects. Knowledge of the domain or subject of research of a citizen science project will help contributors identify instances observed (Harnad, 2005). We predict that this knowledge will lead experts to focus on relevant features; thus, we expect them to be less likely to attend to non-diagnostic attributes than novices. We therefore make the following proposition:

Proposition 1: Crowdsourced Data collected through recruitment strategies that emphasize high contributor domain knowledge will contain less contextual properties, will be less complete and less accurate (lower quality data) than those that do not impose a domain knowledge requirement.

In other words, we predict that any recruitment strategy that restricts participation in a citizen science project based on domain knowledge, risks collecting lower quality data than one that does not, as participants’ domain knowledge increases their tendency to ignore attributes that may otherwise have resulted in higher quality data. We make this argument considering the role a contributor’s knowledge plays in his or her ability to consider observable attributes, and not just diagnostic ones, and consequently provide accurate classification (that will eventually lead to a more complete dataset for the citizen science project).

Second, Hoffman and Rehder (2010) showed the need to differentiate supervised classification – engendered by some form of explicit training (e.g., by a teacher) with sufficient feedback to improve the classifier’s skill – from unsupervised classification – classification formed

without explicit training (self-taught). They argued that the latter “may involve less rule-based processing” and consequently, more attentiveness to attributes. They emphasized the tendency for supervised learning (i.e. training with feedback to ensure that the person learns) to lead to formation of rules if they were not already explicitly taught, and selective attention to diagnostic attributes. They explained that “expert classification involves the same sort of attention optimization that characterizes supervised classification learning” (p. 336), which is due to extensive training and the type of task. We therefore contend that contributors trained in the citizen science task to be performed will show more selective attention than those who have not been trained. That is, if contributors are trained to perform a specific citizen science task, their tendency to selectively attend to only attributes that fit their training and ignore changes to other aspects of the phenomenon under study increases when compared to those who have not been trained.

Proposition 2: Crowdsourced Data collected through recruitment strategies that emphasize training will show a higher level of incompleteness, lack of context and inaccuracy than those that do not.

In other words, we predict that strategies that include training of participants also increases their tendency to prioritize some dimensions of data quality over others and, in some instances, ignore attributes of entities that are not considered in the training received. This will lead to lower quality data. In addition, we also predict that the effect of training on contributors will be similar regardless of the level of domain knowledge they possess before the training. Nonetheless, we expect that, even though domain knowledge can itself be polarizing as expressed in Proposition 1, it will mitigate the effect of task knowledge. This implies that the higher the level of domain knowledge a contributor has, the lower their tendency to fixate on only attributes that are congruent with their training.

DISCUSSION AND CONCLUSION

The quality of information gathered through citizen science is pertinent to stakeholders. In addition, there is value in ensuring high quality data, ranging from reliability for decision-making and predictions to capacity for multiple uses and perhaps even unanticipated uses. The literature suggests that the level of knowledge of the crowd we recruit correlates with the quality of data we get. However, from the theoretical perspectives explicated here, the correlation may not necessarily be positive for all dimensions of data quality. In fact, classification theory suggests that a contributor’s high level of knowledge may be detrimental to their ability to provide quality data along some dimensions. We posit that restrictive recruiting strategies lead to crowds that minimize the contextual characteristics and differences in the non-diagnostic attributes of entities when these



differences exist, focusing instead on commonalities in diagnostic attributes. On the other hand, ideal recruitment strategies will lead to a good crowd - one that is sensitive to similarities as well as differences between instances of phenomena, considering all their attributes.

For this reason, crowd recruitment strategies may support or deter novel discoveries and usefulness of data. Even strategies than include using experts to filter collected data may be sub-optimal for data quality especially because there is a tendency for people to only permit data they consider congruent to their existing knowledge, a phenomenon termed “cognitive disfluency” (Owen, Halberstadt, Carr, & Winkielman, 2016). Therefore, as recruitment strategy may inform the design of citizen science systems, it may correspondingly determine the system’s ability to acquire and access accurate, complete, and context aware data, especially unanticipated or atypical ones (Lukyanenko et al., 2014; Parsons & Wand, 2014). Additionally, research objectives may not be fully formed at the time of project’s commissioning (Lukyanenko et al., 2016; Newman et al., 2012). Therefore, recruitment choices that affect the data collected will also affect the capacity of the project to support changes in its goals, limiting its ability to accommodate and engender novel discoveries.

We are currently designing experiments to test the propositions developed in this paper.

ACKNOWLEDGMENTS

This research was supported by a research grant from the Social Sciences and Humanities Research Council of Canada.

REFERENCES

1. Afuah, A., & Tucci, C. L. (2012). Crowdsourcing as a solution to distant search. Academy of Management Review, 37(3), 355–375.

2. Aspinall, W. (2010). A route to more tractable expert advice. Nature, 463(7279), 294–295.

3. Austen, G. E., Bindemann, M., Griffiths, R. A., & Roberts, D. L. (2016). Species identification by experts and non-experts: comparing images from field guides. Scientific Reports, 6. Retrieved from https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5028888/

4. Best, C. A., Yim, H., & Sloutsky, V. M. (2013). The cost of selective attention in category learning: Developmental differences between adults and infants. Journal of Experimental Child Psychology, 116(2), 105–119.

5. Bonney, R., Cooper, C. B., Dickinson, J., Kelling, S., Phillips, T., Rosenberg, K. V., & Shirk, J. (2009). Citizen science: a developing

tool for expanding science knowledge and scientific literacy. BioScience, 59(11), 977–984.

6. Budescu, D. V., & Chen, E. (2014). Identifying expertise to extract the wisdom of crowds. Management Science, 61(2), 267–280.

7. Buecheler, T., Sieg, J. H., Füchslin, R. M., & Pfeifer, R. (2010). Crowdsourcing, Open Innovation and Collective Intelligence in the Scientific Method-A Research Agenda and Operational Framework. In ALIFE (pp. 679–686).

8. Castriotta, M., & Di Guardo, M. C. (2011). Open Innovation and Crowdsourcing: The Case of Mulino Bianco. In Information Technology and Innovation Trends in Organizations (pp. 407–414). Springer. Retrieved from http://link.springer.com/chapter/10.1007/978-3-7908-2632-6_46

9. Cohn, J. P. (2008). Citizen science: Can volunteers do real research? BioScience, 58(3), 192–197.

10. Collins, H., & Evans, R. (2007). Rethinking expertise University of Chicago Press. Chicago IL.

11. Colner, B., & Rehder, B. (2009). A new theory of classification and feature inference learning: An exemplar fragment model. In Proceedings of the 31st Annual Conference of the Cognitive Science Society (pp. 371–376). Retrieved from http://csjarchive.cogsci.rpi.edu/proceedings/2009/papers/68/paper68.pdf

12. Cooper, C., Dickinson, J., Phillips, T., & Bonney, R. (2007). Citizen science as a tool for conservation in residential ecosystems. Ecology and Society, 12(2). Retrieved from http://www.ecologyandsociety.org/vol12/iss2/art11/main.html

13. Crall, A. W., Newman, G. J., Stohlgren, T. J., Holfelder, K. A., Graham, J., & Waller, D. M. (2011). Assessing citizen science data quality: an invasive species case study. Conservation Letters, 4(6), 433–442. https://doi.org/10.1111/j.1755-263X.2011.00196.x

14. Defining Citizen Science — Citizen Science Central. (n.d.). Retrieved March 7, 2017, from http://www.birds.cornell.edu/citscitoolkit/about/definition

15. Du, Q., Hong, H., Wang, G. A., Wang, P., & Fan, W. (2017). CrowdIQ: A New Opinion Aggregation Model. In Proceedings of the 50th Hawaii International Conference on System Sciences. Retrieved from http://scholarspace.manoa.hawaii.edu/handle/10125/41365

16. Estellés-Arolas, E., & González-Ladrón-De-Guevara, F. (2012). Towards an integrated crowdsourcing definition. Journal of Information Science, 38(2), 189–200.



17. Geiger, D., Seedorf, S., Schulze, T., Nickerson, R. C., & Schader, M. (2011). Managing the Crowd: Towards a Taxonomy of Crowdsourcing Processes. In AMCIS. Retrieved from http://aisel.aisnet.org/cgi/viewcontent.cgi?article=1396&context=amcis2011_submissions

18. Gelman, S. A. (1988). The development of induction within natural kind and artifact categories. Cognitive Psychology, 20(1), 65–95.

19. Gelman, S. A., & Coley, J. D. (1990). The importance of knowing a dodo is a bird: Categories and inferences in 2-year-old children. Developmental Psychology, 26(5), 796.

20. Gelman, S. A., & Markman, E. M. (1986). Categories and induction in young children. Cognition, 23(3), 183–209.

21. Gura, T. (2013). Citizen science: Amateur experts. Nature, 496(7444), 259–261. https://doi.org/10.1038/nj7444-259a

22. Harnad, S. (2005). To cognize is to categorize: Cognition is categorization. Handbook of Categorization in Cognitive Science, 20–45.

23. Hoffman, A. B., & Rehder, B. (2010). The costs of supervised classification: The effect of learning task on conceptual flexibility. Journal of Experimental Psychology: General, 139(2), 319.

24. Hosseini, M., Phalp, K., Taylor, J., & Ali, R. (2014). The four pillars of crowdsourcing: A reference model. In Research Challenges in Information Science (RCIS), 2014 IEEE Eighth International Conference on (pp. 1–12). IEEE. Retrieved from http://ieeexplore.ieee.org/abstract/document/6861072/

25. Howe, J. (2006). The rise of crowdsourcing. Wired Magazine, 14(6), 1–4.

26. Keil, F. C. (1989). Concepts, kinds, and conceptual development. Cambridge, MA: MIT Press.

27. Kloos, H., & Sloutsky, V. M. (2008). What’s behind different kinds of kinds: Effects of statistical density on learning and representation of categories. Journal of Experimental Psychology: General, 137(1), 52.

28. Lukyanenko, R., Parsons, J., & Wiersma, Y. F. (2014). The IQ of the crowd: understanding and improving information quality in structured user-generated content. Information Systems Research, 25(4), 669–689.

29. Lukyanenko, R., Parsons, J., Wiersma, Y., Sieber, R., & Maddah, M. (2016). Participatory Design for User-generated Content: Understanding the challenges and moving forward. Scandinavian Journal of Information Systems, 28(1), 37–70.

30. Lyon, A., & Pacuit, E. (2013). The Wisdom of crowds: methods of human judgement

aggregation. In Handbook of Human Computation (pp. 599–614). Springer.

31. Malone, T. W., Laubacher, R., & Dellarocas, C. (2010). The collective intelligence genome. MIT Sloan Management Review, 51(3), 21.

32. Nelson, R. R., Todd, P. A., & Wixom, B. H. (2005). Antecedents of information and system quality: an empirical examination within the context of data warehousing. Journal of Management Information Systems, 21(4), 199–235.

33. Newman, G., Wiggins, A., Crall, A., Graham, E., Newman, S., & Crowston, K. (2012). The future of citizen science: emerging technologies and shifting paradigms. Frontiers in Ecology and the Environment, 10(6), 298–304.

34. Nov, O., Arazy, O., & Anderson, D. (2011). Dusting for science: motivation and participation of digital citizen science volunteers. In Proceedings of the 2011 iConference (pp. 68–74). ACM. Retrieved from http://dl.acm.org/citation.cfm?id=1940771

35. Owen, H. E., Halberstadt, J., Carr, E. W., & Winkielman, P. (2016). Johnny Depp, Reconsidered: How Category-Relative Processing Fluency Determines the Appeal of Gender Ambiguity. PLOS ONE, 11(2), e0146328. https://doi.org/10.1371/journal.pone.0146328

36. Parsons, J. (1996). An information model based on classification theory. Management Science, 42(10), 1437–1453.

37. Parsons, J., & Wand, Y. (2008). Using cognitive principles to guide classification in information systems modeling. Mis Quarterly, 839–868.

38. Parsons, J., & Wand, Y. (2013). Extending classification principles from information modeling to other disciplines. Journal of the Association for Information Systems, 14(5), 245.

39. Parsons, J., & Wand, Y. (2014). A foundation for open information environments. Retrieved from http://aisel.aisnet.org/ecis2014/proceedings/track17/7/

40. Raddick, M. J., Bracey, G., Gay, P. L., Lintott, C. J., Murray, P., Schawinski, K., … Vandenberg, J. (2009). Galaxy zoo: Exploring the motivations of citizen science volunteers. arXiv Preprint arXiv:0909.2925. Retrieved from https://arxiv.org/abs/0909.2925

41. Rotman, D., Preece, J., Hammock, J., Procita, K., Hansen, D., Parr, C., … Jacobs, D. (2012). Dynamic changes in motivation in collaborative citizen-science projects. In Proceedings of the ACM 2012 conference on Computer Supported Cooperative Work (pp. 217–226). ACM. Retrieved from http://dl.acm.org/citation.cfm?id=2145238

42. Surowiecki, J. (2005). The wisdom of crowds. Anchor.



43. Tarrell, A., Tahmasbi, N., Kocsis, D., Tripathi, A., Pedersen, J., Xiong, J., … de Vreede, G.-J. (2013). Crowdsourcing: A snapshot of published research. Retrieved from http://aisel.aisnet.org/amcis2013/EndUserIS/GeneralPresentations/2/

44. Tripathi, A., Tahmasbi, N., Khazanchi, D., & Najjar, L. (2014). Crowdsourcing typology: a review of is research and organizations. Proceedings of the Midwest Association for Information Systems (MWAIS). Retrieved from http://aisel.aisnet.org/cgi/viewcontent.cgi?article=1018&context=mwais2014

45. Wand, Y., Monarchi, D. E., Parsons, J., & Woo, C. C. (1995). Theoretical foundations for conceptual modelling in information systems development. Decision Support Systems, 15(4), 285–304.

46. Wang, R. Y., & Strong, D. M. (1996). Beyond accuracy: What data quality means to data consumers. Journal of Management Information Systems, 12(4), 5–33.

47. Wiersma, Y. (2010). Birding 2.0: citizen science and effective monitoring in the Web 2.0 world. Avian Conservation and Ecology, 5(2).

48. Wiggins, A., & Crowston, K. (2012). Goals and tasks: Two typologies of citizen science projects. In System Science (HICSS), 2012 45th Hawaii International Conference on (pp. 3426–3435). IEEE.

49. Wiggins, A., Newman, G., Stevenson, R. D., & Crowston, K. (2011). Mechanisms for Data Quality and Validation in Citizen Science. In 2011 IEEE Seventh International Conference on e-Science Workshops (pp. 14–19). https://doi.org/10.1109/eScienceW.2011.27

50. Wixom, B. H., & Todd, P. A. (2005). A theoretical integration of user satisfaction and technology acceptance. Information Systems Research, 16(1), 85–102.


Castellanos et al. Repurposing User Generated Electronic Documentation

Repurposing organizational electronic documentation: Lessons from Case Management in Foster Care1

Arturo Castellanos Baruch College (CUNY)

[email protected]

Alfred Castillo Florida International University

[email protected] Roman Lukyanenko

University of Saskatchewan [email protected]

Monica Chiarini Tremblay Florida International University

[email protected]

1 Authors are listed alphabetically and each contributed equally to the paper

ABSTRACT

Data collected by organizations is typically used for tactical purposes –solving a business need. In this study we show the relationship between inferential utility and institutional practices in repurposing unstructured electronic documentation. Our aim is to (1) understand the underpinnings of unstructured-data-entry formats in the data collected by an organization; and (2) study the impact unstructured-data-entry formats have in solving a task or tactical need. We study this phenomenon in the context of case management in foster care. Our findings have important implications both to theory and practice. Unstructured data accounts for more than 80% of the organizational data. Our research analyzes the implications of different unstructured data-entry formats when capturing user input.

Keywords

Unstructured data, Stylometry, Institutional theory, Text Mining, Systems Analysis and Design

INTRODUCTION

In the course of normal business, organizations generate electronic documentation describing daily operations and transactions. In many cases, the data collected is used for purely tactical purposes. Once accumulated, these organizational data become strategic resources. These data can be aggregated to provide trends, plan, improve processes, support decision-making, or solve additional tasks by repurposing it. The problem, however, is that while some of these data are in structured and consistent form, often organizational reports. We refer to unstructured data to refer to clinical documentation, progress notes, business reports, or any document that is kept as unstructured free-text –with no predefined structure. Unstructured data-entry refers to how these data is entered into a system (e.g., forms, templates or free-text fields).

One of the challenges of utilizing unstructured data are the inherent flexibility on how these data are entered/captured in an information system (e.g., free-form text versus templates). This may partially explain its popularity among data users. Users may deviate from the deep structure (“the meaning”) of the system by capturing different information in a field that was not originally intended for (Boudreau and Robey 2005; Wand and Weber 1995; DeSanctis and Poole 1994; Burton-Jones and Grange 2012). For example, in a study of an electronic patient record, Berg and Goorman (1999) found that although physicians were able to successfully enter coded complaints, diagnosis, blood pressure results, and medication, many physicians complained that the system was too “rigid” to capture the core reason of the patient’s visit. To overcome this limitation physicians started to use a text field labeled as “conclusion” to enter such information and regarded it as a central field for subsequent patient’s visits (Berg and Goorman 1999; Berg 2001). These practices are institutionalized and become norms in the organization.

IDC estimates that more than 90% of the enterprise data generated is unstructured (Gantz and Reinsel 2011). Despite the pervasiveness of unstructured data in organizations, traditional IS research offers limited guidance in understanding the implications of unstructured data-entry formats in decision-making –the alignment between the information needs of data consumers and that of data contributors. A better understanding of unstructured data collection is becoming increasingly important. Among other factors motivating our work is the on-going practice whereby organizations are repurposing data for business insight. This is possible due to increasing computational power and the availability of sophisticated analytical tools. For unstructured data, additional insight can be uncovered with knowledge discovery (KDD) techniques that utilize ontologies, natural language processing, and semantics.

We see examples of this in related research. Tremblay, Berndt, Luther, Foulis and French (2009) analyzed



unstructured progress notes to predict falls in the elderly. Sørlie, Perou, Tibshirani, Aas, Geisler, Johnsen, Hastie, Eisen, van de Rijn and Jeffrey (2001) classified breast carcinomas based on variations in gene expression patterns and then correlate tumor characteristics to clinical outcome. Larsen and Bong (2016) identified intellectual communities in the field of information systems and detected discordant naming practices of constructs (e.g., same term to refer to different phenomena or using different terms to refer to the same phenomena).

In general, as people specialize they are more comfortable using domain-specific language, which provides the individual with the ability to infer unobservable attributes (higher inferential utility). The ability to predict attributes of instances of a class increases as the scope of the class decreases. For example, knowing that an individual is taking Adderall, allow us to make more inferences about the individual that if we only knew the individual was taking medication. This manuscript extends the literature by demonstrating the relationship between inferential utility and institutional practices when repurposing unstructured electronic documentation. Our aim is to (1) demonstrate the implications of unstructured-data-entry formats have in the data collected by the organization; and (2) the impact unstructured-data-entry formats have in solving a tactical need (e.g., identifying cases of psychotropic drug use). We study this phenomenon in the context of case management in foster care.

SAFEKIDS

SafeKids is a Florida non-profit corporation created by advocacy communities to oversee several FCMAs that provide full case management services. Many of these cases include children at-risk of abuse and/or neglect. Failure to identify at-risk clients is highly problematic, because adverse outcomes can include serious adverse events—including death. Since data are often encoded in free-text form (e.g., reports, encounter notes, case notes, progress notes), we study the impact of different data-entry formats, in particular, when the goal is to repurpose these notes and use them for solving a different tactical need. We do so with a case study in which the tactical need is to identify children that are taking psychotropic medicines by analyzing the child’s case notes.

In previous research Castillo et al. (Castillo, Castellanos and Tremblay 2014) hypothesized that by using these home-visit notes, which contained the child’s record and behavior (e.g. has signs of abuse and neglect, aggressive behavior), they could identify children taking psychotropic medication by training Statistical Text Mining (STM) classification models (Luther, Berndt, Finch, Richardson, Hickling and Hickam 2011). An interesting result was that models trained on individual FCMAs data had varying levels of classifications accuracy. This led us to ponder, if all agencies are not

equal, did the writing style of each FCMA have an effect in improving the accuracy of our classification model?

PROPOSITION DEVELOPMENT

Organizational activity (social and non-social) can become a pattern that is repeated by individuals in the organization. Institutions are organized and established by procedures that guide the actions of individuals (Jepperson 1991). The rules, norms, and meanings arise in interaction and are preserved and modified by the behavior of individuals over time (Giddens 1979; Sewell Jr 1992).

In the absence of contextual change, actors are more likely to replicate scripted behavior, making these institutions persistent (Hughes 1936; Barley and Tolbert 1997). Yet, behavior can evolve over time as a result of changing regulations and norms (e.g., solving an emergent tactical purpose or when solving wicked problems). The process of standardizing procedures among members of a population from these pillars is referred to as institutional isomorphism, which is triggered by coercive, normative, and mimetic forces—constraining the ways in which individuals perform their activities (DiMaggio and Powell 1983). This institutional isomorphism constrains the ways in which individuals perform their daily activities and cultivates expectations regarding the style of knowledge representations and production (DiMaggio and Powell 1983). The concept of institutional isomorphism in organizational behavior theory leads to our first proposition:

Proposition 1 (homogeneity): Data collected using unstructured-data-entry formats become homogenous within organizational units.

This data collection homogeneity would suggest the potential for organizations to adopt standard practices in how they collect and use the information to solve a tactical need. The effectiveness on their decision-making is tied to the information at hand to solve such tactical purpose. Institutional features of organizational environments, however, can shape the actions actors take (e.g., the level of detail –specificity or focus– at which they input the information into the IS). This notion of institutional factors of reporting leads to our second proposition:

Proposition 2 (HC-LC): Institutional factors can establish data entry practices that result in highly cohesive (HC) (similar within the same organizational unit) and loosely coupled (LC) (different across organizational units) data collection.

Proposition 2 states that notes from one organizational unit are similar to one another and different from notes from a different organizational unit. More importantly for the organization, is to find a way to assess the effectiveness of these unstructured notes in solving a tactical purpose.



We turn to theories from psychology to discuss the tradeoff between generalization and specification in data collection practices by individuals in the organizations. According to psychology, categories support vital functions of an organism via cognitive economy and inductive inference (Lakoff 1987; Roach, Lloyd, Wiles and Rosch 1978; Smith and Medin 1981; Smith 1988; Parsons 1996). Cognitive economy is achieved by maximally abstracting from individual differences among objects and then grouping objects in categories of larger scope (Smith and Medin 1981; Fodor 1998; Murphy 2004). These categories improve the ability of a person to accurately predict features of instances of a category.

The trade-off between these competing functions is considered one of the defining mechanisms of human cognition and behavior (Roach et al., 1978; Corter and Gluck 1992). According to cognitive theories and theories of classification, categories (which can be represented as a class in the IS) provide cognitive economy and inferential utility, enabling humans to efficiently store and retrieve information about phenomena of interest (Roach et al., 1978; Parsons 1996).

Lukyanenko, Parsons and Wiersma (2014) suggests that in a free-form data entry task, non-experts will classify more accurately at a general level than at a more specific level. When we collect structured data the level of specificity is fixed at the time of system design. Users entering unstructured data, on the other hand, can adjust to their level of specificity—by being more or less detailed. Since specificity results from expertise, unstructured data collection can capture expertise better, which in turn may lead to better performance by having relevant information to support decision-making (e.g., repurposing existing data). We suggest that organizations can foster effective unstructured-data-entry practices that could result in richer data collection. We do so through the following propositions:

Proposition 3 (Inferential utility and repurposing): Unstructured-data-entry formats can help shape effective data-entry practices in solving well-defined needs.

Proposition 3a: Higher levels of specificity in the unstructured data collected leads to increased inferential utility.

Proposition 3b: Higher levels of specificity in the unstructured data collected facilitate repurposing data for other tasks.

The goal of the proposed design propositions is to understand the dynamics of electronic documentation in order to design more effective information systems (Walls, Widmeyer and El Sawy 1992; Eisenhardt 1989). The propositions enable designers to reflect on the effect of institutional practices in user generated electronic documentation. In the following sections we evaluate the propositions and provide a discussion, conclusions, and areas for future research.

EVALUATION OF PROPOSITIONS

We evaluate our propositions in the context of case management in a foster care organization—from three FCMAs. To evaluate proposition 1 we use text mining techniques to discover and extract knowledge from unstructured data and derive our predictive models (Hearst 1999). To evaluate proposition 2, we use a particular application of text mining named Stylometry. To evaluate proposition 3, we use text-mining techniques to analyze whether there are any significant lexical, syntactic, or semantic differences in the text authored by different organizational units.

Proposition 1: Homogeneity of Data

To evaluate proposition one, we use an inductive (classification) text mining technique. First, an expert case manager provides a gold standard with labeled instances. Case notes are labeled “Yes” (uses psychotropic medication) or “No” (no use of psychotropic medication), depending on whether the child is taking psychotropic medication or not. We create individual models for each FCMA and we evaluate the performance of the predictive models using commonly accepted metrics: recall, precision, and F-measure (see Table 1).

Table 1. Evaluation Metrics

We create individual models for each FCMA (A, B, and C) and we evaluate each within its own organizational unit (intra) and across organizational units (inter) (see Figure 1). For each organizational unit, we assign a random sample into a training set containing 70% of the cases and a test set containing the remaining 30% of the data. We use SAS Text Miner 9.4 to evaluate the performance of each of the models and all the permutation comparisons across organizational units.

Figure 1. Intra and Inter-Agency Data Mining Process

There is no standard definition of what a substantial difference in F-measure improvement should be. In the field of information retrieval a 5% performance improvement is considered a substantial improvement (Adomavicius, Sankaranarayanan, Sen and Tuzhilin 2005; Sparck Jones 1974). The z-test for proportions evaluates the statistical difference between two population proportions p1 and p2 (Kachigan 1986; Fleiss, Levin and



Paik 2013). To test the difference between proportions we compute the following:

We evaluated each FCMA by comparing the performance when tested with data from the same organizational unit (intra-FCMA) and compared to models that use data from other organizational units (inter-FCMA). We highlight in bold any statistically significant differences for precision and recall using a z-test for proportions (two-tailed test at the 95% confidence level). We consider the difference in F-measure as substantial if the difference between F-measures is more than 0.05 and the difference in precision or recall is statistically significant (determined using the z-test for proportions and highlighted in bold and with a * symbol)(Adomavicius et al., 2005).

Table 2. Results in difference between proportions for

Precision (P) and Recall (R)

Table 2 shows that the differences in F-measure are substantial in five out of the six pairs. The results show that two of the agencies (FCMA A and FCMA C) consistently perform better in classifying cases of psychotropic drug use. This shows that unstructured data entry formats may result in differences in how information is collected across different organizational units in the organization. Institutional theory helps explain how institutional factors shape practices by individuals in different organizational units, and how these practices can become stable over time and adopted by other individuals, making practices persistent. This validates our first proposition that data collected using unstructured-data-entry formats become homogeneous within organizational units.

Proposition 2: Highly Cohesive-Loosely Coupled

Some researchers have argued that an author’s style is comprised of a limited number of distinctive features inherent to the author, neglecting the content/context-dependency of the writing (De Vel, Anderson, Corney and Mohay 2001). Stylometric analysis is an application

of text mining that uncovers metadata from the documents and allows for statistical comparisons of these metadata as a proxy for “style”. We use SAS Text Miner 9.4 to predict, based on the text in the case note, to which FCMA a particular case note belongs.

Our training set consists of all the case notes from the three agencies assigned to a mutually exclusive train and test set. We train a classification model that has the case note text and our target variable—the FCMA from which that note is coming from (e.g., FCMA A, FCMA B, and FCMA C).

The results of the analysis show that by analyzing a particular case note we can predict, with a high degree of certainty, the authoring FCMA of that case note (see Table 3). These results show that each organization has their own style, which is consistently used by its caseworkers. Based on these results we can confirm proposition 2 that institutional factors establish data entry practices that result in data that is highly cohesive (similar within the same organizational unit) and loosely coupled (different across organizational units).

Table 3. Case Distribution across Agencies

Our results show that despite organizations having established guidelines of reporting, employees adopt new guidelines that become norms over time. This is reflected in how different organizational units are consistent in the way they encode home-visit notes. We also introduce the idea of organizational stylometry. To our knowledge, the use of stylometry at the population level (where many contributors to a body of text) has yet to be explored.

Proposition 3: Inferential Utility and Repurposing

Computers understand very little of the meaning of human language. Sparck Jones (1972) adopted a statistical interpretation of the concept of specificity as a function of term use rather than having to do with the accuracy of the concept representation.

Human language is subtle, with many unquantifiable yet salient qualities. Users with different levels of expertise tend to produce information that differs in quality and level of abstraction. For example, within the category “taking medication”, a conceptual hierarchy can be the following: (a) medication (b) psychotropic medication (c) Lisdexamfetamine (d) Vyvanse, which goes from the most general (a) to the most specific (d). Knowing a child is taking Vyvanse (d) gives more information than just knowing a child is taking medication (a).

We assess language use (in terms of structure and meaning of the case notes) by including/excluding natural



language processing (NLP) features. We then assess whether there are any performance differences in the prediction accuracy of the models. For the text analysis we use SAS Text Miner 9.4, which has built-in text parsing and text filtering features that use NLP. We use three different models: The first one without removing any NLP features, a second one without part-of-speech (POS) and noun group (NG) features, and a third one only with POS and NG features.

The results of the analysis show that higher levels of specificity in the data collected leads to increased inferential utility, which can ultimately help the organization solve unanticipated tasks using these data. We evaluate proposition 3 by comparing the performance of different predictive models using metrics such as precision, recall, and F-measure when varying language features (see Table 4).

Table 4. Prediction Results with Features Disabled

Similarly, to the analysis in proposition one, we use a z-test for proportions for precision and recall as a mechanism for statistically comparing results from the different models.

Based on the results in the previous sections and in line with the literature on cognitive psychology (e.g., specifically on categorization and inference), we could argue that text with higher content specificity could be then abstracted for use in unanticipated applications.

IMPLICATIONS FOR RESEARCH AND PRACTICE

Our findings have important implications both to theory and practice. Unstructured data accounts for more than 80% of the organizational data. Our research analyzes the implications of different unstructured data-entry formats when capturing user input. First, our research suggests that the data-entry formats of the information system can highlight the existence of different organizational styles across organizational units. Second, our research suggests that the flexibility of free-form data entry motivates individuals to stay truthful to their organizational unit’s reporting expectations. This highlights the trade-off between different data-entry formats and the data collected by the organization.

Our results bridge the level of specificity to solving unanticipated and wicked problems. The results of this study can be generalized to other domains and can provide insight to effective system design—the effect of particular designs (that are more/less flexible). In a fully structured scenario, the user is guided by the interface on what needs to be reported. In a semi-structured scenario, pre-established templates guide data entry but allows for some deviation by the user to input something not related

to a particular template. In an unstructured scenario (e.g., free-form), the individual has the liberty to enter data, which is typically defined by the organization (e.g., business processes, training).

Our research encourage experts to be as specific as they can while allowing non-experts to input information at a more general level. Higher specificity, however, requires higher expertise. Thus, it may hinder collaboration from non-experts. Future work should focus on how these different data-entry formats may preclude the collection of valuable information (leading to information loss) when both novices and experts contribute to the system. Previous research have shown that limiting data-entry to experts can preclude the input of valuable information from non-experts and can lead to data accuracy problems (Lukyanenko et al., 2014).

Our results are consistent with psychology research in that the level of specificity of the information limits the applications for which that data can be used. In an environment where individuals have a similar level of expertise based on their background and training, it is preferable for them to be more specific when they enter the data into the system (e.g., the child is taking 5 mg of Adderall provides more information than just saying the child is taking medication). A practical implication to this is that depending on whether the individuals looking at the text are a non-experts vs. experts, the individual writing the text can choose to contribute beyond what he believes is the information required for the reader. This allows for increased inferential utility that can prove beneficial when dealing with unanticipated uses of the data.

Our study also provides guidance of the implications of choosing how data entry formats of a system are designed—and what is it that they would like to capture from their users. To the best of our knowledge, Authorship Analysis had only been done at the individual level. We extend this analysis for authorship identification at the group level (e.g., identifying the authoring organizational unit of a body of text). This can be used by organizations to assess the consistency of data-entry practices in an organization and can be extended by using analytical techniques to create dimensions of categories these documents fall into or metrics that relate to reliability of the data.

This study is not without limitations, psychotropic medication was attributed to the foster home and not the kid. If a foster home, had multiple kids and one was taking psychotropic medication, all of these kids would appear as taking psychotropic medication and vice-versa. This introduces biases in the classification models. However, this does not undermines the goal of our work, which is to understand the relationship of data-entry practices in repurposing data. Future work should focus in providing a method to evaluate when using data in the aggregate is justified as opposed to highlighting meaningful segments for separate analysis.



CONCLUSIONS

Our research shows the implications of specific data-entry formats to the data available to the organization to solve tactical purposes. We present propositions derived from theories of philosophy and human cognition to understand the relation between institutions, organizational styles, and effective data-entry practices. As reported in our case study, allowing some degree of freedom can prove beneficial in solving tactical needs if effective data collection strategies are put in place by the organization. By adopting such practices, organizations can leverage on their data to solve needs that may have not been anticipated at the time of the system’s development. Moreover, it would allow the organization to adapt such information to a different context.

In this paper, we do not argue in favor of unstructured notes over structured notes. However, we argue that for certain applications, although structured information has the advantage of consistency and ability for integration, it may hinder user input. Institutional theories and psychology allow us to bridge practices in specific settings with broader organizational, cultural, and societal contexts (King, Gurbaxani, Kraemer, McFarlan, Raman and Yap 1994; Orlikowski and Barley 2001).

REFERENCES

[1] M.-C. Boudreau, D. Robey, Enacting integrated information technology: A human agency perspective, Organization science, 16 (2005) 3-18. [2] Y. Wand, R. Weber, On the deep structure of information systems, Information Systems Journal, 5 (1995) 203-223. [3] G. DeSanctis, M.S. Poole, Capturing the complexity in advanced technology use: Adaptive structuration theory, Organization science, 5 (1994) 121-147. [4] A. Burton-Jones, C. Grange, From use to effective use: a representation theory perspective, Information Systems Research, 24 (2012) 632-658. [5] M. Berg, E. Goorman, The contextual nature of medical information, International journal of medical informatics, 56 (1999) 51-60. [6] M. Berg, Implementing information systems in health care organizations: myths and challenges, International journal of medical informatics, 64 (2001) 143-156. [7] J. Gantz, D. Reinsel, Extracting value from chaos, IDC iview, 1142 (2011) 1-12. [8] M.C. Tremblay, D.J. Berndt, S.L. Luther, P.R. Foulis, D.D. French, Identifying fall-related injuries: Text mining the electronic medical record, Information Technology and Management, 10 (2009) 253-265. [9] T. Sørlie, C.M. Perou, R. Tibshirani, T. Aas, S. Geisler, H. Johnsen, T. Hastie, M.B. Eisen, M. van de Rijn, S.S. Jeffrey, Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications, Proceedings of the National Academy of Sciences, 98 (2001) 10869-10874.

[10] K. Larsen, C.H. Bong, A tool for addressing construct identity in literature reviews and metaanalyses, MIS Quarterly, (2016). [11] A. Castillo, A. Castellanos, M.C. Tremblay, Improving Case Management via Statistical Text Mining in a Foster Care Organization, in: Advancing the Impact of Design Science: Moving from Theory to Practice, Springer, 2014, pp. 312-320. [12] S. Luther, D. Berndt, D. Finch, M. Richardson, E. Hickling, D. Hickam, Using statistical text mining to supplement the development of an ontology, Journal of Biomedical Informatics, 44 (2011) S86-S93. [13] R.L. Jepperson, Institutions, institutional effects, and institutionalism, The new institutionalism in organizational analysis, 6 (1991) 143-163. [14] A. Giddens, Central problems in social theory: Action, structure, and contradiction in social analysis, Univ of California Press, 1979. [15] W.H. Sewell Jr, A theory of structure: Duality, agency, and transformation, American journal of sociology, (1992) 1-29. [16] E.C. Hughes, The ecological aspect of institutions, American sociological review, 1 (1936) 180-189. [17] S.R. Barley, P.S. Tolbert, Institutionalization and structuration: Studying the links between action and institution, Organization studies, 18 (1997) 93-117. [18] P.J. DiMaggio, W.W. Powell, The iron cage revisited: Institutional isomorphism and collective rationality in organizational fields, American sociological review, (1983) 147-160. [19] G. Lakoff, Women, fire, and dangerous things, University of Chicago Press, Chicago, (1987). [20] E. Roach, B.B. Lloyd, J. Wiles, E. Rosch, Principles of categorization, (1978). [21] E.E. Smith, D.L. Medin, Categories and concepts, Harvard University Press, 1981. [22] E.E. Smith, 2 Concepts and thought, The psychology of human thought, (1988) 19. [23] J. Parsons, An Information Model Based on Classification Theory, Management Science, 42 (1996) 1437-1453. [24] J.A. Fodor, Concepts: Where cognitive science went wrong, Clarendon Press, 1998. [25] G.L. Murphy, The big book of concepts, MIT Press, 2004. [26] J. Corter, M. Gluck, Explaining basic categories: Feature predictability and information, Psychological Bulletin, 111 (1992) 291-303. [27] R. Lukyanenko, J. Parsons, Y.F. Wiersma, The IQ of the Crowd: Understanding and Improving Information Quality in Structured User-Generated Content, Information Systems Research, 25 (2014) 669-689-689. [28] J.G. Walls, G.R. Widmeyer, O.A. El Sawy, Building an information system design theory for vigilant EIS, Information systems research, 3 (1992) 36-59. [29] K.M. Eisenhardt, Building theories from case study research, Academy of management review, 14 (1989) 532-550.



[30] M.A. Hearst, Untangling text data mining, in: Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics, Association for Computational Linguistics, 1999, pp. 3-10. [31] G. Adomavicius, R. Sankaranarayanan, S. Sen, A. Tuzhilin, Incorporating contextual information in recommender systems using a multidimensional approach, ACM Transactions on Information Systems (TOIS), 23 (2005) 103-145. [32] K. Sparck Jones, Automatic indexing, Journal of documentation, 30 (1974) 393-432. [33] S.K. Kachigan, Statistical analysis: An interdisciplinary introduction to univariate & multivariate methods, Radius Press, 1986.

[34] J.L. Fleiss, B. Levin, M.C. Paik, Statistical methods for rates and proportions, John Wiley & Sons, 2013. [35] O. De Vel, A. Anderson, M. Corney, G. Mohay, Mining e-mail content for author identification forensics, ACM Sigmod Record, 30 (2001) 55-64. [36] K. Sparck Jones, A statistical interpretation of term specificity and its application in retrieval, Journal of documentation, 28 (1972) 11-21. [37] J.L. King, V. Gurbaxani, K.L. Kraemer, F.W. McFarlan, K. Raman, C.-S. Yap, Institutional factors in information technology innovation, Information systems research, 5 (1994) 139-169. [38] W.J. Orlikowski, S.R. Barley, Technology and institutions: what can research on information technology and research on organizations learn from each other?, MIS quarterly, 25 (2001) 145-165.


Harrison and Umanath Developing a DDERD

Developing a Dependency Descriptive Entity Relationship Diagram (DDERD)

Andrew Harrison University of Cincinnati [email protected]

Narayan S. Umanath University of Cincinnati

[email protected]

ABSTRACT

The conventional approach to database design follows a process from conceptual modeling to logical modeling and ultimately to physical design. Typically, the conceptual model is built as an ERD from business rules and the data model is not validated through the normalization process until the logical layer of design. We provide an ER grammar and modeling method, which we refer to as Dependency Descriptive Entity Relationship Diagram (DDERD), that transfers database validation processes into the conceptual layer of design. Specifically, the DDERD includes the functions of the Directed-Arc Model and Dependency Diagrams, which are traditional tools used to normalize a model during the logical layer of design. The DDERD extends the conceptual design process to include the validation of relationships between entities and between attributes. This approach provides a better understanding of conceptual semantics for novice database designers and leads to a faster, more accurate, database design.

Keywords

Data, Database, Design, Modeling, ERD, Conceptual Model

INTRODUCTION

In this paper, we propose a new modeling grammar and method that combines an Entity Relationship Diagram (ERD), a directed arc model, and a dependency diagram. We name this model a Dependency Descriptive Entity Relationship Diagram (DDERD). The purpose of the DDERD model is to better conceive the underlying business rules and to improve the revision process of the database design. The design of databases emerged with the advent of the ERD (Chen, 1976), and over time a conventional “top-down” approach has dominated recommendations for database design methods (Batini et al., 1986; Batini et al., 1989; Riccardi, 2003; Date, 2004; Gillenson, 2005; Post, 2005; Watson, 2005; Frost and Van Syke, 2006; Kroenke, 2006; Rob and Coronel, 2006; Elmasri and Navathe, 2007; Hoffer et al., 2007; Kifer et al., 2007; Mannino, 2007; Pratt and Adamski, 2008; Ullman and Widom, 2008; Teorey et al., 2011; Thalheim,

2013; Umanath and Scammell, 2015). In this conventional approach, business rules are derived from process experts, a conceptual model is developed using ERD’s, the logical schema is mapped, and then normalization rules are applied and the model is revised accordingly (Philip, 2007). Despite the maturation of design principles and relatively standard grammars and approaches to database design, there remain significant challenges for database designers to use good modeling and implementation practices.

However, non-expert data modelers generally have a low level of competence, finding mathematical frameworks and data modeling practices to be overly complex or abstract (Mendling et al., 2010). Many data modeling approaches focus on improving the accuracy of data modeling without consideration of the processes through which the models are developed, making them ill-suited for practice (Moody, 2005). Thus, good database design practices are difficult and often fall into disuse within firms (Batra and Marakas, 1995). We believe that these problems may stem from the inconsistencies between the semantics and pragmatics associated with data models (Bera et al., 2014). While most experts take a holistic approach to database development, novices design databases differently and resolve errors through an iterative process of improvement (Batra and Davis, 1992). Furthermore, novices tend to struggle when designing complex elements like high-degree relationships, inheritance, and normalization (Mendling et al., 2010). Consequently, often novice modelers must repeat steps and revise the conceptual model after a litany of issues are discovered during logical modeling.

We propose a new modeling grammar and method that may diminish these challenges by combining conceptual and logical modeling concepts in a pictorial form. Pictorial approaches are easier to digest for non-expert database developers who frequently make errors during the design process (Rob and Coronel, 2007). Lessons from software design have indicated that when designing complex systems, it is less costly to resolve any errors early in design and “fail fast” (Shore, 2004; Fowler, 2005). Similarly, competent conceptual modeling detects and corrects errors early in the development process when changes cost less, consume less time, and require less



effort to fix (Wand and Weber, 2002). Our interest in combining these models is intended to: (1) make logical modeling more accessible to novice database modelers, and (2) to improve the accuracy and pace of iterative database model development.

DDERD Modeling Grammar

Conceptual modeling grammar provides instructions for how to use symbols to represent the real-world domain. These grammars provide rules such as using constructs for entities and relationships, and other rules which dictate that entities can only be connected through relationships (Batra and Davis, 1992). We build on common ER grammar where an entity is represented as a rectangle and relationships are represented with diamonds. The degree of the relationship is assessed by the number of entities connected through it. Deletion rules associated with a relationship are labeled (R-Restrict, C-Cascade, N-Set Null, D-Set Default) alongside the lines connecting the entities. Attributes are represented as circles connected to the entities they describe. Solid circles represent mandatory attributes while optional attributes are represented by unfilled circles.

For denoting cardinality and participation we use Mix/Max notation rather than Chen’s notation (Chen, 1976) or crow’s foot notation (Everest, 1986). In comparison to other forms of notation, Min/Max offers greater precision for business rules and results in less semantic integrity constraints that cannot be represented in a pictorial form (Umanath and Scamell, 2015). Furthermore, look across semantics are a frequent source of error (Feinderer and Salzer, 2007), whereas Min/Max notation is easy to interpret. For example, Mix/Max notation is superior for representing the relationship between a building and an apartment where each building contains between five and eight apartments. We use this modeling grammar to create an ERD that is very densely packed with integrity constraints and reflects the dependencies and interrelationships between attributes. When developed using a systematic method, we believe this type of modeling grammar will improve the revision process for novice database developers.

Our additions to this standard modeling grammar are the representations of foreign keys and semantically obvious functional dependencies. These features replicate the functionality of the directed-arc model and dependency diagrams, respectively. Each relationship (represented as a diamond) requires a foreign key with the same attribute type and field length on the child side of the relationship with its name written in italics. The inclusion of foreign keys makes complex relationships like gerunds and identifying relationships (e.g., weak entities) easier to understand and model correctly. In a conventional database modeling approach foreign keys are not included until the logical model, but we incorporate the identification of foreign keys into the DDERD to aid in the semantic interpretation of complex relationships.

We also include a means to denote any semantically obvious functional dependencies. A functional dependency is a relationship between two attributes where one attribute determines the other (Housel et al., 1979; Connolly and Begg, 2005) and is the basis of normalization from second (2NF) through Boyce-Codd normal forms (BCNF). Semantically obvious functional dependencies are those that can be described by, or derived from, business rules (Umanath and Scamell, 2015), and would not require an instance diagram to interpret. Desirable functional dependencies, and the partial and transitive dependencies that violate 2NF, 3NF, and BCNF are displayed in the DDERD.

In contrast, the mathematical meaning functional dependencies are often difficult for novices to interpret (Philip, 2007). Therefore, we suggest modelers focus on the conceptual meaning of the functional dependency as a means to identify other semantically obvious functional dependencies. To do this, modelers should ask: “If given the value of attribute1, I would know the value of attribute2 with absolute certainty”. Thus, the modeler is conceptually analyzing whether a functional dependency exists where: attribute1 → attribute2. For example, if given the value of a Patient_ID, one knows the patient’s last name with absolute certainty, because: Patient_ID → Last_Name. Research suggests that normalization can be done effectively during the conceptual layer of design (Ling, 1985; Hussain et al., 2005) and the DDERD grammar is designed to facilitate that approach. As displayed in Figure 1, if semantically obvious functional dependencies exists, they are mapped onto the DDERD with a dashed line and arrow. The stem of the arrow is the determinant(s) and the arrow head points to the dependent. This grammar can be used to identify and resolve any undesirable functional dependencies.

Figure 1. Example of DDERD Modeling Grammar



DDERD MODELING METHOD

The method details which procedures to use with the grammar to effectively model the domain of interest. We apply a database design process based on the recommendations of Umanath and Scammell (2015). This process begins with soliciting business rules during requirements specification and analyzing these business rules for important objects and relationships (Rob and Coronel, 2007). Next, a conceptual model is built and revised. After the conceptual model a series of logical models (e.g., Directed-Arc Model, Dependency Diagram, and the Information-Persevering Logical Model) are used to validate and finalize the database design (Umanath and Scammell, 2015). The direct-arc model is a relational schema useful for identifying candidate keys and foreign keys, while the dependency diagram can be used to normalize the model. Errors found through the directed-arc model and dependency diagram can result in significant changes to attributes and relations. Our model deviates from these recommendations by integrating the Directed-Arc Model and the Dependency Diagram into the conceptual model. These changes shift the majority of normalization to the conceptual model rather than the logical model.

We believe that conceptual modeling is a more appropriate time to analyze semantically obvious functional dependencies and relationships between entities. The semantically obvious functional dependencies and relationships between entities would then illustrate the business rules being modeled. Our reason for emphasizing normalization in the conceptual model rather than the logical model is that normalization

largely deal with conceptual issues. Violations of 1NF through BCNF are resolved by decomposing a single relation into multiple relations. Conceptually, what this means is that what was originally conceived as a single entity should be modeled as more than one entity with a relationship between them.

We believe these model misspecifications are can be addressed during the conceptual phase of design, when revision is less costly (Wand and Weber, 2002). Within the DDERD these conceptual modeling errors will manifest as a single attribute with multiple arrows (i.e., semantically obvious functional dependencies) pointed at it. The attribute “PRESCRIPTION.Producer” in Figure 1 indicates this type of error exists. When encountering evidence of model misspecification, the modeler should consider how that attribute is related to the associated entity, and will uncover that an entity is missing from the diagram. In this example, the modeler would recognize that the producer manufactures a prescription drug, which is a different entity than a prescription given to a patient by a doctor. This realization will improve the modeler’s conceptual understanding of the processes and will likely elicit other germane attributes, entities, and relationships. Graphical conceptual models are particularly well-suited to eliciting this type of conceptual improvement to models (Rob and Coronel, 2007).

Furthermore, relationships between entities have conceptual meaning and are directly attributed to the business processes being mapped (Valacich et al., 2015). While the conventional approach indicates that these issues should be resolved at the logical layer of modeling, after a primary key has been defined (Umanath and

Figure 2. Comparison of Conventional and DDERD Approaches



Scammell, 2015), we believe many problems associated with incorrect, or missing, relationships could be better addressed within a conceptual model that includes foreign key attributes. We believe that the explicit inclusion of foreign keys will help modelers understand complex relationships that may occur in situations that require gerund or weak entities. In these scenarios, a weak or gerund entity depends on another entity to exist. Mathematically, this means that the gerund or weak entity contains a composite primary key with one or more of the primary key’s components being a foreign key. However, these concepts are difficult for novice modelers to understand, and therefore, a pictorial conceptual approach may be more useful. Consequently, we believe it will be easier to conceptually understand how entities are related if foreign keys are explicitly modeled for each entity. When a suitable foreign key does not exist, or does not meet the grain of the child entity, the relationship of entities are misspecified. Accordingly, when modelers encounter foreign key errors in the DDERD model, we ask them to reflect on the nature of e the conceptual relationship between entities. This typically means that there is a missing entity that acts as a bridge between the two entities. When used properly, we believe the DDERD approach can yield a faster, easier, revision process for novice database designers to create accurate conceptual data models.

BENEFITS OF DDERD MODELING

The practical benefits of this approach are best elucidated with an example; below we provide an example that illustrates the shortcomings of the conventional approach and the benefits of the DDERD method described above. In this example, we consider a scenario where a patient is given some medicine. A novice database designer will typically convert the nouns (“patient” and “medicine”) into entities with a relationship based on the verb (“is given”) between them.

Figure 3. Example of Model Specification with Example Data

However, the ambiguous business requirements illustrate a common modeling mistake made by novices. The conceptual model assigns attributes from two distinct constructs (“Prescription” and “Drug_Dosage”) within the same entity. This issue can occur when business

requirements are unclear or the database modeler lacks the skill to correctly specify the model. In this scenario, medicine can be alternately interpreted as either a single dose of medicine, a prescription for medicine, or as a type of medication. The same modeling problem happens when modeling most complex systems. For example, a scholarship may refer to a scholarship payment, a scholarship fund, or a scholarship award.

Frequently, this type of conceptual error is not rectified until the logical layer when the dependencies between attributes are considered. At that point, the database designer will typically decompose the table with little thought as to the conceptual meaning of that change. In this example, a transitive dependency violates 3NF (i.e., Name → Producer) and would be decomposed into a separate relation with Name as the candidate key. Furthermore, novice modelers may even attempt to decompose an unnecessary intermediate key relation with the attributes (Rx_#, Pat_#, Drug_name, Date) to resolve transitive dependencies caused by using Rx_# as a surrogate key. These solutions would be mathematically correct, but illustrate a poor understanding the association of semantic and pragmatic issues. We posit that developing a strong conceptual understanding of what these changes mean as business rules is critical for novice modelers and can be stimulated through the assessment of functional dependencies within the ERD. In situations where undesirable functional dependencies exist, there can be significant alterations to the model, and understanding the conceptual meaning of these changes may elicit more attention to other pertinent attributes or relationships.

Figure 4. Conventional Resolution to Model Misspecification with Example Data

In this example, the database modeler had conceived “medicine” as prescription and only finds a problem when examining the dependency diagram and noticing a transitive dependency violating 3NF. If the modeler would have considered functional dependencies that exist between attributes before moving on to the logical layer of model design, it would have become obvious that a problem existed. Then, the violations of 3NF could be addressed while the model is still being conceptually developed leading to a greater understanding about the semantic meaning of the entities, attributes, and relationships. In this example, it is likely that this



modeling error would have stimulated additional conceptualization about the differences between a prescription, dosage instructions, and a drug. These types of conceptual revisions are not intuitive for novice modelers and would not be apparent when revising the model solely using the algebraic principles of normalization. We believe that stimulating thought about the conceptual distinctions between misspecified entities addresses both the semantics of modeling and the pragmatic concerns about how people learn data modeling (Bera et al., 2014). We also posit that this semantic and pragmatic approach would kindle additional model revisions and result in more accurate data models. For example, if a defining distinction between a “prescription” and a “drug” is that a prescription is given by a physician, the modeler would not only separate the two entities that were incorrectly combined, but would also recognize that prescription would have a relationship with a physician construct whereas the drug itself would not. Stimulating this type of pragmatic conceptual analysis would result in a greater understanding of the business requirements by the modeler, and ultimately, a more accurate model (Wand and Weber, 2002). Consequently, we believe that the model accuracy would be as good as, or better, than that of a model developed using a conventional approach, but the revision process would occur more rapidly and would present a more accessible approach for novices to model confusing business requirements.

Figure 5. DDERD Resolution to Model Misspecification with Example Data

CONCLUSION AND FUTURE RESEARCH

We have presented a new DDERD modeling grammar and method intended to improve the design process for novice database designers. We believe that the DDERD approach takes into account pragmatic factors (Bera et al., 2014) and improves upon the strategies novice database developers use to create their models (Batra and Davis, 1992; Mendling et al., 2010). The major benefit of this approach is that it provides a faster approach for clarifying inaccurate or ambiguous business requirements and translating them into accurate models by applying a “fail fast” approach to database design and data modeling. This position contrasts with the conventional assumption that the pictorial model is equivalent to the mathematical

functional dependencies the model will be designed correctly initially (Chen, 1976). Decades of research has indicated that while a conceptual model does maintain functional equivalency with the logical model, it is rarely modeled correctly during initial design (Batra and Marakas, 1995; Bera et al., 2014; Philip, 2007; Mendling et al., 2010; Umanath and Scammell, 2015). Consequently, we believe that a database design process that plans for, and optimizes, revision provides a more prudent approach. We believe the benefits of this approach will be particularly valuable to novice database modelers who have been found to use an iterative approach to database design (Batra and Davis, 1992) and do not fully comprehend the mathematical underpinnings of the relational model (Mendling et al., 2010). We do not expect the same benefits for expert designers who make fewer modeling mistakes and need fewer iterations before coalescing on an appropriate relational model.

The potential drawbacks to this approach are that it shifts the cognitive load to the conceptual model from the logical layer of modeling. While we assert that people will typically learn concepts faster when using a pictorial approach (Rob and Coronel, 2007), it will increase the cognitive load associated with conceptual modeling. However, we expect that the overall effort required to complete the database design will be reduced by removing the necessity of much of the most difficult mathematic modeling. Our perspective is built upon assumptions that the database will be designed incorrectly initially and will go through a series of revisions before being modeled correctly, and that the cost of these revisions is reduced when potential problems are fixed earlier in the design process (Shore, 2004; Fowler, 2005).

In the future, we intend to present a conventional approach and the DDERD approach to multiple sets of novice database designers to empirically evaluate their efficacy when developing database solutions (Siau and Rossi, 2007). To test our propositions, we intend to partition two sections of a course and teach database design using the conventional approach for one section and the DDERD approach for the other. We will use data from various homework assignments to assess the amount of effort, measured in the number of revisions and the amount of time spent, for each set of students. We will determine the accuracy of the designs by counting the number of modeling mistakes for each group. We will then compare the results to determine if any significant differences exist.

One potential limitation of this method is that students may communicate across sections. Additionally, the DDERD approach was developing for novice database developers, and may not be as useful for data modelers with significant experience. Experts already consider holistic components of their model such as the conceptual ramifications of changes made in the logical phase of design (Batra and Davis, 1992), and may not find as much utility in using the DDERD approach as novice modelers



would. One related potential limitation is that the conceptual models using the DDERD get increasingly more complex for difficult models. Thus, the scalability of the DDERD approach when used on simple and complex systems will need to be considered in future work. To evaluate these potential limitations and the general effectiveness of the DDERD modeling approach for improving database modeling and design, we believe the proposed grammar and method should be empirically tested in a variety of contexts.

REFERENCES

1. Batini, C., Lenzerini, M., and Navathe, S. B. (1986) A comparative analysis of methodologies for database schema integration. ACM computing surveys (CSUR), 18, 4, 323-364.

2. Batini, C., Ceri, S., and Navathe, S. (1989) Entity Relationship Approach. Elsevier Science Publishers BV (North Holland).

3. Batra, D., and Davis, J. G. (1992) Conceptual data modelling in database design: similarities and differences between expert and novice designers. International Journal of Man-Machine Studies, 37, 1, 83-101.

4. Batra, D., and Marakas, G. M. (1995) Conceptual data modelling in theory and practice. European Journal of Information Systems, 4, 3, 185-193.

5. Bera, P., Burton-Jones, A., and Wand, Y. (2014) Research note—How semantics and pragmatics interact in understanding conceptual models. Information Systems Research, 25, 2, 401-419.

6. Chen, P. P. S. (1976) The entity-relationship model—toward a unified view of data. ACM Transactions on Database Systems (TODS), 1, 1, 9-36.

7. Connolly, T., and Begg, C. (2005) Database systems: A practical guide to design, implementation, and management. Reading, MA: Addison Wesley.

8. Date, C. J. (2004) An introduction to database systems. Reading, MA: Addison-Wesley.

9. Elmasri, R., and Navathe, S. (2007) Fundamentals of database systems. Reading, MA: Addison-Wesley.

10. Feinerer, I., and Salzer, G. (2007) Consistency and minimality of UML class specifications with multiplicities and uniqueness constraints, Proceedings of the First Joint IEEE/IFIP Symposium on Theoretical Aspects of Software Engineering (TASE), 411-420.

11. Fowler, M. (2005). The state of design [software design]. IEEE Software, 22(6), 12-13.

12. Frost, R., and Van Slyke C. (2006) Database design and development: A visual approach. Upper Saddle River, NJ: Prentice Hall.

13. Gillenson, M. L. (2005) Fundamentals of database management systems. Danvers, MA: Wiley.

14. Hoffer, J. A., Prescott, M. B., and McFadden, F. R. (2007) Modern database management. Reading, MA: Addison-Wesley.

15. Housel, B. C., Waddle, V., and Yao, S. B. (1979) The functional dependency model for logical database design, Proceedings of the Fifth International Conference on Very Large Data Bases, 194-208.

16. Hussain, T., Shamail, S., and Awais, M. M. (2003) Eliminating process of normalization in relational database design, Proceesings of INMIC 2004, 645-649.

17. Kifer, M., Bernstein, A., and Lewis, P. (2006) Database systems: An application-oriented approach. Reading, MA: Addison-Wesley.

18. Kroenke, D.M. (2006) Database processing: Fundamentals, design, and implementation. Upper Saddle River, NJ: Prentice Hall.

19. Ling, T. W. (1985) A Normal Form For Entity-Relationship Diagrams. ER, 1985, 24-35.

20. Mannino, M.V. (2007) Database: Design, application development, and administration. New York: McGraw-Hill.

21. Mendling, J., Reijers, H. A., and van der Aalst, W. M. (2010) Seven process modeling guidelines (7PMG). Information and Software Technology, 52, 2, 127-136.

22. Moody, D. L. (2005) Theoretical and practical issues in evaluating the quality of conceptual models: current state and future directions. Data and Knowledge Engineering, 55, 3, 243-276.

23. Philip, G. C. (2007) Teaching database modeling and design: areas of confusion and helpful hints. Journal of Information Technology Education, 6, 481-497.

24. Post, G. V. (2005) Database management systems. New York: McGraw-Hill.

25. Pratt, P., and Adamski, J. (2008) Concepts of database management. Boston: Course Technology.

26. Riccardi, G. (2003). Database management with web site development applications. Reading, MA: Addison Wesley.

27. Rob, P., and Coronel, C. (2006) Database systems: Design, implementation, and management. Boston: Course Technology.

28. Shore, J. (2004). Fail fast [software debugging]. IEEE Software, 21(5), 21-25.

29. Siau, K., and Rossi, M. (2011) Evaluation techniques for systems analysis and design modelling methods–a review and comparative analysis. Information Systems Journal, 21, 3, 249-268.

30. Teorey, T. J., Lightstone, S. S., Nadeau, T., and Jagadish, H. V. (2011) Database modeling and design: logical design. Elsevier.



31. Thalheim, B. (2013) Entity-relationship modeling: foundations of database technology. Springer Science and Business Media.

32. Ullman, J., and Widom, J. (2008) A first course in database systems. Upper Saddle River, NJ: Prentice Hall.

33. Umanath, N. and Scamell, R. (2015) Data modeling and database design, 2nd ed., Boston: Course Technology.

34. Valacich, J. S., George, J. F., and Hoffer, J. A. (2015) Essentials of systems analysis and design. Pearson Education.

35. Wand, Y., and Weber, R. (2002) Research commentary: information systems and conceptual modeling—a research agenda. Information Systems Research, 13(4), 363-376.

36. Watson, R. T. (2005) Data management: Databases and organizations. Danvers, MA: Wiley.


Hvalshagen et al. Narratives: A Tool to Help the Tool

Conceptual Data Models and Narratives:

A Tool to Help the Tool

Merete Hvalshagen University of Dayton

[email protected]

Binny M. Samuel University of Cincinnati

[email protected] Roman Lukyanenko

University of Saskatchewan [email protected]

ABSTRACT

We aim to tackle an important challenge related to the use of conceptual models: providing additional support to non-technical users. Specifically, we build on prior research in psychology that argues that providing concrete examples of generalized propositions can improve problem understanding and performance. We propose to combine conceptual models with narratives containing specific statements about the application domain. We conducted a laboratory experiment that supports our general claim that supplementing conceptual data models with narratives increases understanding of conceptual data models (specifically, cardinality constraints). We then propose a strategy for finding effective application of narratives. The paper concludes by outlining directions for future research.

Keywords

Conceptual data models, narratives, cardinality constraints, pragmatic quality, artifact sampling.

INTRODUCTION

Conceptual data models are fundamental objects for communication and information sharing among stakeholders. While they may be designed by information systems (IS) professionals (e.g. systems analysts), they are frequently used by people with non-technical backgrounds (e.g. business users) for the purposes of creation, communication and verification (Dobing and Parson 2006). This introduces a challenge of ensuring that non-technical users who may not be familiar with the techniques and methods employed to create conceptual models, are able to effectively work with them. In fact, practitioner surveys indicate that understandability for non-technical stakeholders is a major concern when organizations are considering adopting conceptual data modeling tools (Davies, Green, Rosemann, Indulska and Gallo 2006).

BACKGROUND

For this study, we first look at why conceptual data models can be difficult to understand (Batra, Hoffer and Bostrom 1990; Topi and Ramesh 2002), and second how narratives can aid the comprehension of such models, in particular for non-technical business users.

Pragmatic Quality in Conceptual Data Models

The first challenge of comprehending conceptual data models is that such models usually require specific IS training in order to be understood (Khatri, Ramesh, Vessey and Clay 2006). This makes such models particularly inaccessible to business users, who commonly lack such training.

While training business users could be possible, the cost of training, in addition to possibly poor retention due to infrequent exposure to such models, could make training less desirable. This study instead suggests aiding the comprehension of conceptual data models through narratives that describe features of the depicted domain in “plain language”. Prior research has shown that narratives often are employed in the systems development life-cycle (Alexander and Maiden 2004) because narratives are an understandable and effective way of sharing knowledge (Schank and Abelson 1995).

We believe pairing conceptual data models with narratives written in “plain language” can make these models more accessible to a broader audience, especially people without much IS training. If so, the combined representation could enhance the pragmatic quality of the domain representations, where pragmatic quality is the extent to which the representation is understood by all relevant stakeholders (Lindland, Sindre and Solvberg 1994).

Norman’s Gulf of Evaluation

Pairing conceptual data models with narratives could also address the second challenge in interpreting conceptual data models – resolving abstractions. When an IS



professional creates a conceptual data model, many details of the domain are “abstracted” away in order to create a general and simplified model. However, for a business user to apply the conceptual model to get an understanding of the domain, this process of abstraction must be reversed. This means relating the abstract elements of the model with real-world instances and examples. This process of applying an abstract representation to recreate a concrete state of affairs can be challenging, an is referred to as the gulf of evaluation (Norman 1986).

Figure 1: Norman’s Gulf of Evaluation. Adapted from Norman

(1986).

Interpreting conceptual data models becomes problematic when users struggle with invoking suitable real-world instances (Goldstone et al. 2008; Nathan 2012). Pairing conceptual data models with narratives could possibly help users making the transition between an abstract model and a concrete application domain.

Comprehending Cardinality Constraints

When interpreting conceptual data models, users often struggle with comprehending cardinality constraints in particular (Dunn, Gerard and Grabski 2011; Gemino and Wand 2005). Cardinality constraints often represent important business rules of the domain, for example “Can a customer have multiple accounts?” While prior research has underlined the importance of comprehending cardinality constraints (Currim and Ram 2012; Rosca, Greenspan and Wild 2002), prior research has not thoroughly considered how well individuals understand the semantics of the business rules expressed through these cardinality constraints.

THEORY AND RESEARCH MODEL

Narratives are possibly the oldest, and often most effective way of sharing knowledge (Schank and Abelson 1995; Read and Miller 1995). Because narratives are effective, they have been adopted throughout the systems development lifecycle for a range of different purposes (Alexander and Maiden 2004). For this research, we focus on two characteristics in particular, First, narratives can be understood by “everyone”. When paired with conceptual data models, narratives could therefore enhance pragmatic quality of the combined domain representation. Second, narratives provide context in

terms of instances (i.e., examples), something that abstract conceptual data models lack. The next section will investigate effective ways of combining a conceptual data model with narratives.

Cognitive Load Theory

A stream of research where the main goal is to design understandable material for learning and instruction (Paas, Renkl and Sweller 2003) has been based on Sweller’s Cognitive Load Theory (henceforth CLT) (Sweller 1994; van Merriënboer and Sweller 2005). CLT has also gained popularity as a theoretical foundation for IS research (Browne and Parsons 2012).

The foundation of CTL is that the capacity of our working memory is limited, which impede our ability to process complex material. If the demands imposed by the incoming information is beyond our cognitive processing capacity, a state of cognitive overload occurs, and there will be a breakdown in comprehension (Sweller 2010). To avoid such a breakdown, one should careful select what information to include in learning material, in order to deliberately manage the sources of cognitive complexity in the material.

There are three sources of cognitive complexity in material: intrinsic, extraneous, and germane. Intrinsic complexity is an inherent characteristic of the subject matter, for example the complexity of a specific math function. Extraneous and germane complexity stem from extra information we choose to include in the learning material, for example to better illuminate the subject matter. Examples of extra information are definitions, explanations, examples, illustrations, etc. If the extra material helps a person to better comprehend the subject matter, it is called germane. However, if the extra information is irrelevant, confusing, redundant, etc., it is called extraneous.

Cognitive load, regardless of the source, is additive: total cognitive load is the sum of the loads imposed by intrinsic, germane, and extraneous information sources (Sweller 2006). The focus of this research is on regulating the amount and type of germane information in the material. Adding germane content to information material will increase the overall cognitive load as such material will need to be processed. However, this increase in cognitive load should have an overall positive effect on comprehension as long as a) the material is truly germane to understanding, b) the person have spare cognitive capacity and c) he/she applies this capacity to process it.

Combining Abstract Representations with Concrete Examples

Prior CLT research has shown that combining abstract concepts with instances (i.e., concrete examples) often leads to a better understanding of the subject matter (Sweller 2006; Paas and van Gog 2006; Renkl and Atkinson 2010). Concrete examples are effective because



they support the creation of (cognitive) knowledge schemas (Kalyuga 2006). Knowledge schemas are representations stored in long-term memory that are fundamental to most skilled performances like reading, playing chess, or solving mathematical problems (Ericsson 2006).

Pairing abstract principles with concrete examples supports the creation of knowledge schemas in different ways. First, concrete examples serve as ready-made solutions that novices can emulate and copy (Atkinson, Derry, Renkl and Wortham 2000). Second, examples helps novices build their own knowledge schemas through repeated exposure and analogy (Renkl 2005). Lastly, concrete examples encourage knowledge elaboration where new knowledge is intertwined with one’s prior knowledge (Kalyuga 2009).

Hypothesis 1: Narrative as Supplement

In this study, we take advantage of this principle by pairing conceptual data models with concrete narratives. The conceptual data model will serve as the abstract representations of the application domain (Wand et al. 1995) while the narrative will serve as the concrete, exemplified version (Hvalshagen 2011). Combining these two representations should therefore bring about a better understanding of the application domain.

To construct a narrative that serves as a concrete example of cardinality constraints (the focus of this paper), the content of the narrative could focus on explaining and illustrating the different cardinality constraints in the conceptual data model. For example, the narrative could concretely explain that a SALES AGENT might not have any CUSTOMERs signed up, see highlighted cardinality constraint in Figure 2. The narrative could state “Lillian has just started out as a sales agent in her district. She therefore has no customers signed up yet.”

Figure 2: A simple Conceptual Data Model as an Entity-

Relationship Diagram (Chen 1976) using Crow’s Foot Notation.

Prior research has suggested that people struggle more with understanding participation constraints (=minimum cardinality constraints) compared to connectivity constraints (=maximum cardinality constraints) (Bodart, Patel, Sim and Weber 2001; Burton-Jones, Clarke, Lazarenko and Weber 2012; Wand, Storey and Weber 1999). However, we had difficulties finding studies that has explored this difference empirically. For our study, we are therefore testing these two types of cardinality

constraints separately. This leads us to our first hypothesis.

Hypothesis 1: Providing narratives as explanatory supplements will help subjects to better understand both connectivity and participation cardinality constraints depicted in the conceptual data model.

Hypothesis 2: Narrative as Problem Scenario

As an alternative to an explanatory supplement, the narrative description can be offered in the form of an assessment task. Research has shown that solving concrete problems are sometimes easier than solving equivalent abstract ones (Baranes, Perry and Stigler 1989; Koedinger, Alibali and Nathan 2008; Koedinger and Nathan 2004). This is because concrete problems allow one to apply existing knowledge to make sense of the situation (Markovits and Vachon 1990). For example, making sense of logical statements like “P implies Q; P is true Is Q true?” is often easier if it is stated in concrete terms like “If it rains, the street will be wet; It rains. Will the street be wet?”

We would therefore like to know if cardinality questions stated as narrative problem scenarios are easier to assess than abstract questions about cardinality. For example, a narrative problem scenario could state: “Matthew is trying to sign up a new customer, his mom's friend Mary. If Mary agrees to become a customer, does she also have to sign up for a subscription plan?” Hence, in addition to employing narratives as explanatory supplements, we are also interested in investigating the effect of narratives in the form of problem scenarios. This leads us our second hypothesis.

Hypothesis 2: Providing narratives as problem scenarios will help subjects to better understand both connectivity and participation constraints depicted in the conceptual data model.

Combining the Two Types of Narratives

Would combining the two types of narratives lead to better understanding than providing either one alone? To answer that, we need to consider the four different alternatives of employing narratives with conceptual data models:

A. Providing no narrative B. Narrative as explanatory supplement alone C. Narrative as problem scenario alone D. Providing both types of narratives together.



Figure 3: Summary of hypotheses.

With these four different alternatives, we believe that providing one narrative is better than no narrative, and that providing a combination of two narratives are better than providing one narrative alone, see Figure 3.

We believe that combining the two types of narratives (Alternative D) would lead to a better performance than either narrative alone (Alternative B or C) due to a better fit between the format of the problem representation (=diagram and narrative) and format of the task (=problem scenarios) (Vessey 1991; 2006). In particular, it should be easier for subjects to assess cardinality constraints through concrete problem scenarios if they also have the access to an explanatory narrative containing similar concrete scenarios. This means that combining narrative problem scenarios with narrative supplements should result in a higher performance than providing a narrative supplement alone or providing a narrative problem scenario alone. This leads us to our third and fourth hypothesis.

Hypothesis 3: Combining a narrative supplement with narrative problem scenarios will lead to a better understanding of both connectivity and participation constraints compared to providing a narrative supplement alone.

Hypothesis 4: Combining a narrative supplement with narrative problem scenarios will lead to a better understanding of both connectivity and participation constraints compared to providing narrative problem scenarios alone.

RESEARCH METHOD

To test the hypotheses stipulated above, we designed an experiment. The purpose of the experiment was to assess if both narrative types, narrative as an explanatory supplement and narrative as a problem scenario, had a positive effect on comprehension of cardinality constraints (H1 and H2). We also wanted to assess if combining these two types of narratives would lead to better performance than providing either type of narrative by itself (H3 and H4).

Experimental Design

This experiment had three experimental conditions:

A. Type of cardinality constraint: Connectivity (= maximum cardinality) and participation (= minimum cardinality).

B. Cardinality constrains explained in the narrative supplement: Cardinality constraints depicted in the ER diagram were either explained in the supplementary narrative (=narrative-explained) or not explained in the supplementary narrative (=unexplained).

C. Formulation of assessment task: The question evaluating comprehension were either formulated as abstract questions or as narrative problem scenarios.

Participants

The study participants were 60 undergraduate business students drawn from a mandatory business course. Average conceptual modeling experience and database experience were low.

Experimental Materials

The experimental material consisted of one application domain, “Home Selling”, which depicted a business organization for home selling of household products. The experimental material consisted of one ER diagram, one narrative supplement to the ER diagram, and two sets of cardinality verification questions, see Figure 4. The material for assessing comprehension of the cardinality constraints contained one set of 16 abstract questions and one set of 16 narrative problem scenarios.



Figure 4: Excerpt from experimental material

Experimental Procedure

The study had four phases. First, participants completed a demographics survey. Second, the participants received a brief introduction to cardinality constraints in conceptual modeling. Third, participants completed a training task on comprehending cardinality constraints similar to the one that was given for the experiment. Fourth, participants received the experimental material and completed the assessment.

Assessing Performance

Comprehension was assessed by awarding one point for each correct answer, and no points for incorrect answers. Scores were calculated in terms of “percentage correct”, with separate performance scores for the 16 abstract questions and for the 16 narrative problem scenarios.

ANALYSIS AND RESULTS

In following section presents the results of our analysis.

Hypothesis 1: The Effect of Narratives Descriptions as Supplement when Abstract Problem Formulations were Given

Hypothesis 1 stated that a narrative description as an explanatory supplement would help subjects to better comprehend cardinality constraints compared to when no narrative was provided (given that abstract problem formulations were used to assess comprehension). To test this, we performed a paired samples t-test comparing the average scores for narrative-explained constraints with the scores for unexplained constraints.

CONNECTIVITY M1 Difference SD p d Explained 87.1% 19.6% 23.2% < 0.000 0.68 Unexplained 67.5% 27.0% PARTICIPATION M Difference SD p d Explained 68.8% 18.8% 29.0% 0.001 0.45 Unexplained 50.0% 29.5%

Table 1: The effect of narrative as an explanatory supplement when given abstract problem formulation.

For connectivity constraints, we found that subjects scored significantly higher (19.6%) on narrative-explained constraints compared to unexplained constraints, see Table 1. Similarly, for participation constraints, we also found that subjects scored significantly higher (18.8%) on narrative-explained constraints compared to unexplained constraints. Hypothesis 1 was therefore supported.

Hypothesis 2: The Effect of Narratives Problem Scenarios When No Narrative Supplements were Given

Hypothesis 2 stated that subjects would perform better on questions using narrative problem formulations compared to abstract problem formulations when no narrative supplements to the ER-diagram is given. To test this, we performed a paired samples t-test comparing the average scores for questions with narrative problem formulations compared to questions with abstract question formulations when no narrative supplements were given.

1 M = Mean, SD = Standard Deviation, p = Calculated Probability, d = Cohen’s d



CONNECTIVITY M Difference SD p d Narrative Problem Scenario

72.5% 5.0%

25.5% 0.193 0.17

Abstract Formulation 67.5% 27.0%

PARTICIPATION M Difference SD p d Narrative Problem Scenario

69.6% 19.6%

24.4% < 0.000 0.69

Abstract Formulation 50.0% 29.5%

Table 2: The effect of narratives as problem scenarios when no narrative supplement was given.

For connectivity constraints, we found that subjects scored higher (5.0%) on narrative-explained constraints compared to unexplained constraints, however this difference was not significant, see Table 2. However, for participation constraints, we found that subjects did scored significantly higher (19.6%) on narrative-explained constraints compared to unexplained constraints. Hypothesis 2 was therefore supported for participation constraints but not for connectivity constraints.

Hypothesis 3: The Effect of Combining Narratives Supplements with Narrative Problem Formulation Compared to Narrative Supplements Alone

Hypothesis 3 stated that subjects would perform better with a combination of narrative supplements and narrative problem formulations compared to using a narrative supplement alone. To test this, we performed a paired samples t-test comparing the average scores for the combined use of narratives compared to average scores obtained when narratives only was provided in the form of a supplement (that is, paired with an abstract narrative problem formulation). CONNECTIVITY M Difference SD p d Combined 87.1%

0.0%

20.3%

1.000 0.00 Narrative Supplement Alone

87.1% 23.2%

PARTICIPATION M Difference SD p d Combined 82.9%

14.2%

21.8%

0.001 0.46 Narrative Supplement Alone

68.8% 29.0%

Table 3: The effect of combining two types of narratives compared to narrative supplement alone.

For connectivity constraints, we found no difference between using a combination of the two types of narratives compared to only using narrative as a supplement, see Table 3. However, for participation constraints, we found that subjects did scored significantly higher (14.2%) when a combination of the two types of narratives was provided compared to when only a narrative as a supplement was provided.

Hypothesis 3 was therefore supported for participation constraints but not for connectivity constraints.

Hypothesis 4: The Effect of Combining Narratives Supplements with Narrative Problem Formulation Compared to Narrative Problem Formulations Alone

Hypothesis 4 stated that subjects would perform better with a combination of narrative supplements and narrative problem formulations compared to using narrative problem formulations alone. To test this, we performed a paired samples t-test comparing the average scores obtained for the combined use of narratives compared to average scores obtained when narratives only was provided in the form of a problem formulation (that is, no narrative diagram supplement available).

For connectivity constraints, that subjects did scored significantly higher (14.6%) when a combination of the two types of narratives was provided compared to when only narratives as problem formulations were provided, see Table 4. Similarly, for participation constraints, we found that subjects also scored significantly higher (13.3%) when a combination of the two types of narratives was provided compared to when only narratives as problem formulations were provided. Hypothesis 4 was therefore supported. CONNECTIVITY M Difference SD p d Combined 87.1%

14.6%

20.3%

< 0.000 0.55 Narrative Problem Scenario Alone

72.5% 25.5%

PARTICIPATION M Difference SD p d Combined 82.9%

13.3%

21.8%

< 0.000 0.50 Narrative Problem Scenario Alone

69.6% 24.4%

Table 4: The effect of combining two types of narratives compared to narrative problem scenarios alone.

Summary of Analysis

Table 5 provides an overview of the findings in this study. The next section discusses these findings in more detail.

HI Narrative as explanatory supplement will enhance understanding of cardinality constraints.

Supported

H2 Narrative as problem scenarios will enhance understanding of cardinality constraints.

Supported for Participation

H3 Combining the two types of narrative will lead to better performance than a narrative as an explanatory supplement alone.

Supported for Participation

H4 Combining the two types of narrative will lead to better performance than narrative problem formulations alone.

Supported



Table 5: Summary of hypotheses and results.

DISCUSSION

The analysis showed that providing narratives together with ER diagrams generally had a positive effect on comprehension of cardinality constraints, especially for participation types of constraints. Furthermore, providing a combination of the two types of narratives had an especially positive effect on the comprehension of participation types of cardinality constrains. In summary, providing a narrative (of either type) was generally better than no narrative, and providing two narratives in combination was generally better than providing only one narrative, see Figure 5.

Figure 5: Performance comparison between no narrative, either type of narrative, and combined narratives.

Given that this study has established the efficacy of narratives in promoting domain understanding, one important question for future research is the design of narratives. If narratives are to be used as an IS representation (Rai 2017), specific consideration on how a narrative should be designed is imperative. Hvalshagen (2011) conducted a literature review on the components of narratives and elucidated several key elements that comprise a narrative (see Table 6).

Narrative Component Example with Narrative Component

1) Real (-like) People: A narrative is about particular, named characters

Snow White, wicked queen

2) Specific Places: The events of a narrative happen at specific places

King’s castle, cottage of the seven dwarfs

3) Concrete Objects: A narrative involves concrete objects, that is, instances or exemplars of objects rather than object categories

Magical mirror, poisoned apple

4) Historic Timeframe: The events of a narrative happens within a particular timeframe.

From birth of Snow White until the break of the evil spell

5) Evidence of Consciousness: The people in a narrative display evidence of consciousness (e.g., thinking, feeling, reflecting, deciding, etc.)

The wicked queen fears that Snow White is prettier than she is. The prince wants to rescue Snow White.

6) Thematic Storyline: A narrative follows a thematic storyline that causally connects actions and events, and has some overall moral or point.

Goal: Get rid of Snow White Storyline: Trick her to eat poison apple Point/Moral: Vain, jealousy, and selfish desire is dangerous

Table 6: Components of narratives. Adapted from Hvalshagen (2011)

Hvalshagen (2011) serves as a starting point of possible ways to operationalize (i.e., instantiate) a narrative. However, there could be many combinations of the aforementioned components and unearthing the right combination of components could become quite unwieldly – this is known as the instantiation validity challenge (Lukyanenko, Evermann and Parsons 2014). Thus, we believe an artifact sampling – creating multiple instantiations of narratives to account for the possible variations predicted by theory (Lukyanenko, Samuel, Evermann and Parsons 2016) is apt to consider with respect to the design of narratives.

By using artifact sampling, we hope to evaluate whether one combination of narrative components (e.g., real people and specific places) is more effective for domain understanding compared to another set of narrative components (e.g., real people, specific places, and concrete objects). Varying the degree and combinations of narrative components could be applied both to narratives as explanatory supplements as well as narratives as problem scenarios. Together, this could yield hundreds, if not thousands of different combinations of narratives.

Furthermore, research on how to improve comprehension e.g., through worked examples, has concluded that multiple examples of each concept are usually more effective than a single example (Renkl and Atkinson 2010; Cooper and Sweller 1987). Hence, one could also evaluate the multiplicity effect (e.g., 0, 1, or more) of these narrative components to see if multiple occurrences of a narrative component resulted in some sort of inflection point with respect to the efficacy of a narrative for a given task. One could also evaluate whether these combinations would necessitate the need of an ER Diagram, or if a narrative alone could suffice.

In summary, our results indicate that there is a potentially different impact of narratives on the different types of cardinality constraints. Artifact sampling, we hope, can be used to gain deeper insight into the reasons behind these varying results.

The second major research direction we believe is necessary, is a greater variety of tasks for evaluating domain understanding. Prior research in this area has for example characterized understanding tasks as elaborative reconstruction (Bodart et al., 2001), read-to-do (Khatri et al., 2006), and surface vs. deep level of understanding (Saghafi and Wand 2014). However, beyond



understanding, there exist several secondary outcomes of interest. For example, do narratives improve communication effectiveness? Or, do narratives improve the overall IS success measured by information quality, effective use, or adoption (Burton-Jones and Grange 2012; Lukyanenko, Parsons and Wiersma 2014; Burton-Jones and Volkoff 2017)? One could even connect domain understanding to other downstream activities such as decision making or improved analytics (Howson 2013; Laursen and Thorlund 2016).

CONCLUSION

The aim of this research was to find a way to aid non-technical business users to better understand conceptual data models, focusing on cardinality constraints as a first step. In general, we found that narratives as a support tool to conceptual data models significantly improved the understanding of the cardinality constraints in conceptual data models. Future research could explore this phenomenon further, adding to our knowledge of what in particular makes narratives an effective representation for IS.

REFERENCES

[1] B. Dobing, J. Parson, How UML is Used, Communication of the ACM, 49 (2006) 109-113. [2] I. Davies, P. Green, M. Rosemann, M. Indulska, S. Gallo, How do practitioners use conceptual modeling in practice?, Data & Knowledge Engineering, 58 (2006) 358-380. [3] D. Batra, J.A. Hoffer, R.P. Bostrom, Comparing Representations with Relational and EER Models, Communication of the ACM, 33 (1990) 126-139. [4] H. Topi, V. Ramesh, Human Factors Research on Data Modeling: A Review of Prior Research, An Extended Framework and Future Research Directions, Journal of Database Management, 13 (2002) 3-19. [5] V. Khatri, V. Ramesh, I. Vessey, P. Clay, Comprehension of Conceptual Schemas: Exploring the Role of Application and IS Domain Knowledge, Information Systems Research, 17 (2006) 81–99. [6] I. Alexander, N. Maiden, Scenarios, Stories, Use Cases: Through the Systems Development Life-Cycle, John Wiley & Sons, 2004. [7] R.C. Schank, R.P. Abelson, Knowledge and Memory: The Real Story, in: R.S. Wyer (Ed.) Knowledge and Memory: The Real Story, Lawrence Erlbaum Associates, Hillsdale, NJ, 1995, pp. 1-85. [8] O.I. Lindland, G. Sindre, A. Solvberg, Understanding quality in conceptual modeling, IEEE software, 11 (1994) 42-49. [9] D.A. Norman, Cognitive Engineering, in: D.A. Norman, S. Draper (Eds.) User Centered System Design: New Perspectives on Human-Computer Interaction, Lawrence Erlbaum Associates, 1986, pp. 31-61. [10] C.L. Dunn, G.J. Gerard, S.V. Grabski, Diagrammatic Attention Management and the Effect of Conceptual

Model Structure on Cardinality Validation, Journal of the Association for Information Systems, 12 (2011) 585. [11] A. Gemino, Y. Wand, Complexity and clarity in conceptual modeling: comparison of mandatory and optional properties, Data & Knowledge Engineering, 55 (2005) 301-326. [12] F. Currim, S. Ram, Modeling spatial and temporal set-based constraints during conceptual database design, Information Systems Research, 23 (2012) 109-128. [13] D. Rosca, S. Greenspan, C. Wild, Enterprise modeling and decision-support for automating the business rules lifecycle, Automated Software Engineering, 9 (2002) 361-404. [14] S.J. Read, L.C. Miller, Stories are Fundamental to Meaning and Memory: For Social Creatures, Could It Be Otherwise?, in: R.S. Wyer (Ed.) Knowledge and Memory: The Real Story, Lawrence Erlbaum Associates, Hillsdale, NJ, 1995, pp. 139-152. [15] I. Alexander, N. Maiden, Scenarios, Stories, and Use Cases: The Modern Basis for System Development, IEE Computing & Control Engineering 15 (2004) 24-29. [16] F. Paas, A. Renkl, J. Sweller, Cognitive Load Theory and Instructional Design: Recent Developments, Educational Psychologist, 38 (2003) 1-4. [17] J. Sweller, Cognitive Load Theory, Learning Difficulty, and Instructional Design, Learning and Instruction, 4 (1994) 295-312. [18] J.J.G. van Merriënboer, J. Sweller, Cognitive Load Theory and Complex Learning: Recent Developments and Future Directions, Educational Psychology Review, 17 (2005) 147-177. [19] G.J. Browne, J. Parsons, More enduring questions in cognitive IS research, Journal of the Association for Information Systems, 13 (2012) 1000. [20] J. Sweller, Cognitive Load Theory: Recent Theoretical Advances, in: J.L. Plass, R. Moreno, R. Brünken (Eds.) Cognitive Load Theory, Cambridge University Press, 2010, pp. 29-47. [21] J. Sweller, The Worked Example Effect and Human Cognition, Learning and Instruction, 16 (2006) 165-169. [22] F. Paas, T. van Gog, Optimizing Worked Example Instruction: Different Ways to Increase Germane Cognitive Load, Learning and Instruction, 16 (2006) 87-91. [23] A. Renkl, R.K. Atkinson, Learning from Worked-Out Examples and Problem Solving, in: J.L. Plass, R. Moreno, R. Brünken (Eds.) Cognitive Load Theory, Cambridge University Press, 2010, pp. 91-108. [24] S. Kalyuga, Instructing and Testing for Expertise: A Cognitive Load Perspective, in: A.V. Mittel (Ed.) Focus on Educational Pscyhology, Nova Science, 2006, pp. 53-104. [25] K.A. Ericsson, An Introduction to Cambridge Handbook of Expertise and Expert Performance: Its Development, Organization, and Content, in: K.A. Ericsson, N. Charness, P.J. Feltovich, R.R. Hoffman (Eds.) Expertise and Expert Performance, Cambridge University Press, New York, USA, 2006, pp. 3-19.



[26] R.K. Atkinson, S.J. Derry, A. Renkl, D. Wortham, Learning from Examples: Instructional Principles from the Worked Examples Research, Review of Educational Research, 70 (2000) 181-214. [27] A. Renkl, The worked-out-example principle in multimedia learning, The Cambridge handbook of multimedia learning, (2005) 229-245. [28] S. Kalyuga, Knowledge Elaboration: A Cognitive Load Perspective, Learning and Instruction, 19 (2009) 402-410. [29] M. Hvalshagen, Harnessing the Power of Narratives to Understand User Requirements, in: Department of Operations and Decision Sciences, Indiana University, Kelley School of Business, 2011. [30] P.P.-S. Chen, The Entity-Relationship Model - Toward a Unified View of Data, ACM Transactions on Database Systems, 1 (1976) 9-36. [31] F. Bodart, A. Patel, M. Sim, R. Weber, Should Optional Properties Be Used in Conceptual Modeling? A Theory and Three Empirical Tests, Information Systems Research, 12 (2001) 384-405. [32] A. Burton-Jones, R. Clarke, K. Lazarenko, R. Weber, Is use of optional attributes and associations in conceptual modeling always problematic? Theory and empirical tests, (2012). [33] Y. Wand, V.C. Storey, R. Weber, An Ontology Analysis of the Relationship Construct in Conceptual Modeling, ACM Transactions on Database Systems, 24 (1999) 494-528. [34] R. Baranes, M. Perry, J.W. Stigler, Activation of real-world knowledge in the solution of word problems, Cognition and Instruction, 6 (1989) 287-318. [35] K.R. Koedinger, M.W. Alibali, M.J. Nathan, Trade-Offs Between Grounded and Abstract Representations: Evidence From Algebra Problem Solving, Cognitive Science, 32 (2008) 366-397. [36] K.R. Koedinger, M.J. Nathan, The Real Story behind Story Problems: Effects of Representations on Quantitative Reasoning, The Journal of the Learning Sciences, 13 (2004) 129-164. [37] H. Markovits, R. Vachon, Conditional Reasoning, Representation, and Level of Abstraction, Developmental Psychology, 26 (1990) 942-951. [38] I. Vessey, Cognitive Fit: A Theory-Based Analysis of the Graphs vs. Tables Literature, Decision Sciences, 22 (1991) 219-240.

[39] I. Vessey, The Theory of Cognitive Fit: One Aspect of a General Theory of Problem Solving?, in: P. Zhang, D. Galletta (Eds.) Human-Computer Interaction and Management Information Systems: Foundations, M.E. Sharpe Inc Armonk, NY, 2006. [40] A. Rai, Diversity of Design Science Contributions, MIS Quarterly, 41 (2017) iii–xviii. [41] R. Lukyanenko, J. Evermann, J. Parsons, Instantiation validity in IS design research, in: International Conference on Design Science Research in Information Systems, Springer, 2014, pp. 321-328. [42] R. Lukyanenko, B.M. Samuel, J. Evermann, J. Parsons, Toward Artifact Sampling in IS Design Research, in: Workshop on Information Technology and Systems, Dublin, Ireland, 2016. [43] G. Cooper, J. Sweller, Effects of Schema Acquisition and Rule Automation on Mathematical Problem-Solving Transfer, Journal of Educational Psychology, 79 (1987) 347-362. [44] A. Saghafi, Y. Wand, Do ontological guidelines improve understandability of conceptual models? a meta-analysis of empirical work, in: System Sciences (HICSS), 2014 47th Hawaii International Conference on, IEEE, 2014, pp. 4609-4618. [45] A. Burton-Jones, C. Grange, From use to effective use: a representation theory perspective, Information Systems Research, 24 (2012) 632-658. [46] R. Lukyanenko, J. Parsons, Y.F. Wiersma, The IQ of the crowd: understanding and improving information quality in structured user-generated content, Information Systems Research, 25 (2014) 669-689. [47] A. Burton-Jones, O. Volkoff, How can we develop contextualized theories of effective use? A demonstration in the context of community-care electronic health records, Information Systems Research, Forthcoming (2017) 1–40. [48] C. Howson, Successful business intelligence: Unlock the value of BI & big data, McGraw-Hill Education Group, 2013. [49] G.H. Laursen, J. Thorlund, Business analytics for managers: Taking business intelligence beyond reporting, John Wiley & Sons, 2016.


Vitharana and Jain Is Microservices a Viable Technology

Is Microservices a Viable Technology for Business Application Development? An Organizational Theory

Based Rationale

Padmal Vitharana Whitman School of Management

Syracuse University [email protected]

Hemant Jain College of Business

The University of Tennessee at Chattanooga [email protected]

ABSTRACT

The ability to deploy quickly, focus on a single function around a business capability, and encompass own data sources are often identified as key advantages of microservices. While the practitioner community is beginning to the make the case for microservices, there is apprehension among some scholars in that they view microservices as merely another unsubstantiated approach to software development. To address this gap in our understanding of applicability of microservices for business application development, we draw from organizational theory to demonstrate that the technically desirable characteristics of microservices coincide with organizationally desirable characteristics.

Keywords

Microservices, Application development, Organizational theory

INTRODUCTION

The practitioner community has provided ample anecdotal evidence on the benefits of microservices and their ability to overcome inadequacies in existing software development approaches. Microservices represent a single function around a business capability, encompass own data resources, and are quickly deployable (Daya et al. 2016). These characteristics among others hold the key to practitioners’ widespread acceptance of microservices as the next best thing in software development.

The academic community however needs more convincing to jump on the microservices bandwagon. While many academics acknowledge microservices’ ability to potentially address some of the current challenges (O’Connor et al. 2016; Rahman and Gao 2015), some view microservices as another approach to software development with a new set of problems of their own (Dragoni et al. 2016). Fewer publications in scholarly journals on the topic offer further proof of this hesitancy to widespread acceptance of microservices.

Therefore, a sound rationale based on theoretical underpinning is warranted to demonstrate that satisfying technical needs of software development overlap with satisfying organizational needs of the firm. We argue that organizational theory provides the fitting lens to make this case. Thus, our research aims to address this gap by drawing from organizational theory.

REVIEW OF LITERATURE

In the early seventies, structured design techniques based on principles of modularization aimed to reduce inherent complexities in software. These techniques refer to program design considerations for making “coding, debugging and modifications easier, faster, and less expensive by reducing complexity” (Stevens et al. 1974, p. 115). The concepts of cohesion and coupling stood at the heart of structured design.

One of the criticisms of structured design is that it mainly focused on coding and little credence was given to the domain context. Object-orientation emerged to remedy this drawback by designing systems using a set of interacting objects. Relying on fundamental concepts such as abstraction, encapsulation, inheritance, and polymorphism, object-oriented design employs objects as the building blocks of software systems.

The next radical transformation in application development ensued with the introduction of software components. These refer to a piece of executable software with a published interface (Hopkins 2000). Component-based development emerged from the realization that industrial revolution of the software industry – building an integrated whole from independent parts – necessitates the need to separate the design of software artifacts (e.g., components) from application assembly using those artifacts.

Subsequently, web services emerged as the preferred approach to software development. The service-oriented architecture (SOA) elucidates the use of autonomous web services to compose an application via an orchestration mechanism. While benefits of SOA is well recognized,



practitioners have raised some concerns about its viability. They include, lack of consensus on how to do SOA well, lack of guidance regarding service granularity, and lack of understanding on how to split a bigger complex service into smaller more manageable ones (Daya et al. 2016; Newman 2015; Wolff 2017). Table 1 illustrates a brief overview of the evolution of application development.

More recently, advocates have started to prescribe microservices as a specific, and more importantly a better approach to conventional SOA (Fernández Garcés 2016; Newman 2015). Microservices approach to software development offers guidance in terms of granularity of the business functionality each microservice represents, a single area of responsibility (i.e., bounded context) that facilitates the assignment of ownership to a team, integration of service logic with corresponding data, and in essence on how to do SOA well. Some industry experts observe that the greatest benefit to adopting the new paradigm manifests from its ability to align microservice-based system design with organizational goals (Daya et al. 2016; Newman 2015). The excitement among industry analysts stems from the many technical characteristics exhibited by microservices that are far more favorable to software development than their traditional counterparts. These technical characteristics include1:

(1) Uses a standard interface across various hardware and software platforms.

(2) Independently and quickly deployable.

(3) Supports a single function around a business capability.

(4) Single responsibility center (ownership, accountability, autonomy)

(5) Contains own data source.

(6) Ease of defect management (locate a reported bug in design or code, make corrections, and management including version control).

(7) Ease of requirements change (locate corresponding design or code, make revisions, and management including version control).

(8) Ease of Testing (code, database).

(9) Scalability (microservices can be scaled up independent of one another).

(10) Resilience/fault-tolerance (if one microservice breaks down, then the rest of app can still function).

1 Bogner and Zimmermann 2016; Daya et al. 2015; Dragoni et al. 2016; Killalea 2016; Nadareishvili, et al. 2017; Newman 2015; O’Connor et al. 2016; Rahman and Gao 2015; Thönes 2015; Wolff 2017.

(11) Technology Innovation / heterogeneity (ease of use of new technologies, programming languages, etc.).

(12) Ease of monitoring (performance, failure).

(13) Support more process variations (combine with other microservices to support a bigger function or different process).

While there is agreement that microservices could deliver what is promised in terms of desired technical characteristics, some in the academia are concerned whether microservices paradigm would offer significant benefits to the organization. For many in academic circles, the key question remains: do microservices offer the firm business value beyond that is achieved from realizing desired technical characteristics of software development?

THEORETICAL FRAMEWORK

We argue that microservices offer the firm business value when desirable technical characteristics of microservices coincide with desirable organizational characteristics. There is an expansive volume of management literature on what characteristics make organizations effective. They range from instituting effective organizational strategy, leadership, and workforce to implementing efficient business processes. One such stream in the management literature focuses on high performance organizations (HPO). Andre´ de Waal and his colleagues have done extensive work on HPOs for more than two decades. In one publication, de Waal (2007) identifies several dimensions that potentially make up a HPO: (1) Organizational design; (2) Strategy; (3) Process management; (4) Technology; (5) Leadership; (6) Individuals and roles; (7) Culture; and (8) External orientation. These dimensions and associated characteristics of a HPO emerged from an in depth study conducted by the same author (de Waal 2006).

Our theorizing effort involved the mapping of desirable technical characteristics of microservices with organizational characteristics of HPOs as identified by de Waal. In each of the dimension, we selected the characteristics that are salient to software systems and their support for the organization. In this mapping exercise, we map how technically desirable characteristics of microservices support IT capabilities required for each relevant HPO characteristic.

IMPLICATIONS AND FUTURE RESEARCH

Our work has key implications for research and practice. We drew from organizational theory literature to demonstrate that technically desirable characteristics of microservices coincide with organizationally desirable characteristics of the firm. By doing so, we bring to the fore the emerging microservices paradigm as not only an effective approach to application development, but also affording the firms the means to attain organizationally desirable characteristics. Realizing microservices’



Table 1. The Evolution of Application Development

Issues Addressed Outstanding concerns

Structured design Reduce complexity in code. Identify modules; increase cohesion

within each module and reduce coupling between modules.

System consists of a set of interacting modules.

Focus is on code. No clear guidelines on how to match

business aspects with code modules. No link between modules and corresponding

operations.

Objected-orientation Focus on business aspects; link to operations.

Identify classes representing business aspects; object classes contain data and operations (methods).

System consists of a set of interacting objects.

Tight link between object design and system design.

Objects are too fine grained that they do not correspond to meaningful business functions.

Challenges to scaling. Challenges to reuse (via inheritance).

Component-based development

Separate component fabrication and application assembly.

Identify domain components representing higher-level business functions.

Implement them. Assemble components to build

applications for the domain.

System is not independently deployable. Does not always correspond to a single

business functionality. Challenges to assigning ownership of the

component to a team. Data stored in a repository external to the

component. Challenges to monitoring (e.g.,

performance) the component. Weak resilience (if one component breaks,

the entire system breaks). No capability to combine components in

real-time to provide greater functionality. Web Services Separate webservice fabrication and

application assembly. An application or a web service can

invoke functionality of another web service without having to integrate it.

Represent a business activity (but granularity remains an issue).

Ability to monitor web services (e.g., performance).

Offer resilience (if one web service fails, the system can still function).

Offer the capability to combine web services in real-time to provide greater functionality.

Due to granularity issues, challenges to assigning ownership of webservices to a team.

Data stored in a repository external to the component (hence changes to a webservice might require a change to an external data repository).

Hence, not always independently deployable.

No clear understanding of how to map web services to business functionality.



Table 2. Mapping Organizational Characteristics with Technical Characteristics of Microservices.

Organizational Characteristics Technical Characteristics Supporting Rationale Organizational design characteristics. (1) Stimulate cross-functional and cross-

organizational collaboration. (2) Simplify and flatten the organization by

reducing boundaries and barriers between and around units.

(3) Foster organization-wide sharing of information, knowledge and best practices.

(4) Constantly realign the business with changing internal and external circumstances.

(a) Supports a single function around a business capability. (b) Use a standard interface across various hardware and software platforms. (c) Independently and quickly deployable. (d) Ease of requirements change. (e) Scalability. (f) Technology innovation and heterogeneity. (g) Support more process variations. (h) Ease of Testing.

(a) Because each microservice encompasses a single function, it promotes creation of teams and subsequent cross-functional collaboration (#1) among users, IT professionals and developers responsible for that business service and the corresponding microservice. Furthermore, it helps to reduce barriers between units (#2) (by bringing together stakeholders across functional areas) and enable realignment of team membership with changing circumstances (#4). (b) This allows for easy organization wide sharing of information, knowledge and best practices (#3) across units because microservices can be easily integrated across different hardware and software platforms without requiring gateway software. (c)-(h) Microservice-based architecture facilitates the alignment of the business with changing circumstances (#4) because microservices are independently and quickly deployable, can be replaced with more appropriate service to account for requirements changes, scalable, adopt to technology innovation, support more process variations, and be easily tested.

2. Strategy characteristics. (1) Set clear, ambitious, measurable and

achievable goals. (2) Align strategy, goals, and objectives with

the demands of the external environment and build robust, resilient and adaptive plans to achieve these.

(a) Ease of monitoring. (b) Independently and quickly deployable. (c) Ease of requirements change. (d) Scalability. (e) Technology innovation and heterogeneity. (f) Support more process variations. (g) Resilience/fault-tolerance.

Set clear, ambitious, measurable and achievable goals (#1) Microservices can be easily designed to measure and monitor the goals set in real time (a). (b)-(f) Microservice-based architecture facilitates the alignment of business strategy, goals, and objectives with demands of the external environment (#2). (g) Microservice-based architecture offers greater resilience (#2) in that if one microservice fails, the rest of the application can still function.

3. Process characteristics. (1) Design a good and fair reward and

incentive structure.

(a) Supports a single function around a business capability. (b) Single responsibility center.

(a)-(c) These help managers create an effective incentive structure (#1) that allows them to track performance of the microservice and the corresponding business service and link it with the performance of the



(2) Continuously innovate products, processes and services.

(3) Continuously simplify and improve all the organization’s processes.

(c) Ease of monitoring. (d) Independently and quickly deployable. (e) Ease of requirements change. (f) Technology innovation and heterogeneity. (g) Support more process variations

responsible team. (d)-(f) These can more readily support organization’s need to continuously innovate products, processes, and services (#2). (g) By mixing and matching microservices, organizations can more readily simplify and improve processes (#3).

4. Technology characteristics (1) Implement flexible ICT-systems

throughout the organization.

(a) Use a standard interface across various hardware and software platforms. (b) Independently and quickly deployable. (c) Ease of defect management. (d) Ease of requirements change. (e) Scalability. (f) Support more process variations. (g) Contains own data sources.

(a)-(g) Implement flexible ICT system (#1) - Since microservices use standard interfaces across various hardware and software platforms, are independently and quickly deployable, can more easily cope with requirements changes, manage defects, are scalable, support more process variations, and contains own data sources, they result in flexible ICT systems throughout the organization.

5. Leadership characteristics (1) Stimulate change and improvement. (2) Hold people responsible for results and be

decisive about nonperformers.

(a) Independently and quickly deployable (b) Ease of requirements change (c) Scalability (d) Technology innovation and heterogeneity (e) Support more process variations (f) Single responsibility center.

(a)-(e) Microservices support the implementation of the change and improvements (#1). (f) This allows managers to hold individuals and teams responsible (#2) for results of each business service supported by a corresponding microservice.

6. Individuals & Roles characteristics (1) Engage and involve the workforce.

(a) Supports a single function around a business capability. (b) Single responsibility center.

(a)-(b) These allow mangers to create teams with respective stakeholders, thereby, facilitating their engagement and involvement (#1) in the business service.

7. Culture characteristics (1) Empower people and give them freedom to

decide and act. (2) Develop and maintain a performance-

driven culture. (3) Create a shared identity and a sense of

community.

(a) Supports a single function around a business capability. (b) Single responsibility center. (c) Contains own data source.

(a)-(c) Microservice-based development empowers users, IT professionals, and developers and give them freedom to decide and act (#1). (a)-(b) Because each microservice consists of a single function encompassing a responsibility center, it allows managers to develop and maintain a performance-driven culture (#2) for results of each business service can be easily monitored.



(d) Use a standard interface across various hardware and software platforms.

(d) This allows easier sharing of information across hardware and software platforms thus creating a shared identify and a sense of community (#3) among users.

8. External orientation characteristics (1) Continuously strive to enhance customer

value creation. (2) Maintain good and long-term relationships

with all stakeholders. (3) Monitor the environment consequently and

respond adequately. (4) Grow through partnerships and be part of a

value creating network.

(a) Independently and quickly deployable. (b) Ease of defect management. (c) Resilience/fault tolerance. (d) Technology innovation/ heterogeneity. (e) Use a standard interface across various hardware and software platforms (f) Ease of monitoring

(a)-(d) Microservice-based development supports organizations to continuously strive to enhance customer value creation (#1) because microservices can be independently and quickly deployable, easily manage defects, are more resilient/fault tolerant and enable technology innovation/heterogeneity. (e) This assists organizations to maintain good and long-term relationships with all stakeholders such as suppliers (#2) and grow through partnerships and be part of a value creating network (#4) by easily integrating systems and information. (f) This helps organizations to monitor the business services corresponding to microservices and respond adequately (#3).



potential to add value to the firm at a much broader organizational-level is crucial in promoting their adoption.

Increasing microservices’ credibility among academics facilitates further research on the topic. Questions such as how to leverage an existing set of software assets (e.g., components) to construct a suite of microservices are certain to spark interest from scholars. Along these lines, a metrics that would distinguish between an optimally designed and a sub-optimally designed suite of microservices would be useful to managers in their quest to deliver effective software application solutions that are the mainstay of high performance organizations. In the next stage of our research, we plan to conduct an empirical study to validate the mapping between microservices’ technical characteristics and organizationally desirable characteristics among a group of practitioners.

ACKNOWLEDGMENTS

This research was funded by grants from the Robert H. Brethen Operations Management Institute and the Earl V. Snyder Innovation Management Center at the Whitman School of Management, Syracuse University.

REFERENCES

1. Bogner, J. and Zimmermann, A. (2016) Towards Integrating Microservices with Adaptable Enterprise Architecture. Proceedings of the 20th IEEE Enterprise Distributed Object Computing Workshop.

2. Daya, S., van Duy, N., et al. (2016) Microservices from Theory to Practice: Creating Applications in IBM Bluemix Using the Microservices Approach. IBM Redbooks.

3. de Waal, A. A. (2007) The Characteristics of a High Performance Organization. Business Strategy Series, 8(3), 179 – 185.

4. de Waal, A. A. (2006) The Characteristics of a High Performance Organisation. https://ssrn.com/abstract= 931873.

5. Dragoni, N., Mazzara, M., Giallorenzo, S., Montesi, F., Luch Lafuente, A., Mustafin, R. and Safina, L.

(2016) Microservices: Yesterday, Today, and Tomorrow. https://arxiv.org/pdf/1606.04036.pdf.

6. Fernández Garcés, L. (2016) Design of an Application Using Microservices Architecture and its Deployment in the Cloud. Universidad Politecnica de Madrid.

7. Fichman, R. G. and Kemerer, C. F. (1992). Object-Oriented and Conventional Analysis and Design Methodologies. Computer, 25(10), 22–39.

8. Hopkins, J. (2000). Component Primer. Communications of the ACM, 43(10), 27–30.

9. Killalea, T. (2016). The Hidden Dividends of Microservices. Communications of the ACM, 59(8), 42–45.

10. Nadareishvili, I., Mitra, R., McLarty, M., and Amundsen, M. (2016). Microservice Architecture: Aligning Principles, Practices, and Culture, Sebastopol, CA: O’Reilly.

11. Newman, S. (2015). Building Microservices: Designing Fine-Grained Systems, Sebastopol, CA: O’Reilly.

12. O’Connor, R. V., Elger, P., and Clarke, P. M. (2016). Exploring the impact of situational context – A case study of a software development process for a microservices architecture. Proceedings of the IEEE/ACM International Conference on Software and System Processes.

13. Rahman, M., and Gao, J. (2015). A Reusable Automated Acceptance Testing Architecture for Microservices in Behavior-Driven Development. Proceedings of the IEEE Symposium on Service-Oriented System Engineering.

14. Stevens, W. P., Myers, G. J., and Constantine, L. L. (1974). Structured Design. IBM Systems Journal, 13(2), 115–139.

15. Thönes, J. (2015). Microservices. IEEE Software, 32(1), 115–139.

16. Wolff, E. (2017). Microservices: Flexible Software Architecture, New York: Addison-Wesley.


Nazir Collective Ownership in IS Development

Getting an Old Dog to Learn New Tricks: The Role of Collective Ownership in ISD

Salman Nazir West Virginia University

[email protected]

ABSTRACT

The software development process is a very collaborative and knowledge intensive process. Due to its complex nature, software development teams often fail in accomplishing their desired goals of completing development projects under time, money and functionality constraints. Although IS research has spent quite of bit of energy understanding this complex process, we still struggle with achieving software development goals within time, on budget and with promised functionalities. This paper suggests that in order to achieve these goals, software development teams need to cultivate a climate of collective ownership in the team. Specifically, this paper investigates the factors that promote a sense of collective ownership which, in turn, would help achieve performance goals such as on-time completion, on-budget completion and software functionality.

Keywords

Collective ownership, Software development, Knowledge management, Task Interdependence

INTRODUCTION

Software development is a very knowledge-intensive, collaborative and complex process. It is often strained with issues such as changing user requirements, cost over-runs and poor team collaboration (Ozer and Vogel, 2015). It is no surprise, therefore, that the software development process often delivers software that is over-budget, late, or completed with fewer features than originally planned (Ozer and Vogel, 2015). The reasons for these shortcomings range widely from misconstruing user requirements to planning errors to team collaboration issues. The IS community has spent quite a bit of energy to understand these shortcomings and the IS literature suggests several areas of improvements. For instance, a plethora of techniques such as, software-process diversity (Ramasubbu, Bharadwaj and Tayi, 2015) eXtreme Programming, Scrum (Ozer and Vogel, 2015), computer-aided software engineering (Hardgrave, Davis and Riemenschneider, 2003) and software process tailoring (Xu and Ramesh, 2007) have been suggested. Although these techniques are important in correcting the above

mentioned issues, since software development is essentially a team process, it is crucial for us to understand how the team can be motivated to mindfully take responsibility of every aspect of the complex software development process.

This paper suggests that the concept of collective ownership plays a significant role in the success of software development projects. We suggest that when teams have a sense of collective ownership towards the project, they are better equipped in handling all facets of the software development process. Collective ownership is akin to the concept of mindfulness at the team level in the sense that it creates a heightened level of responsibility in the team which allows better management of all problematic aspects of development. Drawing on past literature we investigate the antecedents of collective ownership in ISD teams. The literature guides us to suggest that teams which develop an intimate knowledge of the project requirements, engage for collaboration, and have task interdependence will have higher levels of collective ownership. This higher collective ownership will translate to performance benefits in terms of software development such as on-time completion, on-budget completion and software functionality.

COLLECTIVE OWNERSHIP

The concept of collective ownership has garnered significant attention in several agile development methodologies. It encourages all team members on an ISD project to take responsibility for all aspects of the software code (Beck, 2000). Although it is important for agile projects, collective ownership should also play an important role in ISD projects that do not specifically use agile methodologies. Since agile methodologies such as Scrum and eXtreme Programming are still not the default development tool in most projects and organizations, most ISD projects follow traditional approaches to software development. Hence, it is important to investigate how collective ownership can improve the results in projects that follow traditional development methodologies. At its core, collective ownership is an organic approach to managing code (Maruping, Zhang and Venkatesh, 2009). The goal in collective ownership is to find and fix any



coding errors by leveraging the expertise of the entire team rather than individual team members (Beck, 2000). Thus, any developer on the team is allowed to make changes to the code at any time. This may involve adding code, fixing code, or improving code via refactoring (Beck, 2000; Maruping et al., 2009). With collective ownership, developers on a team have the responsibility of checking inputs of their colleagues, in addition to their own inputs. This dual responsibility on each team member has the potential to produce code that is better in quality and caters closely to users’ requirements.

Collective ownership in purely agile methodologies is somewhat limited to the management of code only. In this paper, we suggest that collective ownership should encompass more than just the code part of the project. Significant benefits can be reaped if developers are allowed to take responsibility of every aspect of the software project. This may include the user requirements, the

process used for development, as well as the code of the project. Hence, we use this broader definition of collective ownership which encompasses any and all aspects of the project.

Key Facilitators of Collective Ownership

Collective ownership in any project is essentially related to the degree of proactive collaboration and responsibility that team members are willing to accept (Beck, 2000). Research on information systems development suggests that projects that have a high level of shared understanding among project members and are engaged in effective collaboration are better equipped to achieve project goals and objectives (Gregory and Keil, 2014). Moreover, research also suggests that collective ownership is more likely to develop under highly interdependent task conditions (Pierce and Jussila, 2010).

Key Facilitator Definition References

Shared Understanding The process of establishing common social basis for collaboration among team members such that they develop a single, shared mindset about project requirements.

Gregory and Keil (2014)

Engaging for Collaboration The process of specifically focusing on collaboration with team members.

Gregory and Keil (2014)

Task Interdependence The degree to which team members work on tasks that have reciprocal dependence on each other.

Pierce and Jussila (2010)

Following this literature we suggest that shared understanding of requirements, engagement for collaboration and task interdependence are three key facilitators of collective ownership.

RESEARCH MODEL

As shown in Figure 1, we suggest that shared requirements understanding, engagement for collaboration and task interdependence would have a positive effect on the degree of collective ownership that team members develop in a software development project.

A high level of shared understanding is important as it achieves a common social basis for collaboration among team members. When team members go through the exercise of jointly developing a shared understanding of project requirements, they are more likely to go through a socialization process which facilitates the development of

the psychology of “us” and “ours” (Pierce and Jussila, 2010). This process enables the team members and stakeholders to develop a single and shared mindset which is key to fostering a sense of ownership towards a particular project.

While developing a shared understanding is important, engaging for collaboration is the exercise that specifically focuses on actually collaborating with each other (Gregory and Keil, 2014). This concept has its roots in the ideas of collectivist cooperation based on stewardship theory and goal alignment (Sundaramurthy and Lewis, 2003). As project stakeholders are engaged in the activity of collaboration, they make more of the team’s decisions and are part of the project management decision making process. This specific engagement of stakeholders in the process helps build commitment in team members which is crucial for the nurturing of collective ownership (Gregory and Keil, 2014).



Finally, task interdependence is also needed to attain high levels of collective ownership in project stakeholders. As team members work on tasks that are highly interdependent, they communicate more often, exchange knowledge and ideas, jointly plan courses of action and solve problems (Pierce and Jussila, 2010). This has the

overall effect of developing a sense of collective ownership whereby the project stakeholders start thinking about the project as their collective product.

Figure 1. Role of Collective Ownership in Software Project Performance

The three constructs of shared understanding, engaging for collaboration and task interdependence lead to collective ownership. We expect that collective ownership, in turn, should lead to attaining performance goals such as on-time completion, on-budget completion and software functionality (as shown in Figure 1 above). As collective ownership creates a sense of shared responsibility in project stakeholders, there is a greater likelihood that team members will engage in mindful management of the entire development process. Equipped with a better understanding of the project deliverables and requirements, team members will be better able to correct any code-related or feature-related issues with the project (Maruping et al., 2009). Overall, the team members’ mindful management of the project outcomes will make the team more efficient in every aspect of the project as it enables them to do course corrections before they cause major time, money or feature related delays.

RESEARCH METHOD

This study will be conducted in two phases. The first phase will constitute the use of a focus group of project managers that deal with software development teams. The objective of this exercise would be to develop a measure for operationalizing the collective ownership construct. In the second phase, after development of a measure, survey questionnaires will be mailed to project managers of ISD teams. The sample will be drawn from Fortune 1000 North American firms. Data will be collected from each team member and subsequently combined to form team-level measures for collective ownership, shared understanding, engaging for collaboration and task interdependence.

EXPECTED CONTRIBUTIONS

Two major contributions are expected from this paper. First, this paper will enhance our understanding of the collective ownership construct through the investigation of its antecedents. Although studies have investigated collective ownership (Maruping et al., 2009), there are no studies that attempt to explore its antecedents. The literature has lagged in understanding the factors that are prerequisite for teams to be able to gain optimal software development performance. By filling this gap in the literature, this study is expected to have a major contribution towards increasing our knowledge. Second contribution is the operationalization of the collective ownership construct (expected to be achieved through instrument development). This would forward IS research since it will allow us to explore many other interesting questions such as: how does collective ownership help or hinder project performance in changing environmental conditions.

REFERENCES

1. Beck, K. (2000) Extreme Programming Explained. Addison-Wesley, Reading, Massachusetts.

2. Gregory, R.W. and Keil, M. (2014) Blending Bureaucratic and Collaborative Management Styles to Achieve Control Ambidexterity in IS Projects, European Journal of Information Systems, 23, 3, 343-356.

Shared Understanding

Engaging for Collaboration

Task Interdependence

Collective Ownership

On-Time Completion

On-Budget Completion

Software Functionality



3. Hardgrave, B.C., Davis, F.D. and Riemenschneider, C.K. (2003) Investigating determinants of software developers’ intentions to follow methodologies, Journal of Management Information Systems, 20, 1, 123–151.

4. Larman, C. (2003) Agile and Iterative Development: A Manager’s Guide. Addison-Wesley, Reading, MA.

5. Maruping, L.M., Zhang, X. and Venkatesh, V. (2009) Role of Collective Ownership and Coding Standards in Coordinating Expertise in Software Project Teams, European Journal of Information Systems, 18, 355-371.

6. Xu, P. and Ramesh, B. (2007) Software process tailoring: An empirical investigation, Journal of Management Information Systems, 24, 2, 293–328.

7. Ozer, M. and Vogel, D. (2015) Contextualized Relationship between Knowledge Sharing and

Performance in Software Development, Journal of Management Information Systems, 32, 2, 134-161.

8. Pierce, J. and Jussila, I. (2010) Collective Psychological Ownership Within the Work and Organizational Context: Construct Introduction and Elaboration, Journal of Organizational Behavior, 31, 810-834.

9. Ramasubbu, N., Bharadwaj, A. and Tayi, G. (2015) Software Process Diversity: Conceptualization, Measurement, and Analysis of Impact on Project Performance, MIS Quarterly, 39, 4, 787-807.

10. Salas, E., Sims, D. E. and Burke, C.S. (2005) Is there a ‘Big Five’ in teamwork? Small Group Research, 36, 5, 555–599.


Keynote 1 9:15 AM - 10:15 AM Title: Algorithms to Live By ... · Keynote Abstracts and Speaker Bios...

Documents

Transcript of Keynote 1 9:15 AM - 10:15 AM Title: Algorithms to Live By ... · Keynote Abstracts and Speaker Bios...