A Novel Kuhnian Ontology for Epistemic Classification of ...

1

A Novel Kuhnian Ontology for Epistemic Classification of STM

Scholarly Articles

Khalid M. Saqr and Abdelrahman Elsharawy

SOPHIO Research Inc., 267 11th avenue, New York, NY 10001 (P), U.S.A

Corresponding author: [email protected]

ABSTRACT

Thomas Kuhn proposed his paradigmatic view of scientific discovery five decades ago. The

concept of paradigm has not only explained the progress of science, but has also become the

central epistemic concept among STM scientists. Here, we adopt the principles of Kuhnian

philosophy to construct a novel ontology aims at classifying and evaluating the impact of STM

scholarly articles. First, we explain how the Kuhnian cycle of science describes research at

different epistemic stages. Second, we show how the Kuhnian cycle could be reconstructed

into modular ontologies which classify scholarly articles according to their contribution to

paradigm-centred knowledge. The proposed ontology and its scenarios are discussed. To the

best of the authors’ knowledge, this is the first attempt for creating an ontology for describing

scholarly articles based on the Kuhnian paradigmatic view of science.

Keywords: Thomas Kuhn, Scholarly publishing, ontology, science, paradigm

INTRODUCTION

Kuhnian Paradigmatic View of Science

Thomas Kuhn’s seminal work The Structure of Scientific Revolutions1 described the

basic cycle through which natural science and its embedded human knowledge progress. The

cycle is shown in figure 1 and is explained with evident examples in numerous works2-4.

Figure 1. The Kuhnian cycle

mailto:[email protected]

2

The cycle begins by pre-science/pre-paradigm stage where the elements of a scientific

field is realized and partially described. The pre-paradigm stage defines main questions

proposed in a certain field and outlines the framework through which such questions could be

investigated. Then, when these questions are addressed through sufficient formal analyses, the

community realizes a paradigm which leads directly to the normal science stage. This paradigm

could be formed by one or more scientific achievement(s) which are acknowledged by the

community to describe the fundamentals of this particular field. The iterative research

endeavours within normal science stage define the paradigm more precisely and establish its

primary elements. These elements form what is called a model, which describes and defines

the accepted methodologies, induction criteria, correlation models and data/information

representations of this particular paradigm5. The scientific community often reach to a

consensus regarding such model, and it is directly reflected on the scholarly publishing industry

through peer-review and editorial policies of the journals. As much rigour as this procedure

proposes to science, in many scenarios it constitutes an obstacle towards its progress6-8.

When a specific paradigm faces one or more challenges that it cannot handle, it goes

into a model drift stage. To address the challenges facing the paradigm, the model must drift

from normal science. If such drift propagates, even slightly from its past normal science

achievements, the whole paradigm is said to be in a model drift stage. Then, the route of model

drift leads to model crisis. In the latter stage, the field’s paradigm becomes unable to lead to

verifiable knowledge to overcome the challenges facing the model. In this stage, the paradigm

is shattered by too many anomalies that have been revealed during the model drift stage. A

model crisis can be viewed as an evidence of the paradigm’s failure. This stage is characterized

by criticism of the existing model, its elements, and limitations. When one or more new models

(or model elements) emerge from the crisis, the paradigm undergoes a model revolution9. This

stage is always characterized by communal debates among scientists regarding the new model

(elements). The new contributions (in terms of models or model elements) then drive series of

revisions, critical analyses and reformulations of the paradigm and the model keeps drifting

from its original formalism. The results of a model revolution is the emergence of a paradigm

shift. The new model resulting from the revolution produces sufficient achievements to cover

the challenges posed by the old paradigm. Hence, a new paradigm emerges. The paradigm shift

is the process through which scientists’ understanding changes and new concepts, correlation

models and formalisms are formed. The new paradigm, then, routes to normal science and

closes the cycle, as schematically shown in figure 1.

Review of literature and problem statement

Few efforts have been undertaken in the past two decades to formulate ontology-based

classification systems for scholarly articles. Constantin et al10 proposed a document component

ontology (DoCo) to provide a general-purpose structured vocabulary of document elements to

describe document parts. Their work focused on the tagging and annotation of different

document parts based on predetermined key dictionary vocabulary. Elizarov et al11 proposed

system of services for the automatic processing of collections of scientific documents that are

part of digital libraries. Their work aimed mainly at checking the validity of documents for

compliance with manuscript guidelines, converting the documents into required formats and

3

generate their metadata. Senderov et al12 proposed OpenBioDiv-O; an ontology that serves as

the basis of the OpenBiodiv Knowledge Management System. A key feature of their ontology

is that it is an ontology of the scientific process of biological taxonomy and not of any particular

state of knowledge. Jaradeh et al13 and Auer et al14 argued that the document-centric workflow

of science has already become unsustainable and suggested to replace it with knowledge

graphs. A knowledge graph is a concept that has been proposed by Ehrlinger and Wöß15 and

defined as a graph that “acquires and integrates information in an ontology and applies a

reasoner to derive new knowledge”. The concept of knowledge graph is definitely

revolutionary and has great potential. However, it is very unlikely that the transition from the

document-based scholarly publishing to such new concept would be immediate and radical.

Therefore, the authors believe that it is imperative to develop the core concepts required by AI

to facilitate knowledge mining from scholarly articles. To the best of the authors’ knowdlge,

this is the first attempt to construct an ontology for STM research based on Kuhnian

philosophy. The novel ontology presented here is called K-Ontology and its primary function

is to serve as a representation platform through which STM scholarly articles can be classified

based on their role in the Kuhnian cycle of their field.

Hypothesis, objectives, scope and limitations

Our work is focused on STM research, paradigms and scholarly articles. The common

framework of STM research historically combines elements from empiricism, logical

positivism and scientific realism. Knowledge formalism in STM research require verification,

generalization, reasoning and hypothesis testing and validation. We argue that the progress of

thoughts and ideas in all STM disciplines is similar in the view of Kuhnian cycle. Therefore,

we argue that any STM scholarly article could be classified according to their role in the

Kuhnian cycle. Our objective is to develop an ontology that can enable such classification. The

scope of our work is focused on the contemporary formal structure of STM scholarly articles,

their metadata, and writing concepts. Besides being limited to STM articles, the novel ontology

proposed herein is limited to the consensual logic of reporting STM research output which

includes controlled experiments, mathematical formalism, statistical analyses, reasoning, and

scientific logic (i.e. induction, deduction, verification, validation…etc).

RECONSTRUCTING KUHNIAN CYCLE INTO MODULAR ONTOLOGIES

The first principle we propose here is to consider the Kuhnian cycle, in its original form,

as an ontology for STM research. Such ontology combines definitions, classes, objects and

correlations which describe the Kuhnian paradigmatic view of scientific progress. Figure 2

shows the general K-Ontology map which classifies the Kuhnian cycle to three interconnected

modular ontologies. The principles for reconstructing the Kuhnian cycle into the modular

ontologies shown in figure 2 are:

1- Any STM research article has the potential to state, affirm, extend or criticize an

existing paradigm.

2- Model drift and crisis are always associated with unanswered questions, new

arguments, uncertain observations and/or improved/new methods.

4

3- Model revolution is always associated by criticism of existing paradigm.

4- Paradigm change is always associated with new concepts, model correlations, the

emergence of new methods, new variables and new arguments.

Figure 2. Paradigm map of the novel K-Ontology showing the three modular ontologies which

can produce representations of the current status of a specific STM research article in the view

of Kuhnian cycle.

The modular ontologies aim principally at deconstructing the content of a STM

scholarly article to categorize its status in the Kuhnian cycle. For this purpose, we have

developed a set of reasoning scenarios to measure the role of a particular paper in the paradigm

of its field of research. These scenarios are schematically plotted in figure 3. Such scenarios

explain how the novelty and impact of a particular paper can be assessed based on its role in

the Kuhnian cycle. The scenarios also suggests that the invention of new methods and

production of new observations do not linearly nor immediately contribute to paradigm shifts.

The total number of these scenarios expressed as 𝐶3 =12!

(3!×9!)= 22012 possible scenarios. Each

of these scenarios must combine one element of each set (i.e. a method, an observation and a

conclusion regarding the model in question) to be a valid scenario. The modular ontologies

presented in the following sections aim at providing framework for classifying articles

according to these scenarios.

5

Figure 3. Scenarios to categorize the paradigmatic contribution of a specific paper in the K-

Ontology

Formalism modular ontology

The formalism modular ontology function is to build formalism of established

paradigms from scholarly articles. A class-level map representation of this modular ontology

is schematically shown in figure 4. This modular ontology has three different class types. The

first type is for the extrinsic components of the ontology (correlation model, previous work and

key arguments). The second type is for the intrinsic components (method, observation and

theory). The third type is for the objects of the first two types. The logical relations are

mathematical(∈), causal (leads to, limits), which requires a causality detection algorithm16-18,

and syntactic (verifies, comparison) which requires parsing and taxonomy algorithms19, 20. This

ontology focuses on the scenarios where a certain article affirms or extends an established

model. The number of such scenarios is calculated as 𝐶3 =8!

(3!×5!)= 568 scenarios. There are

18 valid scenarios of this ontology, which are detailed in supplementary table 1.

Figure 4. Schematic map of the formalism modular ontology

6

Model modular ntology

The function of this modular ontology is to detect model drift and crisis from articles.

A class-level map representation of this modular ontology is schematically shown in figure 5.

This ontology also contain three class types, similar to the formalism ontology. It allocates one

type to the model of existing paradigm and theory. The second type is allocated to the new and

previously conclusive arguments. And the third is allocated to the objects of the first two class

types. This ontology identifies a model by the observations, methods and correlation models

made throughout all relevant previous work. This ontology focuses on the scenarios where a

certain article questions or criticizes the validity of an established model in the face of new

observation or correlation. The number of such scenarios is calculated as 𝐶3 =7!

(3!×4!)= 357

scenarios. There are 12 valid scenarios which are detailed in supplementary table 1.

Figure 5. Schematic map of the model ontology. (a) new observation that criticizes or updates

observations reported in previous work (b) new method that provides different approach for

solving on of the field’s problems that have been already solved in previous work (c) new

correlation model that is different from existing correlation models in previous work

Paradigm shift modular ontology

This is the primary modular ontology of K-Ontology. The function of this modular ontology is

to clearly differentiate articles which drives paradigm shifts. A class-level schematic map is

shown in figure 6. This ontology has one class type for new paradigm and another class type

for the objects of the first type. We argue that a new paradigm can only be defined by at least

one new correlation model and one new theoretical argument. New observations are also

associated with new paradigms, however, they are not necessary to define it. It is possible,

especially in theoretical fields, to reach to a new paradigm through existing methods and

observations, such as in mathematics and computer science fields. This ontology focuses on

the scenarios where a certain article develops, introduces, verifies a new paradigm. The number

of such scenarios is calculated as 𝐶3 =8!

(3!×5!)= 568 scenarios. There are 18 valid scenarios of

this ontology, which are detailed in supplementary table 1.

7

Figure 6. Schematic map of the paradigm shift ontology

DISCUSSION

Scientists have been communicating their ideas, theories, methods and achievements

throughout scholarly articles since the 17th century. The epistemic framework of STM research,

as manifested in empiricism, logical positivism and scientific realism depends on rigorous

measures, criteria and procedures to present and understand scientific knowledge. These

measures, criteria and procedures are essential elements of a STM scholarly article, regardless

of its topic and discipline. The first step of any research in natural sciences is reviewing the

literature and previous work of relevance and importance to the research topic. Therefore, what

researchers need the most is to discover relevant literature as early as possible in their research

process. The principle elements of scholarly articles circulation among scientist communities

have radically changed with the emergence of the World Wide Web and rapidly developing

communications technology. The global scientific output, in terms of published peer-reviewed

articles, doubles every nine years1, with an annual total increase of STM papers exceeding 2.5

million articles2. In the meantime, natural science, both pure and applied, face new challenges

every day. Climate change, viral pandemics, poverty, and sustainability are just few examples

of such challenges. Now, the scientific community must find a way to make us capable of

investing the most possible potential of what is being published in peer-reviewed journals.

Scientists often look for the best (i.e. most rigorous) journals to read and to publish their

research in. Impact Factor is the primary method of evaluating the scientific merit of scholarly

journals, with new trends towards h-index and its other citation-based derivatives. In addition

to the severe criticism that was drawn to Impact Factor21-25, it is now generally accepted among

STM scientists not all citations are equal26, 27. The discourse on science disciplinarity plays an

important role in this regard. Citation-based bibliometric analysis and evaluation often skips

disciplines by neglecting it altogether or by normalizing the indices of one article/journal by

the total value of indices from its discipline. Disciplinary classification is always performed

based on keywords matching and meta-data tags and annotations. Therefore, paradigms are

only viewed within the limits of such classification.

The proposed K-Ontology unfolds research articles to reveal their role in the Kuhnian

cycle. If it becomes possible to envision the scientific merit of an article – or collection of

1 http://blogs.nature.com/news/2014/05/global-scientific-output-doubles-every-nine-years.html 2 Ware, M. and Mabe, M., The STM Report: an overview of scientific and scholarly journal publishing, 4th edition, 2015, International Association of Scientific, Technical and Medical Publishers

8

articles – based on the Kuhnian cycle, most of the problems of the citation-based bibliometric

scoring could be resolved. Identification and conception of an existing paradigm via well-

defined class types, classes and objects would empower scientists to identify the paradigm-

setting articles which they need for posing their problem statements. Correct understanding of

the concept of models and its differentiation from the more inclusive concept of paradigm

would provide means of evaluating an article contribution based on the different scenarios. The

strict criteria set for identifying new paradigms would provide new edges of endeavour in all

STM fields of research. The total possible scenarios resulting from the K-Ontology is 48

scenarios which principally describe 48 different levels of scientific merit. It must be noted,

however, that the reference for these levels is the Kuhnian cycle.

CONCLUSION AND FUTURE WORK

This is the first Kuhnian ontology to empower epistemic and paradigmatic classification of

STM scholarly articles. We propose K-Ontology which has three modular ontologies to detect

and classify existing paradigms, model revolutions and paradigm shifts. The proposed ontology

has 48 valid scenarios which can be detected by deducing the mathematical, logical and

syntactic correlations within and between the three modular ontologies. Our research in Sophio

Inc. focuses on the implementation of this ontology into a tool that can be used to conduct

epistemic evaluation of STM articles.

REFERENCES

1. Kuhn TS. The Structure of Scientific Revolutions. University of Chicago Press, 1970.

2. Arthur WB. The Nature of Technology: What It Is and How It Evolves. Free Press, 2009.

3. Richards RJ and Daston L. Kuhn's 'Structure of Scientific Revolutions' at Fifty:

Reflections on a Science Classic. University of Chicago Press, 2016.

4. Fuller S. Kuhn Vs Popper: The Struggle for the Soul of Science. Icon Books, 2006.

5. Giere RN. Philosophy of Science Naturalized. Philosophy of Science. 1985; 52: 331-56.

6. Osterloh M and Kieser A. Double-Blind Peer Review: How to Slaughter a Sacred Cow.

In: Welpe IM, Wollersheim J, Ringelhan S and Osterloh M, (eds.). Incentives and

Performance: Governance of Research Organizations. Cham: Springer International

Publishing, 2015, p. 307-21.

7. Jelicic M and Merckelbach H. Peer-review: Let's imitate the lawyers! Cortex. 2002; 38:

406-7.

8. Csiszar A. Peer review: Troubled from the start. Nature. 2016; 532: 306-8.

9. Fairchild D. Revolution and Cause: An Investigation of an Explanatory Concept. The

New Scholasticism. 1976; 50: 277-92.

10. Constantin A, Peroni S, Pettifer S, Shotton D and Vitali F. The document components

ontology (DoCO). Semantic web. 2016; 7: 167-81.

11. Elizarov A, Khaydarov S and Lipachev E. Scientific documents ontologies for semantic

representation of digital libraries. 2017 Second Russia and Pacific Conference on

Computer Technology and Applications (RPC). IEEE, 2017, p. 1-5.

12. Senderov V, Simov K, Franz N, et al. OpenBiodiv-O: ontology of the OpenBiodiv

knowledge management system. Journal of Biomedical Semantics. 2018; 9: 5.

9

13. Jaradeh MY, Auer S, Prinz M, Kovtun V, Kismihók G and Stocker M. Open research

knowledge graph: Towards machine actionability in scholarly communication. arXiv

preprint arXiv:190110816. 2019.

14. Auer S, Kovtun V, Prinz M, Kasprzik A, Stocker M and Vidal ME. Towards a knowledge

graph for science. Proceedings of the 8th International Conference on Web Intelligence,

Mining and Semantics. 2018, p. 1-6.

15. Ehrlinger L and Wöß W. Towards a Definition of Knowledge Graphs. Joint Proceedings

of the Posters and Demos Track of 12th International Conference on Semantic Systems

- SEMANTiCS2016 and 1st International Workshop on Semantic Change & Evolving

Semantics (SuCCESS16). Leipzig, Germany2016.

16. Girju R and Moldovan DI. Text mining for causal relations. FLAIRS conference. 2002,

p. 360-4.

17. Hlaváčková-Schindler K, Paluš M, Vejmelka M and Bhattacharya J. Causality detection

based on information-theoretic approaches in time series analysis. Physics Reports. 2007;

441: 1-46.

18. Dasgupta T, Saha R, Dey L and Naskar A. Automatic extraction of causal relations from

text using linguistically informed deep neural networks. Proceedings of the 19th Annual

SIGdial Meeting on Discourse and Dialogue. 2018, p. 306-16.

19. Miller S, Fox H, Ramshaw L and Weischedel R. A novel use of statistical parsing to

extract information from text. 1st Meeting of the North American Chapter of the

Association for Computational Linguistics. 2000.

20. Paukkeri M-S, García-Plaza AP, Fresno V, Unanue RM and Honkela T. Learning a

taxonomy from a set of text documents. Applied Soft Computing. 2012; 12: 1138-48.

21. Brumback RA. Worshiping false idols: the impact factor dilemma. Journal of child

neurology. 2008; 23: 365-7.

22. Ortner HM. The impact factor and other performance measures – much used with little

knowledge about. International Journal of Refractory Metals and Hard Materials. 2010;

28: 559-66.

23. Vanclay JK. Impact factor: Outdated artefact or stepping-stone to journal certification?

Scientometrics. 2012; 92: 211-38.

24. Coleman A. Assessing the value of a journal beyond the impact factor. Journal of the

American Society for Information Science and Technology. 2007; 58: 1148-61.

25. Hicks D, Wouters P, Waltman L, De Rijcke S and Rafols I. Bibliometrics: the Leiden

Manifesto for research metrics. Nature. 2015; 520: 429-31.

26. Zhu X, Turney P, Lemire D and Vellino A. Measuring academic influence: Not all

citations are equal. Journal of the Association for Information Science and Technology.

2015; 66: 408-27.

27. Glänzel W and Thijs B. Does co-authorship inflate the share of self-citations?

Scientometrics. 2004; 61: 395-404.

10

Supplementary table 1. Detailed possible scenarios of the modular ontologies

Formalism ontology scenarios Model ontology scenarios Paradigm shift ontology

M1 N1 P1 M1 N1 P2 M1 N2 P1 M1 N2 P2 M1 N3 P1 M1 N3 P2 M2 N1 P1 M2 N1 P2 M2 N2 P1 M2 N2 P2 M2 N3 P1 M2 N3 P2 M3 N1 P1 M3 N1 P2 M3 N2 P1 M3 N2 P2 M3 N3 P1 M3 N3 P2

M1 N2 P3 M1 N2 P4 M1 N3 P3 M1 N3 P4 M2 N2 P3 M2 N2 P4 M2 N3 P3 M2 N3 P4 M3 N2 P3 M3 N2 P4 M3 N3 P3 M3 N3 P4

M1 N1 P5 M1 N1 P6 M1 N2 P5 M1 N2 P6 M1 N3 P5 M1 N3 P6 M2 N1 P5 M2 N1 P6 M2 N2 P5 M2 N2 P6 M2 N3 P5 M2 N3 P6 M3 N1 P5 M3 N1 P6 M3 N2 P5 M3 N2 P6 M3 N3 P5 M3 N3 P6

A Novel Kuhnian Ontology for Epistemic Classification of ...

Documents

Transcript of A Novel Kuhnian Ontology for Epistemic Classification of ...