A Novel Kuhnian Ontology for Epistemic Classification of ...
Transcript of A Novel Kuhnian Ontology for Epistemic Classification of ...
1
A Novel Kuhnian Ontology for Epistemic Classification of STM
Scholarly Articles
Khalid M. Saqr and Abdelrahman Elsharawy
SOPHIO Research Inc., 267 11th avenue, New York, NY 10001 (P), U.S.A
Corresponding author: [email protected]
ABSTRACT
Thomas Kuhn proposed his paradigmatic view of scientific discovery five decades ago. The
concept of paradigm has not only explained the progress of science, but has also become the
central epistemic concept among STM scientists. Here, we adopt the principles of Kuhnian
philosophy to construct a novel ontology aims at classifying and evaluating the impact of STM
scholarly articles. First, we explain how the Kuhnian cycle of science describes research at
different epistemic stages. Second, we show how the Kuhnian cycle could be reconstructed
into modular ontologies which classify scholarly articles according to their contribution to
paradigm-centred knowledge. The proposed ontology and its scenarios are discussed. To the
best of the authors’ knowledge, this is the first attempt for creating an ontology for describing
scholarly articles based on the Kuhnian paradigmatic view of science.
Keywords: Thomas Kuhn, Scholarly publishing, ontology, science, paradigm
INTRODUCTION
Kuhnian Paradigmatic View of Science
Thomas Kuhn’s seminal work The Structure of Scientific Revolutions1 described the
basic cycle through which natural science and its embedded human knowledge progress. The
cycle is shown in figure 1 and is explained with evident examples in numerous works2-4.
Figure 1. The Kuhnian cycle
2
The cycle begins by pre-science/pre-paradigm stage where the elements of a scientific
field is realized and partially described. The pre-paradigm stage defines main questions
proposed in a certain field and outlines the framework through which such questions could be
investigated. Then, when these questions are addressed through sufficient formal analyses, the
community realizes a paradigm which leads directly to the normal science stage. This paradigm
could be formed by one or more scientific achievement(s) which are acknowledged by the
community to describe the fundamentals of this particular field. The iterative research
endeavours within normal science stage define the paradigm more precisely and establish its
primary elements. These elements form what is called a model, which describes and defines
the accepted methodologies, induction criteria, correlation models and data/information
representations of this particular paradigm5. The scientific community often reach to a
consensus regarding such model, and it is directly reflected on the scholarly publishing industry
through peer-review and editorial policies of the journals. As much rigour as this procedure
proposes to science, in many scenarios it constitutes an obstacle towards its progress6-8.
When a specific paradigm faces one or more challenges that it cannot handle, it goes
into a model drift stage. To address the challenges facing the paradigm, the model must drift
from normal science. If such drift propagates, even slightly from its past normal science
achievements, the whole paradigm is said to be in a model drift stage. Then, the route of model
drift leads to model crisis. In the latter stage, the field’s paradigm becomes unable to lead to
verifiable knowledge to overcome the challenges facing the model. In this stage, the paradigm
is shattered by too many anomalies that have been revealed during the model drift stage. A
model crisis can be viewed as an evidence of the paradigm’s failure. This stage is characterized
by criticism of the existing model, its elements, and limitations. When one or more new models
(or model elements) emerge from the crisis, the paradigm undergoes a model revolution9. This
stage is always characterized by communal debates among scientists regarding the new model
(elements). The new contributions (in terms of models or model elements) then drive series of
revisions, critical analyses and reformulations of the paradigm and the model keeps drifting
from its original formalism. The results of a model revolution is the emergence of a paradigm
shift. The new model resulting from the revolution produces sufficient achievements to cover
the challenges posed by the old paradigm. Hence, a new paradigm emerges. The paradigm shift
is the process through which scientists’ understanding changes and new concepts, correlation
models and formalisms are formed. The new paradigm, then, routes to normal science and
closes the cycle, as schematically shown in figure 1.
Review of literature and problem statement
Few efforts have been undertaken in the past two decades to formulate ontology-based
classification systems for scholarly articles. Constantin et al10 proposed a document component
ontology (DoCo) to provide a general-purpose structured vocabulary of document elements to
describe document parts. Their work focused on the tagging and annotation of different
document parts based on predetermined key dictionary vocabulary. Elizarov et al11 proposed
system of services for the automatic processing of collections of scientific documents that are
part of digital libraries. Their work aimed mainly at checking the validity of documents for
compliance with manuscript guidelines, converting the documents into required formats and
3
generate their metadata. Senderov et al12 proposed OpenBioDiv-O; an ontology that serves as
the basis of the OpenBiodiv Knowledge Management System. A key feature of their ontology
is that it is an ontology of the scientific process of biological taxonomy and not of any particular
state of knowledge. Jaradeh et al13 and Auer et al14 argued that the document-centric workflow
of science has already become unsustainable and suggested to replace it with knowledge
graphs. A knowledge graph is a concept that has been proposed by Ehrlinger and Wöß15 and
defined as a graph that “acquires and integrates information in an ontology and applies a
reasoner to derive new knowledge”. The concept of knowledge graph is definitely
revolutionary and has great potential. However, it is very unlikely that the transition from the
document-based scholarly publishing to such new concept would be immediate and radical.
Therefore, the authors believe that it is imperative to develop the core concepts required by AI
to facilitate knowledge mining from scholarly articles. To the best of the authors’ knowdlge,
this is the first attempt to construct an ontology for STM research based on Kuhnian
philosophy. The novel ontology presented here is called K-Ontology and its primary function
is to serve as a representation platform through which STM scholarly articles can be classified
based on their role in the Kuhnian cycle of their field.
Hypothesis, objectives, scope and limitations
Our work is focused on STM research, paradigms and scholarly articles. The common
framework of STM research historically combines elements from empiricism, logical
positivism and scientific realism. Knowledge formalism in STM research require verification,
generalization, reasoning and hypothesis testing and validation. We argue that the progress of
thoughts and ideas in all STM disciplines is similar in the view of Kuhnian cycle. Therefore,
we argue that any STM scholarly article could be classified according to their role in the
Kuhnian cycle. Our objective is to develop an ontology that can enable such classification. The
scope of our work is focused on the contemporary formal structure of STM scholarly articles,
their metadata, and writing concepts. Besides being limited to STM articles, the novel ontology
proposed herein is limited to the consensual logic of reporting STM research output which
includes controlled experiments, mathematical formalism, statistical analyses, reasoning, and
scientific logic (i.e. induction, deduction, verification, validation…etc).
RECONSTRUCTING KUHNIAN CYCLE INTO MODULAR ONTOLOGIES
The first principle we propose here is to consider the Kuhnian cycle, in its original form,
as an ontology for STM research. Such ontology combines definitions, classes, objects and
correlations which describe the Kuhnian paradigmatic view of scientific progress. Figure 2
shows the general K-Ontology map which classifies the Kuhnian cycle to three interconnected
modular ontologies. The principles for reconstructing the Kuhnian cycle into the modular
ontologies shown in figure 2 are:
1- Any STM research article has the potential to state, affirm, extend or criticize an
existing paradigm.
2- Model drift and crisis are always associated with unanswered questions, new
arguments, uncertain observations and/or improved/new methods.
4
3- Model revolution is always associated by criticism of existing paradigm.
4- Paradigm change is always associated with new concepts, model correlations, the
emergence of new methods, new variables and new arguments.
Figure 2. Paradigm map of the novel K-Ontology showing the three modular ontologies which
can produce representations of the current status of a specific STM research article in the view
of Kuhnian cycle.
The modular ontologies aim principally at deconstructing the content of a STM
scholarly article to categorize its status in the Kuhnian cycle. For this purpose, we have
developed a set of reasoning scenarios to measure the role of a particular paper in the paradigm
of its field of research. These scenarios are schematically plotted in figure 3. Such scenarios
explain how the novelty and impact of a particular paper can be assessed based on its role in
the Kuhnian cycle. The scenarios also suggests that the invention of new methods and
production of new observations do not linearly nor immediately contribute to paradigm shifts.
The total number of these scenarios expressed as 𝐶3 =12!
(3!×9!)= 22012 possible scenarios. Each
of these scenarios must combine one element of each set (i.e. a method, an observation and a
conclusion regarding the model in question) to be a valid scenario. The modular ontologies
presented in the following sections aim at providing framework for classifying articles
according to these scenarios.
5
Figure 3. Scenarios to categorize the paradigmatic contribution of a specific paper in the K-
Ontology
Formalism modular ontology
The formalism modular ontology function is to build formalism of established
paradigms from scholarly articles. A class-level map representation of this modular ontology
is schematically shown in figure 4. This modular ontology has three different class types. The
first type is for the extrinsic components of the ontology (correlation model, previous work and
key arguments). The second type is for the intrinsic components (method, observation and
theory). The third type is for the objects of the first two types. The logical relations are
mathematical(∈), causal (leads to, limits), which requires a causality detection algorithm16-18,
and syntactic (verifies, comparison) which requires parsing and taxonomy algorithms19, 20. This
ontology focuses on the scenarios where a certain article affirms or extends an established
model. The number of such scenarios is calculated as 𝐶3 =8!
(3!×5!)= 568 scenarios. There are
18 valid scenarios of this ontology, which are detailed in supplementary table 1.
Figure 4. Schematic map of the formalism modular ontology
6
Model modular ntology
The function of this modular ontology is to detect model drift and crisis from articles.
A class-level map representation of this modular ontology is schematically shown in figure 5.
This ontology also contain three class types, similar to the formalism ontology. It allocates one
type to the model of existing paradigm and theory. The second type is allocated to the new and
previously conclusive arguments. And the third is allocated to the objects of the first two class
types. This ontology identifies a model by the observations, methods and correlation models
made throughout all relevant previous work. This ontology focuses on the scenarios where a
certain article questions or criticizes the validity of an established model in the face of new
observation or correlation. The number of such scenarios is calculated as 𝐶3 =7!
(3!×4!)= 357
scenarios. There are 12 valid scenarios which are detailed in supplementary table 1.
Figure 5. Schematic map of the model ontology. (a) new observation that criticizes or updates
observations reported in previous work (b) new method that provides different approach for
solving on of the field’s problems that have been already solved in previous work (c) new
correlation model that is different from existing correlation models in previous work
Paradigm shift modular ontology
This is the primary modular ontology of K-Ontology. The function of this modular ontology is
to clearly differentiate articles which drives paradigm shifts. A class-level schematic map is
shown in figure 6. This ontology has one class type for new paradigm and another class type
for the objects of the first type. We argue that a new paradigm can only be defined by at least
one new correlation model and one new theoretical argument. New observations are also
associated with new paradigms, however, they are not necessary to define it. It is possible,
especially in theoretical fields, to reach to a new paradigm through existing methods and
observations, such as in mathematics and computer science fields. This ontology focuses on
the scenarios where a certain article develops, introduces, verifies a new paradigm. The number
of such scenarios is calculated as 𝐶3 =8!
(3!×5!)= 568 scenarios. There are 18 valid scenarios of
this ontology, which are detailed in supplementary table 1.
7
Figure 6. Schematic map of the paradigm shift ontology
DISCUSSION
Scientists have been communicating their ideas, theories, methods and achievements
throughout scholarly articles since the 17th century. The epistemic framework of STM research,
as manifested in empiricism, logical positivism and scientific realism depends on rigorous
measures, criteria and procedures to present and understand scientific knowledge. These
measures, criteria and procedures are essential elements of a STM scholarly article, regardless
of its topic and discipline. The first step of any research in natural sciences is reviewing the
literature and previous work of relevance and importance to the research topic. Therefore, what
researchers need the most is to discover relevant literature as early as possible in their research
process. The principle elements of scholarly articles circulation among scientist communities
have radically changed with the emergence of the World Wide Web and rapidly developing
communications technology. The global scientific output, in terms of published peer-reviewed
articles, doubles every nine years1, with an annual total increase of STM papers exceeding 2.5
million articles2. In the meantime, natural science, both pure and applied, face new challenges
every day. Climate change, viral pandemics, poverty, and sustainability are just few examples
of such challenges. Now, the scientific community must find a way to make us capable of
investing the most possible potential of what is being published in peer-reviewed journals.
Scientists often look for the best (i.e. most rigorous) journals to read and to publish their
research in. Impact Factor is the primary method of evaluating the scientific merit of scholarly
journals, with new trends towards h-index and its other citation-based derivatives. In addition
to the severe criticism that was drawn to Impact Factor21-25, it is now generally accepted among
STM scientists not all citations are equal26, 27. The discourse on science disciplinarity plays an
important role in this regard. Citation-based bibliometric analysis and evaluation often skips
disciplines by neglecting it altogether or by normalizing the indices of one article/journal by
the total value of indices from its discipline. Disciplinary classification is always performed
based on keywords matching and meta-data tags and annotations. Therefore, paradigms are
only viewed within the limits of such classification.
The proposed K-Ontology unfolds research articles to reveal their role in the Kuhnian
cycle. If it becomes possible to envision the scientific merit of an article – or collection of
1 http://blogs.nature.com/news/2014/05/global-scientific-output-doubles-every-nine-years.html 2 Ware, M. and Mabe, M., The STM Report: an overview of scientific and scholarly journal publishing, 4th edition, 2015, International Association of Scientific, Technical and Medical Publishers
8
articles – based on the Kuhnian cycle, most of the problems of the citation-based bibliometric
scoring could be resolved. Identification and conception of an existing paradigm via well-
defined class types, classes and objects would empower scientists to identify the paradigm-
setting articles which they need for posing their problem statements. Correct understanding of
the concept of models and its differentiation from the more inclusive concept of paradigm
would provide means of evaluating an article contribution based on the different scenarios. The
strict criteria set for identifying new paradigms would provide new edges of endeavour in all
STM fields of research. The total possible scenarios resulting from the K-Ontology is 48
scenarios which principally describe 48 different levels of scientific merit. It must be noted,
however, that the reference for these levels is the Kuhnian cycle.
CONCLUSION AND FUTURE WORK
This is the first Kuhnian ontology to empower epistemic and paradigmatic classification of
STM scholarly articles. We propose K-Ontology which has three modular ontologies to detect
and classify existing paradigms, model revolutions and paradigm shifts. The proposed ontology
has 48 valid scenarios which can be detected by deducing the mathematical, logical and
syntactic correlations within and between the three modular ontologies. Our research in Sophio
Inc. focuses on the implementation of this ontology into a tool that can be used to conduct
epistemic evaluation of STM articles.
REFERENCES
1. Kuhn TS. The Structure of Scientific Revolutions. University of Chicago Press, 1970.
2. Arthur WB. The Nature of Technology: What It Is and How It Evolves. Free Press, 2009.
3. Richards RJ and Daston L. Kuhn's 'Structure of Scientific Revolutions' at Fifty:
Reflections on a Science Classic. University of Chicago Press, 2016.
4. Fuller S. Kuhn Vs Popper: The Struggle for the Soul of Science. Icon Books, 2006.
5. Giere RN. Philosophy of Science Naturalized. Philosophy of Science. 1985; 52: 331-56.
6. Osterloh M and Kieser A. Double-Blind Peer Review: How to Slaughter a Sacred Cow.
In: Welpe IM, Wollersheim J, Ringelhan S and Osterloh M, (eds.). Incentives and
Performance: Governance of Research Organizations. Cham: Springer International
Publishing, 2015, p. 307-21.
7. Jelicic M and Merckelbach H. Peer-review: Let's imitate the lawyers! Cortex. 2002; 38:
406-7.
8. Csiszar A. Peer review: Troubled from the start. Nature. 2016; 532: 306-8.
9. Fairchild D. Revolution and Cause: An Investigation of an Explanatory Concept. The
New Scholasticism. 1976; 50: 277-92.
10. Constantin A, Peroni S, Pettifer S, Shotton D and Vitali F. The document components
ontology (DoCO). Semantic web. 2016; 7: 167-81.
11. Elizarov A, Khaydarov S and Lipachev E. Scientific documents ontologies for semantic
representation of digital libraries. 2017 Second Russia and Pacific Conference on
Computer Technology and Applications (RPC). IEEE, 2017, p. 1-5.
12. Senderov V, Simov K, Franz N, et al. OpenBiodiv-O: ontology of the OpenBiodiv
knowledge management system. Journal of Biomedical Semantics. 2018; 9: 5.
9
13. Jaradeh MY, Auer S, Prinz M, Kovtun V, Kismihók G and Stocker M. Open research
knowledge graph: Towards machine actionability in scholarly communication. arXiv
preprint arXiv:190110816. 2019.
14. Auer S, Kovtun V, Prinz M, Kasprzik A, Stocker M and Vidal ME. Towards a knowledge
graph for science. Proceedings of the 8th International Conference on Web Intelligence,
Mining and Semantics. 2018, p. 1-6.
15. Ehrlinger L and Wöß W. Towards a Definition of Knowledge Graphs. Joint Proceedings
of the Posters and Demos Track of 12th International Conference on Semantic Systems
- SEMANTiCS2016 and 1st International Workshop on Semantic Change & Evolving
Semantics (SuCCESS16). Leipzig, Germany2016.
16. Girju R and Moldovan DI. Text mining for causal relations. FLAIRS conference. 2002,
p. 360-4.
17. Hlaváčková-Schindler K, Paluš M, Vejmelka M and Bhattacharya J. Causality detection
based on information-theoretic approaches in time series analysis. Physics Reports. 2007;
441: 1-46.
18. Dasgupta T, Saha R, Dey L and Naskar A. Automatic extraction of causal relations from
text using linguistically informed deep neural networks. Proceedings of the 19th Annual
SIGdial Meeting on Discourse and Dialogue. 2018, p. 306-16.
19. Miller S, Fox H, Ramshaw L and Weischedel R. A novel use of statistical parsing to
extract information from text. 1st Meeting of the North American Chapter of the
Association for Computational Linguistics. 2000.
20. Paukkeri M-S, García-Plaza AP, Fresno V, Unanue RM and Honkela T. Learning a
taxonomy from a set of text documents. Applied Soft Computing. 2012; 12: 1138-48.
21. Brumback RA. Worshiping false idols: the impact factor dilemma. Journal of child
neurology. 2008; 23: 365-7.
22. Ortner HM. The impact factor and other performance measures – much used with little
knowledge about. International Journal of Refractory Metals and Hard Materials. 2010;
28: 559-66.
23. Vanclay JK. Impact factor: Outdated artefact or stepping-stone to journal certification?
Scientometrics. 2012; 92: 211-38.
24. Coleman A. Assessing the value of a journal beyond the impact factor. Journal of the
American Society for Information Science and Technology. 2007; 58: 1148-61.
25. Hicks D, Wouters P, Waltman L, De Rijcke S and Rafols I. Bibliometrics: the Leiden
Manifesto for research metrics. Nature. 2015; 520: 429-31.
26. Zhu X, Turney P, Lemire D and Vellino A. Measuring academic influence: Not all
citations are equal. Journal of the Association for Information Science and Technology.
2015; 66: 408-27.
27. Glänzel W and Thijs B. Does co-authorship inflate the share of self-citations?
Scientometrics. 2004; 61: 395-404.
10
Supplementary table 1. Detailed possible scenarios of the modular ontologies
Formalism ontology scenarios Model ontology scenarios Paradigm shift ontology
M1 N1 P1 M1 N1 P2 M1 N2 P1 M1 N2 P2 M1 N3 P1 M1 N3 P2 M2 N1 P1 M2 N1 P2 M2 N2 P1 M2 N2 P2 M2 N3 P1 M2 N3 P2 M3 N1 P1 M3 N1 P2 M3 N2 P1 M3 N2 P2 M3 N3 P1 M3 N3 P2
M1 N2 P3 M1 N2 P4 M1 N3 P3 M1 N3 P4 M2 N2 P3 M2 N2 P4 M2 N3 P3 M2 N3 P4 M3 N2 P3 M3 N2 P4 M3 N3 P3 M3 N3 P4
M1 N1 P5 M1 N1 P6 M1 N2 P5 M1 N2 P6 M1 N3 P5 M1 N3 P6 M2 N1 P5 M2 N1 P6 M2 N2 P5 M2 N2 P6 M2 N3 P5 M2 N3 P6 M3 N1 P5 M3 N1 P6 M3 N2 P5 M3 N2 P6 M3 N3 P5 M3 N3 P6