E N I D - Roma, September 7th-9th, 2011 1 Upcoming concepts in a specific scientific discipline: an...

21
E N I D - Roma, September 7th-9th, 2011 1 Upcoming concepts in a specific scientific discipline: an analysis based on a categorisation of the related terminology Marianne Hörlesberger Beatrix Wepner Edgar Schiebel Ivana Roche Christine Louala Claire François Nathalie Antonot Dominique Besagni Upcoming concepts in a specific scientific discipline: an analysis based on a categorisation of the related terminology Georg Vorlaufer

Transcript of E N I D - Roma, September 7th-9th, 2011 1 Upcoming concepts in a specific scientific discipline: an...

Page 1: E N I D - Roma, September 7th-9th, 2011 1 Upcoming concepts in a specific scientific discipline: an analysis based on a categorisation of the related terminology.

E N I D - Roma, September 7th-9th, 2011

1

Upcoming concepts in a specific scientific discipline:

an analysis based on a categorisation of the related terminology

Marianne HörlesbergerBeatrix WepnerEdgar Schiebel

Ivana RocheChristine LoualaClaire FrançoisNathalie AntonotDominique Besagni

Upcoming concepts in a specific scientific discipline:

an analysis based on a categorisation of the related terminology

Georg Vorlaufer

Page 2: E N I D - Roma, September 7th-9th, 2011 1 Upcoming concepts in a specific scientific discipline: an analysis based on a categorisation of the related terminology.

E N I D - Roma, September 7th-9th, 2011

2

Summary

• Objective• Data collection• Methodology

– Statistical evaluation– Diffusion model– Field indicators

• Discussion & Perspectives

Page 3: E N I D - Roma, September 7th-9th, 2011 1 Upcoming concepts in a specific scientific discipline: an analysis based on a categorisation of the related terminology.

E N I D - Roma, September 7th-9th, 2011

3

Objective

Produce a methodology allowing the analysis of the evolution of a specific scientific domainby means of studyingits terminologyextracted from related specialized international literature

Page 4: E N I D - Roma, September 7th-9th, 2011 1 Upcoming concepts in a specific scientific discipline: an analysis based on a categorisation of the related terminology.

E N I D - Roma, September 7th-9th, 2011

4

Data collection

• Data source: PASCAL database• Period: 2001 to 2011

corpus: 19,090 bibliographic references

136 identified fields

19 final fields

examination of classification categories

analysis by scientific experts

representing the whole Tribology domain(without lost of information)

TRIBOLOGYFields applyingTribology issues

Fields facilitating newissues in Tribology

Tribology domain

Page 5: E N I D - Roma, September 7th-9th, 2011 1 Upcoming concepts in a specific scientific discipline: an analysis based on a categorisation of the related terminology.

E N I D - Roma, September 7th-9th, 2011

5

Statistical evaluation of the fields (1/2)

Selection, among the 19 defined fields, of the most dynamic fields showing a steady and consistent development over time study of the evolution of their annual productivity

• by defining 2 growth indexes giving a straightforward comparison of the two endpoints of the period under observation, namely, 2001 to 2011 and 2001 to 2005

• by calculating the annual average growth rate taking yearly changes into account

• by introducing the Sharpe Ratio taking into account the growth homogeneity or stability, conversely of the two growth indicators

production of a composite indicator from the ranking of each indicator equally weighted meta-ranking

Page 6: E N I D - Roma, September 7th-9th, 2011 1 Upcoming concepts in a specific scientific discipline: an analysis based on a categorisation of the related terminology.

E N I D - Roma, September 7th-9th, 2011

6

Statistical evaluation of the fields (2/2)

the analysis of these results by a scientific expert did not generate the suppression of any field

Composite MetaField ID Field name indicator ranking

13 General mechanical engineering and machine design 16 19 Polymers 24 2

10 Metals: production techniques and joining 27 34 Mechanics of solids. Solid Earth physics 29 4

11 Corrosion 36 56 Chemistry 38 65 Material science 38 7

15 Drives 38 87 Energy. Electric power engineering 39 9

14 Machine components. Friction, wear, lubrication 39 102 Metrology 40 11

17 Precision engineering 40 1212 Metals: mechanical properties, tribology 43 1318 Buildings. public works. Transportation 46 143 Condensed matter 46 15

19 Biological and medical sciences 50 1616 Engines. Pumps. Steel design 53 171 General physics 58 188 Electronics. Information theory 60 19

Page 7: E N I D - Roma, September 7th-9th, 2011 1 Upcoming concepts in a specific scientific discipline: an analysis based on a categorisation of the related terminology.

E N I D - Roma, September 7th-9th, 2011

7

Complementary assessment by clustering

Tribological properties of materials

Industrial applications

Lubricants

Page 8: E N I D - Roma, September 7th-9th, 2011 1 Upcoming concepts in a specific scientific discipline: an analysis based on a categorisation of the related terminology.

E N I D - Roma, September 7th-9th, 2011

8

Diffusion model – An heuristic approach (1/2)

In-depth analysis of the evolution of the fields to evaluate the term status in a considered field by

measuring its degree of diffusion

Stage 1: unusual terms index few publications new terms and imported terms well-known in other fields form a set of strongly exotic termsStage 2: established terms can begin to diffuse to other fields the number of occurrences begin to growStage 3: cross-section terms, highly established, show a broad diffusion in other fields. This is the stage with the highest maturity we have a heavily growing number of occurrences Stage 4: a new paradigm occurs and the number of publications about the “old” invention declines

No

of b

iblio

grap

hic

refe

renc

es

Time

Time

Nu

mb

er o

f A

rticles

Stage 1Stage 2

Stage 3 Stage 4

No

of b

iblio

grap

hic

refe

renc

es

Time

Time

Nu

mb

er o

f A

rticles

Stage 1Stage 2

Stage 3 Stage 4

Page 9: E N I D - Roma, September 7th-9th, 2011 1 Upcoming concepts in a specific scientific discipline: an analysis based on a categorisation of the related terminology.

E N I D - Roma, September 7th-9th, 2011

9

Diffusion model – An heuristic approach (2/2)

• definition of Home Technology terms (HT terms)– keywords which are specific for a field occur with a higher

probability in that field rather than in others– the probability is defined as a relative term frequency

(rtfField) • the frequency of one term in a field divided by the number of

bibliographic references in this field– for each term, after calculating its rtfField in each field,

those with the highest probability is declared to be its Home Technology

In fine: each field gets the list of its HT terms, and each term gets a Home Technology

• utilization of the Gini index as a measure of the dispersion of a term in a scientific domain:– Gini index = 0 means a completely uniform distribution and

indicates that the term occurs in all the 19 considered fields of the Tribology domain

– Gini index = 1 tells us that the term is very specifically limited to the only field where it appears

Page 10: E N I D - Roma, September 7th-9th, 2011 1 Upcoming concepts in a specific scientific discipline: an analysis based on a categorisation of the related terminology.

E N I D - Roma, September 7th-9th, 2011

10

Categorizing the HT terms

For each field, its HT terms are classified into 4 categories:

• the terms occurring once– analyzed by introducing a diachronic approach allowing

distinguishing the “old” concepts appearing in the beginning of the studied period from the very “new” ones appearing in its latest years.

• the remaining terms whose Gini index is lower or equal to a fixed threshold, considered as cross-section

• the remaining terms whose relative term frequency in the field (rtfField) is greater than a fixed threshold, considered as established

• the last terms, considered as unusual

Page 11: E N I D - Roma, September 7th-9th, 2011 1 Upcoming concepts in a specific scientific discipline: an analysis based on a categorisation of the related terminology.

E N I D - Roma, September 7th-9th, 2011

11

Results: HT terms occurring once by field

Number %01-General physics 214 10 5%02-Metrology 129 22 17%03-Condensed matter 232 22 9%04-Mechanics of solids 89 20 22%05-Material sciences 176 3 2%06-Chemistry 253 40 16%07-Energy. Electrical power engineering 247 33 13%08-Electronics. Information theory 136 7 5%09-Polymers 199 20 10%10-Metals: production techniques and joining 164 10 6%11-Corrosion 133 17 13%12-Metals: mechanical properties, tribology 98 4 4%13-General mechanical engineering and machine design 86 8 9%14-Machine components. Friction, wear, lubrication 162 11 7%15-Drives 56 8 14%16-Engines. Pumps. Steel design 109 4 4%17-Precision engineering 128 20 16%18-Buildings. Public works. Transportation 337 25 7%19-Biological and medical sciences 253 22 9%

Mono-occurrential HT terms by field

Appearing in 2010-2011Total Nb

ID number & Name of fields

Three fields have the highest rates of HT terms occurring once in the two last years of the considered period: “Mechanics of solids”, “Metrology” and “Precision engineering”

Page 12: E N I D - Roma, September 7th-9th, 2011 1 Upcoming concepts in a specific scientific discipline: an analysis based on a categorisation of the related terminology.

E N I D - Roma, September 7th-9th, 2011

12

Results: HT terms occurring once in « Mechanics of solids » by year

10

2

11

6

11 11

9

45

19

1

0

2

4

6

8

10

12

14

16

18

20

2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011

Page 13: E N I D - Roma, September 7th-9th, 2011 1 Upcoming concepts in a specific scientific discipline: an analysis based on a categorisation of the related terminology.

E N I D - Roma, September 7th-9th, 2011

13

HT terms of « Precision engineering » by category

Year

Electronic component 2011 Spherical bearing Silicon AssemblyThermodynamic analysis 2011 Knudsen number Rough surface QualityAtomization 2010 Involute tooth Microelectromechanical device ManufacturingCastor oil 2010 Limit cycle Gas bearing Hostile environmentCyclic copolymer 2010 Differential pressure Wafer Array

Deformation mode2010

Dissimilar materialsExternally pressurized gas lubrication

Aspect ratio

Elasticity theory 2010 Forming limit curve Micromachine Manufacturing processLinear machine 2010 Stochastic process Single crystal DesignNanodot 2010 Ultrasonic machining Precision engineering FlexibilityNanoplatelet 2010 Ball screw Polycrystal Thin sheetNon destructive method 2010 Turbo pump Electrical discharge machining SuspensionOlefin copolymer 2010 Chemical etching Electromechanical device Surface cleaningPlate electrode 2010 Potassium Relative humidity Production processSAN 2010 Gaussian distribution Mechanical polishing StartingSelective etching 2010 Copper alloy Stiction PositioningSheet electrode 2010 Axial speed Diamond tool ReproducibilityShock wave 2010 Surface fatigue Polysilicon GaPStepping motor 2010 Adhesion work Chemical polishing Process controlSynchronous motor 2010 Crystal face Hydrostatic bearings RepeatabilityWater corrosion 2010 Damper Theoretical model Plate

HT terms in the field by categoryMono-occurential

Unusual Established Cross-section

Page 14: E N I D - Roma, September 7th-9th, 2011 1 Upcoming concepts in a specific scientific discipline: an analysis based on a categorisation of the related terminology.

E N I D - Roma, September 7th-9th, 2011

14

Fields’ terminology indicators (1/3 )

To characterize the terminology used for each field, we define a set of indicators:

dealing with strictly local characteristics of the field terminology

• diversity, defined as the ratio between the number of indexing keywords of its bibliographic references and its productivity. The higher this value, the more diverse the terms used

• specificity, corresponding to the proportion of its HT terms with respect to the number of indexing keywords of its bibliographic references. The lower this value, the bigger the part of the field terminology coming from abroad

Page 15: E N I D - Roma, September 7th-9th, 2011 1 Upcoming concepts in a specific scientific discipline: an analysis based on a categorisation of the related terminology.

E N I D - Roma, September 7th-9th, 2011

15

Fields’ terminology indicators (2/3 )

taking into account the field’s term exchanges with the set of the others fields

• singularity, defined as the proportion of its HT terms without any occurrence in the other fields. The weaker this value, the lower the number of these so-called lonesome terms, corresponding to HT terms neither exported nor imported, occurring exclusively in the field

• normalized trade balance, giving a measure of the balance between the HT terms exported by the field and the terms imported from abroad. The range of values goes from -1 (the field exports none of its HT terms) to 1 (the field imports no term) and a zero value means a trade balance in perfect equilibrium

Page 16: E N I D - Roma, September 7th-9th, 2011 1 Upcoming concepts in a specific scientific discipline: an analysis based on a categorisation of the related terminology.

E N I D - Roma, September 7th-9th, 2011

16

Fields’ terminology indicators (3/3 )

• diffusion capacity, given by the average of the Gini index of all the HT terms. An average Gini index equal to 0 means a completely uniform distribution of all HT terms and indicates that each HT term occurs in all the other fields. An average Gini index of 1 tells us that all HT terms are very specific and occur only in this field

• impact, defined as a Hirsch index. We say that a field has an h-index of X if X of its HT terms are imported by at least X other fields. These X keywords form the h-core list of the field. We can then consider, by analogy, that this Hirsch index can help appraise the influence of the considered field on the others

Page 17: E N I D - Roma, September 7th-9th, 2011 1 Upcoming concepts in a specific scientific discipline: an analysis based on a categorisation of the related terminology.

E N I D - Roma, September 7th-9th, 2011

17

Results: Trade balance indicator ID number & Field name Balance Rank06-Chemistry -0,34 118-Buildings. Public works. Transportation -0,49 211-Corrosion -0,52 316-Engines. Pumps. Steel design -0,55 4

17-Precision engineering -0,57 5

09-Polymers -0,68 607-Energy. Electrical power engineering -0,7 713-General mechanical engineering and machine design -0,71 8

19-Biological and medical sciences -0,72 908-Electronics. Information theory -0,76 1002-Metrology -0,78 1103-Condensed matter -0,78 1210-Metals: production techniques and joining -0,79 1315-Drives -0,79 1401-General physics -0,8 1512-Metals: mechanical properties, tribology -0,85 16

04-Mechanics of solids -0,86 1705-Material sciences -0,89 1814-Machine components. Friction, wear, lubrication -0,99 19

- all the values of the trade balance indicator are negative this means that all fields import more terms than they export- the calculation of the number of exported terms does not consider how many times each term is exported by taking into account the number of exportations of each term in the calculation, the half of fields get a positive value of the trade balance

range = [-1, 1]

Page 18: E N I D - Roma, September 7th-9th, 2011 1 Upcoming concepts in a specific scientific discipline: an analysis based on a categorisation of the related terminology.

E N I D - Roma, September 7th-9th, 2011

18

Results: h-index ID number & Field name h-index Rank04-Mechanics of solids 18 1

09-Polymers18 1

10-Metals: production techniques and joining 18 111-Corrosion 18 113-General mechanical engineering and machine design 18 115-Drives 18 1

16-Engines. Pumps. Steel design 18 1

17-Precision engineering 18 1

02-Metrology 17 9

03-Condensed matter 17 9

06-Chemistry 17 908-Electronics. Information theory 17 912-Metals: mechanical properties, tribology 17 918-Buildings. Public works. Transportation 17 919-Biological and medical sciences 17 9

05-Material sciences 16 16

07-Energy. Electrical power engineering16 16

01-General physics 15 1814-Machine components. Friction, wear, lubrication 6 19

- eight fields get the maximum possible value of the h-index, namely 18 this means that they export 18 of their HT terms to all the other 18 fields considered in the study - the value of the average Gini index allows to qualify this observation, for instance “Precision engineering” has an average Gini index at 0.66, meaning that this field exports its terms more uniformly than “Polymers” whose average Gini index is 0.84

average Gini=0.84

average Gini=0.66

Page 19: E N I D - Roma, September 7th-9th, 2011 1 Upcoming concepts in a specific scientific discipline: an analysis based on a categorisation of the related terminology.

E N I D - Roma, September 7th-9th, 2011

19

Results: Singularity indicator

18-Buildings. Public works. Transportation 0.175 1

19-Biological and medical sciences 0.150 2

07-Energy. Electrical power engineering 0.050 3

09-Polymers 0.046 414-Machine components. Friction, wear, lubrication

0.039 5

05-Material sciences 0.037 601-General physics 0.034 703-Condensed matter 0.031 8

08-Electronics. Information theory 0.024 9

10-Metals: production techniques and joining

0.022 10

12-Metals: mechanical properties, tribology 0.016 11

06-Chemistry 0.015 1204-Mechanics of solids 0.010 1315-Drives 0.007 1402-Metrology 0.006 15

13-General mechanical engineering and machine design

0.004 16

16-Engines. Pumps. Steel design 0.003 17

17-Precision engineering 0.003 1811-Corrosion 0.002 19

RankID number & Field name Singularity

- the singularity indicator gives the proportion of lonesome HT terms the high values of the singularity indicator got by “Buildings. Public works. Transportation” and “Biological and medical sciences” give an indication of their quite “independent” character, in the context of this study these fields are definitively located downstream in our Tribology defined domain and are thus considered applied fields

range = [0, 1]

Page 20: E N I D - Roma, September 7th-9th, 2011 1 Upcoming concepts in a specific scientific discipline: an analysis based on a categorisation of the related terminology.

E N I D - Roma, September 7th-9th, 2011

20

Discussion & Perspectives

• The produced results are available on a web server to facilitate their assessment by scientific experts from the AC2T by providing– direct link from a term to the related bibliographic reference (s)

allowing thus the contextualization of the terminological information

• The AC2T in-depth analysis of the results: a “virtuous circle”– participates to a better understanding of the evolution of the

studied domain and its relationships in a multidisciplinary context– allows verifying if our approaches are complementary– helps assessing and improving our methodology and generates

new developments based on real needs• Some improvements are in study:

– developing a step of assisted terminological extraction previously to the indexation in order to better represent the very recent concepts not yet introduced in our terminological reference tables

– extending the diachronic analysis to the terms occurring more than once in a field

– adding an a priori categorization of the terms known to belong, without doubt, to the “core” field

– adding a stop list of contextual “general science” terminology

Page 21: E N I D - Roma, September 7th-9th, 2011 1 Upcoming concepts in a specific scientific discipline: an analysis based on a categorisation of the related terminology.

E N I D - Roma, September 7th-9th, 2011

21

Thank yououy knahT

[ivana.roche; christine.louala; nathalie.antonot; claire.francois; dominique.besagni]@inist.fr[marianne.horlesberger; beatrix.wepner; edgar.schiebel]@ait.ac.at

[email protected]