Vermelding onderdeel organisatie September 18, 2015 1 Literature Search...
-
Upload
isaac-cameron -
Category
Documents
-
view
217 -
download
1
Transcript of Vermelding onderdeel organisatie September 18, 2015 1 Literature Search...
Vermelding onderdeel organisatie
April 21, 2023
1
Literature Searchhttp://www.pds.ewi.tudelft.nl/~iosup/Courses/2012_aiosup_lit_search.ppt
IN 3305
Alexandru Iosup. Initial slides by Tomas Klos. Course manager: Peter van Nieuwenhuizen.
Parallel and Distributed Systems Groephttp://www.pds.ewi.tudelft.nl/
Literature Surveys: At the Core of InnovationGiven a problem (topic of interest)Answer questions about it
• What solutions exist?• What is the most influential solution?• What is the rate of innovation in the field?
By surveying (understanding, interpreting, and summarizing) the body of related (scientific) knowledge.• Where and how can I innovate?
IN3305’s study goal “kennismaken met wetenschappelijke literatuur”
Innovation is a Vital Competitive Tool
• Innovation = novel application of knowledge• Innovation favors small (but efficient) countries• High-tech companies tend to be more innovation-intensive
Source: Economist Intelligence Unit, A new ranking of the world’s most innovative countries, April 2009, http://graphics.eiu.com/PDF/Cisco_Innovation_Complete.pdf
What is Novel?The Overwhelming Growth of Knowledge“When 12 men founded the
Royal Society in 1660, it was possible for an educated person to encompass all of scientific knowledge. […] In the last 50 years, such has been the pace of scientific advance that even the best scientists cannot keep up with discoveries at frontiers outside their own field.” Tony Blair, PM Speech, May 2002
19972001
19931997
Number of Publicatio
ns
Data: King,The scientific impact of nations,Nature’04.
The “Size” of a Research Topic
• Grid Computing• Billions of $ in research investment• 2,500 PhDs (my est.)• Over 15,000 scientific publications (my est.) in 15
years• Several surveys of 100-200 articles each
• Grid Scheduling• Conferences: Grid, CCGrid, HPDC, SC, IPDPS, ICDCS, …• Journals: TPDS, CCPE, FGCS, JoGC, …
• Peer-to-Peer Search Methods• Survey of over 300 articles after 5 years of research
How to Talk About Books You Haven’t Read
• “There is more than one way not to read”• Not opening the book
• You cannot read everything• How many books can a librarian read?• How many books can you read? Let’s
estimate
• Librarians can talk about every book in the library (every book out of millions)
There exists a system to (not) read
April 21, 2023 7
Outline
1. From the IN3305 study goals:1. “kennismaken met wetenschappelijke
literatuur”2. To read or not to read?3. What is “scientific literature”? (input and
output)4. Measuring and assessing Quality5. Useful sites and tools6. On gaming the citation indices (unethical)7. Conclusion
April 21, 2023 8
Literature = input
• Citations• Place your work in context• Give credit to previous work• Support your arguments• Show your marginal contribution• Prevent plagiarism
• Read what you cite! (prevent superfluous citing)This does NOT mean:• “You should read everything”• “You cannot also read what you don’t cite”
April 21, 2023 9
Literature = inputSources: peer-reviewed• Textbook/monograph: for teaching and background
• Complete treatment of a topic• Cite a textbook? Mention chapter or page number
• Journal article• More space, detail, thorough than conference paper• Sometimes old news at publication date (lag)
• Paper in edited volume:• Multiple papers, review of state-of-the-art• Cite individual papers
• Paper in conference proceedings• Recent results• Conference quality; publisher of proceedings?
April 21, 2023 10
Sources: not peer-reviewed
• Working papers, Preprints• Up-to-date, spread ideas• “Open access”• Computing Research Repository (CoRR)
http://arxiv.org/corr/home• Websites• ‘Personal communication’
April 21, 2023 11
Literature = output• Publish to conferences and journals• Peer-review (for conferences, journals):• (double) blind review:
Accept, with/without (major) revisionsReject
• Acceptance rate ratio, e.g., 25% (not bad)• (Nature: 10% articles are reviewed)• Time to print: up to 1.5 years for journals,
3-6 months for conferences• Measuring scientific output: “scientometrics”
Q What do you think about this situation?
April 21, 2023 12
Quality?
• Reputation: ACM, IEEE, Springer, Elsevier, MIT/Princeton/Oxford/… University Press
• SCIgen - An Automatic CS Paper Generatorhttp://pdos.csail.mit.edu/scigen/accepted (non-reviewed) for: 2005 World Multi-Conference on Systemics, Cybernetics and Informatics (another one: an Elsevier journal!)
April 21, 2023 13
Scientometrics• Scientometrics, “measuring and analyzing science”,• Bibliometrics, “study or measurement of texts and
information”• Citation analysis• Which papers cite a paper / does a paper cite?• Authority of countries, research groups, individual
authors, journals/conferences, individual paper
Q What is a citation?
• “Publish or perish”: quality vs quantity• (“80% of all published papers are not cited”)
Q Conference or journal? Which conference or journal?
April 21, 2023 14
Citation Databases
• Commercial• ScienceCitation Index (Web of Science/Inf. Sci.
Inst.)• Scopus (Elsevier)
• Free• Google Scholar: better coverage than ISI• CiteSeer (computer science)• ArNetMiner (computer science)• RePec (economics)
• More: en.wikipedia.org/wiki/List_of_academic_databases_and_search_engines
Comparing Countries
Data: King, The scientific impact of nations, Nature’04.
Citation rate per paper, norm.
Citation intensity=
#Citations/GDP
Comparing Groups or Individuals [1/3]• An idea: Google PageRank principle
• Web: network of sites, linking to each other• Science: network of papers, citing each
other
Time
World Wide Web’s Links Network
Academic Citations Network
Q What do you think about this approach?
April 21, 2023 17
Comparing Groups or Individuals [2/3]• Journals: Journal Impact Factor• Personal: h-index (Hirsch, 2005):
A scientist has index h if h of his/her N papers have at least h citations each, and the other (N − h) papers have no more than h citations each.g-index (Egghe, 2006): highest number g s.t. the first g most cited articles have attracted at least g2 citations.
• Extensions: e-index; group evaluation
Q What about conferences?Q Really, what is a citation?Q (unethical) How to abuse citation indices?
April 21, 2023 18
Journal Impact Factor (JIF)
• Many journals have no impact factor• JIF is the average number of citations in a given
year, to papers in a journal in the 2 previous years.
• For journal x, 2010
number of citations in 2010 to papers in journal xfrom the period 2008 – 2009
JIF (x, 2008) =Total number of papers in journal x
in the period 2008 – 2009
• What does an average value mean?
April 21, 2023 19
Journal Impact factors, 20042004 Science Journals Impact Factors (Bron: ISI)
0.001
0.01
0.1
1
10
100
0 1000 2000 3000 4000 5000
≥1 citation/publication(last 2 years)
JIF
Journal Rank
Highest JIF ~30
Very high JIF ≥15
April 21, 2023 20
CS impact factors, 20052005 Impact Factor CS Journals (Bron: ISI)
0.01
0.1
1
10
0 100 200 300
Journal Rank
JIF
Highest JIF ~8
Very high JIF ≥2
Highest JIF ~30
Very high JIF ≥15
CS All
Q What do you think about this situation?
Comparing Groups or Individuals [3/3]For Computer Science• Conference proceedings are to be preferred to
journals• ISI Web of Science and Elsevier Scopus are not good
impact indicators—poor, albeit improving, coverage• Google Scholar is a better impact indicator than ISI
WoS and Elsevier Scopus; ArNetMiner is reasonable• DBLP is a good, selective source, but has no citation
links• Expert knowledge is required to select the best
topical conferences and journals (regardless of their acceptance ratios and impact factors)Q Problems with this
approach?
April 21, 2023 22
Outline
1. From the IN3305 study goals:1. “kennismaken met wetenschappelijke
literatuur”2. To read or not to read?3. What is “scientific literature”? (input and
output)4. Measuring and assessing Quality5. Useful sites and tools6. On gaming the citation indices (unethical)7. Conclusion
April 21, 2023 23
Method To Find Sources
• Browse:• Google Scholar: http://scholar.google.com/• DBLP: http://dblp.uni-trier.de/• Others: TU Delft library tools
• Study author using Publish or Perish• Look at author homepages• Follow links and citations (forward and
backward)
April 21, 2023 24
Google Scholar
• “cited by”• Relevant authors• TU Delft SFX linking• Import into bibtex
April 21, 2023 25
Google Scholar at Work
April 21, 2023 26
April 21, 2023 27
Google Scholar at Work
From home: use vpn!
April 21, 2023 28
April 21, 2023 29
DBLP
• “lists more than one million articles” (april 2008)• Indexes:• Authors• Now also “Faceted search”,
“CompleteSearch”• Conferences• Journals• Series• Subjects
DBLP at Work
April 21, 2023 31
DBLP at Work
April 21, 2023 32
April 21, 2023 33
April 21, 2023 34
TU Delft Library
• Search• http://www.library.tudelft.nl/ws/search/• e.g. “information by subject” -> computer
science• TUlib• “how to find and use scientific information”• http://www.library.tudelft.nl/tulib/
Harzing’s Publish or Perish
• Uses Google Scholar data• Calculates many indices• Number of citations (also per year / article /
author /…)• Hirsch’s h-index• Zhang’s e-index (excess in h-index set)• Egghe’s g-index• …
• Similar online tool: ArNetMinerApril 21, 2023 35
April 21, 2023 36
Publish or Perish (http://www.harzing.com/pop.htm)
April 21, 2023 37
Outline
1. From the IN3305 study goals:1. “kennismaken met wetenschappelijke
literatuur”2. To read or not to read?3. What is “scientific literature”? (input and
output)4. Measuring and assessing Quality5. Useful sites and tools6. On gaming the citation indices (unethical)7. Conclusion
April 21, 2023 38
Unethical!How to Game the Citation System?(part of)Collaboration graph
April 21, 2023 39
All authors with Erdős number 1Note: The h-index was “invented” almost a decade after Erdos.
April 21, 2023 40
Collaboration Graph Degree Distribution
Erdős
April 21, 2023 41
Collaboration Graph: Connected Components Distribution
Giant Component
April 21, 2023 42
Interested?
• Mark Newman answers: “who is the best connected scientist?”
• Other references• Erdős Number Project
http://www.oakland.edu/enp/http://harveycohen.net/erdos/ -- Jerry Grossman and Smarty
• Kevin Bacon Oracle—is Kevin Bacon the center of the Hollywood movie industry? (or Sean Connery? or Christopher Lee?) http://oracleofbacon.org/
More on the (unethical) Gaming the Citation Indices• Self-cite, self-cite, self-cite• Journals asking for submitters to cite journal’s
papers• Program committee members and reviewers asking
for their own work to be cited (when not necessary)• Not citing old work because it’s old—”killing” old
results now allows you to republish them later• Work on a popular topic—more people, more
citations, more chances• (Google Scholar-only) Blog, Tweet, and FB daily
about your papers. Ask your friends to re-post.
How to Talk About Books You Haven’t Read
There exists a system to (not) read
1. Know where to find sources• Trustworthy: DBLP, ACM DL, Google
Scholar• Less trustworthy: CoRR, …
2. Know how to find good sources• Number of citations: Google
Scholar+Others• H-index: Publish or Perish (the program)• Try to avoid or weight-out citation cliques
3. Select from the good sources
April 21, 2023 45
Questions?