Pubmed dataset visualisation pecha kucha

10
PUBMED dataset visualisa1on George Gkotsis Knowledge Media Ins1tute The Open University 21.5 million cita1ons 10.8 million authors

description

Pubmed dataset visualisation pecha kucha for the WebScience 2014 conference

Transcript of Pubmed dataset visualisation pecha kucha

Page 1: Pubmed dataset visualisation pecha kucha

PUBMED  dataset  visualisa1on  George  Gkotsis  

Knowledge  Media  Ins1tute  The  Open  University  

21.5  million  cita1ons  10.8  million  authors  

Page 2: Pubmed dataset visualisation pecha kucha

Visualisa1on  

•  XKCD-style

•  Infographic-­‐style  Ver1cal  scrolling  

Page 3: Pubmed dataset visualisation pecha kucha

Data  analysis  

1.  Co-­‐authorship  network  

2.  Academic  reten1on  and  produc1vity  

3.  Terminology  &  evolu1on  

Page 4: Pubmed dataset visualisation pecha kucha

1.  Co-­‐authorship  network  

•  For  each  year,  a  co-­‐authorship  graph  is  constructed  

•  Visualise  graph  proper1es:  – Nodes  – Edges  – Clustering  coefficient  – Entropy:  

Rowe  &  Strohmaier,  WWW2014  

Page 5: Pubmed dataset visualisation pecha kucha

1.  Co-­‐authorship  network  (cont.)  

Page 6: Pubmed dataset visualisation pecha kucha

2.  Academic  throughput  and  reten1on  

•  Researcher  profile  – 4  aYributes:  

[1]  Year  of  first  publica1on  [2]  Year  of  last  publica1on  [3]  Number  of  publica1ons  [4]  Dura1on  of  research  ac1vity  ([2]-­‐[1])  

1966  -­‐  2001  

Page 7: Pubmed dataset visualisation pecha kucha

2.  Academic  throughput  and  reten1on  (cont.)  

Page 8: Pubmed dataset visualisation pecha kucha

3.  Terminology  &  evolu1on  

w:  1-­‐gram  word-­‐term  M:  5-­‐year  ;tles’  corpus  

Page 9: Pubmed dataset visualisation pecha kucha

3.  Terminology  &  evolu1on  (cont.)  

Page 10: Pubmed dataset visualisation pecha kucha

Development  

•  Pandas  Data  analysis  and  manipula1on  

•  NetworkX  Network  analysis  

•  NLTK  Natural  Language  processing  

•  Matplotlib  Plobng