GO-Infosheet

2

Click here to load reader

Transcript of GO-Infosheet

Page 1: GO-Infosheet

The  Grid  Observatory   Initiative   develops   a   scientific   view  of   the   dynamics   and  usage  of  globalized  IT  systems  by  monitoring  and  analyzing  the  EGI  grid.    

The   overall   goal   is   to   create   a   full-­‐fledged   digital   curation   process,   with   its   four  components:  preservation,  validation,  indexation  and  knowledge  building.    

As  the  largest  non-­‐profit  globalized  system  worldwide  and  with  demanding  scientific  users,   the   EGI   infrastructure   is   one   of   the   most   exciting   artificial   complex   systems.  

With   extensive   monitoring   facilities   already   in   place,   it   offers   an   unprecedented   opportunity   to   observe   and   to  understand  the  computing  practices  within  the  e-­‐Science  community.    

Grid   and   cloud   share   a   common   paradigm:   they   are   globalized   at   a   large   scale.   As   such,   the   data   collected   and   the  knowledge  built  from  analyzing  EGI  concern  cloud  modeling  as  well.  Ongoing  work  integrates  monitoring  data  from  the  StratusLab  cloud.  

The  Grid  Observatory  is  an  open  collaboration,  keen  to  foster  dialog  and  partnerships  with  others  in  the  relevant  areas  of  computer  science  and  engineering.  The  Laboratoire  de  Recherche  en  Informatique  and  Laboratoire  de  l’Accélérateur  Linéaire,   from  CNRS   and  University   Paris-­‐Sud,   along  with   the   London   Imperial   College   operate   data   production.   The  initiative  is  supported  by  France-­‐Grilles,  INRIA  and  CNRS.    

A  trove  of  experimental  data:  www.grid-­‐observatory.org  

 

The   first   role   of   the   Grid   Observatory   is   to   preserve   the  monitoring  data,  normally  discarded  after  operational  usage,  and   to   make   them   available   to   the   wider   scientific  community.   Through   its   web   portal,   the   Grid   Observatory  offers  public  access  to  a  repository  of  grid  traces  to  observe  e-­‐Science  practice  and  infrastructure.  

• EGI   provides   an   accessible   approximation   of   the  current  and  future  requirements  of  e-­‐Science  users.    

• Grid   status   and   middleware   activity   are   recorded.  These   can   be   explored   for   a   wide   range   of  motivations,   from  operational  usage,  e.g.   improving  performance,   to   scientific   research,   e.g.   testing  classification  methods  for  fault  detection.  

The   Grid   Observatory   follows   Tim   Berners   Lee’s   recommendation   for   Raw   Data   Now.   It   exemplifies   the   Big   Data  challenges:   semantic  organization,  provenance,   interoperability,   and  next  generation  analytics.  Emerging   technologies  such  as  Linked  Open  Data  will  be  explored  to  further  address  those  challenges.    

The  Green  Computing  Observatory      

 The   Grid   Observatory   offers   extensive   traces   of   energy  consumption.   Because   green   IT   is   becoming   an   increasingly   urgent  need  and  also  because  there  was  no  existing  EGI  monitoring  tool,  this  action  has  its  own  name:  the  Green  Computing  Observatory.    

• The   traces   integrate   motherboard-­‐level   monitoring   with  information  on  computing,  networking,  storage,  and  cooling.  

• Acquisition  exploits  the  de  facto  standards  IPMI  and  Ganglia.  

• Integration   is   based   on   an   ontology   of   IT   system  measurements,   including   virtual   machines,   developed   by  University  Picardie  Jules  Verne.    

 

From  applied  to  fundamental  research  

Research  exploiting  the  monitoring  data  should  demonstrate  verifiable  and  positive  impact  on  production  systems.    • Beyond-­‐power-­‐law   and   non-­‐stationary   behavior   are   pervasive.   With   sequential   testing,   segmentation   and  

adaptive  on-­‐line  clustering,  we  advanced  fault  detection  and  parsimonious  model  building.  • Efficient   autonomic   policies   must   combine   a   priori   knowledge   and   on-­‐line   adaptation,   but   reference  

interpretations   are   most   often   missing.   Data-­‐driven   topic   modeling   in   the   spirit   of   text   mining,   and  heterogeneous  data  integration  with  Statistical  Relational  Learning  help  to  build  intelligible  representations.    

Page 2: GO-Infosheet

         

Digital  curation  

The  overall  goal  of  the  Grid  Observatory  is  to  create  a  full-­‐fledged  digital  curation  process,  with  its  four  components.  

Establishing  and  developing  a  long-­‐term  repository  of  digital  assets  for  current  and  future  references.  

The  Grid  Observatory  operates  since  October  2008.   It  continuously  records  and  publishes  various  traces.  An  essential  achievement  is  to  cover  the  complete  scope  of  the  grid  middleware  and  users  activity,  beyond  particular  aspects  such  as  job  lifecycle  or  failure  events,  and  including  for  instance  logging  the  Information  System  (BDII).  

Providing  digital  asset  search  and  retrieval  facilities  to  scientific  communities  through  a  gateway.  

The  middleware  traces  are  currently  made  available  only  in  raw  format,  on  a  weekly  basis.  Much  remains  to  be  done  in  the   direction   of   a  more   semantic   organization.   The   Green   Computing   Observatory   data   are   organized   along   an   XML  schema  associated  with  the  measurement  ontology.  All  are  available  trough  the  Grid  Observatory  portal.    

Tackling  the  good  data  creation  and  management  issues,  and  interoperability,  through  formal  ontology  building.  

The   Grid   Observatory   most   often   builds   on   EGI   and   gLite   monitoring,   thus   benefits   from   their   collective   effort   of  middleware   development   and   EMI   standardization.   The   Green   Computing   Observatory   builds   on   IPMI   and   Ganglia.  Calibration   of   IPMI   measurements   is   made   possible   by   PDU   (Power   Distribution   Unit)   measurements.   The   Green  Computing  Observatory  participates  in  the  COST  action  IC0804  -­‐  Energy  efficiency  in  large  scale  distributed  systems.    

Adding  value  to  data  by  generating  new  sources  of  information  and  knowledge  through  semantic,  statistical  and  Machine  Learning  based  inference.  

The   general   framework   for   the   Grid   Observatory   is   to   turn   it   into   a   social   intelligence   system   to   pool   scientific   and  engineering   expertise,   in   order   to   build   gradually  more   integrated  models   of   the   European   e-­‐infrastructures,   and   to  define  and  validate  autonomic-­‐oriented  policies  addressing  their  operational  challenges.  

More  information:    

• The  Green  Computing  Observatory:  a  data  curation  approach  for  green  IT.  9th  IEEE  Int.  Conf.  on  Dependable,  Autonomic  and  Secure  Computing.    

• The  Grid  Observatory.  11th  IEEE/ACM  Int.  Symp.  on  Cluster,  Cloud  and  Grid  Computing.  

Towards  Open  Linked  Data  

***  Data   are   accessible   on   the   web  through   the   portal;   the   only  protection   implemented   is   against  malicious  usage.  

All   formats   are   machine   readable  and  open:  ASCII,  XML,  SQL,  LDIF      RDF   and   Linked   RDF   are   the   next  step.  

   

Selected  contributions  from  the  Grid  Observatory  initiative  and  its  users  

Fault  detection  and  diagnosis,  smart  probing.  

Distributed  Monitoring  with  Collaborative  Prediction.  12th  IEEE/ACM  Int.  Symp.  on  Cluster,  Cloud  and  Grid  Computing.  

Toward   Autonomic   Grids:   Analyzing   the   Job   Flow   with   Affinity   Streaming.   15th   ACM   SIGKDD   Conf.   on   Knowledge  Discovery  and  Data  Mining.  

Optimization   of   jobs   submission   on   the   EGEE   production   grid:   modeling   faults   using   workload.   Journal   of   Grid  Computing  ,  8(2).  

Grid  models  

Characterizing  e-­‐science  file  access  behavior  via  latent  Dirichlet  allocation  .  4th  IEEE/ACM  Int.  Conf.  on  Utility  and  Cloud  Computing.    

Towards  non-­‐stationary  Grid  models.  Journal  of  Grid  Computing,  9(4).    

Autonomic  Quality  of  Service  and  Green  Computing  

Multiobjective  reinforcement  learning  for  responsive  grids.  Journal  of  Grid  Computing  8:3..    

Autonomic  policy  adaptation  using  decentralized  online  clustering.  7th  IEEE/ACM  int.  conf.  on  Autonomic  computing.