VO Neural Data Mining Exploration -...

26
VONeural / Data Mining Exploration G. Longo – M. Brescia & & Project Team OACN INAF – Osservatorio Astronomico di Capodimonte Dipartimento di Fisica – Università degli Studi di Napoli Federico II California Institute of Technology Workshop finale dei Progetti GRID del PON "Ricerca" 20002006 – Avviso 1575 – Catania, February 1012 ,2009

Transcript of VO Neural Data Mining Exploration -...

VO‐Neural / Data Mining Exploration

G. Longo – M. Brescia&&

Project TeamOACN

INAF – Osservatorio Astronomico di CapodimonteDipartimento di Fisica – Università degli Studi di Napoli Federico II

California Institute of Technology

Workshop finale dei Progetti GRID del PON "Ricerca" 2000‐2006 – Avviso 1575 – Catania, February 10‐12 ,2009

VO‐Neural / Data Mining Exploration

Astronomical data rate

100000010000000

10000100000

100010000

10100

1 150

160

170

180

190

200

Workshop finale dei Progetti GRID del PON "Ricerca" 2000‐2006 – Avviso 1575 – Catania, February 10‐12 ,2009

00 00 00 00 00 00

VO‐Neural / Data Mining Exploration

Astronomical data rate

Workshop finale dei Progetti GRID del PON "Ricerca" 2000‐2006 – Avviso 1575 – Catania, February 10‐12 ,2009

VO‐Neural / Data Mining Exploration

Astronomical computational rate  

101000 Hours of

Computer Time/

0 0010,1

0 Time/Night

0,000010,001

0,0000001 150

160

170

180

190

20000 00 00 00 00 00

Workshop finale dei Progetti GRID del PON "Ricerca" 2000‐2006 – Avviso 1575 – Catania, February 10‐12 ,2009

VO‐Neural / Data Mining Exploration

Astronomical computational rate  

GRID(   l ti )(a solution)

Workshop finale dei Progetti GRID del PON "Ricerca" 2000‐2006 – Avviso 1575 – Catania, February 10‐12 ,2009

VO‐Neural / Data Mining Exploration

Workshop finale dei Progetti GRID del PON "Ricerca" 2000‐2006 – Avviso 1575 – Catania, February 10‐12 ,2009

VO‐Neural / Data Mining Exploration

Workshop finale dei Progetti GRID del PON "Ricerca" 2000‐2006 – Avviso 1575 – Catania, February 10‐12 ,2009

VO‐Neural / Data Mining Exploration

l dExploration on datasetsDimensional reduction

ClassificationRegressionClusteringForecasting

Workshop finale dei Progetti GRID del PON "Ricerca" 2000‐2006 – Avviso 1575 – Catania, February 10‐12 ,2009

gFiltering

VO‐Neural / Data Mining Exploration

Workshop finale dei Progetti GRID del PON "Ricerca" 2000‐2006 – Avviso 1575 – Catania, February 10‐12 ,2009

VO‐Neural / Data Mining Exploration

Workshop finale dei Progetti GRID del PON "Ricerca" 2000‐2006 – Avviso 1575 – Catania, February 10‐12 ,2009

VO‐Neural / Data Mining Exploration

In 2006, a group of astronomers, computerscientists, engineers and physicians started toexplore possible joined effort to create a data miningp p j ff gtoolset, based on GRID infrastructure and VOstandards, for worldwide users who want to sharedata, methods and discoveries.data, methods and discoveries.

Workshop finale dei Progetti GRID del PON "Ricerca" 2000‐2006 – Avviso 1575 – Catania, February 10‐12 ,2009

VO‐Neural / Data Mining Exploration

Object Oriented ProgrammingInternal VO standards and protocolsJava language (generic for DMM)Java language (generic for DMM)User/Session Registry DB (MySQL)Web‐based User I/OWeb Application and Web Service Technology

Workshop finale dei Progetti GRID del PON "Ricerca" 2000‐2006 – Avviso 1575 – Catania, February 10‐12 ,2009

pp gyPluginModularity (easy to be integrated/modified)Hardware independent through GRID driverData conversion and manipulation support

VO‐Neural / Data Mining Exploration

Architecture:Architecture:• MVC  (Model‐View‐Controller);Technology:• Struts 2.0 (building infrastructure tool);Struts 2.0 (building infrastructure tool);• Java Servlet & JSP (dynamic context‐dependent web page generation);Features:• User GUI deployment and I/O management;

Workshop finale dei Progetti GRID del PON "Ricerca" 2000‐2006 – Avviso 1575 – Catania, February 10‐12 ,2009

p y / g ;• interaction with internal components through standard protocol (XML);• Local User/Session data virtualization through Virtual File Store;

VO‐Neural / Data Mining Exploration

Architecture:Architecture:• It depends on the environment choice;• In S.Co.P.E. DR is a component running onthe GRID UI;the GRID UI;Technology (in S.Co.P.E.):• GRID Software (middleware gLite);Features:• Storage Device(s) + Execution Environment= Deployment Environment;• Different Deployment Environments can bemore suited for a specific task (e.g. an MLPTEST is unlikely to be a computing intensivetask, so GRID latency times are not needed);• Dynamic Driver Loading => Driver Plugins;• Drivers are available to the Framework WSand to the Plugins;Al d fil f ( d d• Also used to convert files formats (standard

or DMM dependent);

Workshop finale dei Progetti GRID del PON "Ricerca" 2000‐2006 – Avviso 1575 – Catania, February 10‐12 ,2009

VO‐Neural / Data Mining Exploration

Architecture:• data mining class hierachy for functionality implementation;  Technology:

il bl   d l  k   d lib i• available model packages and libraries;• custom ad hoc model design and development;• custom wrappers for internal standardization;

Features:• modularity;• fast third part application integration;• functionality specialization;• multi‐language programming support;

Workshop finale dei Progetti GRID del PON "Ricerca" 2000‐2006 – Avviso 1575 – Catania, February 10‐12 ,2009

VO‐Neural / Data Mining Exploration

Architecture:Architecture:• JDBC;Technology (in S.Co.P.E.):• MySQL and JDBC API;MySQL and JDBC API;Features:• management of user (registration, authentication, working sessions,experiments and files) information and their relationships;p ) p ;• store and manage information about three different file's categories: “supported”, “exotic” and “custom” (datasets, model configuration and intermediate data);

Workshop finale dei Progetti GRID del PON "Ricerca" 2000‐2006 – Avviso 1575 – Catania, February 10‐12 ,2009

VO‐Neural / Data Mining Exploration

Architecture:• Restful Web Service (client‐server  apps with resource addressable with HTTP methods); DM  d l   l i f   h h Pl i  SDK• DM models control interface through Plugin SDK;

Technology:• Web container  SUN Apache Tomcat;J  S l t f   b  i• Java Servlet for web service;

Features:• Internal resource representation through ”contextual” VOTables; • Experiment configuration and execution;• user authentication and working sessionmanagement;• experiment data & working flow trigger and supervision;

Workshop finale dei Progetti GRID del PON "Ricerca" 2000‐2006 – Avviso 1575 – Catania, February 10‐12 ,2009

VO‐Neural / Data Mining Exploration

Suite target proposal

VONEURAL/DAME ORGANIZATION CHART

G. LongoPrincipal Investigator

XP – eXtreme Programming

Design & Documentation process:

c pa est gato

M. BresciaProject Manager

O Laurino1. Statement of work & Project Plan 2. Project Design Description3. SW Requirement Specifications

S f  D i  D i i

O. LaurinoProject Engineer

A. Corazza

R D’Abrusco

S. Cavuoti

C D l k

G. d’Angelo

N Deniskina

A. Nocella

4. Software Design Description5. Implementation6. Test Procedures

T h i l R t

R. D Abrusco

D. Capozzi

E. De Filippis

A. Staiano

Data Mining Modeling

C. Donalek

Infrastructure

N. Deniskina

M. Garofalo

F. Manna

M. Fiore

B. Skordovski

Workshop finale dei Progetti GRID del PON "Ricerca" 2000‐2006 – Avviso 1575 – Catania, February 10‐12 ,2009

7. Technical Reports8. Test Reports9. User & Maintenance Manuals

Science & Education

R. Tagliaferri Software Engineering

VO‐Neural / Data Mining Exploration

A i l l d d b ild hi d t t fi th d t i i d lA simple user can upload and build his datasets, configure the data mining modelsavailable, execute different experiments in service mode, load graphical views ofpartial/final results.Y t id i lf i l ? Ok thi k t bYou are not considering yourself as a simple user? Ok, so you think to be aDeveloper. Or at least a scientist who wants to upload and use his application (andpossibly to share it with others).

B  h t    d ’t t t     l ’   li ti  Be honest, you don’t trust  someone else’s application. So You want to extend our framework?

DM Models DevelopmentDownload our DM Models library;

Add new low level/DM shared libraries and related new wrapper;Add new low level/DM shared libraries and related new wrapper;Extend the DM class hierarchy;

Model/Driver Plugin DevelopmentD l d  SDKDownload our SDK;

Implement and test the DMPlugin abstract class;Provide a method to produce the plugin description and Submit for Registration;

Workshop finale dei Progetti GRID del PON "Ricerca" 2000‐2006 – Avviso 1575 – Catania, February 10‐12 ,2009

The same if you want to develop a new driver for a specific environment or storage system. Just implement the Driver Plugin Interface and register it;

VO‐Neural / Data Mining Exploration

Workshop finale dei Progetti GRID del PON "Ricerca" 2000‐2006 – Avviso 1575 – Catania, February 10‐12 ,2009

VO‐Neural / Data Mining Exploration

Workshop finale dei Progetti GRID del PON "Ricerca" 2000‐2006 – Avviso 1575 – Catania, February 10‐12 ,2009

VO‐Neural / Data Mining Exploration

Workshop finale dei Progetti GRID del PON "Ricerca" 2000‐2006 – Avviso 1575 – Catania, February 10‐12 ,2009

VO‐Neural / Data Mining Exploration

Workshop finale dei Progetti GRID del PON "Ricerca" 2000‐2006 – Avviso 1575 – Catania, February 10‐12 ,2009

VO‐Neural / Data Mining Exploration

Workshop finale dei Progetti GRID del PON "Ricerca" 2000‐2006 – Avviso 1575 – Catania, February 10‐12 ,2009

VO‐Neural / Data Mining Exploration

Workshop finale dei Progetti GRID del PON "Ricerca" 2000‐2006 – Avviso 1575 – Catania, February 10‐12 ,2009

VO‐Neural / Data Mining Exploration

Mining the SDSS Archive I. Photometricredshifts in the nearby Universe, R.D’Abrusco et al. (The AstrophysicalDAbrusco et al. (The AstrophysicalJournal, 663: 752‐764, 2007 July 10.

t h/ 8 6 t iastro‐ph/0805.0156v1; to appear soon inMNRAS (R. D’Abrusco et al.)

Cavuoti 2008, Thesis (VONeuralwebsite, voneural.na.infn.it)

In this Conference poster session:A web application for photometric redshifts evaluation

Omar Laurino et al.

Workshop finale dei Progetti GRID del PON "Ricerca" 2000‐2006 – Avviso 1575 – Catania, February 10‐12 ,2009