Data Management & Data Visualization...Last year 86 students DM written score DM project score DM...

15
Data Management & Data Visualization

Transcript of Data Management & Data Visualization...Last year 86 students DM written score DM project score DM...

  • Data Management &

    Data Visualization

  • • Teaching team

    • Course goal and organization

    • Final exam

    • Experience from the past

    Outline

  • • Data management

    – Prof. Andrea Maurino (lead professor) [email protected]

    – Dott. Anisa Rula (assistant professor) [email protected]

    – Dott. Vincenzo Cutrona (laboratory) [email protected]

    • Data Visualization

    – Dott. Federico Cabitza (Associate professor)

    – Mister X

    Teaching Team

  • • Main topic: DATA LIFE CYCLE

    – Data science not only BIG DATA

    Course organization

    Data management

    Data management

    Data Visualization

    Data visualizationstorytelling

    - Machine Learningand decision models

    - Statistical modelling

  • Data management

    Capture

    Store- Hadoop- DataBase Management System relational or NoSQL

    Analyze Use

    - Download

    - SQL queries

    - API

    - (Web) Scraper

    - Spark- SQL- NoSQL

    Query languages

    - Python- R- …

    - Tableau

    Process

    - Batch- Stream

    Enrich

    - Quality- Integration

  • Data Visualization

  • • Davy Cielen, Arno D. B. Meysman, and Mohamed Ali. Introducing Data Science, manning, 2016

    • Harrison Next Generation Databases, Apress, 2015.

    • Rezzani, Big Data Analytics, APOGEO 2017

    • No need to buy these textbooks but we will take

    inspiration and material from them.

    Textbooks

  • • A written exame (40% of global grade)

    • One common project (60% of global grade)– Find a issue →discover datasets, select them, acquisition,

    clean data, (integration), store, query (descriptive not predictive)

    – Preliminary exploration, storytelling (of a subset of data)

    • Both the written score and the project one are valid for one academic year– It is possible to split the two part of the exam

    Exam

    Data managementData visualization

  • • The project must to be preapproved by the teacher

    • From 1 to max 3 students

    • At least 2 of 3 V

    • The finale report, code and data must be shared via google drive with the teacher within the day of the written exam

    Minimum requirements for the

    project

    PICK TWO!

    Volume (at least 2gb of data)

    Velocity (real time collection analysis)

    Variety (at least 2 differentsource of data with differentdescription of format)

    Social listening

  • • http://www.infodata.ilsole24ore.com/2018/06/28/si-costruisce-lartista-musicale-successo-chiedilo-spotify/

    Spotywhy

  • • https://www.infodata.ilsole24ore.com/2018/07/08/spagna-portogallo-finita-3-3-sui-social-cosa-successo-2/

    Social Listening

  • • What, when and where italian use tweets in August

    August

  • • Train, auditel, Emmy awards, atp cincinnati, e-sports, Trivago & booking, Criptocurrency, stocks, amazon, medial data…

    • Where I can find dataset to find some idea?

    – Open data portal

    – Kaggle

    – Ask the teacher!

    Other example

  • • Microsoft azure

    – One virtual pc with 8GB ram , Intel I7, 1Tb hdd

    – Some low cost pcs for collecting data

    Virtual lab

  • Last year

    86 studentsDM written score

    DM project score DM score

    data viz score Final grade

    % of student 87,21% 81,40% 80,23% 80,23% 79,07%

    Average grade 25,14 28,33 27,32 27,32 27,84

    Known issues- exam procedure (project)- virtual lab- teaching (both dm and dv)- sharing lectures

    Codice e Denominazione della

    AD

    Frequenza

    ConoscenzePreliminari

    MaterialeDidattico

    Chiarezze Modalita'd'Esame

    Rispetto degli Orari

    Stimolare l'Interesse

    degli studenti

    Esposizione

    Utilita'della

    Didattica Integrativ

    a

    Coerenza con

    quanto dichiarato in offerta

    Reperibilita del

    docente

    Interesse per la materia

    Soddisfazione

    Complessiva

    Efficacia Didattica

    Aspettiorganizzat

    ivi

    [F9101Q003] DATA MANAGEMENT AND VISUALIZATION Freq 1,24 1,2 1,19 2,22 2,09 1,81 1,35 1,94 2,26 2,57 1,48 2,02 1,64[F9101Q003] DATA MANAGEMENT AND VISUALIZATION

    Non_Freq 1,48 1,29 1,24 2,13 2,24 1,38 2,13 1,26