Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT •...
Transcript of Data Analytics in the domain of Smart Cities and e-Government · 2 • Smart cities and ICT •...
Data Analytics in the domain of
Smart Cities and e-Government
Jose Aguilar
CEMISID, Dpto. de Computación, Facultad de Ingeniería
2
• Smart cities and ICT
• e-Government
• Introduction to Data Analytics
• Neighbor concepts:
• Business intelligent,
• Big data,
• Mining Problems
• Case study
Outline
33/295
Goal
This tutorial will analyze the transformation of the cities and government due to data analytic. We will review the applications of data analytics to support smart cities. We will talk about the decision making
processes using data analytics, so that citizens, policy makers and businesses, can work together to
manage the life of the city. Additionally, we will discuss the transformation of the public service
provision model and, the utilization of data analytics to enable new forms of governance.
In the last twenty years, the most innovative industries such as
aerospace and automotive, began to develop varying degrees of
automation in their activities/products
Now follow the environments common life of human beings, with
a high level of spatiotemporal integration of technologies in their community settings (home, school,
etc.).
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts
Case study
We live in a society where our relationship with "hard
technologies" is constant
We communicate by phone with others, dishwasher in
our homes, we watch TV, our offices with computers,
vehicles, etc.
There is a long list of everyday objects, which are incorporated into our lives, almost
without realizing it…
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts
Case study
6/295
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts
Case study
Smart cities can benefit of the Information and
communications technology (ICT) , however it needs
sophisticate mechanisms and appropriate software
technologies to collect, store, analyze and visualize the
data from the city environment and the citizens.
• The urban environments are of the main data
generators worldwide.
• In the context of smart cities, there is an abundance of
data that can be mined by applying data analytic
techniques.
Motivation
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts
Case study
Motivation
• There are different data sources,
– traditional information held by public institutions (data about the traffic, the health care, etc.),
– the information generated by the citizens (using their smartphones, etc.),
– sensor systems on the city (camera networks, etc.),
– Internet of Things, etc.
• Data is stored and analyzed to define services that the world needs.
Data is the new gold, data has become one of the most precious treasures, because they can generate
information and knowledge.
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts
Case study
Why?
Urban Mobility issues:
Environment and CO2 emmissions
Urban congestion:
Half of the world population is living in cities
in 2008
2020 climate and energy package
Accidents and safety
Freight distribution
Financial issues
Quality of life
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts
Case study
9/295
Water System
Essential systems in which a city is based
Infrastructure Systems (health, education,
etc.).
Productive system of
entrepreneurship
Transportation system
Population
Energy system
Comunication system
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts
Case study
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts
Case study
CITIZEN INFRASTRUCTURE
ADMINISTRATION
Transformation is possible by harmonic integration of «infrastructure»,
«citizen» and «administration»
A “city” that uses information and communications technologies to make the critical infrastructure components
and services of a city — administration, education, healthcare, public safety, real estate, transportation, and utilities— more
aware, interactive, and efficient.Forrester Research
11
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts
Case study
12/295
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts
Case study
Definitions of Smart City
SMART CITY GOALS
• Achieve a sustainable development
• Increase the quality of life of its citizens
- Valuing usage above ownership
- Focusing on non-monetary values
- Having wider opportunities for work and study
- Overcoming restrictions of time and place
- Being both a consumer and a producer
• Achieve a sustainable development
- Managing the lifecycles of cities
- Improving economic performance over the entire Lifecycle
- Enhancing city competitiveness
“A smart sustainable city is an innovative city that uses information and communication technologies (ICTs) and other means to improve quality of life, efficiency of urban operation and services, and competitiveness, while ensuring that it meets the needs of present and future generations with respect to economic, social and environmental aspects.”
by Boyd Cohen :Smart cities use information and communication technologies (ICT) tobe more intelligent and efficient in the use of resources, resulting incost and energy savings, improved service delivery and quality of life,and reduced environmental footprint--all supporting innovation andthe low-carbon economy.
by ITU-T Focus Group on Smart Sustainable Cities
13/295
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts
Case study
RELATION BETWEEN OTHER
CONCEPTS AND THE SMART CITY MODEL
Many global initiatives related to enhancing the
capacities of cities to respond to the demands of
the future
smart cities, cities of knowledge, ...
IEEE initiative about “Smart Cities”
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts
Case study
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts
Case study
"Smart Cities" includes
• Smart Energy (Renewable generation & storage, Energy efficient in buildings
• intelligent Lighting, Smart grids, Irrigation remote control, etc)
• Smart Waste Management(Recycling of waste, residual management, Recovery of waste organics)
• Smart Living
• Smart Building & Home
• Smart Transportation/Mobility
• Smart Education(e-Education)
• Smart Governance(e-governance)
• Smart Medical Facility(e-Medical)
• Smart Communications
• Smart Economy (Innovation Centre, job-search resource centres, e-commerce )
• Smart Environment (environmental Information and alerts, Containers with sensors , Monitoring distribution networks)
• Smart People
16/295
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts
Case study
16
http://es.slideshare.net/giselledellamea/smart-city-ciudades-sostenibles-e-inteligentes
Smart Energy
Smart Public
Services
Smart PublicSafetySmart
Home / Office / Building
Smart Educatio
n
Smart Healthcar
e
Smart Transpor-tation
17/295
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts
Case study
The solution set working on a common infrastructure turn into initiatives
which vary by the industry
Smart HealthSmart Public
ServicesSmart Building
Smart Transportation
• Smart Care
Management
• Connected Health
• Smart Medicine
Supply
• Mobile Health
• Remote
Healthcare
Management
• Smart Citizen
Services
• Smart Tax Administration
• Smart Customs, Immigration, Border Management
• Smart Crime
Prevention
• Smart Emergency
Response
• Smart Financial
Management
• Energy
Optimization
• Asset
Management
• Facility
Management
• Video Surveillance
• Recycling and
Power Generation
• Automatic Fault
Detection
Diagnosis
• Supervisory
Control
• Audio / Video
Distribution
Management
Smart Education
• Smart Classroom
• Performance Man.
• Asset
Management
Smart Governance
• Participation
• Transparency:
Open Data, e-
municipality
• Public and social
services
Smart People
• Digital education
• Creativity
Specific Initiatives
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts
Case study
Energy
Generation
networking
Efficiency
Environment
Buildings and
Infrastructure
Efficiency in
buildings
Urban
planification
Mobility:
Logistics
Mobility and
intermodality
Vehicles
and
alternative
fuels
Government
and social
services:
e-gov, tourist
destinations
Health
Management
, accessibility
ITC
Materials and sensors
Security
19/295
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts
Case study
Impact of Transformation
Within 20 years, a smart city of 5
million can drive
+$
Revenue
+% Growth +% Energy
Efficiency
+K New
Jobs
Improved city management
Continious Economic Growth
Enhanced Quality of Life
Sustainable Urbanization
20/295
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts
Case study
Smart
Buildings
Smart
Infrastructure
Smart Water
Management
Cyber-Security
and ResilienceEM
System
Climate
Change
Adaptation
Integrated
Management
Open Data
Different looks of SMART CITY
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts
Case study
Remote communication services for Education & Healthcare
Different looks of SMART CITY
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts
Case study
FEMS example:-• Visualization
system interconnected with various production information such Monitoring & Control on a real time basis
Source: Toshiba Group
Energy System
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts
Case study
Energy System
Source: Hitachi Group
Shared use of neighborhood facilities
24/295
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts
Case study
Ubiquitous City: services
Source [Lee, Han, Leem y Yigitcanlar]
25/295
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts
Case study
Sl No. Title Core Indicators
1 The long road to zero waste cities Solid waste collection & its recycling.
2 Economic indicators in the new smart city standard
City’s unemployment rate and population living in poverty.
3 Why education may be the most important smart city indicator of all?
students completing primary education & secondary education, student/teacher ratio.
4 Does your city's air quality measure up to the new smart city standard?
Particulate matter (PM2.5- PM10) concentration and Greenhouse gas emissions measured in tonnes per capita.
5 How debt, spending and tax collections add up in new smart city standard?
Debt service ratio.
6 Fire and emergency response indicators -- how safe is your city?
Number of firefighters, fire related deaths and natural disaster related deaths.
The SMART CITY standard: Dissecting ISO 37120
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts
Case study
The SMART CITY standard: Dissecting ISO 37120
Sl No. Title Core Indicators
7 How voting, women and corruption figure in the smart city standard
Voter participation in last municipal election and Women as a percentage of total elected to city-level office.
8 How healthy is your city? Average life expectancy, no. of in-patient hospital beds & no. of physicians, mortality rate.
9 How fun is YOUR city? None
10 How safe is your city? Number of police officers & homicides.
11 The homeless challenge cities face
City population living in slums.
12 What the new smart city standard says about energy?
Residential electrical use per capita (kWh / year), city population with authorized electrical service, energy consumption of public buildings per year (kWh / m2) and energy derived from renewable sources.
27 272727
European SmartCities Project
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts
Case study
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts
Case study
Transportation time:
Maximum travel time 30 minutes in small & medium size cities and 45 minutes in metropolitan areas and High frequency mass transport within 800 meters (10-15 minute walking distance)
Footpath:
Continuous unobstructed footpath of minimum 2 meter wide on either side of all street
Bicycle tracks:
Dedicated and physically segregation of bicycle tracks on all streets with carriageway more than 10 meters
Additional infrastructure:
95% of residences should have retail outlets, parks, primary schools & recreational areas within 400 meters walking distance
Water Management
100% household100% households should be connected to waste water network100% households are covered by daily door-step solid waste collection systemNo water logging incidents in a year
Electricity Supply100% metering of electricity supply100% recovery of cost100% of the city has wi-fi connectivity & 100 Mbps internet speed
Medical Facility:
30 minutes emergency response time for patients
Geospatial Information System (GIS) Services
Integration of Disaster Rescue Information Map Navigation
Benchmark for Smart Cities
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts
Case study
No. 1: COPENHAGEN
• Led the Siemens Green City Index for Europe
• One of the lowest carbon footprints/capita in theworld (less than two tons/capita).
• Aspire to achieve carbon neutrality by 2025
• All new buildings to be carbon neutral (greenbuilding ).
• Approximately 40% of all commutes are conductedby bike.
• The city also recently collaborated with MIT todevelop a smart bike equipped with sensors todeliver to provide real-time info to not only therider but also to administrators for open dataaggregation on issues of air contamination andtraffic congestion.
No. 2: AMSTERDAM
67% of all trips are done by cycling or walking.
First bike sharing project in the world was occurredin Amsterdam decades ago.
At present 40 smart city projects ranging fromsmart parking to the development of home energystorage for integration with a smart grid.
No. 3: VIENNA
The “Citizen Solar Power Plant“ being developedwith a goal of obtaining 50% of their energy fromrenewable sources by 2030.
Testing out a range of electric mobilitysolutions from expanding their charging networkfrom 103 to 440 stations by 2015.
Residents are sharing vehicle with neighbors.
No. 4: BARCELONA
Bike-sharing project with more than 6,000 bikes.
Using various sensors from noise and aircontamination to traffic congestion and even wastemanagement.
The life expectancy in Barcelona is among the highestof cities ( approx 83 years).
No. 5: PARIS•The city has more than 20,000 bikes for sharing.
•5% reduction in vehicle congestion in the city.
•The city partnered with Bolloré to create one of the world’s firstand most expansive EV car sharing programs.
•Autolib’ will soon have 3,000 EVs in its car sharing fleet.
•Paris’ ecosystem was rated 11th best in the world.
22
Source: A report prepared by Boyd Cohen,
The 5 SMARTEST CITIES in EU
30/295
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts
Case study
Big DataBusiness
IntelligenceData Analytics M2M
Reporting
Internet of Things
CloudComputing
UCC
Broadband MobileBuilding
AutomationNext Gen
Device
Wireless Sensor Netw.
IT Security
E-cards
E-government
Smart city concept is founded on a set of solutions which are
combination of today’s standalone technologies
Combination of Technology
31/295
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts
Case study
31
• The management and analysis such data is a new challenger that can help to answer questions in governance, planning, etc. and support the decision making.
• Currently, there is a lack of data analytical frameworks for urban decision makers.
• Particularly, the field of smart city based on data analytics is quite broad, complex and is rapidly evolving.
A Smart City, as a “system of systems”, generates
vast amounts of data of energy, environment,
transport, socio-economic, among others.
32/295
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts
Case study
• The complexity in the smart city data analytics is due to
– requirements of cross-thematic applications (energy, transport, etc.),
– multiple sources and type of data (unstructured, semi-structured or structured)
– Integrity of data.
• Some questions what data analytics can help to response about a smart city are:
– How is the behavior of the people in some places?
– How can predict the areas of dense traffic in the future?
– How are reached and leaved some sites during an especial event?
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts
Case study
• Data analytics has a crucial role in the smart city since it acts as the platform to discover information and knowledge, to understand how the city is functioning, etc.
• A smart city predicts, integrates, etc., specific incidents or events, with the end of improving the quality of life or informing to the citizens.
• For that, it requires:– Be instrumented to allow the collection of data about city life;
– Have mechanism for the aggregation of data from different sources;
– Have mechanism for the representation of the data;
– Have knowledge (detailed, in real-time) available about the city;
– Have automated city functions, to be delivered reliably, and effectively.
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts
Case study
organization of the resource management standards under the smart city environment.
Integrated smart city management
standards
Resources integration standards
Fusion processing
technical specifications
Management service technical specifications
Observation Process Metadata Standard
Observation Metadata Standard
Model Metadata Standard
Node Metadata Standard
Event Metadata Standard
Technical Specification for
Resource and Toponym Matching
Technical Specification for Resource and Map Fusion
Data Service Interface Specification
Model Service Interface Specification
Event Service Interface Specification
integrated management for smart sustainable cities
35 35353535/295
produceconsume processstore
From the IT point of view, a city or urban area is a concentration of entities
(individuals and corporations-family, business, businesses, schools, institutions
...) that
information
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts
Case study
36 36363636/295
Transmitters (channels) classic:
Oral transmissionAdvertisementsIn the posts, in the street, in the store.Loudspeakers, bells, sirens.MailTelephone faxNewspaper, books, magazinesRadioTV
Transmitters (channels) Recent:
digital television, Internet: Email, Chat, Twitter ...Internet phone (Skype)Google, search enginesWikipediaCell phoneSMS messages
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts
Case study
37 373737
ProduceConsume ProcessStore
Much information generated by humans
• The information is heterogeneous: text, pictures, videos,…
• They are a good complement to the information collected
by sensors and smart devices
• Many applications are a combination of social networks,
smart devices and the cloud
• Recommendation systems books, games, ..
• Assistance systems in traffic, travel
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts
Case study
38 383838
ProduceConsumeProcessStore
My data My results
Many data !!!
Many capacity !!
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts
Case study
39 393939
Other data
Other data Results for
someone
Many data !!!This data could come from or
be on the Web
My data My results
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts
Case study
40 40404040/295
Other data
Other data Results for
someone
My data My results
Many capacity !!
Part of my computing can be done outside in the cloud
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts
Case study
41 41414141/295
ProduceConsume ProcessStore
data results
Classics
schoolslibraries
newspapersFiling of documents
Store magazines, books, etc.
Recent
hard drives, DVDs, CDs, ...digital song files, movies, photos
WikipediaYoutube
Web Repositories
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts
Case study
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts
Case study
E-Governance is the application of ICT for delivering
government services, exchange of information
communication transactions, integration of various stand-
alone systems and services as well as back office processes
and interactions within the entire government frame work.
“E-Government is an ongoing process of transformation of
Government towards the provision of government services
(information, transactions) through electronic means,
including access to government information and the
completion of government transactions on an ‘anywhere,
anytime’ basis.”PricewaterhouseCoopers
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts
Case study
• e-Gov uses ICTs to improve and/or enhance on the efficiency and effectiveness of its services.
• ITC is used: to deliver more targeted information or better tailored to citizens, to increase the participation of the citizens, both in the service delivery as in the policy making, among other things.
The development of an efficient and effective e-government is a prerequisite for the development
of Smart Cities.
44/295
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts
Case study
e-government is the
transformation of government
to provide
Efficient
Convenient &
Transparent
Services
to
the Citizens & Businesses
through
ICTs
Particularly, e-Government (e-Gov) has
transformed interactions between
governments, citizens and other
members in the society.
e-Gov (also known as Internet government,
online government, etc.) consists of the
digital interactions between a
government and citizens, government
and businesses, government and
employees, among others.
45/295
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts
Case study
e-Government is not about computers & websites
but about citizens & businesses!
e-Government is not about translating processes
but about transforming processes !
46/48
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts
Case study
Costs/
benefitsof public
sector IT
Computerisation: databases and back office automation
Benefit realisationeGov 1.0:
Online Service Delivery
eGov 2.0: Transformational Government
brief history of e-Government From automation to transformation
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts
Case study
?Citizen-centric
business model
Lower cost
Happier customers
Higher policy impact
Empowered citizens
Business
Customers
Channels
Technology
Business
Customers
Channels
Technology
Business
Customers
Channels
Technology
Business
Customers
Channels
Technology
Transformational Government
48/295
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts
Case study
Governance
Way govt. worksSharing of
informationService delivery
49/295
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts
Case study
citizensGovt.
business
Government service delivery
50/295
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts
Case study
Government To Citizen (G2C)ICT to enable Citizen convenience and participatione.g. SARS e-Filing, DoL Unemployment Fund submission (U-Filing), DHA Identity Tracking , Info Portals (Gateway, Web sites)
Government To Government (G2G)ICT to improve internal efficiency and admin of Governmente.g. Fin Accounting (BAS), Logistics, HR (PERSAL), Crime Administration (CAS), Population Register (NPR), Social Pension (SOCPEN), Health Admin (HIS, PaaB, Pharm), Education (NSC, ANA), Transport (eNatis)
Government To Business (G2B)ICT to serve Private Business, Industry and Tradee.g. CIPC Companies Register, Electronic Payments, SARS Customs
e-Government Domains
51/48
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts
Case study
Costs/
benefitsof public
sector IT
Computerisation: databases and back office automation
eGov 1.0: Online Service Delivery
eGov 2.0: Transformational Government
Benefit realisation
Fragmented
Interoperable
Integrated
Citizen-focused
Citizen-enabled
TransformationAutomation PCMainframe Internet Cloud
“Governments are shifting from a government-centric paradigm to a citizen-centric paradigm”
Rethinking e-government services: user-centric approaches, OECD, 2009
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts
Case study
E-Government Transformational Government
Government-centric Citizen-centric
Supply push Demand pull
Government as sole provider of citizen
services
Government also as convener of multiple
competitive sources of citizen services
Unconnected vertical business silos New virtual business layer, built around
citizen needs, operates horizontally across
government
“Identity” is owned and managed by
government
“Identity” is owned and managed by the
citizen
Public data locked away within government Public data available freely for reuse by all
Citizen as recipient or consumer of services Citizen as owner and co-creator of services
Online services
IT as capital investment
Multi-channel service integration
IT as a service
Producer-led Brand-led
Bolting technology onto the existing business model of government
Focusing first on the business changes needed
to unlock benefits for citizens, and only then on
the technology
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts
Case study
E-Government Evolution
De
liveri
ng V
alu
e T
o C
itiz
en
s
Complexity of Implementation and Technology
Web
Presence
Agency web
sites provide
citizens with
information on
rules and
procedures
Limited
Interactions
Intranets link
departments
allowing for Email
contact, access to
online databases
& downloadable
forms
Transactions
Electronic
delivery of
services
automated.
Applications
include issue
of certificates
and renewal of
licenses
Transformation
Joined up
government. All
stages of
transactions
including payments
are electronic.
Applications include
government portals.
New models of
service delivery with
public private
partnerships
54/295
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts
Case study
Is e-Gov always based on Internet?
NO !
The following forms are also e-Government
• Telephone, Fax, Mobile
• CCTV, Tracking Systems, RFID, Biometrics
• Smartcards
• Non-online e-Voting
• TV & Radio-based delivery of public services
55/295
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts
Case study
What do leading nations aim in eGov?
• Interactive Public Services
• Public Procurement
• Public Internet Access Points
• Broadband Connectivity
• Interoperability
• Culture & Tourism
• Secure G2G Communications
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts
Case study
Government Gateway
Multiple access
channels
Portal
infrastructure
Common web services
Inter-operable
departmental systems
website
Local govt.
portals
Private sector
portals
E-Government Interoperability
Framework
• Registration and enrolment
• Authentication
• Secure e-mail
• Rules engine
• Circumstances and personalisation
• Payments
• Notifications
• Appointments
Life events
email Telephone
Internet
enabled deviceKioskInternet
site
Interactive
TV
E-Government IT Infrastructure
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts
Case study
Government
Backbone
Network
(GNET)
Government Office
Automation (GOA)
Departmental
network
Central
Internet
Gateway
(CIG)
Certification
Authority
(CA)
Government
Communication
Network
(GCN)
Mail Service
Central
Cyber
Government
Office
(CCGO)
E-Government IT Infrastructure
58/295
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts
Case study
European Interoperability Framework v 2.0
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts
Case study
Examples of e-Services – G2C
• Birth Certificate
• Health Care
• School Admission
• Scholarships
• e-Learning
• Examination Results
• Employment Services
• Vehicle Registration
• Driver’s License
• Passport/Visa
• Agriculture
• Land Record
• Property Registration
• Marriage Certificates
• Taxes
• Utility Services
• Municipality Services
• Pensions
• Insurance
• Health Care
• Death Certificate
60/4295
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts
Case study
• A single government portal that crosses ministerial and agencies & links to all other public websites.
• Local content production in key ministries and processes for regular updating.
• Computerized and web-enabled key processes.
• Legal and technical bases for transactions through the portal.
• Capacity for civil servants to facilitate such transactions.
Elements of E-Government
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts
Case study
Public Information
Kiosks
Public
Computer
Facilities
Community
Cyber Points
Wide variety of channels to access
62/295
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts
Case study
Applications
• Submission of tax return
• Renewal of driving and vehicle
licenses
• Registration as a voter
• Payment of Government fees
• Tourist information
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts
Case study
Applications
Electronic procurement
• Electronic Tendering System
• Electronic Marketplace
• Electronic Product Catalogue
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts
Case study
Department
Centric
Approach
Process
Orientation
Output-Based
Assessment
Departmental
View
Customer
Centric
Approach
Service
Orientation
Outcome-based
Assessment
Integrated
View
Principle # 1: e-Government is
about Transformation
65/48
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts
Case study
7 Areas of
Management
Process
Reform
Management
Resource
Management
Procurement
Management
Technology
Management
Knowledge
Management
Change
Management
Principle # 2: e-Government requires
A Holistic Approach
Program
Management
66/295
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts
Case study
Pe
op
le
Pro
cess
Tech
no
log
y
Re
so
urc
es
e-Government
Principle # 2: e-Government requires
A Holistic Approach
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts
Case study
E-GOVERNMENT:
• E-government policy and strategy
• Government wide process reengineering and change
management
• Strategic applications such as unified citizens database
• Prioritized multi-year ICT investment program
SOCIETAL APPLICATIONS FUND:
• Low-cost technology solutions
• Scalable social and business models
• Local content industry promotion, multimedia
INFORMATION
INFRASTRUCTURE & ACCESS:
• Telecom & Internet policies
& regulation
• Rural access subsidy scheme
• Telecenters
HUMAN RESOURCES:
• Specialized ICT education
and training
• Use of ICT in education
LEADERSHIP, POLICY & INSTITUTIONS:
• Overall vision, e-laws
• ICT Agency
• CIOs in different ministries
• ICT industry promotion
Principle # 2: e-Government requires
A Holistic Approach
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts
Case study
•Lack of Process Models
•Status Quo-ism
•Poor Legal Frameworks
•Complex Procurement
1 PROCESS
•Lack of Political Will
•Official Apathy
•Shortage of Champions
•Lack of Skills in Govt
2 PEOPLE
•Lack of Architectures
•Lack of Standards
•Poor Communication
Infrastructure
•Hardware-approach
3 TECHNOLGY
•Budget Constraints
•Disinterest of Pvt Sector
•Lack Project Mgt Skills
4 RESOURCES
Principle # 3: e-Government requires us
to overcome A Number of Challenges
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts
Case study
Principle # 4: e-Government necessitates
Change Management
• Senders & Receivers of communications must be in Sync
• Assess the levels of resistance & comfort
• Authority for change must be sufficient & continuous
• Value systems in the organization should support Chg Mgt
• Change should be of right quantum
• The ‘right’ answer is not enough
• Change is a process and not an event.
70/295
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts
Case study
Principle # 4: e-Government necessitates
Change Management
1. Awareness of Change
2. Desire to Change
3. Knowledge of Skills
4. Ability to apply Knowledge
5. Reinforcement to Sustain Change
71/48
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts
Case study
•Policy Formulation
•Committing Resources
•Taking hard decisions
•Preparing Roadmaps
•Prioritization
•Frameworks, Guidelines
•Monitoring Progress
•Inter-agency Collaboration
•Funds Management
•Capacity Management
•Conceptualization
•Architecture
•Definition (RFP, SLA…)
Leadership & Vision
Program Development
Program Management
Project Development
Project Management•Bid Process Management
•Project Monitoring
•Quality Assurance
Principle # 5: e-Government
necessitates Capacity Building
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts
Case study
• Law & Policy-making– e-Government can be a catalyst for legal reform
– Wider & faster dissemination of laws
– Faster & better formulation of policies
• Better Regulation– Registration & Licensing - speedier
– Taxation – better revenues
– Environmental Regulations – better compliance
– Transportation & Police – more transparency
• More efficient Services to Citizens & Businesses– Better Image
– Cost-cutting
– Better targeting of benefits
– Control of corruption
– Improved accountability of politicians and civil servants.
Benefits to Government
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts
Case study
Benefits to Citizens
• Cost and time-savings
• Certainty in getting services
• Higher penetration due to automation
• Increased participation of citizens in government decisions and actions.
• Better quality of life
• Ease of access of information
• Added convenience – multiple delivery channels
• Possibility of self-service
74/295
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts
Case study
Benefits to Business
• Increased velocity of business
– E.g Tradenet of Singapore
• Ease of doing business with Government
– e-Procurement
• Better Investment climate
• Transparency
PROCESS RE-ENGINEERING –technology only a tool not
panacea
75/295
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts
Case study
Again, data analytics comes to underpin their progress.
• Examples of possible application linked to data analytics and e-government:– E-participation: It is the ability of all citizens to communicate with one
another and with agencies or groups that represent them.
– Online Direct Democracy: It is based on to give citizens decision making power on social issues.
online collective decision making.
– Public Talent in Use: Smart cities must use crowdsourcing approach to support problem solving.
The crowdsourcing model develops the collective
intelligence of online communities.
– New Notions of Public Services. In a smart city, the citizens are partners. The key idea is that the pursuit of public ends is the responsibility of everybody.
Examples of possible application linked to data analytics and e-government:
– New Notions of Public Services. In a smart city, the citizens are partners. The key idea is that the pursuit of public ends is the responsibility of everybody.
the prosumer Era
– Constellation of active agencies and groups, where the governance and coordination can be constituted dynamically from bottom up.
In a smart city, the decentralization of governance is one of the main aspect.
– Government Cloud Data: Governments must go to the cloud computing to allow transparency and collaboration.
Cloud computing allows to cover the whole city with e-government solutions.
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts
Case study
77 777777
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts
Case study
It’s a Snake It’s a Spear
It’s a Bridge
It’s a Tree Trunk
It’s a Blanket
It’s a Python
Blindfolded Men Describe Elephant
78 78787878/295
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts
Case study
Blindfolded Men Describe Business
Analytics
It’s Data Warehousing It’s BD
It’s Statistics
It’s Mathematical
Models
It’s BI
It’s Executive Dashboards
It’s Computer
Science
79 797979
It is the science that examines raw data for the purpose of seeking
knowledge, draw conclusions, generate information, among other
things.
It is used in many areas:
• Industry to make better business decisions
• Science to verify existing models or theories.
...
Data is the new oil economy
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts
Case study
80 80808080/295
Data analysis is not simply data mining
Data mining navigates through large datasets using
sophisticated software to identify patterns.
Data analysis focuses on inference to draw a conclusion
based on what is known, to discover hidden relationships
and establish
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts
Case study
Data analytics is the science of collecting, storing, extracting, cleansing, transforming, aggregating and
analyzing data, with the purpose of discovering information and knowledge.
• Data analytics is been used in different fields: finances, education, industry, etc.
• Analytics uses descriptive, identification and predictive models in order to produce knowledge from data, to be used to guide decision making.
• The high degree of datification embedded in a Smart City demands new tools and mechanisms for data manipulation and representation that facilitate the extraction of meaningful knowledge.
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts
Case study
82 82828282/295
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts
Case study
• Approach to de-synthesizing data, informational, and/or factual elements to answer research questions
• Method of putting together facts and figuresto solve research problem
• Systematic process of utilizing data to address research questions
• Breaking down research issues through utilizing controlled data and factual information
83 838383
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts
Case study
Utilizing Data to Increase Shareholder Value
Data =
Big and Small
Internal and External
Structured and Non-structured
Traditional and “New”
“Free” and Purchased
84 848484
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts
Case study
• With vast amounts of data now available, companies in almost every industry are focused on exploiting data for competitive advantage.
• In the past, firms could employ teams of statisticians, modelers, and analysts to explore datasets manually, but the volume and variety of data have far outstripped the capacity of manual analysis.
• At the same time, computers have become far more powerful, networking has become ubiquitous, and algorithms have been developed that can connect datasets to enable broader and deeper analyses than previously possible.
• The convergence of these phenomena has given rise to the increasing widespread business application of data science principles and data mining techniques.
The Ubiquity of Data Opportunities
85 85858585/295
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts
Case study
The Ubiquity of Data Opportunities
• Data mining is used for general customer relationship management to analyze customer behavior in order to manage attrition and maximize expected customer value.
• The finance industry uses data mining for credit scoring and trading, and in operations via fraud detection and workforce management.
• Major retailers from Walmart to Amazon apply data mining throughout their businesses, from marketing to supply-chain management.
• Many firms have differentiated themselves strategically with data science, sometimes to the point of evolving into data mining companies.
The primary goals of DA are to help you view business problems from a data perspective and understand principles of extracting
useful knowledge from data.
86 868686
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts
Case study
Data can’t “talk”
An analysis contains some aspects of scientific
reasoning/argument:
* Define
* Interpret
* Evaluate
* Illustrate
* Discuss
* Explain
* Clarify
* Compare
* Contrast
87 878787
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts
Case study
Goal of an analysis:
* To explain cause-and-effect phenomena
* To relate research with real-world event
* To predict/forecast the real-world
phenomena based on research
* Finding answers to a particular problem
* Making conclusions about real-world event
based on the problem
* Learning a lesson from the problem
88 88888888/295
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts
Case study
An analysis must have four elements:
* Data/information (what)
* Scientific reasoning/argument (what?
who? where? how? what happens?)
* Finding (what results?)
* Lesson/conclusion (so what? so how?
therefore,…)
89 898989
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts
Case study
Basic guide to data analysis:
* “Analyse” NOT “narrate”
* Go back to research flowchart
* Break down into research objectives and
research questions
* Identify phenomena to be investigated
* Visualise the “expected” answers
* Validate the answers with data
* Don’t tell something not supported by
data
90 90909090/295
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts
Case study
Shoppers Number
Male
Old
Young
6
4
Female
Old
Young
10
15
More female shoppers than male shoppers
More young female shoppers than young male shoppers
Young male shoppers are not interested to shop at the shopping complex
91 919191
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts
Case study
• When analyzing:
* Be objective
* Accurate
* True
• Separate facts and opinion
• Avoid “wrong” reasoning/argument. E.g. mistakes in interpretation.
92 929292
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts
Case study
• The success of analytics can only be measured in terms of how well they help achieve their strategic objectives
• So a managers role is to:
– Identify business goals
– Collect the data necessary to measure performance towards goals
– Analyze the data
– Draw conclusion based on the information generated
Managing using Analytics
93 939393
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts
Case study
• Data science involves principles, processes, and techniques for understanding phenomena via the (automated) analysis of data.
• Data science is in the context of various other closely related and data related processes in the organization.
• We can distinguish data science from other aspects of data processing that are gaining increasing attention in business.
Data science
94 94949494/295
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts
Case study
• Data-driven decision-making (DDD) refers to the practice of basing decisions on the analysis of data, rather than purely on intuition.– For example, a marketer could select advertisements based purely
on his long experience in the field and his eye for what will work.
Or,
– he could base his selection on the analysis of data regarding how consumers react to different ads.
He could also use a combination of these approaches.
• DDD is not an all-or-nothing practice, and different firms engage in DDD to greater or lesser degrees.
Data-driven decision-making
95 95959595/295
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts
Case study
~ Exploratory Methods ~
This method often involves a lot of calculating averages and percentages, and displaying the information on a graph. Although Exploratory methods may provide many pieces of information, it may not answer specific questions or make definite statements about a problem.
~ Confirmatory Methods ~
This method is used to conclude the results of the survey and the statistical information by answering specific questions. For example, using a confirmatory method, a statistician can say “Oil Prices leaving Saudi Arabia has been increasing, and will increase in prices.”
Not one of these methods should be overlooked. Both methods should be used extensively to analyze the results of a statistical activity and will have to come to varieties of extremely specific conclusions with credibility and accuracy.
Analyzing the Data
96 969696
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts
Case study
Quantitative and qualitative methods produce different types of data
– Quantitative data produces numerical values
– Qualitative data produces narratives
But for both quantitative and qualitative data, the same analytical strategies are
used for data interpretation
97 97979797/48
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts
Case study
Descriptive Predictive Prescriptive
Questions What happened?What’s happening?What actions are needed?What exactly is the problem?What actions are needed?
Why is this happening?What will happen next?Why will it happen?
What should I do?Why should I do it?What’s the best that can happen?What if we try this?
Enablers • Ad hoc Reports• Dashboards• Data Warehousing• Alerts
• Data Mining• Text Mining• Web/Media Mining• Forecasting
• Optimization• Simulation• Decision Modeling• Randomized Testing
Outcomes Well defined business problems and opportunities
Accurate projections of the future states and conditions
Best possible business decisions and transactions
Analytics and Types of Questions
98 98989898/295
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts
Case study
Basic analytical strategies:
Describing
Factoring
Clustering
Comparing
Classification
Finding commonalities
Finding covariation
Ruling out rival
explanations
99 999999
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts
Case study
• Narrative (e.g. laws, arts)
• Descriptive (e.g. social sciences)
• Statistical/mathematical (pure/applied sciences)
• Audio-Optical (e.g. telecommunication)
• Others
Most research analyses, arguably, adopt the first
three.
The second and third are, arguably, most popular
in pure, applied, and social sciences
Categories of data analysis
100 100100100100/295
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts
Case study
• Privacy
• Security
• Drawing decisions on incomplete data
• Drawing decisions on inaccurate data
• Using only data that supports our gut decisions
• Drawing the wrong conclusion from the data
– Stock prices example
Dangers in Analytics
101 101101101
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts
Case study
• Data mining
• Statistical analysis
• Predictive analysis
• Correlation
• Regression
• Forecasting
• Process Modeling
• Optimization
• Simulation
Analytic Tools
Two main categories:* Descriptive statistics* Inferential statistics
102 102102102
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts
Case study
• Use summary measures to describe central tendency of a distribution (mean, Mode, Median)
• For dispersion (variability) use standard deviation, variance, and range to tell you how spread out the data are about the mean.
• Count (frequencies)
• Percentage
• Mean (Sum of all values ÷ no. of values)
• Mode most frequent value)
• Median (middle value)
• Range
• Standard deviation
• Variance
• Ranking
Basic Descriptive Statistics
103 103103103
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts
Case study
Basic Descriptive Statistics
Frequency Distributions
To what extent did you increase your skills in
putting together a household budget?
A lot Some A little Not at all
Women (N=30) 14 9 5 2
104 104104104104/295
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts
Case study
Basic Descriptive Statistics
Percentage Distributions
To what extent did you increase your skills in
putting together a household budget?
A lot Some A little Not at all
Women (N=30) 46% 30% 17% 7%
105 105105105105/295
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts
Case study
Basic Descriptive Statistics
40 50 55 94 100 100 100
40 92 93 94 95 96 98
Mean = 81
Mean = 87
106 106106106106/4
8
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts
Case study
Basic Descriptive Statistics
Two different bar graphs are made from the same survey of favorite foods:
107 107107107
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts
Case study
Basic Descriptive Statistics
Favorite Foods
Pizza
33%
Hot Dogs
34%
Hamburgers
33%
Pizza
Hot Dogs
Hamburgers
The same information can
be accurately presented in a
non-misleading way :
108 108108108
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts
Case study
Basic Descriptive Statistics
A simple glance at this graph will
make us conclude that smoking
is the leading cause of death
among Americans. However, an
in-depth analysis of this graph
will easily tell us that it is greatly
misleading.
A person who smokes has died from a heart disease. What was
his cause of death?
109 109109109
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts
Case study
Basic Descriptive Statistics
Percentage of Smokers in Each Cause of Death
0%
20%
40%
60%
80%
100%
120%
AID
S
Alc
ohol
Mot
or V
ehic
le
Fires
Hom
icid
e
Illicit
Dru
gs
Suicide
Can
cer
Hea
rt D
isea
se
Percent of Smokers
110 110110110110/295
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts
Case study
Basic Descriptive Statistics
Price Per Barrel of Light Crude Oil Leaving Saudi Arabia on
Jan. 1
$0.00
$2.00
$4.00
$6.00
$8.00
$10.00
$12.00
$14.00
$16.00
1973 1974 1975 1976 1977 1978 1979
Years
Pric
e P
er B
arre
l
111 111111111
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts
Case study
Basic Descriptive Statistics
Price Per Barrel of Light Crude Oil Leaving Saudi
Arabia on Jan. 1
$0.00
$2.00
$4.00
$6.00
$8.00
$10.00
$12.00
$14.00
$16.00
1973 1974 1975 1976 1977 1978 1979
Years
Pri
ce
s P
er
Ba
rre
l
- Another adequate way of fixing the graph, showing the gradual
increase in the oil prices effectively through a line graph.
112 112112112
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts
Case study
Basic Descriptive Statistics
Graphing comparisons
Satisfaction with Services
0
5
10
15
20
25
30
35
40
A B C D E
Clinic Name
Sati
sfa
cti
on
Sco
re
113 113113113113/295
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts
Case study
Basic Descriptive Statistics
Satisfaction with Services
0
2
4
6
8
10
12
14
16
A B C D E
Clinic
Sati
sfa
cti
on
Sco
re
Staff
Advice
Facility
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts
Case study
A main aspect in data analytic is the capability of measuring and mining urban data.
• There are a lot of city spatio-temporal data, available in various forms, about a lot of our activities (human mobility data, etc.).
• To interpret such data, there is a variety of data mining and visualization techniques.
• Particularly, smart cities need reality mining, which concerns pervasive sensing in the social systems using ubiquity technology (for example, smartphones).
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts
Case study
Using reality data mining and network analysis, we are able to:
– Produce models and methods, using urban data with different spatio-temporal scales.
– Develop services that ensure equity, fairness and a better quality of city life.
– Explore the notion of the city as a laboratory for innovation.
– Enhance mobility for city populations.
– Produce new forms of urban governance and organization.
A real-time analytic is very important in a smart city, in order to create a
catalogue of the behaviors in a city.115/295
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts
Case study
• Collecting the right data (historical perspective)
• Developing a Data Warehouse (all data in one place
• Having a staff to analyze the data
• Managers that understand the business & embrace managing by the numbers
What it takes to succeed using this
technique?
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts
Case study
Big data, data mining tools, business
intelligence platforms, open data, Internet of
things (IoT), ubiquitous sensor networks,
among others, are essentials in a data
analytics infrastructure.
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: BD
Case study
90% of the world’s Data has been generated since 2010 , Big Data combines data from human and
computer, everyday, we create 2.5 quintillion bytes of data
• Big data defines a collection of data so large, complex and rapidly changing, which becomes difficult to process with traditional data processing systems.
• Big data include capture, curation, storage, search, sharing, transfer, analysis and visualization of the data.
118/295
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: BD
Case study
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: BD
Case study
Social media and networks
(all of us are generating data)Scientific instruments
(collecting all sorts of data)
Mobile devices
(tracking all objects all the time)
Sensor technology and
networks
(measuring all kinds of data)
• The progress and innovation is no longer hindered by the ability to collect data
• But, by the ability to manage, analyze, summarize, visualize, and discover knowledge from the collected data in a timely manner and in a scalable fashion
Who’s Generating Big Data
120/295
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: BD
Case study
Where Is This “Big Data” Coming From ?
12+ TBs
of tweet data
every day
25+ TBs
of
log data
every day
? T
Bs
of
data
every
day
2+
billionpeople
on the
Web by
end 2011
30 billion RFID
tags today
(1.3B in 2005)
4.6
billioncamera
phones
world
wide
100s of
millions
of GPS
enableddevices
sold
annually
76 million smart
meters in 2009…
200M by 2014
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: BD
Case study
The Model of Generating/Consuming Data has Changed
Old Model: Few companies are generating data, all others are consuming data
New Model: all of us are generating data, and all of us are consuming
data
120/295
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: BD
Case study
Type of Data
• Relational Data (Tables/Transaction/Legacy Data)
• Text Data (Web)
• Semi-structured Data (XML)
• Graph Data– Social Network, Semantic Web (RDF), …
• Streaming Data – You can only scan the data once
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: BD
Case study
What to do with these data?
• Aggregation and Statistics
– Data warehouse and OLAP
• Indexing, Searching, and Querying
– Keyword based search
– Pattern matching (XML/RDF)
• Knowledge discovery
– Data Mining
– Statistical Modeling
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: BD
Case study
Big Data Is New
Big Data Is Only About Massive Data Volume
Big Data Means Hadoop
Big Data Need A Data Warehouse
Big Data Means Unstructured Data
Big Data Is for Social Media & Sentiment Analysis
The Myth About Big Data
125/295
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: BD
Case study
• No single standard definition…
“Big Data” is data whose scale, diversity, and
complexity require new architecture,
techniques, algorithms, and analytics to manage
it and extract value and hidden knowledge from
it…
126/295
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: BD
Case study
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: BD
Case study
- Government
In 2012, the Obama administration announced the Big Data Research
and Development Initiative
84 different big data programs spread across six departments
- Private Sector
- Walmart handles more than 1 million customer transactions every hour,
which is imported into databases estimated to contain more than
2.5 petabytes of data
- Facebook handles 40 billion photos from its user base.
- Falcon Credit Card Fraud Detection System protects 2.1 billion active
accounts world-wide
- Science
- Large Synoptic Survey Telescope will generate
140 Terabyte of data every 5 days.
- Large Hardon Colider 13 Petabyte data produced in 2010
- Medical computation like decoding human Genome
- Social science revolution
- New way of science (Microscope example)
Importance of Big Data
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: BD
Case study
Data Analysis prediction for US 2012 Election
Drew Linzer, June 2012
332 for Obama,
206 for Romney
Nate Silver’s, Five thirty Eight blog
Predict Obama had a 86% chance of winning
Predicted all 50 state correctly
Sam Wang, the Princeton Election Consortium
The probability of Obama's re-election
at more than 98%
media continue reporting the race as very
tight
Usage Example in Big Data
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: BD
Case study
- Ad-hoc querying and reporting
- Data mining techniques
- Structured data, typical sources
- Small to mid-size datasets
- Optimizations and predictive analytics
- Complex statistical analysis
- All types of data, and many sources
- Very large datasets
- More of a real-time
What’s driving Big Data
130/295
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: BD
Case study
Big Data ExplorationFind, visualize, understand all big data to improve decision making
Enhanced 360o Viewof the CustomerExtend existing customer views (MDM, CRM, etc) by incorporating additional internal and external information sources
Security/Intelligence ExtensionLower risk, detect fraud and monitor cyber security in real-time
Data Warehouse AugmentationIntegrate big data and data warehouse capabilities to increase operational efficiency
Operations AnalysisAnalyze a variety of machinedata for improved business results
The 5 Key Big Data Use Cases
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: BD
Case study
• Big data is more real-time in nature than traditional DW applications
• Traditional DW architectures (e.g. Exadata, Teradata) are not well-suited for big data apps
• Shared nothing, massively parallel processing, scale out architectures are well-suited for big data apps
Value of Big Data Analytics
132/295
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: BD
Case study
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: BD
Case study
Volume
of Tweets
create daily.
12+terabytes
Variety
of different
types of data.
100’sVeracity
decision makers trust
their information.
Only 1 in 3
trade events
per second.
5+million
Velocity
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: BD
Case study
• Data Volume– 44x increase from 2009 2020– From 0.8 zettabytes to 35zb
• Data volume is increasing exponentially
Exponential increase in
collected/generated data
1-Scale (Volume)
135/295
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: BD
Case study
• Various formats, types, and structures• Text, numerical, images, audio, video,
sequences, time series, social media data, multi-dim arrays, etc…
• Static data vs. streaming data • A single application can be
generating/collecting many types of data
To extract knowledge all these types of data need to linked together
2-Complexity (Varity)
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: BD
Case study
• Data is begin generated fast and need to be processed fast
• Online Data Analytics
• Late decisions missing opportunities
• Examples– E-Promotions: Based on your current location, your purchase history,
what you like send promotions right now for store next to you
– Healthcare monitoring: sensors monitoring your activities and body any abnormal measurements require immediate reaction
3-Speed (Velocity)
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: BD
Case study
Challenges in Handling Big Data
• The Bottleneck is in technology– New architecture, algorithms, techniques are needed
• Also in technical skills– Experts in using the new technology and dealing with big data
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: BD
Case study
Big Data Integration is Multidisciplinary
Less than 10% of Big Data world are genuinely relational
Meaningful data integration in the real, messy, schema-less
and complex Big Data world of database and semantic web
using multidisciplinary and multi-technology methode
The Billion Triple Challenge
Web of data contain 31 billion RDf triples, that 446million of
them are RDF links, 13 Billion government data, 6 Billion
geographic data, 4.6 Billion Publication and Media data, 3 Billion
life science data
BTC 2011, Sindice 2011
The Linked Open Data Ripper
Mapping, Ranking, Visualization, Key Matching, Snappiness
Demonstrate the Value of Semantics: let data integration drive
DBMS technology
Large volumes of heterogeneous data, like link data and RDF
Other Challenges in Big Data
140
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: BD
Case study
Big Data Technology
141/295
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: BD
Case study
Hadoop NoSQL Databases Analytic Databases
Hadoop
• Low cost, reliable
scale-out architecture
• Distributed computing
Proven success in
Fortune 500
companies
• Exploding interest
NoSQL Databases
• Huge horizontal scaling
and high availability
• Highly optimized for
retrieval and appending
• Types
• Document stores
• Key Value stores
• Graph databases
Analytic RDBMS
• Optimized for bulk-load
and fast aggregate
query workloads
• Types
• Column-oriented
• MPP
• In-memory
Main Big Data Technologies
142/295
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: BD
Case study
• Hadoop Distributed File System (HDFS)
– Massive redundant storage across a commodity
cluster
• MapReduce
– Map: distribute a computational problem
across a cluster
– Reduce: Master node collects the answers to
all the sub-problems and combines them
• Many distros available
US and Worldwide: +1 (866) 660-7555 | Slide
Hadoop Core Components
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: BD
Case study
Apache Hive
Apache Pig
Apache HBase
Sqoop
Oozie
Hue
Flume
Apache Whirr
Apache Zookeeper
SQL-like language and
metadata repository
High-level language
for expressing data
analysis programs
The Hadoop database.
Random, real -time
read/write access
Highly reliable
distributed
coordination service
Library for running
Hadoop in the cloud
Distributed service for
collecting and
aggregating log and
event data
Browser-based
desktop interface for
interacting with
Hadoop
Server-based
workflow engine for
Hadoop activities
Integrating Hadoop
with RDBMS
Major Hadoop Utilities
144/295
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: BD
Case study
Hadoop & Databases
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: BD
Case study
Hadoop DB – A Hybrid Approach[Abouzeid et al., VLDB 2009]
• An architectural hybrid of MapReduce and DBMS technologies
• Use Fault-tolerance and Scale of MapReduce framework like Hadoop
• Leverage advanced data processing techniques of an RDBMS
• Expose a declarative interface to the user
• Goal: Leverage from the best of both worlds
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: BD
Case study
Architecture of HadoopDB
EDBT 2011 Tutorial
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: BD
Case study
EDBT 2011 Tutorial
Architecture of HadoopDB
148/295
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: BD
Case study
MapReduceRaw Input: <key, value>
MAP
<K2,V2><K1, V1> <K3,V3>
REDUCE
Implementation of Big Data
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: BD
Case study
• Automatic Parallelization:– Depending on the size of RAW INPUT DATA instantiate
multiple MAP tasks– Similarly, depending upon the number of intermediate <key,
value> partitions instantiate multiple REDUCE tasks
• Run-time:– Data partitioning– Task scheduling– Handling machine failures– Managing inter-machine communication
• Completely transparent to the programmer/analyst/user
MapReduce Advantages
150/295
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: BD
Case study
• Google MapReduce (2004)– Jeffrey Dean et al. MapReduce: Simplified Data Processing on
Large Clusters. OSDI 2004.
• Apache Hadoop (2005)– http://hadoop.apache.org/– http://developer.yahoo.com/hadoop/tutorial/
• Apache Hadoop 2.0 (2012)– Vinod Kumar Vavilapalli et al. Apache Hadoop YARN: Yet
Another Resource Negotiator, SOCC 2013.– Separation between resource management and computation
model.
MapReduce Model
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: BD
Case study
Google MapReduce
Worker
WorkerWorker
Worker
Worker
(1) fork (1)
fork
(1) fork
Master(2)
assig
n
map
(2)
assi
gn
redu
ce(3) read (4) local
write
(5) remote
read
Output
File 0
Output
File 1
(6) write
Split 0Split 1Split 2
Input files
Mapper: split, read,
emit intermediate
KeyValue pairs
Reducer:
repartition, emits
final output
User
Program
Map phaseIntermediate files
(on local disks)Reduce phaseOutput files
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: BD
Case study
• MapReduce works with a single data source (table):– <key K, value V>
• How to use the MR framework to compute:– R(A, B) S(B, C)
• Simple extension (proposed independently by multiple researchers):– <a, b> from R is mapped as: <b, [R, a]>– <b, c> from S is mapped as: <b, [S, c]>
• During the reduce phase:– Join the key-value pairs with the same key but different relations
Two-way Joins and MapReduce
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: BD
Case study
Beyond Two-way Joins?
• How to generalize to:
– R(A, B) S(B, C) T(C,D)
in MapReduce?
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: BD
Case study
U
Beyond Two-way Joins
155/295
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: BD
Case study
Beyond Two-way Joins?
156/295
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: BD
Case study
• Big Data Models and Algorithms – Foundational Models– Algorithms and Programming Techniques– Analytics and Metrics– Representation Formats for Multimedia Big Data
• Big Data Architectures – Big Data as a Service– Cloud Computing Techniques for Big Data– Big Data Open Platforms– Big Data in Mobile and Pervasive Computing
• Big Data Search and Mining– Algorithms and Systems for Big Data Search– Distributed, and Peer-to-peer Search– Machine learning based on Big Data– Visualization Analytics for Big Data
Challenges (1)
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: BD
Case study
Challenges (2)
• Big Data Management – Big Data Persistence and Preservation– Big Data Quality and Provenance Control– Management Issues of Social Network Big Data
• Big Data Protection, Integrity and Privacy – Models and Languages for Big Data Protection– Privacy Preserving Big Data Analytics– Big Data Encryption
• Security Applications of Big Data – Anomaly Detection in Very Large Scale Systems– Collaborative Threat Detection using Big Data Analytics
158/295
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: BD
Case study
• In general, data analytics and Big data imply:– A Distributed data architecture, where data can be stored and
analyzed in real time.
– A High-performance computing capability embedded in the architecture to filter and analyze data.
– A set of services for the operational and policy decision making, based on data analytics.
• Big data and data analytics make possible – To better understand the city: What? Where? Who? How?
– To anticipate: • On short term: traffic congestion, risks due to weather events …
• On long term: needs of infrastructures, needs of schools …
– To get the real, clear and understandable indicators (scoreboards) to the attention of the mayor.
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: BD
Case study
Data Ingestion
Manipulation
Integration
Enterprise &
Ad Hoc Reporting
Data Discovery
Visualization
Predictive Analytics
RelationalHadoop NoSQLAnalytic
Databases
Pentaho Big Data Analytics
Complete Big Data Analytics &
Visual Data Management
160/295
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: BI
Case study
Business Intelligence (BI)
• software that searches vast amounts of data to derive information for improved decision making
A management decision support
framework that empowers business users
to understand data => resulting in
actionable insights that improve the
business.
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: BI
Case study
Business Intelligence enables the business to make intelligent, fact-based decisions
Aggregate
Data
Database, Data Mart, Data
Warehouse, ETL Tools,
Integration Tools
Present
Data
Enrich
Data
Inform a
Decision
Reporting Tools,
Dashboards, Static
Reports, Mobile Reporting,
OLAP Cubes
Add Context to Create
Information, Descriptive
Statistics, Benchmarks,
Variance to Plan or LY
Decisions are Fact-based
and Data-driven
161/295
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: BI
Case study
scal
e
scale
1990’s2000’s 2010’s
https://www.google.de/search?q=evolution+of+business+intelligence&newwindow=1&tbm=isch&tbo=u&source=univ&sa=X&ei=gE
GoU5KXBuTb4QSGsoH4BQ&ved=0CDsQsAQ&biw=1366&bih=64
The Evolution of Business Intelligence
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: BI
Case study
Business Intelligence Process
164/295
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: BI
Case study
(source Heinz, 2014)
165/295
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: BI
Case study
Strategic
Tactical
Operational
High direction
Managers
Personaloperating
Business Intelligence
ERP
Strategy
Day to day
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: BI
Case study
Tactical /
Strategic BI
What’s the best that can happen?
What will happen next?
What if these trends continue?
Why is this happening?
What actions are needed?
Where exactly is the problem?
How many, how often, where?
What happened?
Sophistication of Intelligence
Operational BI
Optimization
Predictive Modeling
Forecasting/extrapolation
Statistical analysis
Alerts
Query/drill down
Ad hoc reports
Standard reports
Co
mp
eti
tive A
dvan
tag
eWhy do companies need BI?
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: BI
Case study
source Heinz, 2014
168/295
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: BI
Case study
Data Analysis and
Data Mining
Business Modeling
Knowledge
Management
“Actionable” Information
Report
Warehouse
And
Document
Mart
Data
Warehouse
And
Data Marts
Business
Intelligence
PROJECT MANAGMENT
Decision Making
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: BI
Case study
Originally a term coined by the Gartner
Group in 1993, Business Intelligence (BI) is a
broad range of software and solutions aimed at
collection, consolidation, analysis and
providing access to information that allows
users across the business to make better
decisions.
The technology includes software for
database query and analysis, multidimensional
databases or OLAP tools, data warehousing and
data mining, and web enabled reporting
capabilities.
Applied across disciplines but especially in
Customer Relationship Management (CRM),
Supply Chain Management (SCM) Enterprise
Resource Planning
Provide better, faster and more accessible
reports
170/295
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: BI
Case study
Core Capabilities of BI
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: BI
Case study
• Improve Management Processes– planning, controlling, measuring and/or
changing resulting in increased revenues and reduced costs
• Improve Operational Processes– fraud detection, order processing, purchasing..
resulting in increased revenues and reduced costs
• Predict the Future
Benefits of Business Intelligence
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: BI
Case study
Stages in Business Intelligence
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: BI
Case study
DETERMINE approach and ACQUIREsoftware
THE NEED for Business Intelligence
FOCUS on user adoptionto ensure success!
EXECUTEbased upon selection
EXPAND to new areas within your organization
Implement successful
Business Intelligence Strategy…
EXPANSIONADOPTIONIMPLEMENTSELECTJUSTIFY
174/295
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: BI
Case study
»Customer/Product Profitability
»More competitive pricing
» Improved customer loyalty
» Integration of sales, delivery billing and AR
Justify BIBusiness Intelligence Benefit OPPORTUNITY
»Real time views across business processes
»Real time alerts to operational problems
»Trend analysis on Inventory & AR
»Real time information for direct customer interaction
»Executive dashboards
»Consistent use of KPI’s
»Real time access to data
»Fewer silos between apps
»Reduced data entry
»Reduced report development costs
»Reduced error Processing
»More efficient administrative and processes
Business
Intelligence
175/295
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: BI
Case study
BI SelectionOther
Considerations:
Real time vs. nightly refresh
Other data sources
Speed of implementation
Source system upgrades (JDE or PS)
Impact on Production system
Cross-module reporting
TIME
Direct Connect with Adapters
Enterprise BI with Pre-Built Data Warehouse
Enterprise BI No Data Warehouse
ERP
Value
Value
Value
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: BI
Case study
BI Implementation
YOU HAVE TO HAVE A PLAN
RewardsExecutionExpectationsRequirements
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: BI
Case study
YOU HAVE TO HAVE A PLAN
BI Implementation
RewardsExecutionExpectationsRequirements
A successful BI implementation involves:
• Gathering
requirements
• Training
• Planning
• Project
management
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: BI
Case study
YOU HAVE TO HAVE A PLAN
BI Implementation
RewardsExecutionExpectationsRequirements
that EVERY issue
will be solved
Don’t set the expectation
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: BI
Case study
YOU HAVE TO HAVE A PLAN
BI Implementation
• Define the path
• Identify the team
• Keep deliverables
set to 4–6 weeks
Define, identify, deliver
RewardsExecutionExpectationsRequirements
180/295
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: BI
Case study
YOU HAVE TO HAVE A PLAN
BI Implementation
The upside potential
in cost savings FAR
outweighs the
acquisition cost
Rewards:
Tie to results
RewardsExecutionExpectationsRequirements
181/295
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: BI
Case study
Executive Sponsor must help drive BI
BI should be accessible to ALL levels of the
Organization
Training
Establish a BI Center of Excellence
Ensure their BI solution is rolled out to their entire End User community
BI Adoption
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: BI
Case study
Data Warehousing
It is, of a complete file of an organization,
beyond the transactional and operational
information stored in a database
designed to facilitate efficient analysis
and dissemination of data.
182/295
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: BI
Case study
Why Data Warehouse?
Missing data: Decision support requires historical data
which operational DBs do not usually have
Data Consolidation: It requires consolidation
(aggregation, summary) of data from heterogeneous
sources
Data quality: different sources typically use
inconsistent data representations, codes and formats
that must be reconciled, etc.
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: BI
Case study
• Large Database
• Subject-Oriented
• Integrated
• Time-Variant
• Nonvolatile
• User-Friendly Interface
Data Warehouse
185/295
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: BI
Case study
Data Warehouse System
Oper-
ational
DB
Other
DB
External
DB
Data
Ware-
house
Reporting
Data Mining
KM
Expert
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: BI
Case study
Client Client
Warehouse
Source Source Source
Query & Analysis
Integration
Metadata
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: BI
Case study
Data MartLarge amounts of data in the Data
Warehouse sometimes subdivided into
smaller logical drives (data marts)
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: BI
Case study
• Mini data warehouses
• Hold subsets of data from the data warehouse
• Data focuses on a specific aspect of the company
Data Mart
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: BI
Case study
E
T
L
190/295
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: BI
Case study
E
T
L
192/295
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: BI
Case study
193/295
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: BI
Case study
Capture / Extract... obtains a subset of the data
sources to load the DW
Source Heinz 2013
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: BI
Case study
Cleanse... uses pattern recognition and AI
technologies to improve data quality
sourceHeinz 2013
195/295
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: BI
Case study
Transform… converts data from relational
databases to format DW
Source Heinz 2013
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: BI
Case study
Load/Index… Transforms data and creates indexes
source Heinz 2013
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: BI
Case study
The five styles of BI
1. Enterprise reporting
2. Cube analysis
3. Ad hoc querying and analysis
4. Statistical analysis and data mining
5. Report delivery and alerting
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: BI
Case study
Dimensional modelsIt is a logical design technique commonly used for data
warehouses, which seeks to present data in a standard
architecture and high performance allow access to end
users.
The model is based on star schemas, tables of Facts
and Dimensional Tables (e.g. cubes).
Multidimensionality: The ability to organize, present,
and analyze data by several dimensions, such as sales
by region, by product, by salesperson, and by time (four
dimensions)
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: BI
Case study
Star Schemas
• A star schema is a common organization for data at a warehouse. It consists of:
1. Fact table : a very large accumulation of facts such as sales.
w Often “insert-only.”
2. Dimension tables : smaller, generally static information about the entities involved in the facts.
199/295
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: BI
Case study
200/295
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: BI
Case study
customer custId name address city
53 joe 10 main sfo
81 fred 12 main sfo
111 sally 80 willow la
product prodId name price
p1 bolt 10
p2 nut 5
store storeId city
c1 nyc
c2 sfo
c3 la
sale oderId date custId prodId storeId qty amt
o100 1/7/97 53 p1 c1 1 12
o102 2/7/97 53 p2 c1 2 11
105 3/8/97 111 p1 c3 5 50
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: BI
Case study
sale prodId storeId amt
p1 c1 12
p2 c1 11
p1 c3 50
p2 c2 8
c1 c2 c3
p1 12 50
p2 11 8
Fact table view:Multi-dimensional cube:
dimensions = 2
CubeA subset of highly interrelated data that is organized to
allow users to combine any attributes in a cube (e.g.,
stores, products, customers, suppliers) with any
metrics in the cube (e.g., sales, profit, units, age) to
create various two-dimensional views, or slices, that
can be displayed on a computer screen
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: BI
Case study
sale prodId storeId date amt
p1 c1 1 12
p2 c1 1 11
p1 c3 1 50
p2 c2 1 8
p1 c1 2 44
p1 c2 2 4
day 2c1 c2 c3
p1 44 4
p2 c1 c2 c3
p1 12 50
p2 11 8
day 1
dimensions = 3
Multi-dimensional cube:Fact table view:
3-D Cube
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: BI
Case study
MEASURE
The word “measure” is
exactly what it means: a
number that we want to
analyze, what we want to
measure in our analysis
In this example:
410 is the number of
packages delivered
204/295
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: BI
Case study
205/295
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: BI
Case study
DIMENSION
The business attribute that
“describes” the measure.
In this example:
We find that the 410
measure has important
context, it represents the
intersection of:
- Route
- Source
- Time
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: BI
Case study
MEASURE CONTEXT
Specifically, the 410 packages
are related to:
Non-Ground / Air
Eastern Hemisphere / Australia
2nd Half / 4th Quarter on
November 27, 1999
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: BI
Case study
HIERARCHY
Literally, from the highest
level “grain” to the most
detailed grain. Think:
Year/Qtr/Month/Week/Day
In this example:
The Source dimension can
be drilled-down into
increasing levels of detail.
Each time we do this, the
cube recalculates all
measures at the
intersections.
1 2
208/295
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: BI
Case study
• OLTP: Online Transaction Processing (DBMSs)
• OLAP: Online Analytical Processing (Data Warehousing)
• RTAP: Real-Time Analytics Processing (Big Data Architecture & technology)
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: BI
Case study
OLAP• On-Line Analytical Processing
– Drill-Down
– Consolidation
– Slicing and Dicing
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: BI
Case study
211
Roll
up
Drill
down
Pivot
(rotate):
Slice and
dice
211/295
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: BI
Case study
Limitations of dimensionality
– The multidimensional database can take up significantly more computer storage room than a summarized relational database
– Multidimensional products cost significantly more than standard relational products
– Database loading consumes significant system resources and time, depending on data volume and the number of dimensions
– Interfaces and maintenance are more complex in multidimensional databases than in relational databases
212/295
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: BI
Case study
OLAP versus OLTP
– OLTP concentrates on processing repetitive transactions in large quantities and conducting simple manipulations
– OLAP involves examining many data items complex relationships
– OLAP may analyze relationships and look for patterns, trends, and exceptions
– OLAP is a direct decision support method
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: BI
Case study
• Operators: sum, count, max, min, median, ave
• “Having” clause
• Using dimension hierarchy
– average by region (within store)
– maximum by month (within date)
Aggregates
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: BI
Case study
Aggregates
sale prodId storeId date amt
p1 c1 1 12
p2 c1 1 11
p1 c3 1 50
p2 c2 1 8
p1 c1 2 44
p1 c2 2 4
• Add up amounts for day 1
• In SQL: SELECT sum(amt) FROM SALE
WHERE date = 1
81
215/295
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: BI
Case study
Aggregates
sale prodId storeId date amt
p1 c1 1 12
p2 c1 1 11
p1 c3 1 50
p2 c2 1 8
p1 c1 2 44
p1 c2 2 4
• Add up amounts by day
• In SQL: SELECT date, sum(amt) FROM SALE
GROUP BY date
ans date sum
1 81
2 48
216/295
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: BI
Case study
sale prodId storeId date amt
p1 c1 1 12
p2 c1 1 11
p1 c3 1 50
p2 c2 1 8
p1 c1 2 44
p1 c2 2 4
• Add up amounts by day, product
• In SQL: SELECT date, sum(amt) FROM SALE
GROUP BY date, prodId
sale prodId date amt
p1 1 62
p2 1 19
p1 2 48
drill-down
rollup
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: BI
Case study
Business intelligence and analytics (BI&A)
218/295
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: BI
Case study
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: BI
Case study
• Complete solutions
Pentaho, JasperReports, SpagoBI, BIRT
• ETL tools
Clover , Enhydra Octopus
• OLAP developments
Mondrian, JPivot
• Dashboards
JetSpeed, JBoss Portal
220/295
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: MP
Case study
• Discovery of useful, possibly unexpected, patterns in data
• Non-trivial extraction of implicit, previously unknown and potentially useful information from data
• Exploration & analysis, by automatic or semi-automatic means, of large quantities of data in order to discover meaningful patterns
What is Data Mining?
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: MP
Case study
data mining is a collection of techniques for
efficient automated discovery of
previously unknown, valid, novel, useful
and understandable patterns in large
databases.
The patterns must be actionable so they
may be used in an enterprise’s decision
making.
222/295
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: MP
Case study
Valid: The patterns hold in general.
Novel: We did not know the pattern beforehand.
Useful: We can devise actions from the patterns.
Understandable: We can interpret and comprehend
the patterns.
… discover valid, novel, potentially useful, and
ultimately understandable patterns in data.
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: MP
Case study
Examples
• amazon.com uses associations. Recommendations to customers are based on past purchases and what other customers are purchasing.
• A store in USA “Just for Feet” has about 200 stores, each carrying up to 6000 shoe styles, each style in several sizes. Data mining is used to find the right shoes to stock in the right store.
• More examples in case studies to be discussed later.
224/295
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: MP
Case study
Data Mining vs. KDD
• Knowledge Discovery in Databases (KDD):process of finding useful information and patterns in data.
• Data Mining: Use of algorithms to extract the information and patterns derived by the KDD process.
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: MP
Case study
Knowledge Discovery Process
– Data mining: the core of knowledge discovery process.
Data Cleaning
Data Integration
Databases
Preprocessed
Data
Task-relevant Data
Data transformations
Selection
Data Mining
Knowledge Interpretation
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: MP
Case study
•Similarity Measures
•Hierarchical Clustering
•IR Systems
•Imprecise Queries
•Textual Data
•Web Search Engines
•Bayes Theorem
•Regression Analysis
•EM Algorithm
•K-Means Clustering
•Time Series Analysis
•Neural Networks
•Decision Tree Algorithms
•Algorithm Design Techniques•Algorithm Analysis•Data Structures
•Relational Data Model•SQL•Association Rule Algorithms•Data Warehousing•Scalability Techniques
DATA MINING
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: MP
Case study
Data Mining Process
Successful data mining involves careful determining the aims and selecting appropriate data. The following steps should normally be followed:
1. Requirements analysis 2. Data selection and collection3. Cleaning and preparing data4. Data mining exploration and validation5. Implementing, evaluating and monitoring6. Results visualisation
228/295
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: MP
Case study
Requirements Analysis
The enterprise decision makers need to formulate goals that
the data mining process is expected to achieve. The business problem must be clearly defined. One cannot use data mining without a good idea of what kind of outcomes the enterprise is looking for.
If objectives have been clearly defined, it is easier to evaluate
the results of the project.
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: MP
Case study
Preprocessing• A data mining process would normally involve preprocessing
• Often data mining applications use data warehousing
• One approach is to pre-mine the data, warehouse it, then carry out data mining
• The process is usually iterative and can take years of effort for a large project
• Preprocessing is very important although often considered too mundane to be taken seriously
• Preprocessing may also be needed after the data warehouse phase
• Data reduction may be needed to transform very high dimensional data to a lower dimensional data
230/295
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: MP
Case study
Data Selection and Collection
Find the best source databases for the data that is required.
If the enterprise has implemented a data warehouse, then most of the data could be available there. Otherwise source OLTP systems need to be identified and required information extracted and stored in some temporary system.
In some cases, only a sample of the data available may be required.
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: MP
Case study
Cleaning and Preparing Data
This may not be an onerous task if a data warehouse containing the required data exists, since most of this must have already been done when data was loaded in the warehouse.
Otherwise this task can be very resource intensive, perhaps more than 50% of effort in a data mining project is spent on this step. Essentially a data store that integrates data from a number of databases may need to be created. When integrating data, one often encounters problems like identifying data, dealing with missing data, data conflicts and ambiguity. An ETL (extraction, transformation and loading) tool may be used to overcome these problems.
232/295
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: MP
Case study
Exploration and Validation
Assuming that the user has access to one or more data
mining tools, a data mining model may be constructed based on the enterprise’s needs. It may be possible to take a sample of data and apply a number of relevant techniques. For each technique the results should be evaluated and their significance interpreted.
This is likely to be an iterative process which should lead to selection of one or more techniques that are suitable for further exploration, testing and validation.
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: MP
Case study
Implementing, Evaluating and Monitoring
Once a model has been selected and validated, the model can be implemented for use by the decision makers. This may involve software development for generating reports or for results visualisation and explanation for managers. If more than one technique is available for the given data mining task, it is necessary to evaluate the results and choose the best. This may involve checking the accuracy and effectiveness of each technique. Regular monitoring of the performance of the techniques that have been implemented is required. Every enterprise evolves with time and so must the data mining system. Monitoring may from time to time to lead to the refinement of tools and techniques that have been implemented.
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: MP
Case study
CRISP Data Mining Model
235/295
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: MP
Case study
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: MP
Case study
• Classification [Predictive]
• Clustering [Descriptive]
• Association Rule Discovery [Descriptive]
• Sequential Pattern Discovery [Descriptive]
• Regression [Predictive]
• Deviation Detection [Predictive]
• Collaborative Filter [Predictive]
Data Mining Tasks
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: MP
Case study
• Association analysis
• Classification and prediction
• Cluster analysis
• Web data mining
• Search Engines
• Data warehouse and OLAP
• Others, for example, Sequential patterns and Time-series analysis,
238/295
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: MP
Case study
Association Analysis
• Association analysis involves discovery of relationships or correlations among a set of items.
• Discovering that personal loans are repaid with 80% confidence when the person owns his home.
• The classical example is the one where a store discovered that people buying nappies tend also to buy beer.
• The association rules are often written as X → Y meaning that whenever X appears Y also tends to appear. X and Y may be collection of attributes.
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: MP
Case study
Association Rules• Identify dependencies in the
data:– X makes Y likely
• Indicate significance of each dependency
• Bayesian methods
Uses:
• Targeted marketing
Technologies:
• AIS, SETM, Hugin, TETRAD II
“Find groups of items commonly purchased together”– People who purchase fish are
extraordinarily likely to purchase wine
– People who purchase Turkey are extraordinarily likely to purchase cranberries
Date/Time/Register Fish Turkey Cranberries Wine …
12/6 13:15 2 N Y Y Y …
12/6 13:16 3 Y N N Y …
240/295
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: MP
Case study
241
We prune the set of all possible association rules using two interestingness measures:
• Confidence of a rule:– X => Y has confidence c if P(Y|X) = c
• Support of a rule:– X => Y has support s if P(XY) = s
We can also define
• Support of a co-ocurrence XY:– XY has support s if P(XY) = s
Confidence and Support
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: MP
Case study
• Example rule:{Pen} => {Milk}Support: 75%Confidence: 75%
• Another example:{Ink} => {Pen}Support: 100%Confidence: 100%
TID CID Date Item Qty
111 201 5/1/99 Pen 2
111 201 5/1/99 Ink 1
111 201 5/1/99 Milk 3
111 201 5/1/99 Juice 6
112 105 6/3/99 Pen 1
112 105 6/3/99 Ink 1
112 105 6/3/99 Milk 1
113 106 6/5/99 Pen 1
113 106 6/5/99 Milk 1
114 201 7/1/99 Pen 2
114 201 7/1/99 Ink 2
114 201 7/1/99 Juice 4
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: MP
Case study
Sequential Associations
• Find event sequences that are unusually likely
• Requires “training” event list, known “interesting” events
• Must be robust in the face of additional “noise” events
Uses:
• Failure analysis and prediction
Technologies:
• Dynamic programming (Dynamic time warping)
• “Custom” algorithms
“Find common sequences of warnings/faults within 10 minute periods”– Warn 2 on Switch C preceded
by Fault 21 on Switch B
– Fault 17 on any switch preceded by Warn 2 on any switch
Time Switch Event
21:10 B Fault 21
21:11 A Warn 2
21:13 C Warn 2
21:20 A Fault 17
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: MP
Case study
Classification and Prediction
A set of training objects each with a number of attribute
values are given to the classifier. The classifier formulates
rules for each class in the training set so that the rules
may be used to classify new objects. Some techniques do
not require training data.
Classification may be used for predicting the class label of
data objects. Number of techniques including decision
tree and neural network.
244/295
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: MP
Case study
• Given a collection of records (training set )– Each record contains a set of attributes, one of the
attributes is the class.
• Find a model for class attribute as a function of the values of other attributes.
• Goal: previously unseen records should be assigned a class as accurately as possible.– A test set is used to determine the accuracy of the
model. Usually, the given data set is divided into training and test sets, with training set used to build the model and test set used to validate it.
245/295
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: MP
Case study
Example application: telemarketing
246/295
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: MP
Case study
• Decision trees are one approach to classification.
• Other approaches include:
– Linear Discriminant Analysis
– k-nearest neighbor methods
– Logistic regression
– Neural networks
– Support Vector Machines
Classification
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: MP
Case study
• Training database:– Two predictor attributes:
Age and Car-type (Sport, Minivan and Truck)
– Age is ordered, Car-type iscategorical attribute
– Class label indicateswhether person boughtproduct
– Dependent attribute is categorical
Age Car Class
20 M Yes
30 M Yes
25 T No
30 S Yes
40 S Yes
20 T No
30 M Yes
25 M Yes
40 M Yes
20 S No
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: MP
Case study
Types of Variables
• Numerical: Domain is ordered and can be represented on the real line (e.g., age, income)
• Nominal or categorical: Domain is a finite set without any natural ordering (e.g., occupation, marital status, race)
• Ordinal: Domain is ordered, but absolute differences between values is unknown (e.g., preference scale, severity of an injury)
250/295
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: MP
Case study
Decision Trees• A decision tree T encodes d (a classifier or regression function) in form of a
tree.
• A node t in T without children is called a leaf node. Otherwise t is called an internal node.
Minivan
Age
Car Type
YES NO
YES
<30 >=30
Sports, Truck
0 30 60 Age
YES
YES
NO
Minivan
Sports,Truck
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: MP
Case study
Cluster Analysis
Similar to classification in that the aim is to build clusters
such that each of them is similar within itself but is
dissimilar to others. Clustering does not rely on class-
labeled data objects.
Based on the principle of maximizing the intracluster
similarity and minimizing the intercluster similarity.
252/295
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: MP
Case study
• Output: (k) groups of records called clusters, such that the records within a group are more similar to records in other groups– Representative points for each cluster
– Labeling of each record with each cluster number
– Other description of each cluster
• This is unsupervised learning: No record labels are given to learn from
• Usage:– Exploratory data mining
– Preprocessing step (e.g., outlier detection)
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: MP
Case study
age
income
education
253/295
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: MP
Case study
• Example input database: Two numerical variables
• How many groups are here?
Age Salary
20 40
25 50
24 45
23 50
40 80
45 85
42 87
35 82
70 30
Customer Demographics
0
10
20
30
40
50
60
70
80
90
100
0 20 40 60 80
Age
Sa
lary
in
$1
0K
Customers
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: MP
Case study
• Requirements: Need to define “similarity” between records
• Important: Use the “right” similarity (distance) function
– Scale or normalize all attributes. Example: seconds, hours, days
– Assign different weights to reflect importance of the attribute
– Choose appropriate measure (e.g., L1, L2)
255/295
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: MP
Case study
Similarity Measures
• Determine similarity between two objects.
• Similarity characteristics:
• Alternatively, distance measure how unlike or dissimilar objects are.
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: MP
Case study
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: MP
Case study
Distance Measures
• Measure dissimilarity between objects
258/295
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: MP
Case study
Approaches• Centroid-based: Assume we have k
clusters, guess at the centers, assign points to nearest center, e.g., K-means; over time, centroids shift
• Hierarchical: Assume there is one cluster per point, and repeatedly merge nearby clusters using some distance threshold
Scalability: Do this with fewest number of passes over data, ideally, sequentially
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: MP
Case study
K-means Clustering Algorithm
• Choose k initial means
• Assign each point to the cluster with the closest mean
• Compute new mean for each cluster
• Iterate until the k means stabilize
260/295
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: MP
Case study
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: MP
Case study
Other Types of Mining
• Text mining: application of data mining to textual documents
– cluster Web pages to find related pages
– cluster pages a user has visited to organize their visit history
– classify Web pages automatically into a Web directory
• Graph Mining:
– Deal with graph data
262/295
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: MP
Case study
Data Mining / Knowledge Discovery
Structured Data Multimedia Free Text Hypertext
HomeLoan (
Loanee: Frank Rizzo
Lender: MWF
Agency: Lake View
Amount: $200,000
Term: 15 years
)
Frank Rizzo bought
his home from Lake
View Real Estate in
1992.
He paid $200,000
under a15-year loan
from MW Financial.
<a href>Frank Rizzo
</a> Bought
<a hef>this home</a>
from <a href>Lake
View Real Estate</a>
In <b>1992</b>.
<p>...Loans($200K,[map],...)
Mining Text Data: An Introduction
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: MP
Case study
Text mining is about knowledge discovery from large collections of unstructured text.
• It’s not the same as data mining, which is more about discovering patterns in structured data stored in databases.
• Similar techniques are sometimes used, however text mining has many additional constraints caused by the unstructured nature of the text and the use of natural language.
• Information extraction (IE) is a major component of text mining.
• IE is about extracting facts and structured information from unstructured text.
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: MP
Case study
Reasons for Text Mining
0
10
20
30
40
50
60
70
80
90
Percentage
Collections ofText
StructuredData
265/295
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: MP
Case study
Who is in the text analysis arena?
Data Analysis
Computational Linguistics
Search & DBKnowledge Rep. & Reasoning / Tagging
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: MP
Case study
Four score and seven
years ago our fathers brought
forth on this continent, a new
nation, conceived in Liberty,
and dedicated to the
proposition that all men are
created equal.
Now we are engaged in a
great civil war, testing
whether that nation, or …
nation – 5
civil - 1
war – 2
men – 2
died – 4
people – 5
Liberty – 1
God – 1
…
Feature
Extraction
Loses all order-specific information!
Severely limits context!
Documents Token Sets
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: MP
Case study
• Knowledge Discovery: Extraction of codified information (features)
• Information Distillation: Analysis of the feature distribution
Two Mining Phases
268/295
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: MP
Case study
Text mining stages
• Document selection and filtering (IR techniques)
• Document pre-processing (NLP techniques)
• Document processing (NLP / ML / statistical techniques)
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: MP
Case study
Basic Measures for Text Retrieval
• Precision: the percentage of retrieved documents that are in fact relevant to the query (i.e., “correct” responses)
• Recall: the percentage of documents that are relevant to the query and were, in fact, retrieved
|}{|
|}{}{|
Retrieved
RetrievedRelevantprecision
Relevant Relevant &
Retrieved Retrieved
All Documents
|}{|
|}{}{|
Relevant
RetrievedRelevantRecall
270/295
271 271271271
Semantic mining
Minería
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: MP
Case study
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: MP
Case study
Semantic Data Mining
Semantic Web Mining
Ontology Mining
Text Mining
Semantic mining
272/295
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: MP
Case study
Mining
Given: table transaction data, relational
databases, text documents, Web pages, ...
one or more domain ontologies Find: a classification model, a
set of semantic patterns
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: MP
Case study
• Integration,
• Semantic Web
• Web Mining
Combination of Semantic Web and Web Mining
• Improve Web Mining using Semantic Web
• Improve Semantic Web using Web Mining
The Semantic Web is expressed in formats such as OWL, RDF,
XML,
Are the resources that will be mined to extract knowledge from
the Semantic Web
Semantic Web Mining
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: MP
Case study
Web Mining
• Discovers Local and Global Structure• Structured Data• Goals
• Improvement of site design• Generate dynamic recommendations• Improve marketing
• Main Areas• Web Content Mining• Web Structure Mining• Web Usage Mining
275/295
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: MP
Case study
• Type of Text Mining
• Uses Tags
• Detect co-occurrences
• Event detection
• Reconstruction of page content
• Relations in a domain
Content Mining
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: MP
Case study
Web Usage Mining
• Request by Visitors
• Additional Structure
• Unintended Relationships
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: MP
Case study
Web Structure Mining
• WebPages as a whole– Uses hyperlinks
– Identify relevance
• Single Pages– Five types of Web Pages
• Head Pages
• Navigation Pages
• Content Pages
• Look up Pages
• Personal Pages
278/295
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: MP
Case study
Ontology Mining
• Ontology Learning
– Learn structures of Ontologies
• Instance Learning
– Populates the Ontologies
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: MP
Case study
Extraction Rules: This technique aims to extract rules from an ontology
and / or group of documents, whether to update the existing or for
creating a new ontology.
Ontology Integration: Consists seek shared vocabulary among several
ontologies.
Ontology Linked: This technique is different relationships between
entities ontologies in order to extract information, view maps, make
changes, build relationships or rules, etc.
Ontology Fusion : information from several ontologies is mixed in order
to standardize knowledge.
Ontology alignment: Identifies similar concepts between ontologies.
Ontology Mining
280/295
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: MP
Case study
Mining to Learn Ontologies
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: MP
Case study
Filling the Ontologies
282/295
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: MP
Case study
Use Ontology to Mine
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: MP
Case study
Open data
• In general, the availability of open data is considered crucial to improving the functioning of cities.
• The convergence of smart cities and open data initiatives is fast unfolding across a number of cities.
• Open data is the way to master information and turn challenges into opportunities. – Allow for better decisions.
– Stimulate innovation.
– Foster greater collaboration.
– Promote predictive analytics.
– Become more effective, efficient, and equitable
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: MP
Case study
285/295
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: MP
Case study
http://www.meltinfo.com/ppt/ibm-big-data
286/295
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: MP
Case study
Smart Data Analytics for Smart Cities
Clustering large masses of urban data in a
compact format allows analysts to present
information of the entire dataset (without
omissions or deletions) but reorganises the
data and makes it manageable.
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: MP
Case study
How can AD potentially contribute to smart cities?
• DA can help reduce emissions and bring down pollution.
• Parking problems can be better managed
• The environment will cooler and greener with less energy being consumed.
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: MP
Case study
5 ways DA can build better governments
1. Raw data needs to become useful
knowledge
2. Governments must shift from a culture of
secrecy to openness
3. Websites should be user-friendly
4. Distribute stuff that computers can use
(a.k.a machine readable data)
5. Governments will need to open up, in
order to be seen as legitimate
288/295
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: MP
Case study
Five Ways the Government Wants to Use DA
Do Not Pay Portal
Continuous Evaluation of Insider Threats
Helping Students Learn
Doing Away with Fee-For-Service in Healthcare
Tracking Illegal Activity on the Deep Web
290/295
Smart cities and ICT
e-Government
Introduction to Data Analytics
Neighbor concepts: MP
Case study
eGovernance Big Data Analytics Platform
References• D. Loshin, “Business Intelligence: The Savvy Manager's Guide”, The Morgan
Kaufmann Series on Business Intelligence, 2010
• S. Kudyba , Richard Hoptroff , “Data Mining and Business Intelligence: A Guide to Productivity”, IGI Publishing, 2011
• J. Hurwitz, Alan Nugent, Fern Halper, Marcia Kaufman, “Big Data For Dummies,”, Wiley, 2013
• I. Witten, E. Frank & M. Hall "Data Mining. Practical Machine Learning Tools and Techniques with Java Implementations. Third Edition". Morgan Kaufmann Publishers. 2011.
• M. Milton, “Head First Data Analysis”, O'Reilly Media, 2009
• K. Ahmed, M. Bouhorma, M. Ahmed, “Age of Big Data and Smart Cities: Privacy Trade-Off”, International Journal of Engineering Trends and Technology, 16(6), pp. pp298-304, 2014.
• J. Aguilar, M. Petrizzo, O. Terán, “Desarrollo de las Tecnologías de Información y Comunicación bajo un enfoque de Desarrollo Endógeno: hacia un conocimiento libre y socialmente pertinente”, CAYAPA Revista Venezolana de Economía Social, 9(18), pp. 52-74, 2009.
• J. Aguilar, ‘Ciudades ubicuas y Ciudades Emergentes: Las nuevas CiudadesInteligentes”, to be published Revista de la Academia de Mérida, 2016.
References• E. Al Nuaimi, H. Al Neyadi, N. Mohamed, J. Al-Jaroodi, “Applications of big data to
smart cities” Journal of Internet Services and Applications, 25(6), 2015.
• T. Bakıcı, E. Almirall, J. Wareham. "A smart city initiative: the case of Barcelona´. Journal of the Knowledge Economy, 4(2), pp.135–148, 2013.
• B. Clark, J. Brudney, S. Jang, “Coproduction of government services and the new information technology: investigating the distributional biases”. Public Adm. Rev. 73, pp. 687–701. 2013.
• Z. Khan, A. Anjum, S. Liaquat. “Cloud Based Big Data Analytics for Smart Future Cities”. Proc. IEEE/ACM 6th International Conference on Utility and Cloud Computing (UCC '13). pp. 381-386. 2013.
• D. Lu, Y. Tian, V. Liu, Y. Zhang, “The Performance of the Smart Cities in China—A Comparative Study by Means of Self-Organizing Maps and Social Networks Analysis”, Sustainability, 7, pp. 7604-7621, 2015.
• S. Martin, Z. Holger, G. Vangelis, A. Navot, "Towards a Big Data Analytics Framework for IoT and Smart City Applications", In Modeling and Processing for Next-Generation Big-Data Technologies: With Applications and Case Studies, Springer, pp. 257—282, 2015.
• W. Zhang, Q. Chen, “From E-government to C-government via cloud computing”. Proc. 2010 International Conference on E-Business and E-Government, pp. 679–682. 2010.
294 294
Next conferences
294294
WITFOR 2016
World Information Technology
Forum
September 12th - 14th, 2016
Metropolis: an Emerging Serious Game in a Smart City
Ciudades Ubicuas y Ciudades
Emergentes:
Las nuevas Ciudades Inteligentes
27 de Junio 2016
GRACIAS
295295
MERCI
BEAUCOUP
Thanks
GRACIAS
www.ing.ula.ve/~aguilar
Wakupeman
Merci
Thanks
Obrigado
Danke
“Si buscas resultados distintos,
entonces no hagas siempre lo mismo”
A. Einstein