Big data for Vet and Active labour market policy decision makers
Big Data for Labour Market Information · 2019. 11. 18. · Big Data for Labour Market Information...
Transcript of Big Data for Labour Market Information · 2019. 11. 18. · Big Data for Labour Market Information...
Big Data for Labour Market Information
Session 7
Architecture: solutions for real-time LMI
(based on KDD)
Alessandro Vaccarino – Fabio Mercorio
Big Data for Labour Market Information – focus on data from online job vacancies – training workshop
Milan, 21-22 November 2019
1. Goal & context2. Challenges
1. The functional architecture
2. Why use micro-services
3. The Team and the pipeline design
4. How handle infrastructure costs
2
Topics
5
Challenges
• Handle a huge amount of near real time data
• Data coming from web Need to detect and reduce noise
• Multi language environment
• Need to relate to classification standards
• Find a way to summarize and present a wide and complex
scenario
1. Goal & context
2. Challenges
1. Stakeholders2. The functional architecture
3. Why use micro-services
4. The Team and the pipeline design
7
Topics
8
Stakeholders
Project
Leader
Key
Users
Domain
Experts
End
Users
• Lead the project with the steering committee
• Define the scope of the project
• Define key organizations
• Maintain relations with stakeholders
• Provide advice
9
Project leader
• Define requirements
• Monitor quality of the project
• Provide input to the development of the project
• Manage the source landscaping
• Validate overall data flow and methodology
10
Key Users
• International Country Experts
• Provide the knowledge and expertise
• Execute the landscaping
• Understand the language/terms of their
context
• Evaluate the accuracy of the results
• Test the product
• Provide feedback
11
Domain Experts
• Decision Makers and Business Users
o (Visual) Explore dataset, analysis and aggregate data
o Define new analysis processes
o Produce Data storytelling
o Make decisions by exploring data
• Data Scientists
o Apply new machine learning models and AI techniques
o Extract new insights from the data
o Apply advanced data modelling to the dataset
• Data Analysts
o Interprets data and turns it into information
o Identifying patterns and trends
o Extract and analyze aggregate data
o Publish and share their analysis
12
End Users
1. Goal & context
2. Challenges
1. Stakeholders
2. The functional architecture3. Why use micro-services
4. The Team and the pipeline design
13
Topics
Overall Data Flow
Data
Ingestion
Pre-Processing Information
Extraction
ETL Presentation
Area
Ingestion Processing Front end
Conceptual architectureData ingestion Data processing Data analysis
Visual
interface
Data lab
Data
Supply
Mo
nit
or
an
d s
ched
ule
r
Cra
wle
r
Data
qu
ality
Data
pro
cess
ing a
nd
cla
ssif
icati
on
ET
L
Dir
ect
acc
ess
Scr
ap
er
Backup
Labour Market
Analysts
Interactive Data
Analytics
Web Scraper
Web Crawler
Direct
Access
Pre-Processing
Information
Extraction and
Classification
Data Management
and Presentation
Employment
Agencies and
Public Employment
Services
Job Portals
Newspaper,
Companies
University Job
Placement
Classified Ads Sites
Job Vacancies
Classified on ISCO
Recognised NUTs
Other dimension
(contract, sector,
education, …)
Document
store
DW
Logical view
DataIngestion
DataProcessing
Modelling, Machine
Learning, AI
Data visualization
Data storage & archiving
System and process monitoring
Automation & management
Input Output
UnstructuredData
Dashboard andinteractive report
Machine to machine
Web App
Physical view
Modelling, Machine
Learning, AI
DataIngestion
DataProcessing
Data visualization
Data storage & archiving
System and process monitoring
Automation & management
Input Output
UnstructuredData
Dashboard andinteractive report
Machine to machine
Web App
Technology view
- Micro-services
- Componentization
- Component specialization
- Small applications
- Portability
- Reuse
- Maintenance
- Scale Out
- Performance
Key design projects
20
Key components
• Data ingestion: collect raw data from OJV in both
structured and unstructured (raw text) formats
• Data processing: classify data through machine
learning techniques
• Data analysis: extract information from data and
make it available through visualization
• Backup: store data in a safe environment to
allow warm and cold restore
Infrastructure Challenges
• parallel ingestion
• high performance
at a glance
• High memory
• storage
•
• Scalable
Big Data Flow
Infrastructure
challenges
Components
by definition
Quality
requirementsMicro-services
design
01010101000101010010101010010101
01010101000100101010100101
01010101000100101010100101
01010101000101010010101010010101
0101010100010010101010010101010101000100101010100101
010101010001010101010010010101010001010101010010
1. Goal & context
2. Challenges
1. Stakeholders
2. The functional architecture
3. Why use micro-services4. The Team and the pipeline design
23
Topics
Microservices
25
Context
Manutability Monitoring Scability
Updates Onboarding
26
Pre-Processing Microservices
Language
Detector
Spam
Filter
Deduplication
component
N-gram
component
Tokenizer
StemmerNo-Vacancy
Filter
Text Cleaner Merge Vacancy
TF-IDF
TransformerDocument2Vec
StopWords
Removers
27
Classification Microservices
Skills
Classifier
Occupation
Classifier
Education
Requirements
Classifier
Industry
Classifier
WorkingHours
Detector
Contract
Detector
Locations
Detector
Dates
Extractor
Salary
Extractor
Experience
Extractor
1. Services on request
2. Network access
3. Resource pooling
1. Governance
4. Quick elasticity
5. Measurement of services
1. Data Quality
2. Performance
6. Portability (on-premises and different cloud services)
7. Polyglot
1. Computer programming languages
2. Technologies
28
Technology requirements
1. Goal & context
2. Challenges
1. Stakeholders
2. The functional architecture
3. Why use micro-services
4. The Team and the pipeline design
29
Topics
1. Cloud Architects
2. Software Architects and Developers
3. Big Data Engineers
4. Data Scientists
5. Domain & Ontology Experts
30
The team
31
Organizations
Cloud
InfrastructureService
Team
Components Micro-service
Service
ExecutionDefine
DesignDeploy
Develop
Organize around business services
Language Detector
Occupation Classifier
Salary Extractor Skills Classifier