Post on 24-May-2020
Asian Food Agribusiness Conference
Department of Mathematics and Computer Science,Faculty of Science, Chulalongkorn University
11 June 2019
Presented by Asst. Prof. Krung Sinapiromsaran
Big Data Analytic Tools for Agrifood Industry on
Open Data Sources
2
Agenda
● 1st Industrial Revolution → 4th Industrial Revolution
● Big Data and Advanced Analytics
● Big Data Sources & Characteristics
● Big Data Technologies & Solutions
● Data Science Concepts and Tools
3
Industrial Industrial RevolutionRevolution
4
Industrial Revolution
4th Industrial RevolutionIoT, Cloud, Computing, Cyber-Physical System
3rd Industrial RevolutionComputer, IT and automation and Internet
2nd Industrial RevolutionElectricity and assembly line for mass production
1st Industrial RevolutionStream engine, railroads & mechanical production
1750 – 1840 1840 – 1910 1910 – 2000 2000 – now
5
Industrial Revolution (Agrifood)
4th Industrial RevolutionIoT, Cloud, Computing, Cyber-Physical System
3rd Industrial RevolutionComputer, IT and automation and Internet
2nd Industrial RevolutionElectricity and assembly line for mass production
1st Industrial RevolutionStream engine, railroads & mechanical production “→ Mechanical drills for planting seeds”
→ Fodder crop & stall-fed livestock
→ Switch from natural fertiliser to commercially produced chemical fertilisers.
→ Raising animals confined in crowded indoor facilities
→ Twinrotor system → Genetically modify crops → Satellite use in farming
→→ Deployment of IoT → Enhanced analytics → Use of in-field sensors,
drones “→ precision agriculture”
1750 – 1840 1840 – 1910 1910 – 2000 2000 – now
6
Technologies using in Industry 4.0
Industry 4.0
Autonomous Robots
Simulation
Horizontal and vertical
system integration
Industrial Internet of
ThingsCyber
Security
Additive Mfg
Augmented reality
Big data analytics
7
TechnologiesSource: https://www.gartner.com/smarterwithgartner/5-trends-emerge-in-gartner-hype-cycle-for-emerging-technologies-2018/
8
The Emerging Technologies Hype CycleSource: http://www.mauriziogalluzzo.it/wp-content/uploads/2014/08/hype_TheEconomist.pdf
9
The new era for Agrifood
✔ Shorten value chains: Various agrifood companies attempt to shorten the value chain step such as direct-to-consumer delivery, meal kits✔ Utilize technology to improve crop efficiency:use of drones, autonomous robots✔ Bio-chemincals and bio-energy:Reduce the ecological footprint, developing biologically-produced agrochemicals, bio-materials, and bio-energy✔ Food technology and artificial meat:Developing “sustainable protein”✔ Contained and vertical farming:Indoor farming, raising animals in crowded indoor facility
10
Big Data and Big Data and Advanced Advanced AnalyticsAnalytics
11
Big Data and Advanced Analytics
Opportunity Industry Challenge ApplicationInnovation High need for innovation ● Building a “Data innovation engine”
● Holistic optimization
Optimize farming operations
Increase quality and quantity of food over 20-30 years
● “Precision agriculture” based on measuring and optimizing granular field operations
Increase supply chain transparency
Little foresight into crop volumes
● Increasing forecasting accuracy with real-time data collection and analysis
● Lowering response times, risks
Improving Downstream Ops
Produce in high-volume but low operational efficiency
● “Operations big-data toolbox” – production optimization
Infrastructure challenge
Poor infrastructure in emerging markets
● Advanced analytics to identify key bottlenecks in infrastructure
● Infrastructure network optimization
Anticipate waste Enormous amounts of residential (food) waste
● Granular data collection of waste streams in households
Source:McKinsey&Company “How big data will revolutionize the global food chain
12
Characteristics of Big Data
Source: http://i0.wp.com/blog.agro-know.com/wp-content/uploads/2015/06/3-Vs-of-big-data.png?resize=700%2C710
13
Characteristics of Big Data
Source: https://www.pinterest.com/pin/68117013088528571/
14
Data → Information → Knowledge
Source: https://www.pinterest.com/pin/68117013088528571/
Dat a
Info rm
atio n
Kn
owle d
g e✔ Data
✔ Kept in DBMS✔ Operational data store✔ Access via SQL
✔ Information✔ Statistics✔ Data warehouse, Data cube✔ Access via OLAP
✔ Knowledge✔ Interesting Patterns✔ In many forms:Regression,
Rule, Tree, Network✔ Emphasize on visualization
15
Rational, Theories, Assumptions
✔ Knowledge hidden in a vast amount of data✔ Need new science to perform automatic extraction → data science. ✔ Required three components:
✔ Math and Statistics✔ Computer Science/
Information Technology✔ Domains/Business
Knowledge
Data Science
Math and Statistics
Computer Science/IT
Business/Domains
Knowledge
Machine Learning Tranditio
nal
Researc
h
Software Develop-
ment
16
Current Issues with effective decision
1) 57%: Varieties of data "silos"
2) 44%: Processing time to analyze "large" datasets
3) 40%: Need more skilled analytic persons
4) 34%: "Big data" concept is not in the vision of managers
5) 33%: Unstructured content is difficult to interpret
6) 24%: High cost of storing and analyzing large datasets
7) 17%: Too complex to collect and stored "Big data"
Source: Capgemini and the Economist Intelligence Unit. The Deciding Factor: Big Data and Decision-making, 2012.
17
Big Data Value Chain
Big Data Assets:➢ Internal DBMS➢Data warehouse➢Sensor data➢Social Network data➢Satellite data➢Open Data Sources
Big Data Capability:➢ IT management➢Data Cube➢Dashboard➢Hadoop clusters➢MapReduce on Spark➢Real-time processing
Big Data Analytics:➢Data preprocessing on Hadoop
via Hive/Pig➢Statistical analysis on Spark
(MapReduce)➢Data Mining➢Machine Learning➢Deep Learning
Big Data Value:➢Learn from experiences via BI =
Business Intelligence➢Descriptive analytics:scorecard➢Predictive analytics:predict the
future➢Prescriptive analytics:Make
optimal decision
18
Big DataBig DataTechnologies & Technologies &
SolutionsSolutions
19
Hadoop Ecosystem
Source: https://opensource.com/life/14/8/intro-apache-hadoop-big-data
1
2
3
4
5
9
6 78
20
Hadoop Ecosystem by timeline
Source: https:/www.cloudera.com
21
Hadoop Ecosystem by tasks
Source:https://savvycomsoftware.com/what-you-need-to-know-about-hadoop-and-its-ecosystem/
22
Big Data Landscape 2012
Source: http://www.forbes.com/sites/davefeinleib/2012/06/19/the-big-data-landscape/
23
Big Data Landscape 2018
24
Data ScienceData Science
25
Data Science Diagram
Data Science
Math and Statistics
Computer Science/IT
Business/Domains
Knowledge
Machine Learning Tranditio
nal
Researc
h
Software Develop-
ment
26
Data Life-Cycle
Source: Microsoft and Celent, How Big is Big Data: Big Data Usage and Attitudes among North American Financial Services Firm, March 2013.
1) Business Understanding: Ask relevant questions, define objectives
2) Data Mining: Gather and scrape data3) Data Cleaning: Fix inconsistencies,
anomaly, missing values4) Data Exploration: Form hypotheses and
visualizing data5) Feature Engineering: Extract important
features and construct more meaningful ones
6) Predictive Modeling: Train machine learning models, evaluate their performances
7) Data Visualization: Communicate findings using plots and interactive visualizations
27
Important of Understanding ML
Source: https://www.kdnuggets.com/2018/11/machine-learning-model-understandable-poll-results.html
28
Source: https://www.kdnuggets.com/2018/04/poll-analytics-data-science-ml-applied-2017.html
29
CurrentCurrentData ScienceData Science
SoftwareSoftware
30
Hadoop 1.0 and Hadoop 2.0
Source:https://opensource.com/life/14/8/intro-apache-hadoop-big-data
31
Visual Programming
Source:https://hackernoon.com/top-3-most-popular-programming-languages-in-2018-and-their-annual-salaries-51b4a7354e06 https://s4scoding.com/mit-app-inventor-2-introduction-to-android-app-development/visual-programming-language-blocks/
● Computer Science Programming Language: JavaScript (2018 #1), Java,
Python, C#, C++, C, Ruby● Visual Programming Language:Scratch,
mBlock, …● Visual Analytic Language:SAS EM, IBM
SPSS Modeler, Rapidminer, Orange
32
How to apply Data Science Software● Interactive Data Visualization● Visual Programming● Python modules and add-ons● Open Source and Free
Source from Orange Software:https://orange.biolab.si/
33
Data Science Software● Data Widget Categories:
● Information extraction● Data management from input → output● Transformation
● Visualize Widget Categories:● Univariate visualization● Bivariate visualization● Multivariate visualization
● Model Widget Categories:● Regression & Classification
● Evaluate Widget Categories:● Test & Score including ROC
● Unsupervised Widget Categories:● Clustering analysis● Principal Component Analysis● And more.
Source from Orange Software:https://orange.biolab.si/
34
Analysing the Tweets dataset
Source from Orange Software:https://orange.biolab.si/
35
Questions
Comments