Big Data, Graphical Modeling, and Causal Inference in ...
Transcript of Big Data, Graphical Modeling, and Causal Inference in ...
Guilherme J. M. Rosa Department of Animal Sciences
Big Data, Graphical Modeling, and Causal Inference in
Livestock Production
• Currently 7.2 billion people in the world.
• Expected increase to about 9 billion by 2050, mostly in developing countries.
• World food production will need to increase by 60 percent and food production in the developing world will need to double.
• Productivity, profitability, product quality, environmental footprint (land, water and energy use, greenhouse gas emissions, etc.)
Feeding the World
Genotype x Environment
Example
Ribeiro S, Eler JP, Pedrosa VB, Rosa GJM, Ferraz JBS and Balieiro JCC. Genotype x environment interaction for weaning weight in Nellore cattle using reaction norm analysis. Livestock Science 176: 40–46, 2015.
Genotype x Environment
• Nucleus herd vs. commercial settings environment
• Environmental diversity within countries/macro-regions
• Globalization of breeding • Increasing importance of South
America, Africa and Southeastern Asia • Global poverty and ecological footprint
Precision Livestock Production
• Animal-level data - Production indexes, well-being monitoring - Pattern recognition (e.g. early detection of health issues) - Predictive analytics (e.g. prediction of animal future
performance)
• Farm-level data - Efficiency of management protocols and product
administration - Genetics-environment interaction - Informed decision-making
Different Sources of Data and Information
Sensors: Prediction of behavior in lactating dairy cows
João Dórea
Animal-level Data High-Throughput, Real-Time Phenotyping
- 3 behaviors/activities: resting, eating and ruminating - Hidden Markov model - The data were analyzed for each axis: X, Y, and Z - When the probability of a state at a time t was greater
than 50%, the state was classified into one of the 3 possible states
- The total time of each state (predicted values) was compared to the observed values
Accelerometer to predict feeding behavior
X-axis: 3-state probabilities
Feeding Behavior
Ea#ng#me
Rumina#ng#me
Res#ng#me
X-axispred,min 45 47 42obs,min 45 51 40Accuracy,% 100 92 95Y-axispred,min 35 76 25obs,min 40 45 51Accuracy,% 89 32 48Z-axispred,min 54 72 10obs,min 45 51 40Accuracy,% 80 58 24
Computer Vision: Tilapia filet quality
• Data from more than 3000 fish • Dorsal and lateral pictures • Carcass weight and yield
Image Processing
• Image recognition and segmentation
Original image
Output segmentation
Pig weight and leg/back score
• Data 700 pigs • Weight across different ages • Leg and back scores
Arthur Fernandes
Prediction: Linear model
The use of artificial neural network to estimate feed intake in lactating cows through mid-infrared spectra of milk samples
Milk Mid-infrared Spectra
dry matter intake mid-infrared (MIR)
spectroscopy
milk sample
João Dórea
Objective: use of Fourier transform MIR of milk samples to estimate dry matter intake in lactating Holsteins cows
- MIR recorded for 599 milk samples from 189 lactating cows Individual DMI recorded with electronic feeding gates
- One-hidden-layer ANN model compared with partial least squares (PLS) regression
- Cross-validation method used to assess the predictive ability (PMSE)
Results: ANN PMSE decreased as the number of neurons increased until 15 (PMSE = 4.13, 3.61, 3.30 and 3.38 kg2/d2 , for 5, 10, 15 and 20 neurons); the PLS model (7 factors) resulted in higher PMSE = 4.41 kg2/d2
Gianola D, Okut H, Weigel KA and Rosa GJM. Predicting complex quantitative traits with Bayesian neural networks: a case study with Jersey cows and wheat. BMC Genetics 12:87, 2011.
Pérez-Rodríguez P, Gianola D, Weigel KA, Rosa GJM and Crossa J. An R package for fitting Bayesian regularized neural networks with applications in animal breeding. Journal of Animal Science 91: 3522-3531, 2013.
High-Throughput Genotyping
Predicting complex quantitative traits with Bayesian neural networks
Mixed Models
• Used extensively in animal breeding, with multiple traits and huge numbers of records and animals in the pedigree
• However, environmental effects coalesced into contemporary groups
• As such, individual effects of specific factors are not investigated, no issues with collinearity, no insight into indirect, direct and total effects, etc.
eZuXβy ++=G =G0 ⊗AΣ =R⊗ I⎧⎨⎩
Farm-level Data Historic data across farms
• Confinamento Monte Alegre (CMA): http://www.cma.agr.br
• Feedlot capacity 16,000 heads • Annual output around 50,000 heads
• TGC software: http://www.gestaoagropecuaria.com.br/produtos/tgc/
• 80 variables (input, output, economics, etc.)
Decision Tree
• Decision tree: decision support tool that uses a tree-like graph or model of decisions and their possible consequences
• Decision trees are commonly used in operations research, such as decision analysis, to help identify a strategy most likely to reach a goal
• Popular tool in machine learning
Decision Tree: U$/head – Income (feed cost)
Decision Tree: U$/head – Net Income
Company Owner#CPF/CNPJ
FarmRegion
ID
outcomeiAge iFat Month Year
gSlaugther Weight Quality Amounttypified
farmtraits
Farm Owner
Inscestadual
Size
City#CPF/CNPJ
State
La#t/Long
StateIDCPF/CNPJ
ID
OwnerOwnerCode
#CPF/CNPJ
FarmStateID
CityState
outcome
TechnologySalesteam
TechnicianYearpNutri#on
tNutri#on
wNutri#onSeason
Beef Production and Quality
Vera Cardoso
Amountofcarcassestypifiedperregion
Total:23,056,869carcasses(≅25%ofBrazilianproduc#on2014-2016)
NumberofcarcassesbyAge
0 2 4 6 8Years
feb apr jun aug oct dec
Carcassesslaughtered
bymon
th
Years2014-2016
Carcassesbyyear2014:2,229,7022015:1,924,1492016:1,808,955
NumberofcarcassesbyQuality
Desirable Acceptable Undesirable
Carcassesslaughtered
byweight
Weightin@
NumberofcarcassesbyiFat
1 2 3 4 5Fatindex
Preliminary results
• Data from two sources: JBS S.A. (81,053 farms) and DSM Produtos Nutricionais (22,223 farms). After merging, the final dataset comprised information from 7,248 farms and 1,571,023 carcasses slaughtered in the years 2014-2016.
• Outcome variables: body weight at slaughter, carcass fat index, age at slaughter (AS)
• Covariates: farm, AS, season, animal category (steer, bull, cull bull, heifer and cow), frequent technical consulting (FTC), regional sales team (RST), type of feedlot premix (no feedlot premix – NFP, finishing grazing cattle – FGC, feedlot without additives – FWA, and feedlot with additives – FA)
Ferreira et al. Big data analysis of beef production and quality: an example with the Brazilian cattle industry. ASAS meeting, Baltimore, MD, July 8-12, 2017 (to appear)
• Results: – Use of FA premix decreased AS, and increased BWS
and FI in comparison to NFP and FWA – Adopting FTC increased BWS and FI, and reduced AS – Bulls presented greater BWS and lower AS, but
presented lower FI in comparison with steers – Differences in BWS were observed for different RST
and seasons – AS was reduced and BWS and FI increased in raining
seasons of 2014-2016 – Combining FTC and FA was capable of increasing BWS
in 27.4 kg and reducing AS in approximately 10 months in comparison with FWA and non-FTC, suggesting that this approach might be favorable for production
Preliminary results
Location of Iowa Select Finishing Farms throughout Iowa
Pig Production
Tiago Fragoso
Pig Production
• The data set contains 503 farms divided into 3 production types: – Finishing [428 farms] – Nursery [25 farms] – Gilt Development Unit (GDU) [50 farms]
Traits
• Data set contains 140+ traits divided into 3 groups: – Performance: days on feed (DOF), average daily gain
(ADG), mortality, pigs produced, total weight produced, feed conversion (FC), profit
– Carcass traits: backfat, loin depth, % very light, % light, % target, % heavy, % very heavy, carcass weight (CW), live weight (LW), % yield (CW/LW)
– Utilization: fill days, location days, % utilization, number of turns, number of loads
Example – Mortality (%) • Reference values:
– Excellent: E < 2.5%, Regular: 2.5% < R < 5%, Bad: B > 5%
Type of Farm
Feed Mill and Sire Line
Precipita#on
Temperature
Soil
Geographical location, soil and weather condition