Big Data, Graphical Modeling, and Causal Inference in ...

37
Guilherme J. M. Rosa Department of Animal Sciences Big Data, Graphical Modeling, and Causal Inference in Livestock Production

Transcript of Big Data, Graphical Modeling, and Causal Inference in ...

Page 1: Big Data, Graphical Modeling, and Causal Inference in ...

Guilherme J. M. Rosa Department of Animal Sciences

Big Data, Graphical Modeling, and Causal Inference in

Livestock Production

Page 2: Big Data, Graphical Modeling, and Causal Inference in ...

•  Currently 7.2 billion people in the world.

•  Expected increase to about 9 billion by 2050, mostly in developing countries.

•  World food production will need to increase by 60 percent and food production in the developing world will need to double.

•  Productivity, profitability, product quality, environmental footprint (land, water and energy use, greenhouse gas emissions, etc.)

Feeding the World

Page 3: Big Data, Graphical Modeling, and Causal Inference in ...

Genotype x Environment

Page 4: Big Data, Graphical Modeling, and Causal Inference in ...

Example

Ribeiro S, Eler JP, Pedrosa VB, Rosa GJM, Ferraz JBS and Balieiro JCC. Genotype x environment interaction for weaning weight in Nellore cattle using reaction norm analysis. Livestock Science 176: 40–46, 2015.

Page 5: Big Data, Graphical Modeling, and Causal Inference in ...

Genotype x Environment

•  Nucleus herd vs. commercial settings environment

•  Environmental diversity within countries/macro-regions

•  Globalization of breeding •  Increasing importance of South

America, Africa and Southeastern Asia •  Global poverty and ecological footprint

Page 6: Big Data, Graphical Modeling, and Causal Inference in ...

Precision Livestock Production

Page 7: Big Data, Graphical Modeling, and Causal Inference in ...

•  Animal-level data -  Production indexes, well-being monitoring -  Pattern recognition (e.g. early detection of health issues) -  Predictive analytics (e.g. prediction of animal future

performance)

•  Farm-level data -  Efficiency of management protocols and product

administration -  Genetics-environment interaction -  Informed decision-making

Different Sources of Data and Information

Page 8: Big Data, Graphical Modeling, and Causal Inference in ...

Sensors: Prediction of behavior in lactating dairy cows

João Dórea

Animal-level Data High-Throughput, Real-Time Phenotyping

Page 9: Big Data, Graphical Modeling, and Causal Inference in ...

-  3 behaviors/activities: resting, eating and ruminating -  Hidden Markov model -  The data were analyzed for each axis: X, Y, and Z -  When the probability of a state at a time t was greater

than 50%, the state was classified into one of the 3 possible states

-  The total time of each state (predicted values) was compared to the observed values

Accelerometer to predict feeding behavior

Page 10: Big Data, Graphical Modeling, and Causal Inference in ...

X-axis: 3-state probabilities

Page 11: Big Data, Graphical Modeling, and Causal Inference in ...

Feeding Behavior

Ea#ng#me

Rumina#ng#me

Res#ng#me

X-axispred,min 45 47 42obs,min 45 51 40Accuracy,% 100 92 95Y-axispred,min 35 76 25obs,min 40 45 51Accuracy,% 89 32 48Z-axispred,min 54 72 10obs,min 45 51 40Accuracy,% 80 58 24

Page 12: Big Data, Graphical Modeling, and Causal Inference in ...

Computer Vision: Tilapia filet quality

•  Data from more than 3000 fish •  Dorsal and lateral pictures •  Carcass weight and yield

Page 13: Big Data, Graphical Modeling, and Causal Inference in ...

Image Processing

•  Image recognition and segmentation

Original image

Output segmentation

Page 14: Big Data, Graphical Modeling, and Causal Inference in ...

Pig weight and leg/back score

•  Data 700 pigs •  Weight across different ages •  Leg and back scores

Arthur Fernandes

Page 15: Big Data, Graphical Modeling, and Causal Inference in ...

Prediction: Linear model

Page 16: Big Data, Graphical Modeling, and Causal Inference in ...

The use of artificial neural network to estimate feed intake in lactating cows through mid-infrared spectra of milk samples

Milk Mid-infrared Spectra

dry matter intake mid-infrared (MIR)

spectroscopy

milk sample

João Dórea

Page 17: Big Data, Graphical Modeling, and Causal Inference in ...

Objective: use of Fourier transform MIR of milk samples to estimate dry matter intake in lactating Holsteins cows

-  MIR recorded for 599 milk samples from 189 lactating cows Individual DMI recorded with electronic feeding gates

-  One-hidden-layer ANN model compared with partial least squares (PLS) regression

-  Cross-validation method used to assess the predictive ability (PMSE)

Results: ANN PMSE decreased as the number of neurons increased until 15 (PMSE = 4.13, 3.61, 3.30 and 3.38 kg2/d2 , for 5, 10, 15 and 20 neurons); the PLS model (7 factors) resulted in higher PMSE = 4.41 kg2/d2

Page 18: Big Data, Graphical Modeling, and Causal Inference in ...

Gianola D, Okut H, Weigel KA and Rosa GJM. Predicting complex quantitative traits with Bayesian neural networks: a case study with Jersey cows and wheat. BMC Genetics 12:87, 2011.

Pérez-Rodríguez P, Gianola D, Weigel KA, Rosa GJM and Crossa J. An R package for fitting Bayesian regularized neural networks with applications in animal breeding. Journal of Animal Science 91: 3522-3531, 2013.

High-Throughput Genotyping

Predicting complex quantitative traits with Bayesian neural networks

Page 19: Big Data, Graphical Modeling, and Causal Inference in ...
Page 20: Big Data, Graphical Modeling, and Causal Inference in ...

Mixed Models

•  Used extensively in animal breeding, with multiple traits and huge numbers of records and animals in the pedigree

•  However, environmental effects coalesced into contemporary groups

•  As such, individual effects of specific factors are not investigated, no issues with collinearity, no insight into indirect, direct and total effects, etc.

eZuXβy ++=G =G0 ⊗AΣ =R⊗ I⎧⎨⎩

Farm-level Data Historic data across farms

Page 21: Big Data, Graphical Modeling, and Causal Inference in ...

•  Confinamento Monte Alegre (CMA): http://www.cma.agr.br

•  Feedlot capacity 16,000 heads •  Annual output around 50,000 heads

•  TGC software: http://www.gestaoagropecuaria.com.br/produtos/tgc/

•  80 variables (input, output, economics, etc.)

Page 22: Big Data, Graphical Modeling, and Causal Inference in ...

Decision Tree

•  Decision tree: decision support tool that uses a tree-like graph or model of decisions and their possible consequences

•  Decision trees are commonly used in operations research, such as decision analysis, to help identify a strategy most likely to reach a goal

•  Popular tool in machine learning

Page 23: Big Data, Graphical Modeling, and Causal Inference in ...

Decision Tree: U$/head – Income (feed cost)

Page 24: Big Data, Graphical Modeling, and Causal Inference in ...

Decision Tree: U$/head – Net Income

Page 25: Big Data, Graphical Modeling, and Causal Inference in ...

Company Owner#CPF/CNPJ

FarmRegion

ID

outcomeiAge iFat Month Year

gSlaugther Weight Quality Amounttypified

farmtraits

Farm Owner

Inscestadual

Size

City#CPF/CNPJ

State

La#t/Long

StateIDCPF/CNPJ

ID

OwnerOwnerCode

#CPF/CNPJ

FarmStateID

CityState

outcome

TechnologySalesteam

TechnicianYearpNutri#on

tNutri#on

wNutri#onSeason

Beef Production and Quality

Vera Cardoso

Page 26: Big Data, Graphical Modeling, and Causal Inference in ...

Amountofcarcassestypifiedperregion

Total:23,056,869carcasses(≅25%ofBrazilianproduc#on2014-2016)

NumberofcarcassesbyAge

0 2 4 6 8Years

feb apr jun aug oct dec

Carcassesslaughtered

bymon

th

Years2014-2016

Carcassesbyyear2014:2,229,7022015:1,924,1492016:1,808,955

NumberofcarcassesbyQuality

Desirable Acceptable Undesirable

Carcassesslaughtered

byweight

Weightin@

NumberofcarcassesbyiFat

1 2 3 4 5Fatindex

Page 27: Big Data, Graphical Modeling, and Causal Inference in ...

Preliminary results

•  Data from two sources: JBS S.A. (81,053 farms) and DSM Produtos Nutricionais (22,223 farms). After merging, the final dataset comprised information from 7,248 farms and 1,571,023 carcasses slaughtered in the years 2014-2016.

•  Outcome variables: body weight at slaughter, carcass fat index, age at slaughter (AS)

•  Covariates: farm, AS, season, animal category (steer, bull, cull bull, heifer and cow), frequent technical consulting (FTC), regional sales team (RST), type of feedlot premix (no feedlot premix – NFP, finishing grazing cattle – FGC, feedlot without additives – FWA, and feedlot with additives – FA)

Ferreira et al. Big data analysis of beef production and quality: an example with the Brazilian cattle industry. ASAS meeting, Baltimore, MD, July 8-12, 2017 (to appear)

Page 28: Big Data, Graphical Modeling, and Causal Inference in ...

•  Results: –  Use of FA premix decreased AS, and increased BWS

and FI in comparison to NFP and FWA –  Adopting FTC increased BWS and FI, and reduced AS –  Bulls presented greater BWS and lower AS, but

presented lower FI in comparison with steers –  Differences in BWS were observed for different RST

and seasons –  AS was reduced and BWS and FI increased in raining

seasons of 2014-2016 –  Combining FTC and FA was capable of increasing BWS

in 27.4 kg and reducing AS in approximately 10 months in comparison with FWA and non-FTC, suggesting that this approach might be favorable for production

Preliminary results

Page 29: Big Data, Graphical Modeling, and Causal Inference in ...

Location of Iowa Select Finishing Farms throughout Iowa

Pig Production

Tiago Fragoso

Page 30: Big Data, Graphical Modeling, and Causal Inference in ...

Pig Production

•  The data set contains 503 farms divided into 3 production types: –  Finishing [428 farms] –  Nursery [25 farms] –  Gilt Development Unit (GDU) [50 farms]

Page 31: Big Data, Graphical Modeling, and Causal Inference in ...

Traits

•  Data set contains 140+ traits divided into 3 groups: –  Performance: days on feed (DOF), average daily gain

(ADG), mortality, pigs produced, total weight produced, feed conversion (FC), profit

–  Carcass traits: backfat, loin depth, % very light, % light, % target, % heavy, % very heavy, carcass weight (CW), live weight (LW), % yield (CW/LW)

–  Utilization: fill days, location days, % utilization, number of turns, number of loads

Page 32: Big Data, Graphical Modeling, and Causal Inference in ...

Example – Mortality (%) •  Reference values:

–  Excellent: E < 2.5%, Regular: 2.5% < R < 5%, Bad: B > 5%

Page 33: Big Data, Graphical Modeling, and Causal Inference in ...

Type of Farm

Page 34: Big Data, Graphical Modeling, and Causal Inference in ...

Feed Mill and Sire Line

Page 35: Big Data, Graphical Modeling, and Causal Inference in ...

Precipita#on

Temperature

Soil

Geographical location, soil and weather condition

Page 36: Big Data, Graphical Modeling, and Causal Inference in ...
Page 37: Big Data, Graphical Modeling, and Causal Inference in ...