Post on 18-Jan-2016
1
Computational Science as an enablerfor sustainable FEW Systems
Baskar GanapathysubramanianIowa State University
NSF FEW Workshop: Oct 12-13, 2015, ISU
NSF FEW Workshop: Oct 12-13, 2015, ISU2
Computational Science and Engineering Group
What do we do:1) Algorithm design and software implementation2) Application driven research: Curiosity driven group
Overview of research activities related to Plant Sciences
NSF FEW Workshop: Oct 12-13, 2015, ISU3
Feature extraction: Data for crop models
Spatial coverage
(Dimensions of field)Temporal Coverage
(Crop Cycle)
Data for validation/input/calibration
Data deluge due to sensor advances and data collection improvements
Heterogeneous, multi length and time scale data
Noisy, gappy data
Need to extract traits used for various ‘down stream’ tasks
Have to do this in an automated, high throughput, and efficient way
Similar issues faced by other disciplines: Astronomy, Particle physics, Driverless automobiles, security and defense applications
Machine learning approaches very promising
NSF FEW Workshop: Oct 12-13, 2015, ISU4
Machine Learning
Goal of ML is to generalize beyond training data
Pattern recognition, perception and control tasks
Very difficult to manually encode all features
From opsrules.com
MNIST dataset
TIMIT dataset
Breakthrough in learning algorithms. Prominent examples include ‘deep networks’
NVIDIA cuDNN website
More data, Better computing infrastructure
NSF FEW Workshop: Oct 12-13, 2015, ISU5
Learning feature labels in scenes: Convolution networks
From Le Cun group, Hinton group, Ng group
Machine Learning Examples
NSF FEW Workshop: Oct 12-13, 2015, ISU6
From Le Cun group, Hinton group, Ng group
Machine Learning ExamplesLearning a hierarchy of features: Feature extractions using auto-encoders, sparse encoders, Deep Belief networks, Deep Neural Networks
Basic hypothesis: Use high throughput phenotyping to enable extraction of detailed characteristics of tassels.
Challenges: Identification of tassel locations, followed by extraction of tassel features of close to a million images!
ML: Agricultural Examples
P. Schnable
Basic hypothesis: Use high throughput phenotyping to understand features affecting (a)biotic stress tolerance
A. Singh
A. Singh
1 2
3 4
5
Standard Area Diagram
Example Application: Iron Deficiency Chrolosis (IDC)
IDC: Inability of plants to absorb iron from soil
Current Methods are Visual:- Time consuming- Labor Intensive- Reliability/Consistency
issues
ML tools for rapid identification. Deploy as apps
ML: Agricultural Examples
S. Sarkar
NSF FEW Workshop: Oct 12-13, 2015, ISU9
ML for Yield Prediction
Goal: 1) Collect and curate dataset of economic, agricultural, meteorological, and crop management traits that is used to make predictions. 2) Develop and deploy suite of statistical and ML tools on data3) Create a workflow that will enable the larger community to utilize data and test methods
Yield forecasting: Combination of knowledge-based computer programs (that simulate plant-weather-soil-management interactions) along with soil and environment data and targeted surveys.
D. HayesCompanies such as Climate Corp and other big data firms may now be able to beat the USDA at yield forecasting, leading to detrimental asymmetric markets.
A publicly available high quality yield prediction tool will enable the producers to make informed decisions thereby ensuring a symmetrical market.
S. Sarkar
D. Nettleton
D. Attinger
M. Gilbert
Simple physiological model of adult maize plant.
Validated in field by Matthew Gilbert (UC Davis)
Several field-testable traits: stomatal conductance, root, stem, leaf conductance.
Input: Hourly weather data.
Outputs: Water use, Photosynthetic yield
Optimization: Trait identification for productivity
Software engineering
Code optimization
Integrate with parallel optimization framework Deploy on HPC systems
NSF FEW Workshop: Oct 12-13, 2015, ISU11
Optimization: Trait identification for productivity
Pareto front with more than 3 million configurations tested. Ran on XSEDE TACC and local HPC resources (unpublished, 2015).
Explored traits that perform under well irrigated vs drought conditions.
Concluding Observations
1) Leverage (rapid) machine learning developments
2) Learn from progress/best practices in other fields
3) Fast ML models as surrogate models for exploration, uncertainty quantification
4) Visualization and data management become important
5) Data exchange/sharing/interoperability protocols have to be set.
6) Critical to incorporate software engineering practices into the workflow (code reuse,
modularity).
7) Need sustained support for software development and maintenance
8) Need to be ready for next generation cyber infrastructure
9) Community based approach?