Big Data in Water - Water Resources Center · Great Flood of Mississippi River, 1993 6 Cedo Caka...

21
1/23/2018 1 Big Data in Water: Opportunities and Challenges for Machine Learning Vipin Kumar Department of Computer Science and Engineering University of Minnesota [email protected] www.cs.umn.edu/~kumar 1 Headwaters Lecture - 2018 2018 Water Resources Assembly and Research Symposium Headwaters Lecture Water : A Grand Societal Challenge of the 21 st Century 2 Shrinking Lake Mead Droughts in Southern California Harmful Algal Bloom in Lake Erie Floods due to Hurricane Harvey

Transcript of Big Data in Water - Water Resources Center · Great Flood of Mississippi River, 1993 6 Cedo Caka...

1/23/2018

1

Big Data in Water: Opportunities and Challenges for Machine Learning 

Vipin Kumar

Department of Computer Science and Engineering

University of Minnesota

[email protected]

www.cs.umn.edu/~kumar

1Headwaters Lecture - 2018

2018 Water Resources Assembly and Research SymposiumHeadwaters Lecture

Water : A Grand Societal Challenge of the 21st Century

2Shrinking Lake Mead

Droughts in Southern California

Harmful Algal Bloom in Lake Erie

Floods due to Hurricane Harvey

1/23/2018

2

Big Data in Water

3

Satellite Imagery Weather/Climate Models Hydrological Models

IOT for Water

• Hugely successful in commercial applications:

Golden Age of Data Science

4

1/23/2018

3

Case Study: Monitoring Global Surface Water Dynamics

5

Cedo Caka Lakein Tibet, 1984

Cedo Caka Lakein Tibet, 2011 Aral Sea in 2014Aral Sea in 1989

Impact of Climate Change Impact of Human Actions Early Warning Systems

Great Flood of Mississippi River, 1993

6

Cedo Caka Lakein Tibet, 1984

Cedo Caka Lakein Tibet, 2011 Aral Sea in 2014Aral Sea in 1989

Impact of Climate Change Impact of Human Actions Early Warning Systems

Great Flood of Mississippi River, 1993

Quantifying water stocks and flow

Global projections of water risks (red)

Integrating with hydrological models

Case Study: Monitoring Global Surface Water Dynamics

1/23/2018

4

Satellite Big Data

7

A vegetation index measures the surface “greenness” – proxy for total

biomassThis vegetation time seriescaptures temporal dynamics around the site of the China National Convention Center

Data Type Coverage SpatialResolution

Temporal Resolution

Spectral Resolution

Duration Availability

LANDSAT Multispectral Global 30 m 16 days 7 1972 - present Public

Hyperion Hyperspectral Regional 30 m 16 days 220 2001 - present Public

Sentinal - 1 Radar Global 5 m 12 days - 2014 - present Public

Sentinal - 2 Multispectral Global 10 m 6days 13 2015 - present Public

Quickbird Multispectral Global 2.16 m 2 to 12 days 4 2001 - 2014 Private

MODIS covers ~ 5 billion locations globally at 250m resolution daily since Feb 2000.

Longitude

Latitude

Time

grid cell

• SWBD (SRTM Water Body Dataset (Feb 2000)• Google-JRC water body product (1984 – 2015)

Challenges for Traditional Big Data Methods

• Challenge 1: Heterogeneity in space and time

- Water and land bodies look different in different regions of the world

- Same water body can look different at different time‐instances

8

Great Bitter Lake, Egypt Lake Tana, Ethiopia Lake Abbe, Africa

Mar Chiquita Lake, Argentina in 2000 (left) and 2012 (right)

• Challenge 2: Data Quality

– Clouds, shadows, atmospheric disturbances

• Incorrect labels

• Missing data – no labels

Poyang Lake, China (Pink color shows missing data)

1/23/2018

5

Method Innovations for Monitoring Water

• Ensemble Learning Methods for Handling Heterogeneity in Data 1,2

9

P1

P2

P3

Positive Modes(Water)

Negative Modes(Land)

N1

N2

N3

• Using Physics Guided Labeling to Handle Poor Data Quality3,4

Elevation A > B > C > D

Learn an ensemble of classifiers to distinguish b/w different pairs of positive and negative

modesUse elevation information to

constrain physically-consistent labels

3 Khandelwal et al. ICDM 20154 Mithal et al. (PhD Dissertation)

1 Karpatne et al. SDM 20152 Karpatne et al. ICDM 2015

A Global Surface Water Monitoring System http://z.umn.edu/monitoringwater

• Maps the dynamics of all major surface water bodies (surface area > 2.5 km2) shown as blue dots

Key Highlights:

• Detects melting of glacial lakes• Maps changes in river morphology• Identifies reservoir constructions• Finds relationships b/w surface water 

and precipitation/groundwater

10

1/23/2018

6

Showing Surface Water Dynamics

Don Martin Dam, Mexico

Surface area of water around Don Martin Dam across time

Annual Landsat Time‐lapse of this region (Courtesy: Google Earth Engine)

11

Regions of Change in South America

Red Dots (Water Gain):Region of size  > 2.5 km2 that have changed 

from land to water in the last 15 years

Green Dots (Water Loss):Region of size  > 2.5 km2 that have changed 

from water to land in the last 15 years

Example time series of a Water Gain region

Example time series of a Water Loss region12

1/23/2018

7

Examples of Change: Shrinking Water Bodies

Aggregate dynamics of all green dots shown on left

(Green dots show regions changing from water to land in last 15 years)

Annual Time‐lapse of an example green dot

13

September 2013

November 2015

November 2015

Examples of Change: Melting Glacial Lakes in TibetWater Gain regions (red dots) show melting of lakes

Red polygons show regions changing from land to water 

Aggregate dynamics of all red regions in Tibet

14

1/23/2018

8

Examples of Change: River Meandering(Adjacent occurrence of Water Gain (red) and Water Loss (green) regions all along the river indicate the displacement of water from the green dots to the red dots)

Zoomed‐in View

Example time series of a Water Gain region 

Example time series of a Water Loss region 

1

Time‐lapse of 1

2

Time‐lapse of 2

15Headwaters Lecture ‐ 2018

16

Examples of Change: Shrinking Island

Headwaters Lecture ‐ 2018

1/23/2018

9

Examples of Change: Dam Construction

17

Construction of Chubetsu Dam, Japan

Construction of a dam characterized by a sudden and persistent increase in surface area

Headwaters Lecture ‐ 2018

Global Reservoir and Dam (GRanD) Database

Global Reservoir and Dam (GRanD) Database:

• A data curation initiative by Global Water System Project (GWSP)

• Finds 61 dams constructed after 2001

UMN Approach:

• Finds 701 dams constructed after 2001

Dams reported by GRanD since 2001: 35

18

A data curation initiative by Global Water System Project (GWSP)

Headwaters Lecture ‐ 2018

1/23/2018

10

Comparison of Dam Detections with GRanDGlobal Reservoir and Dam (GRanD) 

Database:

• A data curation initiative by Global Water System Project (GWSP)

• Finds 61 dams constructed after 2001

UMN Approach:

• Finds 701 dams constructed after 2001

Dams only reported by GRanD: 5Dams reported by both UMN and GRanD: 30Dams only reported by UMN: 671

19Headwaters Lecture ‐ 2018

Relationship between Ground Waterand Surface Water Area Dynamics

• GRACE land data:– Obtained from http://grace.jpl.nasa.gov

– Available at 1° spatial resolution, monthly since 2002

– Preprocessing:• Average of GFZ, CSR, and JPL versions computed

• Prescribed grid scaling factors applied

• Surface Water Area Dynamics:– Number of MODIS water pixels counted for every 1° grid cell every month (to match resolutions with GRACE)

– Preprocessing:• Grid cells with less than 50 MODIS water pixels ignored

• Data spatially smoothed using a 3° X 3° windowHeadwaters Lecture ‐ 2018 20

1/23/2018

11

Correlations with GRACE

• Most regions show strong positive correlations b/w surface water dynamics and GRACE measurements

GRACE: Gravimetry Recovery and Climate Experiment• Measures changes in total water mass (surface + groundwater) at ~100 

km

21

Examples of Positive Correlations (1)

Correlation: 0.902

Blue: Surface area time seriesRed: GRACE data

22

1/23/2018

12

Negative Correlations in Indus Basin: Over‐consumption of groundwater?

• Increase in area of surface water due to rice/paddy farming and widening of Indus river

• GRACE shows decrease due to depletion of groundwater for agriculture Headwaters Lecture ‐ 2018 23

Negative Correlations in Bangladesh and Thailand

24

1/23/2018

13

Can we produce daily surface water extents maps at high spatial resolution ?

• Challenge: - MODIS (500m resolution, daily)

- LANDSAT (30m, every 16 days), 

Sentinel‐2 (10m, every  5‐10 days)

• Solution: ORBIT ‐ Ordering Based Information Transfer across space and time

25

Kajakai ReservoirAfghanistan

Extent at coarse resolution (500m) Extent at high resolution (30m) created using our approach

Quantifying water stocks and flow

Global projections of water risks (red)

Background:LANDSAT 7 image

of Dec 13, 2000

Daily surface water mapping at 30m: Lake Mead, USA

26

1/23/2018

14

Surface Extent at 500m created from MODIS data on Dec 13, 2000 27

Daily surface water mapping at 30m: Lake Mead, USA

Background:LANDSAT 7 image

of Dec 13, 2000

Surface Extent at 30m from MODIS 500m data on Dec 13, 2000 by ORBIT approach using USGS 30m DEM data 28

Daily surface water mapping at 30m: Lake Mead, USA

Background:LANDSAT 7 image

of Dec 13, 2000

1/23/2018

15

29

Daily surface water mapping at 30m: Lake Mead, USA

Background:LANDSAT 7 image

of Dec 13, 2000

Surface Extent at 30m from MODIS 500m data on Dec 13, 2000 by ORBIT approach using USGS 30m DEM data

Surface Extent at 500m created from MODIS data on Dec 13, 2000

30

Daily surface water mapping at 30m: Lake Mead, USA

Background:LANDSAT 7 image

of Dec 13, 2000

1/23/2018

16

31

Surface Extent at 30m created from MODIS 500m data on Dec 13, 2000 by ORBIT approach using USGS 30m DEM data

Daily surface water mapping at 30m: Lake Mead, USA

Background:LANDSAT 7 image

of Dec 13, 2000

32

Surface Extent at 30m created from MODIS 500m data on Dec 13, 2000 by ORBIT approach using USGS 30m DEM data

Daily surface water mapping at 30m: Lake Mead, USA

Background:LANDSAT 7 image

of Dec 13, 2000

1/23/2018

17

Surface Extent at 500m created from MODIS data on Dec 13, 2000

33

Daily surface water mapping at 30m: Lake Mead, USA

Background:LANDSAT 7 image

of Dec 13, 2000

34

Surface Extent at 30m created from MODIS 500m data on Dec 13, 2000by ORBIT approach using USGS 30m DEM data

Daily surface water mapping at 30m: Lake Mead, USA

Background:LANDSAT 7 image

of Dec 13, 2000

1/23/2018

18

35

Surface Extent at 30m created from MODIS 500m data on Dec 13, 2000 by ORBIT approach using USGS 30m DEM data

Daily surface water mapping at 30m: Lake Mead, USA

Background:LANDSAT 7 image

of Dec 13, 2000

MODIS 500m pixel grid in cyan color

Surface Extent at 500m created from MODIS data on Apr 02, 2016

36

Daily surface water mapping at 10m: Richland Chambers Reservoir, USA

(Background image: Sentinel-2 image of

Apr 02, 2016)

1/23/2018

19

Surface Extent at 10m created from MODIS 500m data on Apr 02, 2016 by ORBIT approach using USGS 10m DEM data

37

Daily surface water mapping at 10m: Richland Chambers Reservoir, USA

(Background image: Sentinel-2 image of

Apr 02, 2016)

38

Daily surface water mapping at 10m: Richland Chambers Reservoir, USA

Surface Extent at 10m created from MODIS 500m data on Apr 02, 2016 by ORBIT approach using USGS 10m DEM data

(Background image: Sentinel-2 image of

Apr 02, 2016)

1/23/2018

20

Surface Extent at 500m created from MODIS data on Apr 02, 2016

39

Daily surface water mapping at 10m: Richland Chambers Reservoir, USA

(Background image: Sentinel-2 image of

Apr 02, 2016)

40

Surface Extent at 10m created from MODIS 500m data on Apr 02, 2016 by ORBIT approach using USGS 10m DEM data

Daily surface water mapping at 10m: Richland Chambers Reservoir, USA

(Background image: Sentinel-2 image of

Apr 02, 2016)

1/23/2018

21

Other Applications of Big Data in Water

41

Hydrological Models for Streamflow

Digital Twin of AnacostiaWatershed

Modeling Lake Water QualityCollaboration: USGS

Cover Crop Mapping

Leakage Detection using smart meters

Collaboration: Northeastern University

Land-Water Interaction

Hybrid Physics-Data Models IOT for Water

Collaboration:D.C. Water

Collaboration: University of Minnesota

Team Members

University of Minnesota:Arindam Banerjee, Snigdhansu Chatterjee, Michael Steinbach, Jeff Peterson, David Mulla

Northeastern University: Auroop Ganguly, Ed Beighley

University of Wisconsin: Paul Hansen, Hilary Dugan

USGS: Jordan Read

UCLA: Dennis Lettenmaier

University of Maryland: Charon Birkett

Ankush KhandelwalAnuj Karpatne Xiaowei Jia

42