Talk 1

10
Analysis of Blue Sky Model Outputs and Realistic Smoke Monitor Data using R/R-Studio Jimin Kim Mazama Science August 4 th 2015 1

Transcript of Talk 1

Page 1: Talk 1

Analysis of Blue Sky Model Outputs and Realistic Smoke Monitor Data using R/R-Studio

Jimin Kim

Mazama Science

August 4th 2015

1

Page 2: Talk 1

Table of Contents

• Introduction

• Analysis of Gridded Smoke Model Outputs in R

• Analysis of Unraveled Smoke Data in R

• Conclusion

• Q/A

2

Page 3: Talk 1

Introduction

• Wild fires during the fire season in Washington state are the serious problems due to the abundance of trees and possible high dryness.

• PWFSL (Pacific Wild Fire Science Lab) conducts researches the wild fire events by analyzing the monitoring data and the atmospheric model output data.

• However, until now, there have not been effective software based tools that enable PWFSL to conduct analysis with ease and efficiency. Especially when it comes to comparison between the model output and the monitoring data.

• R/R-Studio is perfect for this analysis task

– It is specialized in reading in/analyzing realistic big data sets.

– Its wide range of high quality public packages further allows users to perform advanced statistical analysis on data.

3

Page 4: Talk 1

Analysis of Gridded Model Outputs in R

• Atmospheric model outputs are produced for different map domains and resolutions. (e.g, PNW-1.33km, CANSAC-2km)

• The model output domains are in ‘Grid format’ and are originally stored as NetCDF file with sort of an awkward structure.

• Using R, one can restructure the data architecture to be more sensible and easier to handle in R.

Raw Bluesky

model Output

2D Lon 2D Lat PM2.5

ModelRun DeltaLon DeltaLat Time

New Data Format

4

Page 5: Talk 1

Analysis of Gridded Model Outputs in R

• Combined with Maps package in R, one can easily plot PM2.5 distribution on the map

• Example shows the maximum PM2.5 ever achieved in the PNW-1.33km domain

5

Page 6: Talk 1

Analysis of Gridded Model Outputs in R

• One can also subset the data by defining the x-range and y-range

• Built in R function, ‘apply’, enables users to choose any arbitrary function such as ‘mean’ and collapses the time axis to obtain particular information.

6

Page 7: Talk 1

Analysis of Unraveled Smoke Data in R

• Unlike the ‘Gridded’ format, ‘Unraveled’ data format allows users to pick a target point on the map and the radius around that point, or any arbitrarily shaped regions.

• In return of discarding the concept of ‘Gridded’, ‘Unraveled’ data can be much more efficient when it comes to the local smoke analysis.

• We can treat both ‘unraveled’ model output and monitor data as the same data structure to make it easy to compare two.

Data Meta

t

Monitor IDs Attributes (Lon, Lat, Monitor ID…)

Id

Monitoring Data

Unraveled Model Output

7

New Data Format

Page 8: Talk 1

Analysis of Unraveled Smoke Data in R

• The example demonstrates the case for the local fire at WA state.

• Once the raw data is restructured as a list of ‘data’ and ‘meta’, it’s compatible with all the functions that are designed to handle unraveled data.

• Usage of ‘Rolling Mean’ and ‘Apply’ functions further extend the functionalities for different types of statistical analysis.

• This makes the comparison between the model outputs and the actual monitoring data much easier and effective.

8

Page 9: Talk 1

Conclusion

• By restructuring the raw data to a feasible format and utilizing the wide array of great packages in R, it is possible to perform the smoke analysis with model outputs and monitoring data in much easier/efficient manner.

• Unifying the data structure for both model output and monitoring ‘Unraveled’ data enables users to easily make comparisons between two.

• Equipped with wide ranges of robust functionalities in R/R-Studio, PWFSL has effective tools in disposal to improve their computational model and thus can be better prepared for the fire seasons.

9

Page 10: Talk 1

Q/A

10