NKS meets the Grid and e-Science NKS2003 Boston June 29 2003 Geoffrey Fox Community Grids Lab,...

NKS meets theGrid and e-Science

NKS2003Boston

June 29 2003

Geoffrey FoxCommunity Grids Lab, Indiana University

[email protected]://www.grid2002.org

Moore’s Law for Sensors; Data Deluge• e-Science Drivers

– Science will be deluged with data from accelerators (LHC 10 petabytes/year), satellites (InSAR for earthquakes), telescopes, sensors, video surveillance …

– Scientific research is distributed across the world• Grid Technology aims to integrate distributed data, people,

computers (simulation or better data-mining)– Commercial interest from “utility computing” etc.

• NKS can provide the underlying modeling approach?

The total area of astronomical telescopes in m2, and CCDs measured in Gigapixels, over the last 25 years. The number of pixels and the data double every year.

Database Database

Analysis VisualizationExploration

(Mathematica)

RepositoriesFederated Databases

Sensor NetsStreaming Data

SERVOGrid Caricature

Linked NKS Models (with each other andWith data)

Simulations for Chaotic Earth SystemsEarth systems are now thought to be chaotic:

- Many scales in space and time- Most dynamics are fundamentally unobservable- Includes stochastic processes (random forcings)

Examples include:- Weather and climate- Earthquakes and other crustal processes- Plate tectonics and mantle convection- Geodynamo

Two possible approaches to forecasting and prediction:

Deterministic: Solve differential equations with initial conditions, boundary conditions and fixed parameters. Critical problem is that many (or all) of these are unknown in nature. There is a data deluge but wrong data for PDE’s. Doomed as an approach to earthquake forecasting even with “Earth Simulator follow-on” 2009 Petaflop supercomputer

Pattern Informatics and Complexity: The focus is on studying the manifold of all possible space-time patterns that a system can display, then using either pure observation or phenomenological dynamics constrained by the data we can actually observe.

Successful examples of Pattern-based Forecasting include weather and El Nino forecasting.

Models of Processes with Many Scales in Length and TimeStatistical Dynamics of an Earthquake Fault:

The Burridge-Knopoff Slider Block ModelR. Burridge and L. Knopoff, Bull. Seism. Soc. Am, 57, 341 (1967)

The nearest-neighbor BK model was the first slider block model.

Sticking points on the fault are represented by blocks having uniform loader spring constant KL(= kp in figure at right).

Each block is connected to its 2d nearest neighbors (d = spatial dimension) by springs having constant KC ( = kc at right).

A friction law prevents the blocks from sliding until sufficient force (stress) builds up. A simulated earthquake begins when the force

on a block due to the plate motion reaches a stress threshold F.

The avalanche of failing blocks, triggered by stress transfer from sliding blocks, represents an earthquake.

Earthquake work from John Rundle (University of Davis) as part of SERVOGrid – Solid Earth Research Virtual Observatory Grid – led by JPL

CA Model for Earthquakes

Fault Network Model for Southern California

Dynamics of Earthquakes from Numerical Simulations of CAs ”Virtual California” is a Cellular Automaton Model (J.B.R. et al., Phys. Rev. E, 61, 2418 , 2000; P.B. Rundle et al.,

Phys. Rev. Lett., 87, 148501, 2001)

Example of one of the large earthquakes that occur during a simulation.

= CFF Stress: Time vs. Space

Buildup of Coulonb Failure Function stress over time and space. Horizontal lines are earthquakes

Historic Earthquakes: Last 200 Years

The historic record of earthquakes over the last 200 years is shown at left.

The model fault system used for the simulations.

San Andreas Fault

Space (Fault Segments)

Tim

e (Y

ears

)

A representation of the fault friction encoded via data assimilation of historic events

Friction Model

Large Event

Positively correlated: (red - red) & (blue - blue). Negatively correlated: (red - blue).Uncorrelated: (red - green) & (blue - green).

JBR et al, Phys. Rev. E., v 61, 2000, & AGU Monograph “GeoComplexity & the Physics of Earthquakes”

Method: Correlation operator methods are used to compute the characteristic basis patterns, or eigenpatterns, of the earthquake activity. These eigenpatterns represent the characteristic modes of correlation and anticorrelation of earthquake activity. The corresponding eigenvalues, or eigenprobabilities, are a measure of the contribution of given eigenpatterns to the overall activity during the time period of interest.

Space-time Patterns in Earthquake Simulations

The eigenpatterns represent the normal modes of the earthquake activity time series

215 sites = 215 Time series

Tim

e

“Pattern Informatics” Method is Somewhat Like Quantum Mechanics!Earthquake activity over a period of time can be represented by a state vector (x,t), which can be written as a sum over KL eigenfunctions. Differences in state vectors have been found to represent a probability measure for future activity. Method analyzes the shifting patterns of earthquakes through time.

Plot of Log10 P(x), potential for large earthquakes, M 5, ~ 2000 to 2010

How to generate an earthquake forecast (~2000 to 2010)

1. Spatially coarse grain (tile) the region with boxes .1o x .1o on a side (~3000 boxes, ~ 2000 with at least one earthquake from 1932 to 2000). This scale is approximately the size of a M ~ 6 earthquake, although method seems to be sensitive down to a level of M 5.

2. 1(x) Temporal average of activity from 1932 to 1990 for large earthquakes

3. 2(x) Temporal average of activity from 1932 to 2000 for large earthquakes

4. (x) = 2(x) - 1(x) Change in average activity, 1990 to 2000, for large earthquakes

5. P(x) = {(x)}2 - < {(x)}2> Increase in probability for a large earthquake. Symbol <> represents spatial average.

6. Color code the result. From retrospective studies, we find that P(x) measures not only the average change in activity of large events during 1990-2000 (triangles at right), but also indicates locations for future activity for the period ~ 2000 to 2010.

(JB Rundle, KF Tiampo, W. Klein, JSS Martins, PNAS, v99, Supl 1, 2514-2521, Feb 19, 2002; KF Tiampo, KF Tiampo, JB Rundle, S. McGinnis, S. Gross and W. Klein, Europhys. Lett., 60, 481-487, 2002 )

Patterns in Nature: ENSO (El Nino Southern Oscillation) and The Pacific Decadal Oscillation

ENSO is the leading principal component of equatorial sea surface temperature variability. The Pacific Decadal Oscillation (PDO) Index is the leading principal component of North Pacific monthly sea surface temperature variability (poleward of 20N for the 1900-93 period).

ENSOs are now being forecast using Karhunen-Loeve (KL) Analysis, also called Empirical Orthogonal Function (EOF), or Principal Component Analysis (PCA)

At right are shown typical wintertime sea surface temperature (colors), sea level pressure (contours) and surface windstress (arrows) anomaly patterns of ENSO & PDO.

Differences between ENSO & PDO:

1. 20th century PDO "events" persist for 20-to-30 years, while typical ENSO events persist for 6 to 18 months.

2. The climatic fingerprints of the PDO are most visible in the North Pacific/North American sector, while secondary signatures exist in the tropics - the opposite is true for ENSO

http://tao.atmos.washington.edu/pdo/

Pattern Informatics is Interesting• But it is “only” qualitative as are many fields of science

where “real theory” too complex– Note Computational Fluid Dynamics for aircraft is in contrast

quantitative– Earth Science, Strong Interactions in particle physics, most

biology don’t have quantitative practical models.• Suggest we combine a new way of looking at things

(NKS) and data-deluged science to make “Pattern Complexity”– Data is a function of space and time and will give both

dynamics and boundary conditions (latter is “old science” view of data)

• We need Mathematica to become a Grid Service and• Combine NKS with pattern informatics and data

assimilation

InfoGrid

MultiScale

Parallel Computing

Experiments

GeoInformatics

WorkflowIntegration

NKSApproach

GeneralComplexSystems

Simulations

Load Balancing Algorithms

Integrated IDE

Sensors/Satellites

Other FieldsX-Complexity

Infrastructuree-Science

Grid

Computer Science

Modeling

Geology

Clusters

Grid

Visualization

FieldComplex

FluidsStock Market

GridPortals

Databases

BioComplexity

DatabaseService

SensorService

ComputeService

ParallelSimulation

Service

Middle Tier with XML Interfaces

ExplorationService

ApplicationService-1

Users

Database



CCE Control Portal Aggregation

SERVOGrid Complexity Computing Environment CCE

XML Meta-dataService

Complexity(NKS Model)Simulation

Service

Approach• Build on e-Science methodology and Grid

technology• Geocomplexity and Biocomplexity

applications with multi-scale models, scalable parallelism, data assimilation as key issues– Data and NKS driven models

• Use existing code/database technology (SQL/Fortran/C++) linked to “Application Web/OGSA services” – XML specification of models, computational

steering, scale supported at “Web Service” level as don’t need “high performance” here

– Allows use of Semantic Grid technology

NKSModels

WS linkingto user andOther WS

(data sources)

Application WS

HPCSimulation

DataFilter

Data FilterD

ata

Filt

er

Data

Filter

Data

Filter

Distributed Filters massage dataFor simulation

Other

Grid

and W

eb

Servi

ces

AnalysisControl

Visualize

SERVOGrid (Complexity)Computing Model

Grid

OGSA-DAIGrid Services

This Type of Gridintegrates with

Parallel computingMultiple HPC

facilities but only use one at a time

Many simultaneous data sources and

sinks

Grid Data Assimilation

Data Assimilation• Data assimilation implies one is solving some optimization

problem which might have Kalman Filter like structure

• As discussed by DAO at Earth Science meeting, one will become more and more dominated by the data (Nobs much larger than number of simulation points).

• Natural approach is to form for each local (position, time) patch the “important” data combinations so that optimization doesn’t waste time on large error or insensitive data.

• Data reduction done in natural distributed fashion NOT on HPC machine as distributed computing most cost effective if calculations essentially independent – Filter functions must be transmitted from HPC machine

2 2

1

min ( , ) _obsN

i iTheoretical Unknownsi

Data position time Simulated Value Error

Distributed Filtering

HPC Machine

Distributed Machine

Data FilterNobslocal patch 1

Nfilteredlocal patch 1

Data FilterNobslocal patch 2

Nfilteredlocal patch 2

GeographicallyDistributedSensor patches

Nobslocal patch >> Nfiltered

local patch ≈ Number_of_Unknownslocal patch

Send needed FilterReceive filtered data

In simplest approach, filtered data gotten by linear transformations on original data based on Singular Value Decomposition of Least squares matrix

Factorize Matrixto product oflocal patches

NKS meets the Grid and e-Science NKS2003 Boston June 29 2003 Geoffrey Fox Community Grids Lab,...

Documents

Transcript of NKS meets the Grid and e-Science NKS2003 Boston June 29 2003 Geoffrey Fox Community Grids Lab,...