Community Grids Lab at Pervasive Technology Labs Geoffrey Fox [email protected].
NKS meets the Grid and e-Science NKS2003 Boston June 29 2003 Geoffrey Fox Community Grids Lab,...
-
Upload
paul-hodge -
Category
Documents
-
view
215 -
download
0
Transcript of NKS meets the Grid and e-Science NKS2003 Boston June 29 2003 Geoffrey Fox Community Grids Lab,...
NKS meets theGrid and e-Science
NKS2003Boston
June 29 2003
Geoffrey FoxCommunity Grids Lab, Indiana University
[email protected]://www.grid2002.org
Moore’s Law for Sensors; Data Deluge• e-Science Drivers
– Science will be deluged with data from accelerators (LHC 10 petabytes/year), satellites (InSAR for earthquakes), telescopes, sensors, video surveillance …
– Scientific research is distributed across the world• Grid Technology aims to integrate distributed data, people,
computers (simulation or better data-mining)– Commercial interest from “utility computing” etc.
• NKS can provide the underlying modeling approach?
The total area of astronomical telescopes in m2, and CCDs measured in Gigapixels, over the last 25 years. The number of pixels and the data double every year.
Database Database
Analysis VisualizationExploration
(Mathematica)
RepositoriesFederated Databases
Sensor NetsStreaming Data
SERVOGrid Caricature
Linked NKS Models (with each other andWith data)
Simulations for Chaotic Earth SystemsEarth systems are now thought to be chaotic:
- Many scales in space and time- Most dynamics are fundamentally unobservable- Includes stochastic processes (random forcings)
Examples include:- Weather and climate- Earthquakes and other crustal processes- Plate tectonics and mantle convection- Geodynamo
Two possible approaches to forecasting and prediction:
Deterministic: Solve differential equations with initial conditions, boundary conditions and fixed parameters. Critical problem is that many (or all) of these are unknown in nature. There is a data deluge but wrong data for PDE’s. Doomed as an approach to earthquake forecasting even with “Earth Simulator follow-on” 2009 Petaflop supercomputer
Pattern Informatics and Complexity: The focus is on studying the manifold of all possible space-time patterns that a system can display, then using either pure observation or phenomenological dynamics constrained by the data we can actually observe.
Successful examples of Pattern-based Forecasting include weather and El Nino forecasting.
Models of Processes with Many Scales in Length and TimeStatistical Dynamics of an Earthquake Fault:
The Burridge-Knopoff Slider Block ModelR. Burridge and L. Knopoff, Bull. Seism. Soc. Am, 57, 341 (1967)
The nearest-neighbor BK model was the first slider block model.
Sticking points on the fault are represented by blocks having uniform loader spring constant KL(= kp in figure at right).
Each block is connected to its 2d nearest neighbors (d = spatial dimension) by springs having constant KC ( = kc at right).
A friction law prevents the blocks from sliding until sufficient force (stress) builds up. A simulated earthquake begins when the force
on a block due to the plate motion reaches a stress threshold F.
The avalanche of failing blocks, triggered by stress transfer from sliding blocks, represents an earthquake.
Earthquake work from John Rundle (University of Davis) as part of SERVOGrid – Solid Earth Research Virtual Observatory Grid – led by JPL
CA Model for Earthquakes
Fault Network Model for Southern California
Dynamics of Earthquakes from Numerical Simulations of CAs ”Virtual California” is a Cellular Automaton Model (J.B.R. et al., Phys. Rev. E, 61, 2418 , 2000; P.B. Rundle et al.,
Phys. Rev. Lett., 87, 148501, 2001)
Example of one of the large earthquakes that occur during a simulation.
= CFF Stress: Time vs. Space
Buildup of Coulonb Failure Function stress over time and space. Horizontal lines are earthquakes
Historic Earthquakes: Last 200 Years
The historic record of earthquakes over the last 200 years is shown at left.
The model fault system used for the simulations.
San Andreas Fault
Space (Fault Segments)
Tim
e (Y
ears
)
A representation of the fault friction encoded via data assimilation of historic events
Friction Model
Large Event
Positively correlated: (red - red) & (blue - blue). Negatively correlated: (red - blue).Uncorrelated: (red - green) & (blue - green).
JBR et al, Phys. Rev. E., v 61, 2000, & AGU Monograph “GeoComplexity & the Physics of Earthquakes”
Method: Correlation operator methods are used to compute the characteristic basis patterns, or eigenpatterns, of the earthquake activity. These eigenpatterns represent the characteristic modes of correlation and anticorrelation of earthquake activity. The corresponding eigenvalues, or eigenprobabilities, are a measure of the contribution of given eigenpatterns to the overall activity during the time period of interest.
Space-time Patterns in Earthquake Simulations
The eigenpatterns represent the normal modes of the earthquake activity time series
215 sites = 215 Time series
Tim
e
“Pattern Informatics” Method is Somewhat Like Quantum Mechanics!Earthquake activity over a period of time can be represented by a state vector (x,t), which can be written as a sum over KL eigenfunctions. Differences in state vectors have been found to represent a probability measure for future activity. Method analyzes the shifting patterns of earthquakes through time.
Plot of Log10 P(x), potential for large earthquakes, M 5, ~ 2000 to 2010
How to generate an earthquake forecast (~2000 to 2010)
1. Spatially coarse grain (tile) the region with boxes .1o x .1o on a side (~3000 boxes, ~ 2000 with at least one earthquake from 1932 to 2000). This scale is approximately the size of a M ~ 6 earthquake, although method seems to be sensitive down to a level of M 5.
2. 1(x) Temporal average of activity from 1932 to 1990 for large earthquakes
3. 2(x) Temporal average of activity from 1932 to 2000 for large earthquakes
4. (x) = 2(x) - 1(x) Change in average activity, 1990 to 2000, for large earthquakes
5. P(x) = {(x)}2 - < {(x)}2> Increase in probability for a large earthquake. Symbol <> represents spatial average.
6. Color code the result. From retrospective studies, we find that P(x) measures not only the average change in activity of large events during 1990-2000 (triangles at right), but also indicates locations for future activity for the period ~ 2000 to 2010.
(JB Rundle, KF Tiampo, W. Klein, JSS Martins, PNAS, v99, Supl 1, 2514-2521, Feb 19, 2002; KF Tiampo, KF Tiampo, JB Rundle, S. McGinnis, S. Gross and W. Klein, Europhys. Lett., 60, 481-487, 2002 )
Patterns in Nature: ENSO (El Nino Southern Oscillation) and The Pacific Decadal Oscillation
ENSO is the leading principal component of equatorial sea surface temperature variability. The Pacific Decadal Oscillation (PDO) Index is the leading principal component of North Pacific monthly sea surface temperature variability (poleward of 20N for the 1900-93 period).
ENSOs are now being forecast using Karhunen-Loeve (KL) Analysis, also called Empirical Orthogonal Function (EOF), or Principal Component Analysis (PCA)
At right are shown typical wintertime sea surface temperature (colors), sea level pressure (contours) and surface windstress (arrows) anomaly patterns of ENSO & PDO.
Differences between ENSO & PDO:
1. 20th century PDO "events" persist for 20-to-30 years, while typical ENSO events persist for 6 to 18 months.
2. The climatic fingerprints of the PDO are most visible in the North Pacific/North American sector, while secondary signatures exist in the tropics - the opposite is true for ENSO
http://tao.atmos.washington.edu/pdo/
Pattern Informatics is Interesting• But it is “only” qualitative as are many fields of science
where “real theory” too complex– Note Computational Fluid Dynamics for aircraft is in contrast
quantitative– Earth Science, Strong Interactions in particle physics, most
biology don’t have quantitative practical models.• Suggest we combine a new way of looking at things
(NKS) and data-deluged science to make “Pattern Complexity”– Data is a function of space and time and will give both
dynamics and boundary conditions (latter is “old science” view of data)
• We need Mathematica to become a Grid Service and• Combine NKS with pattern informatics and data
assimilation
InfoGrid
MultiScale
Parallel Computing
Experiments
GeoInformatics
WorkflowIntegration
NKSApproach
GeneralComplexSystems
Simulations
Load Balancing Algorithms
Integrated IDE
Sensors/Satellites
Other FieldsX-Complexity
Infrastructuree-Science
Grid
Computer Science
Modeling
Geology
Clusters
Grid
Visualization
FieldComplex
FluidsStock Market
GridPortals
Databases
BioComplexity
DatabaseService
SensorService
ComputeService
ParallelSimulation
Service
Middle Tier with XML Interfaces
ExplorationService
ApplicationService-1
Users
Database
ApplicationService-2
ApplicationService-3
CCE Control Portal Aggregation
SERVOGrid Complexity Computing Environment CCE
XML Meta-dataService
Complexity(NKS Model)Simulation
Service
Approach• Build on e-Science methodology and Grid
technology• Geocomplexity and Biocomplexity
applications with multi-scale models, scalable parallelism, data assimilation as key issues– Data and NKS driven models
• Use existing code/database technology (SQL/Fortran/C++) linked to “Application Web/OGSA services” – XML specification of models, computational
steering, scale supported at “Web Service” level as don’t need “high performance” here
– Allows use of Semantic Grid technology
NKSModels
WS linkingto user andOther WS
(data sources)
Application WS
HPCSimulation
DataFilter
Data FilterD
ata
Filt
er
Data
Filter
Data
Filter
Distributed Filters massage dataFor simulation
Other
Grid
and W
eb
Servi
ces
AnalysisControl
Visualize
SERVOGrid (Complexity)Computing Model
Grid
OGSA-DAIGrid Services
This Type of Gridintegrates with
Parallel computingMultiple HPC
facilities but only use one at a time
Many simultaneous data sources and
sinks
Grid Data Assimilation
Data Assimilation• Data assimilation implies one is solving some optimization
problem which might have Kalman Filter like structure
• As discussed by DAO at Earth Science meeting, one will become more and more dominated by the data (Nobs much larger than number of simulation points).
• Natural approach is to form for each local (position, time) patch the “important” data combinations so that optimization doesn’t waste time on large error or insensitive data.
• Data reduction done in natural distributed fashion NOT on HPC machine as distributed computing most cost effective if calculations essentially independent – Filter functions must be transmitted from HPC machine
2 2
1
min ( , ) _obsN
i iTheoretical Unknownsi
Data position time Simulated Value Error
Distributed Filtering
HPC Machine
Distributed Machine
Data FilterNobslocal patch 1
Nfilteredlocal patch 1
Data FilterNobslocal patch 2
Nfilteredlocal patch 2
GeographicallyDistributedSensor patches
Nobslocal patch >> Nfiltered
local patch ≈ Number_of_Unknownslocal patch
Send needed FilterReceive filtered data
In simplest approach, filtered data gotten by linear transformations on original data based on Singular Value Decomposition of Least squares matrix
Factorize Matrixto product oflocal patches