UAH GRIDS Center Middleware Testing
Sandra RedmanInformation Technology and Systems Center
andInformation Technology Research CenterNational Space Science and Technology
Center256-961-7806
[email protected]@msfc.nasa.gov
www.itsc.uah.edu
“…drowning in data but starving for knowledge”
User
Community
InformationInformation
Data glut affects business, medicine,
military, science
How do we leverage data to make BETTER decisions???
Data Mining
• Automated discovery of patterns, anomalies from vast observational data sets
• Derived knowledge for decision making, predictions and disaster response
http://datamining.itsc.uah.edu
Mining Environment: When,Where, Who and Why?
WHEN•Real Time•On-Ingest•On-Demand•Repeatedly
WHERE•User Workstation•Data Mining Center•GRID
WHO•End Users•Domain Experts•Mining Experts
Data Mining
WHY•Event•Relationship•Association•Corroboration•Collaboration
Algorithm Development and Mining (ADaM)
ADaM consists of:
• a data mining engine
• an extensible set of core functional applications to aid researchers in defining and performing data mining operations on spatial data sets
• data mining modules as Open Grid Services Architecture (OGSA) services
ADaM Engine Architecture
PreprocessedData
PreprocessedData
Patterns/ModelsPatterns/Models
ResultsResults
OutputGIF ImagesHDF-EOSHDF Raster ImagesHDF SDSPolygons (ASCII, DXF)SSM/I MSFC
Brightness TempTIFF ImagesOthers...
Preprocessing AnalysisClustering K Means Isodata MaximumPattern Recognition Bayes Classifier Min. Dist. ClassifierImage Analysis Boundary Detection Concurrence Matrix Dilation and Erosion Histogram Operations Polygon Circumscript Spatial Filtering Texture OperationsGenetic AlgorithmsNeural NetworksOthers...
Selection and Sampling Subsetting Subsampling Select by Value Coincidence SearchGrid Manipulation Grid Creation Bin Aggregate Bin Select Grid Aggregate Grid Select Find HolesImage Processing Cropping Inversion ThresholdingOthers...
Processing
InputHDFHDF-EOSGIF PIP-2SSM/I PathfinderSSM/I TDRSSM/I NESDIS Lvl 1BSSM/I MSFC
Brightness TempUS RainLandsatASCII GrassVectors (ASCII Text)
Intergraph RasterOthers...
TranslatedData
DataData
NMI TestingADaM Feature Subset Selection
application chosen for testing
Supervised pattern classification is a technique important in many domains
Used to improve both the runtime and accuracy of a supervised pattern classifier by eliminating noisy, irrelevant or redundant attributes or features from the data set.
Feature subset selection is the process of choosing a subset of the features from the original data set in order to maximize classifier accuracy
Both processor and data-intensive
Parallel Version of Cloud Extraction
Laplacian FilterSobel Horizontal
FilterSobel Vertical
Filter
Energy Computation
Energy Computation
Energy Computation
Energy Computation
Classifier
GOES Image
Cloud Image
• GOES images can be used to recognize cumulus cloud fields
• Cumulus clouds are small and do not show up well in 4km resolution IR channels
• Detection of cumulus cloud fields in GOES can be accomplished by using texture features or edge detectors
• Three edge detection filters are used together to detect cumulus clouds which lends itself to implementation on a parallel cluster
GOES Image Cumulus CloudMask
Feature Subset Selection Application• Application ported to
linux• Support Vector Machine
downloaded and tested• Developed application
scripts• Modified for Globus
environment by writing simple Globus RSL file
• Ran each combination of tools on a different node on the grid
• Globus used to execute jobs on different machines
• Experimented with both real and synthetic data
Grid Mining Agent
Grid Processor
Satellite Data
Archive X
Satellite Data
Archive Y
Grid Mining Agent
Grid Processor
Grid Mining Agent
Grid Processor
Components used in testing
Globus toolkit - the “defacto standard,” an open source software toolkit and libraries for building grid applications; Resource Management, scheduling, information services, file transfer
GSI- OpenSSH - a modified version of OpenSSH that adds support for GSI authentication, providing a single sign-on remote login capability for the Grid
Condor-G - workload management system for compute-
intensive jobs; job queueing mechanism, scheduling policy, priority scheme, resource monitoring, and resource management.
Network Weather Service - monitors and dynamically forecasts the performance various network and computational resources can deliver over a given time interval
Some Lessons Learned
• Component testing went well Globus documentation improved, installation
trouble-free, application port straight-forward No problems encountered during Condor-G
installation, but found problem with Condor-G under Redhat linux 7.3 when using nss_ldap. Developer provided workaround - start name service caching daemon (nscd)
GSI-OpenSSH installed, but Kerberos authentication did not work since linux was not compiled with PAM option (undocumented)
Network Weather Service installed, but learned we are more interested in MDS
Some Lessons Learned
• NMI Testbed Process working well Answers found through NMI discussion
lists from developers and other users• Have to “sell” the grid concept to
developers, administrators, users• NMI Work proven helpful in other grid work
TeraGrid ISS Space-based Science Operations Grid CEOS Grid
• Need more components!
Top Related