Earth Observing Data Systems in the Internet Era(TRMM), are producing data at rates of tens of...

7

Transcript of Earth Observing Data Systems in the Internet Era(TRMM), are producing data at rates of tens of...

Page 1: Earth Observing Data Systems in the Internet Era(TRMM), are producing data at rates of tens of gigabytes per day. Current data holdings at NASA's and NOAA's data centers exceed many
Page 2: Earth Observing Data Systems in the Internet Era(TRMM), are producing data at rates of tens of gigabytes per day. Current data holdings at NASA's and NOAA's data centers exceed many

H I G H L I G H T

Distributed Earth Observing Data Systems-An Example: The EStP Federation Distributed Earth observing data sys- tems are benefiting from the exist- ence of the Internet. As stated above, though, the very succsss of distribution of da ta and relatively easy access parallels the clogging of the Internet highways. Coupled with the fact that remote sensing satel- lites both of the commercial variety and government-sponsored Earth ob- serving systems are producing in- creasingly higher and higher data rates and data volumes at centers, implies that the problem will get more acute. For example, commer- cial hyperspectral remote sensing (HSRS] satellites will be utilizing hundreds of channels with resolu- tion of tens of meters. producing hundreds of gigabytes of data sets for even small regional coverage. Earth observing satellites with glo- bal coverage. such as NASA's Earth Observing System Terra satellite and Landsat 7 (both to be launched in 1999), will be producing data at rates approaching 1 terabyte per day. Even existing "smaller" missions like the joint NASAINASDA Tropi- cal Roinfall Measuring Mission (TRMM), are producing data at rates of tens of gigabytes per day. Current data holdings a t NASA's and NOAA's data centers exceed many terabytes. The future era of ad- vanced or high technology remote sensing sensor systems will be pro- ducing n wealth of tintn r ich i f ] in- formation. The challenge is to har- vest a non-negligible proportion of these data holdings for the benefit of science and the society at both the nationat and global levels. ' NASA's Earth System Enterprise (ESE] i s building a data system to serve i ts present and future data holdings. Tcrmsd Earth Observing System Data and Information System fEOSDIS), i t relies on a central in- formation management system (XMS) distributed to a variety of Distrib-

uted Active Archive Centers [DAACs), which produce, store and distribute data sets according to specific Focused science areas [Asrar and Greenstone, 1995). Al- though EOSDVS contains both dis- tributed ~ n d centralized aspects, i t

was designed at a t ime when a cen- tralized approach was deemed to be more effective. The advent of the Internet has changed the overall situation. As such, in response to the 1995 recommendations of the National Research Council [NRC], NASA has augmented the existing EOSDIS by a federation of informn- tion and data providers, called the Earth Science Information Partners [ESIPs), Specifically, the 1995 NRC study of the U.S. Global Change Re- search Program IUSGCRP) and NASA's Mission to Planet Earth [the present ESE), recommended to NASA to augment i ts current EOSDIS which relies on a core ar- chitecture with a federation of Earth science information partners (BSD, 19951. Specifically, the NRC recom- mended that "the responsibility For product generation and publication and for user services should be transferred to a federation of part- ners selected through a competitive process open to all." NASA rs- sponded by funding the ESIP federa- tion (NASA, 19971 consisting of 1 2

Earth science-based ESIPs (also known as ESIP-2'sl and 12 value- added, or extended communities ESIPs (also known as ES1P-3's). Sub- ssqr~ently, the current NASA DAACs, which are responsible far producing, archiving, and distribut- ing NASA ESE data products, also joined the federation (as ESIP-1's). In the context of the ESIP federation (herein termed tho Federation]., the DAACs continue to provide baseline services of low-level data produc- tion, archiving, and distribution. The selected ESIPs are drawn from academia. government, and tho pri- vate sector. They are charged with distributing and archiving baseline

data nnd information (ESIP-I 's); cre- ating specialized scientific prod- ucts for the Earth science and global change research c o m m ~ ~ n i t i e s (ESIP- 2's); and developing innovative, practical applications of Earth sci- enca data for the broader community by producing value-added products (ESIP-3's). Federation members are expected to use information tech- nology that i s advanced, scalable, and evolutionary. The description of the individual ESIPs and the overall federation Can be Found a t http:// www.ceosr.gmu.ed~~l-esipfed-

One may examine how the Internet will facilitate the Federa- tion. Its members, the individual ESIP systems, are expected to serve individual user communities with diverse needs and required data and information products. A centralized approach cannot be designed to serve diverse and ever-changing needs. Only a federated approach that relies on specialized data sys- tems that responds to and is in close contact with their user communities c a n hope to succeed. Moreover, a distributed federation consisting of at least 33 members [it is expected that the membership of the Federa- tion will only increase with time], serving hundreds, and more likely thousands, of diverse users cannot rely on a specialized, dedicated not- work. Because of their specialized communities' needs, individual ESIPs and c1;kters of ESIPs nsed to be designed to address the complex queries emanating from scientific and applications* data exploration, analysis, and data mining tasks. Moreover, due to the distributed na- ture of the Federation and the re- quirement for its members to interoperate. support for data. query and Function exchanges is impor- tant. Although the basic Internet can serve some basic data access

and query functions, the more ad- vanced data exchange and interoperable Eunctianalities require

CONTiNUEn ON PACE 545

PHOTOCRAMMETRIC ENGINEERING & REMOTE SENSING

Page 3: Earth Observing Data Systems in the Internet Era(TRMM), are producing data at rates of tens of gigabytes per day. Current data holdings at NASA's and NOAA's data centers exceed many

(.WlrlTfNUED FnOM !'ACE 541

quality of service (QoS) and higher bandwidths that will likely exist in both I2 and NGI. This brings u p many implementation and performance issties anti trade-offs.

The Federation is followiug A distributed ap- proach within its own boundaries, forming a "federa- tion of federations," wherein clusters of ESIPs ant1 other related partners art! coming together for more

efficient end rapid responses to diverse user conlnlu- nitias' needs. What we staled atlove wit1 becorne Oven moro aclite ns Ihose dusters or "mini-Federn- tions" nro forming.

Federation lnteroperahility Options The interopernbility options for Federation partners, a3 specified by the original NASA cooperative agree- ment notice (CAN) under which the ESlPs were funded, are shown in Figure 1. These options include both catalogue search protocols as well as data search and access solulions.

To enable those options, a variety of interopernhil- ity modes, which tl~emsclvas imply different imple- mentations, are clearly available.

It is clear from the al,ove rliscussion that the Fed- eration has mnny options opeti, At the lowest end, Federation partners coitld choose minimum interop- erability afforded by a daka advertisement or direc- tory services s ~ l c h as GCMD to catalogue services as provided by tlie Oak Ridge National Laboratory's DAAC MERCURY system anti the CEOS Interoper- ability Protocol (CIP]. The laiter has more extellrted services as well. If data access and exchanges are needed, the Distributed Oceanographic Data System (DODS) or part of the Seasonal to Interannual ESIP (see below) conld hn utilized. IT Full functionality trailsporellcy i s ~~cct le t l . I ip,tli i t~~eropc~.nl , le st11 u I ions

will need to be fountl. As clusters form, i t is clear that differen1 d l ~ s l c r requiremr?nts (driven by tlisir carnbinecl user con~rnunities) will drivn tho internp- arability designs. These will also determine whether the existing Irlternci con serve these needs (likely i f only advertisement services and rudimentary catn- logue services are needetl) or whether I2 will be re- quired [see below). To a large extent, which oprions are going to be ndopterl i s dependent en what !he Federation ant] its ctusrers are attempting to accom- plish. 11 is likely that more "tight" interoperability options will be pursued by i~ldividual clusters of ESIPs which are nnturnlly working together, whereas minimal solutious s11c11 as Glol~ol Change Master Di- rectory (GCMD) directory and advertisement search &tions nre likely to he ndoptorl by the Federation as n wtlole.

CONTINIJED ON PACE 545

Inlz~.opemhili t in the W- Federation (fron~ the CAN)

Below are he 5 SWlZ irttroptmbility options specified in the CAN

T h permit Ihs ESlP Lo be artomatitJly muchsd quaiad Rom r m o k cliads u if parl o f a fmger w h l t (i.0, a "Fcdmahw;?.

Vtrdm 0 Pmvidea msrch d mder (ods fur accs&ng mom fhm 700 Emth Peience & mrvires he14 ld NNAS DMCI 4 HOhA cm!m.

EOSDXS Cwt Systtm (ECS) EOSDIS' irEr*rfn*tu?r W provide & d d r cmplliw mchitcctuo P a 4 variay oP ~ d r

CE OS Cata lq IntaqerablJfQ Prod& (CW) 239 >O I W5 V asrm3 8s the bass POIOCO~ PmClP (for senrch cordro!, ordsrvlg& aulhtntrcatrm) CIP povldar ddnstnhued march, r ~ l r i d . order L othursamceswiL 4 c c a 4 to VO. Will nlw C1P lo FODC CIermghou* systom a

Federal Gemgraphic Data Committee (FGDC) Clearhghoust Anaclmiyfor d#rlhultd search& rolrlwal of &@Id ponpmllal detbfmm mdbplr m t t s urn* a common aanrch vocnbduy.

Figure I . Lnteroperability in !he WP-Federation (from 111e CAN]

hIocIes of ESIP Intel-opet-ahil ity Below is the hitrarchy of interoperability modes or options. Clcarly, these art not the only choices. And t h y are listed roughly h m total openness or no structurt, to total transparency. Each step reelires additional coordination between ESIPs.

WWW Thare could be pago IisIingaIl th ESlP# wd eachESIP c d d p r k d a a linlt tothat page.

GCMD EnchEQiP m u l d m b m l a dsmiption 0 t h dot# rstuto (ha Olobal ChmgeMdsr Dss&y (OCMD) md d l w ESIP dala to brcoma comp~snlw~ththe F a d d C I torpdal DahCmmtllae(FODC) ratfnrsmmla It would makrESlP data much snrer b fin&

Commtm kont page - T h r a codd be anHTML pa@ ihatasU1e"ftontl w dI ESlPs ttwodd explunihe fedaruianard @I& urns towhchESIP to vlat The won wouldrtquiremore commrmcahm belwoon Vld ESlPs the rnwnlumr of h s c o m m o n pa@, because it w d d conatn mora r d o thm the WWW ldvel of federarim

I n v c ~ t ~ l y I K t ~ ~ ~ d SW& - Ths w d d atow uans or 4 a o ~ ~ r syaem t o h w wbal daiawr held at a h o r ESP* Tharo ere mverd &ffuren!StwIs d q p u t fo? U i m lype d

Plna orders at &he ESIPE - The umr(nyaem) codd ptnc5 w d a r far dAtn f r m wifhin b e intsAacr afmothsr ESP.

Import data Rom othw B S P s EStPscould fr te ly impt dale from omn ESfPs fa inimadiva sdrsim withupns(vtcm#j

Transparency - EachBtP wp.cddmdsl anmlhlywith each h e r . so theusor or another @ern i s not aware of w y rfinmcuon batwasnEStPs

Figure 2. Spec trum of Degree of Inturot~erability for ESlPs

PHOTOGRAMMETRIC ENCINEERIEJG & REMOTE SENSING

Page 4: Earth Observing Data Systems in the Internet Era(TRMM), are producing data at rates of tens of gigabytes per day. Current data holdings at NASA's and NOAA's data centers exceed many

CONTINUED FROM PAGE 543

A Specific Distributed Systkrn in the Fecleration The Seasonnl ta Interannual Earth Science Information Partner (SIESIP) is one af the 1 2 ESIP-2 sys- tems. SIESIP is described in detail elsewhere [Kafatos et al . , 1998: Kafatos, 1999). Here we concentrate on some interoperability and query aspects for this ESIP that might have general relevance for the Fed- eration.

The SIESIP project focuses on serving the data and information products needs of the seasonal to interannual (S-I) scientists, and as-

sociated process studies specialists and interdisciplinary scientists. It can also serve as an appropriate pro- totype for the Federation due ta i t s

distributed nature and representn- tive Functionality. A distributed ar- chitecture of three sites forms the mini-federation; namely George Ma- son University [GMU), the Center for Ocean, Land, Atmosphere Studies (COLA), and the NASA Goddard DAAC. The SEESIP architecture hides the distributed implementa- tion details from the liser and yet supports many fiinctionalities in- cluding online data search, data analysis, a nd data orders. In addi- tion, SIESlP supports many complex queries for d ive rse user communi-

'ties including content-based qusries through a multiresolution represan- tation of data using statistical sum- mnries of important geophysical pa- rameters (Li et al., 11198). Extended range of operations trigger different types of data transfers among the three sites that use different hard- ware and software resollrces For their lacnl implementations. This pro- vides for autonomy and enhance- ment of local capabilities, constitut- ing a mini-federation that could serve as a prototype of the general Eederation. Queries under SIESIP can range from simple data orders to involving data mining tasks that

may require on-the-fly integration of physically dispersed data.

The relationships between the three main partners (Figure 3) ex- tend from rnetadata exchanges ( thin arrows) and data exchanges [me- dium arrows), to full functional ex- changes (thick arrow). The interoperable operations end current SIESIP consortium capabilities (such as working prototypes at GMU, see Knfatos et al . , 1997) are to be en- hanced to serve the specific Earth science S-I community and to pro- vide nn innovative information tsch- nology query, engine and implemen- tation of rr working distributed system.

Trnages of associated parameters for each plot (either multiple GIF displays or animations) can be sup- plied on-dernnntl v i a tllu Internet. Moreover, corrolntion coefficients, means, standard darivntions, and other statisticnlly-derived param- eters derived from the content-based browsing, can form a set of new metadata For the system (Li et al., 1998; Kafatos et al., 1998).

The SiESFP ft~nctionnlity and queries allow online data search, analysis, and order. Searches can be performed based on regular metadata or based on data contents via the WWW. Since SIESIP supports data pyramids of different resolu-

Figure 3.

SIESIP as a Prototype for the Earth Science Federation

At the heart of the query engine and system analysis is the Grid Analysis and Display System (GrADS) [Doty et al., 1997). A variety of data products are distributed among the members of the mini-fod- eralion and accessible v i n the WWW (Ky le et al . , lsan; UDwl data). One particular useful data collection is the Climatology Interdiscipliilary Data Callection, available in Pour CD's which can be orderedlrequestsd from the Goddard DAAC or from GMU (see also http://sissip.gmu. edu/tfata.html].

tions [Li et al., 1998) a specific reso- lution co11ld be selected for brows- ing. With the mult iple resolution data, users can start from low resolll- tion images and drill down to higher resolution ones. In this way. users can browse data covering a large spatial and ternporn1 range and then focus on a srnalk interesting range based o n the previous browsing re- sult. Because users use data of grow- ing sesolut io~ each time they get closer to the target data, the data volume does not increase rapidly.

CONTIHIJET) O N PAGE 546

PMOTOGRAMMETRlC ENGINEERING 8 REMOTE SENSING M a y 1999 545

Page 5: Earth Observing Data Systems in the Internet Era(TRMM), are producing data at rates of tens of gigabytes per day. Current data holdings at NASA's and NOAA's data centers exceed many

Therefore, data mining on a large volume of data can be performed at a reasonable spoed. This functional support .allows quick queries of po- tentially large data volumes using the current Internet. In order to sup- port these diverse types of Eunc- tions, SIESIP search is designed around three typeslphases of queries that focus on catalagl~e search, analysis and content-based search, and data ordering as Fqllows.

Phase 1: Using the mstadata and browse images provided by the SlESIP system, the user browses tha data holdings.

Phase 2: The user gets a quick estimate of the type and quality of data found in phase 1. Analytical tools are applied, including statis- tical functions and visualization algorithms available via WWW through SIESIP. The SIESIP inter- face incorporates a spectrum of

stetistical data mining algorithms. We have also begun to implement tools for Finding positive correla- tions providing realistic, human- aided data min ing capability. The use of analysis tools such as GrADS to aid the search is also in- corporated in this phase.

Phase 3: The user has located the data sets of interest and is ready to order. I f tho data are available through SIESIP, it will handle the data order; otherwise,

an order will be issued to the ap- propriate data provider on behalf of the user, or necessary informa- tion will be forwarded to the user for this task. The three-phase data search and

order functions are an integral part of [he SIESIP consortiam. Each node (GMU, COLA and GDAAC) performs one or more of these search and or- der functions. The SIESIP consor- tium i s , therefore, an IT irnplemen-

H I G H L I G H T

tation of the needed Eunctionalities of the distributed system to serve

our communities (http:1/ www.siesip.gmu.edu).

The architecture design of the '

SIESIP mini-federation is to support the queries in all three phases in a modular fashion. Specifically, there exist three types of servers: each serving queries in one phnse [See Figure 4). 111 this architeciure dia. gram, the disk associated with a server is located in the same physi- cat location. That is, the cornmuni- catioiz between thc server anrl the disk associated with it does not go through the Internet. A "Metaclata Server" is for phase 1 fo r metadata] queries, a "GrADS Server" is for phase 2 (or analysis) queries, and a "Data Order Server" is for phase 3

queries [or data set requests]. There is also an experimental "Data Pyra- mid Server" which will be respon- sible for data mining queries. The

546 May I ' I Y r l PHOTOGRAMMETRIC ENGINEERSNG & REMOTE SENSING

Page 6: Earth Observing Data Systems in the Internet Era(TRMM), are producing data at rates of tens of gigabytes per day. Current data holdings at NASA's and NOAA's data centers exceed many

Data Pyramid stores low-resolution data as well as some pre-compl~tetl statistics,for fast processing of data mining [content-based] queries. An implementation of the SIESIP archi- tecture can have many distributed servers, as well as user interface drivers, located at different physical locations, while a physical system may host a number of different serv- ers and interface drivers. As such, SIESIP Covers a number of different federated services.

WWW continues to grow, it starts to assert its ~ O ! R as an essential, easy-to- use mechanism for delivering infor- mation of different modalities in a geographically distributed system. In addition to its ease of use and wide availability, the Web lends itself to the client-server architectural model.

To make data sets available online for a cluster of ESIPs or a mini-federation, date sets and rs- lated metadata should be trans- ferred to the system from the origi-

- nata Sets Data

Figure 4 . Conceptual Architecture of a SIESlP Site

The system supports all function- alities through the same interfnce. A GUI applet based on the Java Swing ~ a c b g e will appear in n web page. By using the window like GUI with menu lists, folders, and more, users tin search metadata, make spatial and temporal selections, and submit analysis or order qnerias.

l mplernentation Options and Results

nal data sources or data praducers. For example, metadata ingested at the data product level could be achieved through HTML forms and CGI programs. Data providers could then provide a minimum amount of information regarding the data prad- ucts by filling appropriate Web forms. The system operator would then run another program to insert the information caught by the CGI nroeram into the DBMS. The increas- . .,

In this section we examine some of ing popularity OF XML might provide the technologies available for imple- an easy metadata protocol manking an interoperability layer for that would minimize anything the a single ESIP consisting of more than data nroYider a one site [such as in SIESIP] or a clus- would have to do. To ingest ter-oh ESIPs, in the ovsrall Federntion. metadata, one s l~ou ld consider In doing so, the growing power of ''le the exchanges are asvn- WWW becomes most important. As

- chronous, which are often necessary

to access data stored in nearline tape libraries; or whether the date are online, in which case the rnetndata will include the Universal Resaurce Locators and a simple FTP (or HTTP] transfer to tho client can be utilized.

Recent developments in informa- tion technology have res~ll ted in a number of distributed object erchi- tectures that provide the framework required for building and using cli- sntlserver applications that use dis- tributed objects. The framework aiso

supports a large number of servers

and applications running concur-

rently. Many such frameworks pro* vide natural mechanisms for in- teroperability, Tor example, tha Common Object Request Broker Ar - chitecture (CORBA] nnd the Remote Method of Invocation [RMI). CORBA is n product o f a n industry consor- tium called the Object Management Group (OMG). I t is a set of specifica- tions for providing interoperability and portability to distributed object- oriented applications. CORBA-com- pliant applications can communi- cate with sach other regardless of location, implementation language, underlying operating system and hardware systems. The RMI specifi- cation is a new API that lets one create objects whose methods can bo invoked from a different Java Virtual Machine (JVM). The JVM may be running in the same physical ma- chine or a remote server. Thus, RMI basically provides the capability far calling methods on romota objects.

We have selected to examine the above two options for interopcrabil- i ty tests [although performed for our own ESIP, their applicability is not limited to it and has Federation- wide importance]. We have also se- lected to consider more hasic tech- niques such as ftp and sockets. This selection was based on the potential for creating low-overhead protocols that may be suitable for a simp10 baseline class of Federation apgli-

CONTlNIlED ON PAGE548

PHOTOGRkMMETUlC ENGlHEERlNC & REMOTE SENSING M a y 1899 547

Page 7: Earth Observing Data Systems in the Internet Era(TRMM), are producing data at rates of tens of gigabytes per day. Current data holdings at NASA's and NOAA's data centers exceed many

CONTINUED FROM PAGE 847

cations. These can also be used to implement Inore so- phisticated interoperability standards such as 239.50 and i t s CIP profile (see above]. In order-to assess the impact of using one technique ver-

s u s the other, we have e x p e r i m ~ n t a l l ~ studied the p r f o r - mance of CORBA and RMI, as well as Light-weight. but more primitive solutions such ns sockets arid Ftp. These technologies were tested over 10 Mbps LANs as well as over the Internet. The testad scenarios considered up to 1 6 clients and two servers. The transferred object sizes ranged from 2KB to IOMB. Table 1 summarizes the testing conditions and the measurements that were collected.

Figures 5 and 6 show some of the characteristics of these diffarent techniques. All of the compared methods seem to have similar bahavior up to 256KB messages. Be- yond that point, the overhead associated with CORBA and RMI becomes quite clear. Table 2 summarizes the re- sults and indicates a number of important facts. CORBA seems to be four times slower than RMI, 10 times slower than sockets, and 4 0 times slower than Rp in a LAN en- vironment when requested objects nre significant [ Z MB i n this experimenk). This might be a typical size for a subset of remote sensing (9.g. MODIS) data set. Due to the large overhead over the Internet, the performance gap between these technologies becomes smaller, less than one order of magnitude. However. the performance difference remains high between these techniques. In these experiments CORBA was not tested over the Internet. However, as understood from the LAN experi- ment, the performance of CORBA is much lower than that of RMI. It is possible to infer that CQRBA's perfor- mance i s perhaps three to four times worse than RMI. This is only an estimate based on the fact that KMI was five times better in the LAN experiment and that the common overhead of !he Internet environment is likely to bring the gap closer. FTP seems to be optimized to a system's parameters and therefore was able to perform better than sockets that are not optimized by the operat- ing system developers. Furthermore, managing a lot of sockets has both a performance penalty as well as scalability limitations.

Cor~clusions The above results indicate that i n the new eilierging era of Iz and NGX, data transfers of single or a few retnote sensing images of tens to hnndrads of megabytes (such as a MODIS Terra "granule" or a hyperspectral image) will requiro dedicated networks and high bandwidths a s in 12 and NGI. Even in supposedly high bandwidth sys- tems, Quality of Service (QoS) will be most important where QoS refers to the realized communication band- width between two distributed sites. In today's Internet,

CONTINUED O N IJAGE 562

H I G H L I G H T

I Data Object Sires I Implenlen tations

s/No, of Servers 1-1612

CORBA, KMI, SOCKET, FTP

I Networlc* I LAN 110 Mbps) and lnlemel 1 Table 1.

Figuro 5. Response Time Over the LAN

System Response Tlme [LAN) : single

Figure 6. Response Time Over thc Internet

i~

lY

I. - : -. m Y

E =- F II) - ' QI

C rn -. m vl

2 x

System Response Time (Internst) : slngls user

300

CORBA

warn

- - ~ --- 4 _ ----I

.n II

250 7 200

p 150 0 ID fL& 100 p 50

0

Socket

44 +RMI

SOEWT

+fTP

".A

E & d - - k - /

Max 1 480.04

2K 84K 256K 2hl 10M

rile S I Z ~

Median 69.49

114.34

Median 30.02 1

I I Max 1 34.4 4428.83 1

Table 2.

PHOTOGRAMMETRIC ENGINEERING & REMOTE SEWSING

Z7Cr

548 M a y 1999

Median

Max

11.04

23.25

183.06

341.31