A Framework for Network RTK Data Processing Based on … · multiple continuously operating...
-
Upload
phungduong -
Category
Documents
-
view
224 -
download
4
Transcript of A Framework for Network RTK Data Processing Based on … · multiple continuously operating...
A Framework for Network RTK Data
Processing Based on Grid Computing
Deming Yin
A thesis submitted in partial fulfilment of the requirements for the degree of
Master by Research
Faculty of Science and Technology Queensland University of Technology
Brisbane, Queensland, 4001, AUSTRALIA
January, 2009
II
ABSTRACT
Real-Time Kinematic (RTK) positioning is a technique used to provide precise
positioning services at centimetre accuracy level in the context of Global Navigation
Satellite Systems (GNSS). While a Network-based RTK (N-RTK) system involves
multiple continuously operating reference stations (CORS), the simplest form of a
NRTK system is a single-base RTK. In Australia there are several NRTK services
operating in different states and over 1000 single-base RTK systems to support
precise positioning applications for surveying, mining, agriculture, and civil
construction in regional areas. Additionally, future generation GNSS constellations,
including modernised GPS, Galileo, GLONASS, and Compass, with multiple
frequencies have been either developed or will become fully operational in the next
decade.
A trend of future development of RTK systems is to make use of various isolated
operating network and single-base RTK systems and multiple GNSS constellations
for extended service coverage and improved performance. Several computational
challenges have been identified for future NRTK services including:
• Multiple GNSS constellations and multiple frequencies
• Large scale, wide area NRTK services with a network of networks
• Complex computation algorithms and processes
• Greater part of positioning processes shifting from user end to network centre
with the ability to cope with hundreds of simultaneous users’ requests
(reverse RTK)
There are two major requirements for NRTK data processing based on the four
challenges faced by future NRTK systems, expandable computing power and
scalable data sharing/transferring capability. This research explores new approaches
to address these future NRTK challenges and requirements using the Grid
Computing facility, in particular for large data processing burdens and complex
computation algorithms. A Grid Computing based NRTK framework is proposed in
this research, which is a layered framework consisting of: 1) Client layer with the
form of Grid portal; 2) Service layer; 3) Execution layer. The user’s request is passed
III
through these layers, and scheduled to different Grid nodes in the network
infrastructure.
A proof-of-concept demonstration for the proposed framework is performed in a
five-node Grid environment at QUT and also Grid Australia. The Networked
Transport of RTCM via Internet Protocol (Ntrip) open source software is adopted to
download real-time RTCM data from multiple reference stations through the
Internet, followed by job scheduling and simplified RTK computing. The system
performance has been analysed and the results have preliminarily demonstrated the
concepts and functionality of the new NRTK framework based on Grid Computing,
whilst some aspects of the performance of the system are yet to be improved in
future work.
IV
ACKNOWLEDGEMENT
Studying abroad has been a long and at times arduous journey, and one that could not
have been completed without the support of many people.
Firstly, I would like to express my sincere gratitude to my supervisors Professor
Mark Looi and Associate Professor Yanming Feng. They brought me into the area of
GNSS positioning, and provided various kinds of resources to facilitate my research.
During the research process, they also made sure that I was on the right track and
supervised the progress of my milestones. And most importantly, they checked my
thesis with great patience and gave me invaluable suggestions.
I wish to then thank Dr Charles Wang for his support in a variety of ways. I really
appreciate his time commenting on my various reports and thesis, and often gave me
lots of detailed and constructive feedback. I offer my thanks to PhD candidate
Bofeng Li for his answers to the NRTK problems which confused me a lot at the
early stage.
Computational resources and services used in this work were provided by the High
Performance Computing (HPC) and Research Support Group, Queensland University
of Technology (QUT). Special thanks are also offered to Mr Ashley Wright from
HPC for his untiring assistance with the Grid Australia node Auriga. I am fortunate
to have had access to such an excellent facility and staff. I also gratefully
acknowledge QUT and the Faculty of Information and Technology for awarding me
a scholarship to pursue my studies abroad.
Last but not least, I want to extend my deepest appreciation from the bottom of my
heart to my family and my friends, for believing that I could achieve anything I set
my mind to, for encouraging me to take the more challenging path, and for their
unwavering support during my study and my life so far.
V
STATEMENT OF AUTHORSHIP
“The work contained in this thesis has not been previously submitted to meet
requirements for an award at this or any other higher education institution. To the
best of my knowledge and belief, the thesis contains no material previously
published or written by another person except where due reference is made.”
Signature __________________
Deming Yin
Date __________________
VI
TABLE OF CONTENTS
1. Introduction ........................................................................................ 1 1.1 Background of Research................................................................................ 1
1.2 Statements of Research Problems ................................................................. 3
1.3 Research Aims and Objectives ...................................................................... 5
1.4 Structure of Thesis ......................................................................................... 6
2. Review of NRTK and Grid Computing ............................................. 8 2.1 Overview of GNSS ........................................................................................ 8
2.1.1 GNSS Systems ....................................................................................... 8
2.1.2 Real Time Kinematic (RTK) Positioning ............................................ 11
2.2 Overview of Grid Computing ...................................................................... 16
2.2.1 Grid Computing ................................................................................... 16
2.2.2 Security ................................................................................................ 21
2.2.3 Web Service ......................................................................................... 22
2.2.4 Job Scheduling ..................................................................................... 23
2.2.5 Globus .................................................................................................. 23
2.2.6 Grid Computing Applications .............................................................. 25
2.3 Relationship between NRTK and Grid Computing .................................... 26
2.4 Summary ..................................................................................................... 26
3. Design of Framework for NRTK Data Processing Based on Grid
Computing ............................................................................................... 28 3.1 Requirements of NTRK Data Processing .................................................... 28
3.2 Architectural Studies of Grid Computing Based NRTK ............................. 31
3.3 Framework Design ...................................................................................... 32
3.4 Design of Different Layers .......................................................................... 35
3.4.1 Client Layer .......................................................................................... 35
3.4.2 Service Layer ....................................................................................... 36
3.4.3 Execution Layer ................................................................................... 39
3.5 Summary ..................................................................................................... 41
VII
4. Demonstration of Framework – A Simplified Data Processing
Experiment .............................................................................................. 42 4.1 Overview of a Proof-of-Concept Demonstration ........................................ 42
4.1.1 Introduction to Simplified Data Processing Demonstration ................ 42
4.1.2 Evaluation of Open Source Ntrip Client .............................................. 43
4.1.3 Demonstration Environment ................................................................ 46
4.2 Requirements and Design of the Proof-of-Concept Demonstration ............ 47
4.3 Implementation of Demonstration ............................................................... 48
4.3.1 Globus Configuration ........................................................................... 48
4.3.2 Submitting a Job ................................................................................... 50
4.3.3 Processing Procedure ........................................................................... 53
4.4 Results Analysis and Discussions ................................................................ 55
4.4.1 Performance Evaluation ....................................................................... 55
4.4.2 Multiple Mountpoints on Each Grid Node ........................................... 56
4.4.3 Computation Task Submission ............................................................. 57
4.4.4 Different Computation Tasks on Each Grid Node ............................... 57
4.4.5 Additional Tests ................................................................................... 59
4.4.6 Discussion ............................................................................................ 60
4.5 Result From Lightweight Grid Tool JPPF ................................................... 61
4.6 Summary ...................................................................................................... 63
5. Concluding Remarks and Future Work ............................................ 64 5.1 Concluding Remarks ................................................................................... 64
5.2 Future Work ................................................................................................. 65
Appendix A: Installation Process of Globus ........................................... 67
Appendix B: Web Service for Submitting a Job ..................................... 76
References ............................................................................................... 81
VIII
LIST OF TABLES Table 1.1 GNSS Present and Future ............................................................................ 3
Table 2.1 Background Processing and Time Requirements ...................................... 26
Table 3.1 Background Processing and Time Requirements ...................................... 29
Table 3.2 Data Amount of Backing-up Service ......................................................... 30
Table 3.3 User-System Table ..................................................................................... 36
Table 4.1 Open Source Ntrip Client Information....................................................... 44
Table 4.2 RTCM 2.x Message ................................................................................... 44
Table 4.3 Configuration of Grid Nodes in Laboratory .............................................. 46
Table 4.4 Specifications of the Grid Node at QUT .................................................... 47
Table 4.5 Basic Parameters of Grid Nodes ................................................................ 49
Table 4.6 Brief Installation Process of Globus .......................................................... 49
Table 4.7 Valid Proxy Generation ............................................................................. 50
Table 4.8 Submitting a Job in Globus ........................................................................ 51
Table 4.9 Job Description in XML Format ................................................................ 51
Table 4.10 Job Description with File Stage-In and Stage-Out................................... 52
Table 4.11 Time Factor of Simple Job (Unit: Second) .............................................. 57
Table 4.12 Time Factor of a Complex Job (Unit: Second) ........................................ 59
Table 4.13 Time Factor of Data Sharing (Unit: Second) ........................................... 60
Table 4.14 Time Factor of Data Sharing (Unit: Second) ........................................... 63
IX
LIST OF FIGURES Figure 1.1 SunPOZ Current Network Coverage (Higgins 2008) ................................. 2
Figure 1.2 Planned Distribution of New AuScope GNSS Stations (Rizos 2007) ........ 5
Figure 2.1 Constellations of GPS (Wikipedia 2007) .................................................... 9
Figure 2.2 Fundamental Principle of Positioning ....................................................... 10
Figure 2.3 RTK Positioning Illustrations ................................................................... 12
Figure 2.4 Illustration of Single Base RTK and NRTK ............................................. 13
Figure 2.5 System Architecture of the Virtual Reference Station (Wanninger 2006) 14
Figure 2.6 Simplified Grid Topology ......................................................................... 17
Figure 2.7 Layered Grid Architecture vs. the Internet Protocol Architecture ............ 18
Figure 2.8 Basic Classifications of Grid Computing Organizations .......................... 19
Figure 2.9 Globus GT4 Infrastructures ...................................................................... 19
Figure 2.10 Topology of Grid .................................................................................... 20
Figure 2.11 OGSA Platform Architecture .................................................................. 20
Figure 2.12 Simplified Scenario of Typical Secured Communication ...................... 21
Figure 2.13 Web Service Architecture ....................................................................... 22
Figure 2.14 Globus Components (Sotomayor and Childers 2006) ............................ 24
Figure 2.15 Schematic View of GT4.0 Components (Foster 2005) ........................... 25
Figure 3.1 Data Sharing Service Among Different Sites ........................................... 29
Figure 3.2 NRTK Architecture ................................................................................... 31
Figure 3.3 Grid Computing based NRTK .................................................................. 32
Figure 3.4 Different Organisation Being Responsible for Different Component ...... 33
Figure 3.5 Execution Layer Network Topology ........................................................ 35
Figure 3.6 Service Decomposition and Composition ................................................. 38
Figure 3.7 Background Processing Service Decomposition ...................................... 39
Figure 3.8 A Simplified Network Infrastructure of NRTK Data Processing ............. 40
Figure 3.9 Simple CA Solution for Security .............................................................. 41
Figure 4.1 Ntrip Data Flow Process ........................................................................... 43
Figure 4.2 Grid Environment in a Laboratory at QUT .............................................. 47
Figure 4.3 Design of Demonstration .......................................................................... 48
Figure 4.4 Ntrip Data Processing Procedure .............................................................. 53
Figure 4.5 Job Submission Procedure ........................................................................ 54
Figure 4.6 Calculating Positions on Auriga ............................................................... 58
X
Figure 4.7 Calculating Ionospheric Bias on Auriga ................................................... 58
Figure 4.8 JPPF Architecture (JPPF 2008) ................................................................ 62
XI
XII
LIST OF ACRONYMS ARGN: Australian Regional GNSS Network
CORS: Continuously Operating Reference Stations
DoD: Department of Defence
EDGE: Enhanced Data rates for GSM Evolution
GNSS: Global Navigation Satellite Systems
GPRS: General Packet Radio Services
GPS: NAVSTAR Global Positioning System
GSM: Global System for Mobile Communication
IGS: International GNSS Service
NCRIS: National Collaborative Research Infrastructure Strategy
NPC: Network Processing Centre
Ntrip: Networked Transport of RTCM via Internet Protocol
NRTK: Network RTK
RINEX: Receiver Independent Data Exchange format
RTCM: Radio Technical Commission for Maritime Services
RTK: Real Time Kinematic
SAPOS: Satellite Positioning Service of the German National Survey
SV: Space Vehicle
TCAR: Three Carrier Ambiguity Resolution
UMTS: Universal Mobile Telecommunication Service
WS GRAM: Web Service Grid Resource Allocation Management
1. Introduction
1.1 Background of Research
The research problem of this thesis arose from a Cooperative Research Centre for
Spatial Information (CRCSI) project which faces several challenges for future GNSS
precise positioning services. The CRCSI project 1.04, titled “Delivering Precise
Positioning Services in Regional Areas”, includes several organizations and industry
partners including: Queensland University of Technology (QUT), The University of
New South Wales (UNSW), Queensland Department of Natural Resources and
Water (NRW), Geoscience Australia, Ergon Energy, Leica Geosystems, Trimble
Navigation, etc (CRCSI 2007).
Project 1.04’s main goal is to develop an extension of GNSS precise positioning
services into regional areas to facilitate the adoption of such services in agriculture,
mining, utilities, construction, tourism, defence and environmental protection. With
some similar service networks having been established in areas of high population
density, such as the SunPOZ network for South East Queensland as shown in Figure
1.1, the project aims to extend the current service to sparsely populated regional
areas where the available technology may be characterised as “thin infrastructure”. In
order to achieve this, several research areas have been identified and have to be
explored.
1
Figure 1.1 SunPOZ Current Network Coverage (Higgins 2008)
Project 1.04’s first research task is to investigate the technical framework for the
current and future GNSS network architecture and operations. This will examine
alternative approaches for precise positioning network architecture beyond the
solutions currently offered by the major commercial suppliers. It will also address
challenges and benefits of next generation GNSS such as GPS-III and Galileo in
terms of all level GNSS services at local, regional and global scales.
The second task of Project 1.04 is to investigate appropriate communications
infrastructure for both the provider’s reference stations and the client’s rover
receivers, when working in remote and sparsely populated regional areas.
The third part of Project 1.04 is to develop a software-based research platform to
investigate and develop the next generation precise positioning services.
This research is closely related to the third part of Project 1.04 and will be conducted
by trying to approach the software-based research platform utilising Grid.
2
1.2 Statements of Research Problems
Over the next decade, new GNSS systems (Galileo, GLONASS, and others) will be
deployed, offering worldwide positioning services which have much more enhanced
capabilities than the current GPS system. However technical challenges have been
identified with future GNSS systems. Firstly, the number of available satellites will
double, or triple compared to the current situation (GPS only), and even quadruple if
the proposed Chinese Compass is also considered. Additionally, a major change in
the broadcast frequency is proposed. The positioning service will be upgraded from
the current dual frequency based to triple or multiple frequency based. Table 1.1
provides a comparison between the current and future GNSS system. As more
satellites and L-band signals become available, the reference station in Continuously
Operating Reference Stations (CORS) network needs to have the ability to deal with
increasing amounts of data and complexity of data interoperability.
Table 1.1 GNSS Present and Future
GNSS System
Number of Satellites Frequency Band Present Future Present Future
GPS 31 31+ L1, L2 L1, L2, L5 GLONASS 10 24+ G1, G2 G1, G2, G3
Galileo 1 30+ L1 E5A, E5B, L1, E6 Compass 1 35+ E1, E2 E1, E2, E6, L5
As the number of base stations increases in each region, there is a tendency for
different RTK networks in different regions to connect to each other to provide wider
area positioning services. For example, the number of reference stations will grow
from 30 to over 100 in GPSnet (GPSnet 2008) of Victoria, and the same situation
will happen from 11 to 30 in SydNET (SydNET 2008) of New South Whales, and 7
to over 20 in SunPOZ of Queensland. AuScope (AuScope 2008), an organisation for
a National Earth Science Infrastructure Program 2007-2011 funded by National
Collaborative Research Infrastructure Strategy (NCRIS), will establish more than
100 new Australian Regional GNSS Network (ARGN) quality stations throughout
Australia.
Due to the increasingly larger ground network scales and updates of GNSS systems,
computations used to compute differential corrections in Network Processing Centre
3
(NPC) will be much more complicated and heavier than those used currently. Firstly,
more rapid tropospheric and ionospheric model updates, and the adoption of Three
Carrier Ambiguity Resolution (TCAR) will bring additional complexity to the
positioning processes (Feng and Rizos 2005). Secondly, as the purpose of the CRC
project is to deliver precise positioning services in regional areas, the base line
between stations in NRTK will expand from 70km to 150-200km (Feng and Li
2008). Thirdly, background processing referring to time consuming processes such
as preparing GNSS orbital polynomial functions, estimation of tropospheric grid
models and ionospheric grid map generation, requires huge extra computational
capability. As a result, a new solution is needed for these extremely time-consuming
computations (e.g., huge-dimensional matrix calculation) to provide high-quality
real-time positioning services.
Another problem is that a greater part of the positioning process may shift from user-
terminals to the network centre which has the ability to cope with hundreds of
simultaneous users’ requests. Conventionally, most of the positioning process is done
at the user end. As more positioning processes could be dealt with at the network
centre or servers, also called reverse RTK, this could be a challenge for real-time
positioning service (Lim and Rizos 2007).
To summarize, there are four main aspects that will challenge the computation of
NRTK data processing.
• Multiple GNSS constellations and multiple frequencies
• Large scale, wide area NRTK services with more reference stations
• Increasingly Complex computation algorithms for network-based processing
• Large part of positioning computation process shifting from the user end to
the network centre with ability to cope with hundreds of simultaneous users’
requests (reverse RTK)
In addition to computational requirements, data sharing should also be a significant
post-processing requirement for NRTK. Data sharing among processing centres and
servers sitting in different CORS networks and IGS reference stations/servers will
become very frequent as each CORS becomes more extensively covered. Such data
sharing will bring substantial benefits to nation-wide geodetic activities, such as the
4
backup service of one state data in another state’s processing centre in case of an
earthquake for an example.
This research proposes a Grid Computing based framework to address the problems
mentioned above. As the whole project evolves and approaches full Australia
coverage at a later stage, as shown in Figure 1.2, the NPC from every state may be
integrated as a whole. This network infrastructure will act as the hardware base for
the Grid Computing environment. Corresponding software architecture will also be
discussed in detail in the proposal.
Figure 1.2 Planned Distribution of New AuScope GNSS Stations (Rizos 2007)
It is important to note that although the research questions come from the CRCSI
project, the work does not reflect the official position and viewpoints of the project
team of QUT, NRW and UNSW. Instead, the research provides an independent
perspective to the problems.
1.3 Research Aims and Objectives
The aim of this research is to explore the use of Grid Computing as a solution to
provide a high performance computing platform, thus meet the requirement of
providing real-time or near real-time corrections to users. The objectives of this
research are outlined below:
• To analyse the time requirements of large scale NRTK.
5
At the very first stage, timeliness requirements of large scale NRTK will be
analysed based on the existing research problems. The requirements will serve as
the basis of the framework to be designed. Also, the requirements are so generic
that they can also be used by other methods trying to approach large scale NRTK
problems.
• To propose a Grid Computing-based framework to process data from
multiple reference stations.
This is the main objective of this research effort. A robust, reusable and compact
framework will be developed to cope with the proposed research problems. Lots
of factors, such as time, computational capability, and heterogeneous
environment, should be considered during the designing process.
• To implement the framework to demonstrate the idea of Grid.
A full implementation of the above framework for Network RTK services (data
collection, correction generation, positioning, etc) involves significant
development efforts and is beyond the scope of this research. A simplified data
processing experiment is instead used as a proof-of-concept demonstration for
the proposed framework based on Grid Computing.
1.4 Structure of Thesis
Apart from a review of existing methods and solutions, this research effort consists
of four tasks, which are conducted in four phases: 1) Examine timeliness
requirements of large scale NRTK; 2) Build a Grid Computing environment for
NRTK data processing using Globus; 3) Reconstruct reference station algorithms
adapted to Grid Computing; 4) Test simplified NRTK positioning cases in the Grid
Computing environment.
The whole thesis is divided into five chapters. The first chapter gives some
background of the research topic and research problems. The second chapter is
dedicated to an overview of NRTK and Grid Computing. The third chapter presents
the requirements of data processing and design of NRTK data processing based on
Grid Computing. The fourth chapter discusses how the Ntrip data processing
6
experiment is conducted and the test results. The final chapter presents concluding
remarks and a brief outline of future work.
7
2. Review of NRTK and Grid Computing
2.1 Overview of GNSS
This section provides a general overview of GNSS. Topics such as different GNSS
systems, GNSS applications, and various Real Time Kinematic (RTK) principles and
concepts are covered.
2.1.1 GNSS Systems
GNSS Navigation and Positioning
Firstly, we outline the concept of navigation. Navigation is simply about how we can
go from point A to point B. GNSS is the acronym of Global Navigation Satellite
System. Generally speaking, GNSS tells us where we are and how to get to our
destination if we have a GNSS receiver. A GNSS allows receivers to determine their
states (longitude, latitude, altitude, velocity and time) with positioning accuracy of a
few to tens of metres using time signals transmitted along a line of sight by radio
from the satellites. Receivers on the ground with a fixed position can also be used as
reference stations for precise positioning services and scientific purposes (Misra and
Enge 2006).
There are several existing and planned GNSS navigation systems, such as
NAVSTAR Global Positioning System (GPS) of U.S., GLONASS of Russia, Galileo
of Europe and Compass of China. GPS is the first of the new generation of
navigation satellite systems to become operational and is likely to remain the only
fully operational system at least until 2010. GPS cost the U.S. government US$10
billion to develop and US$500 million for operation and maintenance annually
(Misra and Enge 2001; Misra and Enge 2006).
GPS consists of three segments, space segment, control segment and user segment.
The Department of Defence (DoD) is responsible for both the Space and Control
Segments. The Space segment shown in Figure 2.1 is composed of satellites in the
sky, which are also called space vehicles (SVs). The baseline constellation comprises
24 satellites deployed in nearly circular orbits with a radius of 26,560 km, a period of
approximately twelve hours, and stationary ground tracks. There are several kinds of
8
satellites, Block I, Block II, Block IIA, Block IIR (the ‘R’ stands for replenishment)
and Block IIR-M. Block IIR-M and the future GPS III satellites (or Block IIF/Block
III) will contain more frequencies. Early sailors relied on stars in the sky to judge
their direction and they had to rely on their experiences when it was rainy or cloudy.
But satellites as man-made stars function 24/7, 365 days a year. So thanks to
satellites, nowadays it’s really much easier to know where we are on the earth.
Figure 2.1 Constellations of GPS (Wikipedia 2007)
The Control segment consists of Master Control Station (MCS), monitor stations and
ground antennas. At the heart of the Control Segment is the Master Control Station,
located at the Schriever (formerly named Falcon) Air Force Base near Colorado
Springs, Colorado. The Master Control Station operates the system and provides
command and control functions. The specific functions of the Control Segment are:
• to monitor satellite orbits,
• to monitor and maintain satellite health,
• to maintain GPS Time,
• to predict satellite ephemerides and clock parameters,
• to update satellite navigation messages,
• to command small manoeuvres of satellites to maintain orbit, and relocations
to compensate for failures, as needed
The User segment consists of user receivers. The success of GPS in large-scale civil
use is attributable almost entirely to the revolution in integrated circuits, which has
made the receivers compact, light, and an-order-of-magnitude less expensive than
what was possibly thought twenty years ago. Much higher capabilities and lower
9
price of receivers have been achieved since the first generation receivers were
designed for precise positioning.
GPS satellites transmit two kinds of codes, C/A code for civilian use and P(Y) code
for military use. Signals are transmitted from the satellites to users in two
frequencies, L1, L2. Receivers can receive C/A code, P(Y) code and carrier phase
measurement from L1 frequency, P(Y) code and carrier phase measurement from L2
frequency. Two civilian frequencies will be available in the future, that is, L2C code
of L2 frequency and the future L5 frequency. As most GNSS systems except Galileo,
are originally developed for military service, so there are always some encrypted
codes or frequencies which can’t be accessed by civilian users.
The fundamental positioning principle is shown in Figure 2.2. Basically, at least four
satellites are needed to determine the three dimensional (x, y, z) position of the
receiver and the time factor. There are a variety of GNSS positioning errors in this
positioning process, such as satellite clock and ephemeris, ionospheric delay,
tropospheric delay, multipath and receiver noise. This is why RTK concept is
introduced to reduce these positioning errors in order to provide a better positioning
service.
Satellite
Satellite
User Receiver (X, Y, Z, T)
Satellite
Satellite
Figure 2.2 Fundamental Principle of Positioning
GNSS Applications
Applications of GNSS exist in both military and civilian fields. GPS and GLONASS
were first developed and adopted for dedicated military use, but they are now
10
becoming more and more popular for civilian usage. The Galileo and Compass
systems under development are designed for dual use, although some services may
be accessed by authorised users only.
The civil applications of GNSS-based positioning and navigation may be divided
roughly into (Misra and Enge 2006)
• mass market, such as vehicle navigation
• specialized applications such as aviation, transport and space navigation
• professional level, such as high-precision (millimetre-to-centimetre-level)
positioning
The mass market has the largest number of user receivers and takes about 90% of the
whole GNSS application’s market, but the accuracy requirement for this kind of
consumers is not very strict, usually one to ten meters. More specialized applications
such as transport safety of life, liability critical and space navigation require more
accurate positioning results and more reliable service, which is also known as
integrity and high availability of GNSS positioning service. The third kind of GNSS
application is more professional level, such as high-precision (millimetre-to-
centimetre-level) positioning, which is the main concern of this research.
2.1.2 Real Time Kinematic (RTK) Positioning
Basic GNSS Positioning Methods
There are two types of tracking ranges available from GNSS systems to support
various positioning methods: code range and phase range. Code tracking ranges
provide estimates of instantaneous ranges to the satellites, which are mainly used for
single point positioning or navigation. The code measurements at an instant from
different satellites have measurement errors and are referred to as pseudoranges. The
other is carrier phase measurement. Carrier phase tracking provides measurements of
the received carrier phases relative to the phase of a duplicate sinusoidal signal
generated by the receiver clock. The carrier phase gives a precise measurement of
change in the satellite-user pseudorange over a time interval, and an estimate of its
instantaneous rate of change, or Doppler frequency (Misra and Enge 2006).
11
RTK Positioning Concept and Principles
Based on the basic positioning methods, RTK was developed to reduce the
positioning errors, and thus provides high-precision (centimetre-level) positioning in
real time through the use of GNSS and communication links.
Positioning errors are similar for users located ‘not far’ from each other. RTK
provides corrections to user receivers located in a certain area, which is usually less
than 20km, as Figure 2.3 illustrates. The reference station remains stationary and
transmits corrections to the user receiver via communication links.
Satellite
Satellite
User Receiver Reference Station
Broadcast Correction
Satellite
Satellite
Figure 2.3 RTK Positioning Illustrations
RTK, which is used for kinematic surveying, relies primarily on precise carrier phase
measurements. The most critical part in RTK is rapid ambiguity resolution (AR).
Hence the base stations must be deployed in a dense enough pattern to model
distance-dependent errors to such an accuracy that residual double-differenced
carrier phase observable errors can be ignored in the context of such rapid AR (Rizos
2003), which means a much higher cost to deploy. Therefore, RTK positioning is
extended from a single base to a multi-base technique by implementing the latest
technique, NRTK.
Network-based RTK
One significant drawback of the single-base RTK approach is that the maximum
distance between a reference and a rover receiver is limited, for instance, 10 to 20
kilometres (Rizos 2003), in order to be able to rapidly and reliably resolve the carrier
phase ambiguities. This limitation is caused by distance-dependent biases such as
12
orbit error, and ionospheric and tropospheric signal refraction. These errors,
however, can be accurately modelled using the measurements of an array of GNSS
reference stations surrounding the rover site. Thus, RTK positioning is extended
from a single base to a multi-base technique. NRTK can make the baseline among
reference stations increase from 10-20km up to 70km or even longer. In this way,
NRTK also reduces the number of reference stations needed, which saves significant
cost to the operators and service providers (Wanninger 2006).
NRTK positioning is becoming a successful story in more and more states and
countries in recent years. Commercial mode has been developed that users can access
the reference data for their needs after payment. On the other hand, NRTK can cut
the cost dramatically by reducing the number of reference stations. According to data
from some projects (Wanninger 2006), the number of reference stations can be
reduced from about 30 reference stations per 10,000 square kilometres for single
base RTK to 5 to 10 reference stations per 10,000 square kilometres for NRTK An
example of the coverage of single base RTK compared with NRTK is given in
Figure 2.4, in which the circles stand for the coverage of single base RTK and we
can see the coverage increases a lot after single base RTK systems are linked
together to form a NRTK.
Figure 2.4 Illustration of Single Base RTK and NRTK
Three major steps are used in NRTK (Wanninger 2006):
• Fixing the ambiguities among the baselines of the reference network
13
• Correction model coefficients are estimated (modelling of the distance-
dependent biases)
• An optimum set of reference observations is computed from the observations
of a selected master reference station
VRS
One of the most popular NRTK concepts is the Virtual Reference Station (VRS),
which was first introduced in the German reference station network Satellite
Positioning Service of the German National Survey (SAPOS). The name of this
approach results from the fact that observations for a “virtual” non-existing station
are created from the real observation of a multiple reference station network. This
allows eliminating or reducing systematic errors in reference station data resulting in
an increase of distance separation to the reference station for RTK positioning while
increasing the reliability of the system and reducing the initialization time
(RETSCHER 2002; Wanninger 2003). The system architecture of the virtual
reference station concept is depicted in Figure 2.5 (Wanninger 2006). A virtual
reference station is formed by several reference stations surrounding the user to
facilitate the position service.
Figure 2.5 System Architecture of the Virtual Reference Station (Wanninger 2006)
Other important concepts relating to NRTK include Continuously Operating
Reference Stations (CORS), Radio Technical Commission for Maritime Services
14
(RTCM), and Networked Transport of RTCM via Internet Protocol (Ntrip). A brief
introduction to each of them is given below.
CORS
CORS is the first important item for NRTK. CORS are generally a group of
reference stations, which form the infrastructure for NRTK. The CORS network can
be divided into several categories, such as global/continental CORS, national CORS,
regional CORS, and local CORS etc. The representative CORS are continental
CORS run by International GNSS Service (IGS), American National CORS
maintained by National Geodetic Survey (NGS) (NGS 2006), and so on. There are
also several CORS networks in Australia, GPSnet of Victoria, SydNET of New
South Whales and SunPOZ in Queensland. Most CORS data is available for
retrieving using some client software for free which will be introduced later.
RTCM
The concept of Radio Technical Commission for Maritime Services (RTCM) is the
data format of RTCM recommended standards for NRTK, which is usually referred
to RTCM 3.x (RTCM 2007). There are quite a few types of messages in an
observation data file of RTCM format. For example, message type 1 contains data of
DGPS, pseudo range corrections and range rate corrections, type 3 contains data of
reference station coordinates, and types 18 and 19 contain data of RTK, uncorrected
carrier phase measurements and uncorrected pseudorange measurements. The
specification of RTCM can be purchased from Radio Technical Commission for
Maritime Services.
Ntrip
Ntrip is used for an application-level protocol streaming GNSS data over the
Internet. Ntrip is a generic, stateless protocol based on the Hypertext Transfer
Protocol HTTP/1.1. The HTTP objects are enhanced to GNSS data streams (LENZ
2004).
Ntrip is an RTCM standard designed for disseminating differential correction data
(e.g. in the RTCM-104 format) or other kinds of GNSS streaming data to stationary
15
or mobile users over the Internet, allowing simultaneous PC, Laptop, PDA, or
receiver connections to a broadcasting host. Ntrip supports wireless Internet access
through Mobile IP Networks like Global System for Mobile Communication (GSM),
General Packet Radio Services (GPRS), Enhanced Data rates for GSM Evolution
(EDGE), or Universal Mobile Telecommunication Service (UMTS).
Ntrip is implemented in three system software components: NtripClients,
NtripServers and NtripCasters. The NtripCaster is the actual HTTP server program
whereas NtripClient and NtripServer are acting as HTTP clients (FACC 2007).
To summarize, CORS provides the base infrastructure for NRTK, and RTCM 3.x
and Ntrip care about the data format and protocol used during the data transmission
in NRTK.
Real Time Kinematic and near Real Time Kinematic
Differences between the concepts of Real Time Kinematic and near Real Time
Kinematic must be stated here in order to make the later part of the thesis clearer. In
the context of Real Time Kinematic positioning, the service has to be provided
within 1 second, while the time requirement for near Real Time Kinematic can be
within several seconds.
2.2 Overview of Grid Computing
This section provides a brief review for various Grid computing concepts and
applications.
2.2.1 Grid Computing
One similar concept of Grid Computing is Peer-to-Peer (P2P) Computing. P2P
Computing is the sharing of resources between computers. Such resources include
processing power, knowledge, disk storage and information from distributed
databases (Kamath 2001). In the last several years, about half a billion dollars have
been invested in companies developing P2P systems (Loo 2007). This kind of
interest is due to the success of several high-profile and well-known P2P applications
such as the Napster and Oxford anti-cancer projects (Loo 2003). The possibility of
16
using P2P Computing in our framework will be explored in Chapter 3, which is
about designing NRTK data processing based on Grid Computing.
The word of Grid is always used by analogy with the electric power grid, which
provides pervasive electric power and has fundamentally changed our way of living.
Back in 1998, Grid was defined, “A Computational grid is a hardware and software
infrastructure that provides dependable, consistent, pervasive, and inexpensive access
to high-end computational capabilities” (Foster and Kesselman 2004). The
proceeding definition concentrates more on the computational capabilities of Grid
Computing. Grid Computing accomplishments can now prove to be more powerful
than the largest computer in the world. Later iterations broaden this definition with
more focus on coordinated resource sharing and problem solving in multi-
institutional Virtual Organizations (VO). Virtual Organization is defined as a
dynamic set of individuals and/or institutions defined around a set of resource-
sharing rules and conditions (Foster and Kesselman 2004). A simplified Grid
topology is shown in Figure 2.6, which consists of three Grid nodes. The grid nodes
can be computer servers or clusters located in different places, and connected by
various kinds of communication links, such as optical fibre, twisted-pair and so on.
Grid Node2
Grid Node1
Grid Node3 Figure 2.6 Simplified Grid Topology
17
In the context of this research project, VO means different grid nodes sitting in the
processing centres of different states. The computing resources of these VOs will be
utilised to form the strong computing capability of grid.
Grid Computing has evolved as an important field in the computer industry by
differentiating itself from distributed computing with an increased focus on resource
sharing, coordination, and high-performance orientation. Grid Computing is trying to
solve the problems associated with resource sharing among a set of individuals or
groups. Figure 2.7 illustrates a layered grid architecture and its relationship to the
Internet Protocol architecture.
Application Application
Inte
rnet
Pro
toco
l Arc
hite
ctur
e
Grid
Pro
toco
l Arc
hite
ctur
e
Figure 2.7 Layered Grid Architecture vs. the Internet Protocol Architecture
In addition to resource-sharing and formation of virtual organizations, open
standards become a key underpinning. It is important for open standard to be used
through the whole grid implementation, which also accommodate other open
standards-based protocols and frameworks to provide interoperable environments
(Joseph and Fellenstein 2004). Grid Computing environments must be constructed
upon the following foundations (Joseph and Fellenstein 2004):
• Coordinated resources
• Open standard protocols and frameworks
Collective
Resource
TransportConnectivity
Internet
Fabric Link
18
There are many organizations in the world striving to achieve new and innovative
Grid Computing environments. Figure 2.8 depicts the Grid Computing organizations
(Joseph and Fellenstein 2004).
Figure 2.8 Basic Classifications of Grid Computing Organizations
To achieve a successful adoption of Grid Computing requires an adequate
infrastructure, security and other key components. The Globus project is a multi-
institutional research effort to create a basic infrastructure and high-level services for
a computational grid. Globus GT4 middleware, core, and high-level services present
a wide variety of capabilities. Figure 2.9 illustrates the details on Globus
infrastructure based on Globus GT4.
Figure 2.9 Globus GT4 Infrastructures
In the early development stage of grid applications, numerous middleware solutions
are developed to solve Grid Computing environments. Today, with the emergence of
grid service-oriented technologies, including XML-based solutions becoming more
and more popular, it’s simpler to achieve valuable solutions. Grid middleware topic
areas are becoming more sophisticated at an aggressive rate. Figure 2.10 shows the
topology of these middleware topics.
Grid Users (NSF, TerraGrid)
Tool kits and middleware solution providers Custom Middleware (Sun Grid Engine…)
Corporate Toolkits and Users
(Globus, OGSI.NET, NMI)
(IBM, Avoki/Sun, HP, MS, Platform)
Standards Organizations (GGF, IETF, DMTF, W3C, OASIS)
CIM, WSDL, SOAP, .NET, OGSA, OGSI, URI, HTTP, J2EE, CORBA
Globus High-Level Services GRAM MDS
Globus GT4 Core
OGSI Web Service
GSI
WS-Security
19
Grid Applications
Hosting Environment
Grid
Mid
dlew
are
Infr
astru
ctur
e
Dat
a M
anag
emen
t
Secu
rity
Res
ourc
e M
anag
emen
t
Info
rmat
ion
Serv
ices
Grid
Mid
dlew
are
Infr
astru
ctur
e
Figure 2.10 Topology of Grid The emergence of the Open Grid Services Architecture (OGSA) in 2002 led to a true
community standard with multiple implementations, including, in particular, the
OGSA-based GT 3.0, released in 2003. Building on and significantly extending GT2
concepts and technologies, OGSA firmly aligns Grid Computing with broad industry
initiatives in Service-Oriented Architecture (SOA) and web services. OGSA
architecture is shown in Figure 2.11. A fundamental OGSA concept is that of the
Grid service: a Web service that implements standard interfaces, behaviours, and
conventions that collectively allow for services that can be transient (i.e., can be
created and destroyed) and stateful (i.e., we can distinguish one service instance from
another).
More specialized and domain-specific
services
Other
Schema
OGSA Platform Services
(CMM, Service Domain, Policy, Security, Logging, Metering/Accounting)
OG
SA
Schemas
Open Grid Services Infrastructure (OGSI)
Web Services
Figure 2.11 OGSA Platform Architecture
Hosting Environment
Host env. and protocol bindings
Protocol
20
The foundational Open Grid Services Infrastructure (OGSI) specification defines the
interfaces, behaviours, and conventions that control how Grid services can be
created, destroyed, named, monitored, and so forth. OGSI defines a set of building
blocks that can then be used to implement a variety of resource layer and collective
layer interfaces and behaviours (Foster and Kesselman 2004).
WS-Resource Framework (WSRF) is the latest standards of web service. WSRF is
also known as stateful web service, where state stands for the resources the web
service needs. OGSI was the model that was used in the Globus Toolkit 3.0, and now
it’s being replaced by WSRF, WS-Security, and the broader set of Web services
standards.
2.2.2 Security
Security is a critical issue for Grid Computing, because at the very first stage grid
nodes have to trust each other and then their computing resources, such as CPU
cycle, memory resource, can be shared in a Grid Computing environment. Privacy,
integrity and authentication are believed to be the three pillars of secure
communication. Privacy makes sure the communication between sender and receiver
is private, and Public Key Infrastructure (PKI) is the most popular mechanism to
implement privacy. Integrity is used to constrain that the sender admits what it sends.
Authentication checks whether information is sent by sender as declared, and digital
signature is the mature technology to realize authentication. A typical secured
communication picture is shown in Figure 2.12, which indicates the role each
security pillar plays.
Figure 2.12 Simplified Scenario of Typical Secured Communication
21
In the first step, the sender sends the data encrypted by receiver’s public key to
receiver. Then receiver uses his own private key to decrypt the data. These two steps
utilise the PKI mechanism to ensure the privacy of the data transferred. At the same
time, the sender also sends a digest to the receiver, which is a summary of the
original data using a certain algorithm. The last step is to generate a digest of the data
receiver receives in the first step using the same algorithm as the second step, and
then compare whether this digest and the received digest are the same or not. If they
are the same, it means the data is not changed by somebody else during the
transferring process. Otherwise, the integrity of the data has a problem and we have
to dispose the data received. Actually the digest is also encrypted by sender’s private
key, and the receiver has to use the sender’s public key to decrypt it, which process is
omitted in Figure 2.12. As only the sender has his own private key, so when the
receiver can successfully decrypt the digest, the sender then cannot deny the data he
sends. To some extend, this process also guarantees the authentication of the data
transferring.
Another important concept in computer security, although not generally considered a
‘pillar’ of secure communication, is the concept of authorization (Sotomayor and
Childers 2006), which is used to grant privilege to users. In Grid Computing
environment, Certificate Authority (CA) is responsible for distributing certificates to
users, that is, the process of authorising.
2.2.3 Web Service
Find
Publish
Figure 2.13 Web Service Architecture
Figure 2.13 depicts the classic web service architecture, which is made up of service
requester, service broker and service provider. The service provider registers its
22
service in the service broker using technology such as UDDI, and when the service
requester requests for a service, the service broker will provide the corresponding
service address WSDL to the service requester, then the service requester is bound
with the service provider through SOAP. The whole process is also known as
SOAP/WSDL/UDDI, which indicates the three important technologies used.
2.2.4 Job Scheduling
Job scheduling is the most important topic in Grid Computing. One of Grid
Computing’s main tasks is to distribute user’s request to different Grid nodes, which
is the concept of job scheduling. Job scheduling has been extensively studied over
the past 50 years, including intelligent scheduling algorithms and meta-scheduling.
Classic scheduling modes deals with job shop, flow shop, open shop, cycle shop and
online scheduling etc. And some new scheduling modes have emerged from
applications in computer science, while some others are from operation research and
management community (Leung 2004).
For the data-intensive projects, the data location has also to be taken into
consideration when scheduling jobs. Sometimes job scheduling has also to deal with
some other similar localised factors, such as the computing resource distribution. We
can say that it’s always not easy to make a most optimised job scheduling policy.
2.2.5 Globus
Globus (Globus 2008) is the most popular Grid Computing toolkit, developed by the
Globus team at University of Chicago and Argonne National Laboratory. At this
stage, Globus is the de facto standards of Grid Computing and the latest version is
Globus Toolkit 4 (GT4).
Globus’s kernel components include GSI for security, GridFTP for data
management, WS GRAM (Feller, Foster et al. 2007) for execution management,
Index for information services and Python/C/Java Core for common runtime, as
Figure 2.14 depicts. Most of Globus components are built upon web service, such as
WS GRAM, Common Runtime. But some are not web service based, such as
GridFTP, which uses Globus’s specific protocol and now has become the industry
standard.
23
Figure 2.14 Globus Components (Sotomayor and Childers 2006)
The security component is the basis of other components, which is made up of
various authentication, authorisation and delegation solutions. Data management is
devised for high throughput data transmission. Execution management takes care of
job scheduling and maximises the performance and efficiency of job execution.
Normally local adapters are needed to achieve this goal, while the adapter Fork
provided with Globus can be used for simple solution. Information services’ function
is to monitor and discover resources and services on Grids, and Monitoring and
Discovery System is one of the solutions. Common runtime are some web service
based CLIENT solution, such as C Runtime, Java Runtime. This is the interface of
Globus for both Globus users and programmers.
Schematic view of GT4.0 components is illustrated in Figure 2.15. Requests from
CLIENT end written in Java/C/Python are forwarded to SERVER end, which is
authenticated by WS-Security with GSI. In the SERVER end, CLIENT’s requests are
24
processed by corresponding components implemented as Java Services in Apache
Axis plus GT libraries and handlers, or C Services using GT libraries and handlers.
Figure 2.15 Schematic View of GT4.0 Components (Foster 2005)
In next chapter we will demonstrate how to utilise the Globus Toolkit as the basis of
our framework for NRTK data processing. The Globus Toolkit will provide the
security infrastructure for the whole Grid environment and be able to schedule jobs
anagement (CRM), Supply Chain Management (SCM),
which dynamically adjust the number of worker processes used to meet
through web service.
2.2.6 Grid Computing Applications
Grid Computing has been successfully applied in many fields. SAP AG has modified
their flagship product to be Grid-based. It has also made some applications, like
Workforce Management (WFM), which form part of SAP’S core product suite, like
Customer Relationship M
computational demands.
Grid is also extensively used in high performance data capture, network for
earthquake engineering simulation, Earth System Grid (ESG), open science grid,
Biomedical Informatics Research Network (Ellisman and Peltier 2004) and federated
computing for high-energy physics. In the famous LHC (Large Hadron Collider )
project (CERN 2008) operated by the Europe Organisation for Nuclear Research
(CERN), the facility Grid helps transfer the large amount of experiment output data
25
(roughly 15 Petabytes) to tens of countries around the world to do the round-the-
clock analysis. In the biomedical and biochemical industry, large computation for
experiments is done under the Grid environment. A single-sign-on portal is provided
to Geoscience community researchers to access a myriad of computational and data
TK needs by utilising all the computational resources of each Grid
ode. Globus will be installed in each Grid node to collect data or do some
ing and send back the
result to a server, while the server is responsible for integrating all the results
ckground Processing a
t of updating
resources based on Grid computing (R.Fraser, T.Rankine et al. 2007).
2.3 Relationship between NRTK and Grid Computing
The concepts of NRTK and Grid Computing are very similar from high-level aspect.
As there are lots of reference stations in NRTK, so are lots of Grid nodes in Grid
Computing. Each reference station or each server in Network Processing Centre can
be mapped to one Grid node. Grid Computing will provide the high computational
capability that NR
n
computing task.
As long as concrete computational tasks are concerned, Grid Computing will mainly
take charge of background processing to support the whole real time positioning
service. As we can see from Table 2.1, most background processing should only be
conducted every several minutes or hours. In this way, the computational capability
of Grid can be utilised to compute these background process
together in order to facilitate the real time processing service.
Table 2.1 Ba nd Time Requirements
Background processing Time requiremen
Orbital interpolation Every 2 hours for each satellite
Zenith Tropospheric Delay (ZTD) estimation Every 5 minutes
Ionospheric grid map generation Every 30 to 60 seconds
2.4 Summary
This chapter gives background knowledge of NRTK and Grid Computing, which is
the basis of this research topic. The review of GNSS is firstly presented followed by
the Real-Time Kinematic positioning, namely, RTK positioning. In the RTK
26
positioning review part, the concept of NRTK is introduced as well as some other
NRTK related topics, such as CORS, RTCM, Ntrip. The second part of this chapter
is a summary of a Grid Computing literature review. Some critical aspects, such as
Virtual Organisation, Security, Web Service, Job Scheduling and de facto toolkit
his chapter is dedicated to the background knowledge review. The design of the
ew framework for NRTK data processing will be presented in the next chapter.
Globus are briefly presented as well.
T
n
27
3. Design of Framework for NRTK Data Processing Based on Grid Computing
This chapter provides the design of the grid computing based framework for NRTK.
As stated in the statements of research problems (see Chapter 1), there are mainly
four aspects that will bring challenges to the computation of NRTK.
• Multiple GNSS constellations and multiple frequencies
• Large scale, wide area NRTK services with more reference stations
• Complex computation algorithms for network-based processing
• Greater part of positioning processes shifting from user end to network centre
with ability to cope with hundreds of simultaneous users’ requests (reverse
RTK)
Section 3.1 of the Chapter provides the grid computing framework design
requirements that were identified based on the four challenges faced by future NRTK
systems. Section 3.2 provides a detailed architecture design of the current NRTK
system as well as the general grid computing architecture. Section 3.3 outlines the
developed framework for this research. Each of the different layers in the framework
is presented and discussed.
3.1 Requirements of NTRK Data Processing
There are two main requirements for NRTK data processing, expandable computing
power and scalable data sharing/transferring capability.
Computational requirement
As long as concrete computational tasks are concerned, Grid Computing will mainly
take charge of background processing to support the whole real time positioning
service. Background processing refers to the timing consuming processes such as
preparing GNSS orbital polynomial functions, updating reference station coordinates,
estimation of tropospheric grid models and ionospheric grid map generations. These
corrections vary slowly, and can be updated with different delays, for instance up to
30 seconds for ionospheric grids, and a few minutes for tropospheric corrections, as
we can see from Table 3.1. In this way, the computational capability of Grid can be
utilised to compute these background processing and send back the result to a server,
28
while the server is responsible for integrating all the results in order to facilitate the
real time processing service.
Table 3.1 Background Processing and Time Requirements Background processing Time requirement of updating
Orbital interpolation Every 2 hours for each satellite
Zenith Tropospheric Delay (ZTD) estimation Every 5 minutes
Ionospheric grid map generation Every 30 to 60 seconds
Data sharing/transferring capability
In addition to computational requirements, there are also data sharing or transferring
requirement for NRTK. Data sharing among processing centres sitting in different
CORS and IGS reference stations will become very frequent as each CORS is
becoming more and more widely covered. Such data sharing will bring lots of
benefits to national-wide geodetic activities, such as the backup service of one state
data in another state’s processing centre in the case of earthquakes for example.
SunPOZGPSnet
SydNET
Private site
IGS
Figure 3.1 Data Sharing Service Among Different Sites
As shown in Figure 3.1, sites in different states and places are connected together
and form a whole Virtual Organisation in the Grid Computing environment by
trusting each other. Different sites can share intermediate positioning results with
each other. For example, SydNET can ask GPSnet and SunPOZ to send data to it.
29
And in this case, SydNET acts as the Grid server, while GPSnet and SunPOZ act as
the Grid node. Actually every site can serve as server or node.
One obvious data transferring application is the data backup service. As it’s always
no good to put all the eggs in one basket, so in order to prevent disastrous results in
case of some emergencies, data should be backed up in several places on a
daily/weekly/monthly basis based on different requirements. The amount of data
generated by one reference station can be 1-2k/s, that’s about 75M per day. The data
amount accurs during the backing up process is shown in Table 3.2.
Table 3.2 Data Amount of Backing-up Service
Per day Per week Per month Per year
10 stations ~750MB ~5.25GB ~160GB ~2TB*
20 stations ~1.5GB ~10.5GB ~320GB ~4TB
50 stations ~3.75GB ~26.25GB ~800GB ~10TB
100 stations ~7.5GB ~52.5GB ~1.6TB* ~20TB
* 1TB=1024GB, 1GB=1024MB, 1MB=1024KB
Hundreds of reference stations are being deployed around Australia by different
bodies, such as IGS, government departments and private owner. Even some single
CORS network, such as GSPnet, is planning to set up about 100 reference stations in
a few years’ time. The data amount will also certainly increase a lot if GLONASS
and Galileo become fully operating. These changes bring a huge requirement for a
consolidated and collaborative data sharing framework among different sites
Both the critical computational requirements and large-amount data sharing service
need an integrated collaborative framework to assist the whole process, which can be
provided by a well-structured Grid RTK data processing framework.
30
3.2 Architectural Studies of Grid Computing Based NRTK
The concept of NRTK coincides with Grid Computing to some extent. Both are
network architecture and new solutions for emerging problems in a more efficient
way.
NRTK can be divided into three parts, as Figure 3.2 illustrates. Part One consists of
reference stations generating data streams at the speed of 1-2kbps. Part Two is the
Network Processing Centre which is critical for processing and computing the
received data (from Part One) and forwards the results to Users (Part Three). Part
Three are users receiving correction data from Network Processing Centre through
mobile or wireless network, such as GSM, GPRS, EDGE, or UMTS. To some extent
the last part in the architecture of NRTK is transparent in our project.
Reference Stations Network Processing Centre Rover
Network Processing Application
Tool kits and middleware solution
providers (Globus etc.)
(Part One) (Part Two) (Part Three)
Figure 3.2 NRTK Architecture
31
3.3 Framework Design
Taking all of the requirements and architectural studies into consideration, the
framework for NRTK data processing based on Grid computing is designed to
include three layers, Client layer, Service layer, and Execution layer. The overall
framework is depicted in Figure 3.3.
Client (Grid Portal in Java/C/Python)
Task pipe
Send requests
Security/Web service/Job Scheduling/...
Results
Receive results
Grid Node
Grid Node
Grid Node
Client layer
Execution layer
Service layer
Figure 3.3 Grid Computing based NRTK
Client layer
The Client layer is responsible for interacting directly with the user. When the user’s
request comes in, the Client layer receives user’s approximate position and returns
the accurate position to the user when the result is ready. The user not only receives
the accurate position, but also some value-added service, such as commercial
services around this position, which can be provided by merchants around this area
(Lim and Rizos 2007). Actually some products with the same commercial idea exist
now, such as Google Earth and Google Map.
32
Service layer
Then the request is forwarded to the Service layer where the function is implemented
to fulfill the Client’s requests. As Network RTK project is always so huge that no
any single organisation can finish all the jobs. Normally different components of the
whole process are undergone within different universites and companies. For
example, one organization is responsible for the positioning component, another one
maybe responsible for the integrity monitoring, as depicted in Figure 3.4.
Organisation APositioning component
Organisation BIntegrity monitoring
Organisation…...
C
Ntrip Caster
Ntrip ClientNtrip Client
IGS Ephemeris ServerNtrip Caster
Figure 3.4 Different Organisation Being Responsible for Different Component
Various methods can be used here to implement the requirement, such as the
conventional VRS-constructing method or the newly proposed server-based NRTK
framework (Lim and Rizos 2008). As huge computational capability is needed in this
layer to calculate the VRS for hundreds of users simultaneously, the computational
task is distributed to different Grid nodes, such as NPC in Queensland, IGS in
Germany, through the GT4 scheduler.
In order to make a full use of the computational resources of all Grid nodes, the
whole algorithm should be modularised carefully. For example, if there are six Grid
nodes, then the whole algorithm can be modularised to,
1. Data collection and conversion from each station
2. Network-based Ambiguity Resolution
33
3. Network-based ionospheric grid generations
4. Network-based tropospheric grid generations
5. GNSS orbital corrections
6. Integrity monitoring
But if there are only three Grid nodes, then the computation tasks may be shared by
different nodes, such as one for the tropospheric bias calculation, one for ionospheric
bias calculation and other bias calculations to map each module to each Grid node.
Execution layer
The execution layer consists of all the Grid nodes, the real place where the jobs will
be done. This layer has to deal with the network topology, security and some other
heterogeneity-related issues. As long as network topology is concerned, the present
context that most of the reference stations are already connected to the servers sitting
in different NPC (Network Processing Centre) provide a consolidated basis for us to
construct a Grid computing environment. All we have to do is to deal with the
interconnection of the servers and their communication. As illustrated in Figure 3.5,
the network will connect CORS processing centre of different states, such as GPSnet
of Victoria, SydNET of NSW, and IGS worldwide stations together.
34
SydNETNSW
GPSnetVictoria
IGSGermany
Router
SunPOZQueensland
Node
Node
Node
Node
Node
Node
Node
Node
Client
Client
Client
Figure 3.5 Execution Layer Network Topology
Finally, the result is collected from all Grid nodes involved in the processing back to
Web Service layer, which responds to the Client’s request.
The whole framework will be loose coupling, as each layer only exposes their
interface to other layers. If one layer needs to change, we only need to modify that
layer; the other layers will not be affected by this change. Service composition or
aggregation will also be used to reuse some of the basic services, which can then
form a more meaningful and more extensive service for the Client layer.
3.4 Design of Different Layers
3.4.1 Client Layer
The Client layer can be designed as a Grid portal, which can be implemented in C,
Java, or Python. The Client layer invokes endpoints of appropriate Web Services to
pass the requests to the next layer with necessary parameters. After the Web Service
35
layer finishes the request, the Client layer collects the results and sends a response if
needed.
According to the client’s requirements and budget, different levels of Single Sign On
(SSO) interface or Portal can be designed for the Client layer. If using a small
budget, then we can adopt the simplest way of SSO, as depicted in Table 3.3. Every
person, depicted as ‘User’ in the table, will have to register in the Portal system first
and get his user-id and password. Originally every user has a pair of username and
password for each system, such as the positioning service, data sharing etc, the
person needs to map these pairs of usernames and passwords with the user-id in the
Portal. The final mapping relationships should look like Table 3.3. In this way, once
the user logs on the Portal, logging on to the other subsystems will not be required.
Table 3.3 User-System Table
User System Username Password 001 A System A 1234 001 B System B 2345 001 C System C AAAA 001 D System D CCCC 001 … … …
Of course, if the client has sufficient budget and time, we can implement a more
complex but more robust Portal solution. For example, Kerberos (Wikipedia 2008)
can be adopted to be the authentication protocol. In this way, we’ll have to maintain
one or several central servers to store the user-system mapping, and a third party to
verify the user and server’s identity.
3.4.2 Service Layer
The Web service is a set of functions that are implemented by traditional languages,
like C, Java or some script languages. eXtensible Markup Language (XML) is used
to expose web service to outside for invoking. So the web service is platform
independent, which is an important feature for Grid Computing because of the
heterogeneous environment.
WS-Resource Framework (WSRF) and WS-Resource Transfer (WS-RT) are the
latest standards of web service. WSRF is also known as stateful web service, where
36
state stands for the resources the web service needs. WS-RT combines elements of
the original WSRF standard with the WS-Management standards to enable easier
exchange of resource information and objects between different components. All the
web services developed in this project will run in Globus Toolkit, and as only WSRF
is supported at this stage, so WSRF will be first considered to be the standard we
should comply to. At a later stage, we may use delegation or XSTL to migrate WSRF
web services to WSRT complying if WSRT is supported by the future versions of
Globus Toolkit (Cafaro 2007).
Service layer receives requests from the Clients layer. Then the corresponding web
service is invoked to implement user’s requests. Several core services are described
below.
Real Time Positioning Service This is the most important service for the NRTK
mission. As it is so complex, it has to be decomposed into several smaller services
according to the NRTK positioning algorithm mentioned in Network RTK part of
Section 2.1.2:
The first service will be used to determine and reconfirm ambiguities among
the baselines of the reference network.
The second service will be used to generate high-rate ranges and update the
correction models for the distance-dependent biases from all stations
The third service will be used to compute a user position with observations
from the user and selected set of reference stations.
The fourth service, designed as parallel service to the third service, is to
compute a user position with user observations, correction grids, and high-
rate ranges from a nearest reference station.
The Real Time Positioning service can be invoked by operators in any NPC or
servers to provide services to their user group. Of course, this service can also be
invoked by a central server if necessary. Figure 3.6 illustrates the idea of service
decomposition and composition
37
Figure 3.6 Service Decomposition and Composition
ackground processing service As defined in 3.1, background processing is
at to
elay (ZTD) estimation, based on ambiguity resolved
EC
position and composition including background processing service is
B
performed after the fact for timing consuming tasks. It processes the past data sets
and generates slowly varying corrections. Three major smaller services include
Orbital Interpolation service: convert the IGS predicted orbits in SP3 form
polynomial functions (Feng and Zheng 2005).The process is conducted every 2
hours for each satellite.
Zenith Tropospheric D
double-differenced phase measurements and zero-differenced phase
measurements for each reference station, which are updated every 5 minutes.
Ionospheric grid map generation, network-adjustment from the smoothed T
solution from each station and DD ionospheric solutions, updated every 30 to 60
seconds.
Service decom
shown in Figure 3.7.
38
Figure 3.7 Background Processing Service Decomposition
ata sharing service The Data sharing service is provided for the data sharing
ata transferring service Basically the data transferring service is based on File
3.4.3 Execution Layer
most fundamental and important layer in the NRTK data
D
among different processing centres or reference stations. The service is all about the
collaboration work, such as the sharing of intermediate results, background
processing outputs and solutions for national-wide real time and near real time, and
post-mission services. It is also possible that if appropriately configured, each facility
could theoretically provide cross-border redundancy to other CORS networks in the
case of disruption to a particular service (Hale 2007).
D
Transfer Protocol (FTP). As the magnitude of data transferring among different
processing centres can be TB (Tera-Bytes), GridFTP protocol from Globus Toolkit
will be adopted to ensure the whole process efficient and reliable.
The execution layer is the
processing framework. It has to deal with the network topology, security and some
other heterogeneity-related issues.
39
The first issue with the network infrastructure would be the network topology. As
illustrated in Figure 3.8, the network will connect CORS processing centre of
different states, such as GPSnet of Victoria, SydNET of NSW, and IGS worldwide
stations together.
SydNETNSW
GPSnetVictoria
IGSGermany
Router
SunPOZQueensland
Figure 3.8 A Simplified Network Infrastructure of NRTK Data Processing
The second issue that the design of network infrastructure has to solve is security. In
the early experimental stage, simple Certificate Authority (CA) could be adopted to
implement the authentication and authorisation process, as shown in Figure 3.9. One
site will be chosen to act as the simple CA server, and a backup server should also be
considered, as the CA server is the most critical component in this simple CA
solution of security. Each site or user has to apply for a user certificate from simple
CA, through X.509 or other security protocols (Foster, Kesselman et al. 1998).
40
Certificate Authority
(CA) Server
Site1User Certificate
Auth
orisa
tion/
Auth
entic
atio
n
Auth
oris
atio
n/
Auth
entic
atio
n
Site2User Certificate
Figure 3.9 Simple CA Solution for Security
For a more robust solution, a third party CA has to be imported to assist the
authentication and authorisation process. MyProxy would be a good third part CA
centre to choose. Users have to apply their certificate from MyProxy CA centre, and
they have to maintain a connection with the centre during a communication for the
server to authenticate the users’ privilege.
3.5 Summary
This chapter focused on how the layered framework for NRTK data processing is
designed. The high-level framework was firstly introduced and this was followed by
details of each individual layers. The framework is designed based on Grid
Computing, namely, the Globus Toolkit, and user’s request is forwarded through
each layer to the Grid environment and scheduled to the Grid nodes through Globus.
Chapter 4 demonstrates the framework of Grid Computing for use in Network RTK
data processing and positioning services. The implementation procedures are
described in detail and results will be used to justify the power of Grid Computing.
41
4. Demonstration of Framework – A Simplified Data Processing Experiment
The implementation of the above framework for full Network RTK services (data
collection, correction generation, positioning, etc) involves significant development
efforts. Thus, a simplified data processing demonstration is used as a proof-of-
concept demonstration for the proposed framework based on Grid Computing idea.
Some open source software packages are evaluated and used during the
demonstration.
Firstly, the overview of a simplified data processing demonstration is presented.
This includes the introduction of open source Ntrip client (for data gathering and
processing) and overview of demonstration environments. The implementation of the
framework is presented which includes the Globus configuration, job submission,
and details of the Ntrip data processing procedures. The last section of the chapter
presents the result analysis and discussion, where different scenarios are tested and
evaluated based on various performance measurement factors.
4.1 Overview of a Proof-of-Concept Demonstration
4.1.1 Introduction to Simplified Data Processing Demonstration
The first step of the simplified data processing demonstration is the collection of data
through the open source Ntrip Client software, where the concept of Ntrip was
introduced in Chapter 2. Herein, data flow process of Ntrip is briefly introduced
below.
Ntrip is composed of three components: Ntrip Server, Ntrip Caster and Ntrip Client.
Ntrip Server is responsible for collecting GPS data from various GPS receivers and
reference stations and then is transferred to Ntrip Caster, such as EUREF-IP
(www.euref-ip.net:80) and IGS-IP (www.igs-ip.net:80). Every Ntrip Caster address
contains an IP address and a port number. Finally Ntrip Client accesses the data from
Ntrip Caster through mountpoint, which stands for different reference stations. The
overall Ntrip data flowing process is shown in Figure 4.1 (Weber G. 2005).
42
Ntrip CasterWww.euref-ipWww.igs-ip.net:80
.net:80
Ntrip Server
Ntrip Client
NtripServer
Ntrip Server…...
Ntrip Client
MountpointACOR0
MountpointBOCH0
MountpointZIM20…...
… Ntrip Client...
Reference Station
Reference Station
Reference Station…...
Figure 4.1 Ntrip Data Flow Process
For this demonstration, Ntrip is used for collecting data from reference stations.
Firstly, commands will be sent from the grid server to grid nodes requesting for data
download from one or multiple mountpoints through Ntrip Client software. Once the
data has been downloaded, computing jobs are launched for data processing and
analysis. Finally, results are returned to the server.
4.1.2 Evaluation of Open Source Ntrip Client
Open source Ntrip clients will be used to obtain the RTCM data in this research
project. There are quite a few open source Ntrip clients provided by Federal Agency
for Cartography and Geodesy (BKG) of Germany (FACC 2007).Their tabulated
information is given below in Table 4.1.
As Linux will be chosen as our operating system, so those Ntrip clients which can
only run under Windows are excluded. After considering version (maturity),
complexity and a series of testing of functionality and stability, we chose
NtripLinuxClient and RTCM 2.x Decoder as our combined option. NtripLinuxClient
will take care of downloading RTCM 2.x raw data from Ntrip Caster, and RTCM 2.x
Decoder will decode the raw data to the format as illustrated in Table 4.2.
43
Table 4.1 Open Source Ntrip Client Information Name Operating
System
Version Type Size Data format
NtripLinuxClient Linux 1.2.7 ZIP ~7 K RTCM 2.x raw data
NtripPerlClient Linux 0.6 ZIP ~15 K
RTCM 2.x raw data
RTCM 2.x Decoder Linux 1.1 ZIP 17K Decoded RTCM 2.x data
Ntrip Client Linux 1.24 ZIP 17 K Converts RTCM 3.x to RINEX (fail to function)
GNSS Internet
Radio
Windows 1.4.11 EXE
~680 K
RINEX
BKG Ntrip Client
(BNC)
Windows/
Linux
1.x ZIP ~4 MB
RINEX
GNSS Surfer Windows 1.06c ZIP 17 K Untested
Table 4.2 RTCM 2.x Message Type01: 4 0 186 2640.0 9.660 0.124
Type01: 5 0 18 2640.0 17.740 0.122
Type01: 6 0 236 2640.0 6.280 0.126
Type01: 9 0 233 2640.0 17.000 0.128
Type01: 14 0 117 2640.0 16.680 0.120
Type01: 22 0 230 2640.0 -33.180 0.060
Type01: 24 0 211 2640.0 8.260 0.116
Type01: 30 0 63 2640.0 18.680 0.120
Type03: 4033461.020 23537.680 4924318.170
Z-Count: 2640.00 0.000000
Type18/19: 0 4 1387 146639 23819816.580 0.000 23819820.460
-306300.343 -306299.597 0 0
Type18/19: 0 5 1387 146639 20033021.480 0.000 20033024.440
-469646.405 -1613658.205 0 0
The explanations for the message types are given as follows:
(1) Records beginning with "Type01:" contain information derived from message
type 1. Parameters 1 to 6 are to be interpreted as:
1 SVPRN
2 User Differential Range Error (UDRE):
44
0 ... <= 1 m
1 ... > 1 m <= 4 m
2 ... > 4 m <= 8 m
3 ... > 8 m
3 Issue of Data (IOD)
4 Z-Count
5 Pseudo-Range Correction [m]
6 Pseudo-Range Rate Correction [m/s]
(2) Records beginning with "Type03:" contain information derived from message
type 3. Parameters 1 to 3 are to be interpreted as the X coordinate, Y coordinate,
and Z coordinate respectively.
(3) Records beginning with "Type18/19:" contain information derived from
message types 18 and 19. Parameters 1 to 11 are to be interpreted as:
1 Station ID
2 SVPRN
3 GPSWeek
4 GPS WeekSec
5 CA Code on L1
6 P Code on L1
7 P Code on L2
8 L1
9 L2
10 SNR1
11 SNR2
(4) Records beginning with "Z-Count:" contain epoch time information.
Columns 1 to 2 are to be interpreted as:
1 Epoch time
2 Clock error
45
Only ‘Type 18/19’ will be used in this demonstration, which will be changed to
RINEX format as the input of positioning algorithm.
4.1.3 Demonstration Environment
The first stage of the demonstration is conducted in a laboratory in the Faculty of
Information Technology, Queensland University of Technology. The whole
hardware environment is a Bus-type topology computer network, which is made up
of five grid nodes, namely five PCs and a 10/100 Mbps Techworks 5-Port Ethernet
Switch. The configuration of every PC is shown in Table 4.3.
Table 4.3 Configuration of Grid Nodes in Laboratory CPU 2A GHZ, Intel Pentium 4 (400MHZ/266MHZ)
L1 Cache 8K/L2 Cache 512K
RAM 512M (DDR266)/ Cache RAM 512 KB
Ethernet 10/100 Mbps Intel 82562 Ethernet Device
Hard Disk 40G (ST340016A) *2
Operating System Debian
Other Software Core set of software in Debian
As data from different base stations must be downloaded in real-time with Ntrip from
Internet in this demonstration, ability to access Internet becomes a critical issue.
There are two ways to access Internet in the campus of QUT. One is wired and
another one is wireless, but both need to login through a secure window in order to
access public websites except QUT Intranet. As no Graphical User Interface (GUI) is
installed for Debian of all Grid nodes, there is no way to get access to Internet
through the traditional way. One laptop is added into the Grid environment to solve
this problem. The wireless network connection of the laptop is shared in the Grid
environment, while the Internet Protocol (IP) address of local network card is set to
‘192.168.0.1’ to be in charge of transferring data to other Grid nodes. The
environment is shown in Figure 4.2.
46
Figure 4.2 Grid Environment in a Laboratory at QUT
In the second stage, we will make use of Grid Australia, belonging to which there is
one node in High Performance Computing Centre (HPC) at QUT. We can submit
jobs to several other grid nodes that we have quota from this node. The grid node at
QUT is an IBM eCluster 1350 machine, which is a twenty-two (22) processor cluster.
The specifications are shown in Table 4.4.
Table 4.4 Specifications of the Grid Node at QUT CPU 2x 3.4GHz 64bit Intel Xeon processors.
20x 2.4GHz 64bit AMD Optron processors
Peak Rating 75 Gflop (approx.)
RAM 44 Gigabytes of main memory (11 x 4GB)
Hard Disk 1.5 Terabytes of disk storage
Operating System RedHat Linux Operating System
Other Software a very large range of research software
4.2 Requirements and Design of the Proof-of-Concept Demonstration
The requirements of this demonstration are quite clear and are divided into three
parts:
1. downloading GNSS RTCM data via Ntrip from mountpoints in real-time
2. submitting various computing jobs through Globus
3. processing the data collected in real-time or near real-time (background
process)
47
The design of this demonstration is similar to the design of the framework for NRTK
based on Grid Computing, as it is a demonstration for proof-of-concept for the whole
framework. The layered high-level architecture is depicted in Figure 4.3.
Grid Node
Grid Node 1 Grid Node 2 …... Grid Node n
MountpointACOR0
Real Time Positioning
Calculate IonoBias
Orbital Interpolation ZTD estimation Ionospheric grid
map generation
Figure 4.3 Design of Demonstration
One Grid server (which actually can be any Grid node in the Grid) will take charge
of sending commands to each Grid node and collecting results back. Actually if the
demonstration is applied to industry applications, there should be more than two Grid
servers to be fault tolerant. The number of Grid nodes should be dynamic and
configurable, and should be tested as the indicator of scalability in the later part of
this demonstration. Each Grid node can download RTCM data from one or many
mountpoints and perform different kinds of tasks, such as calculating positions,
ionopheric bias estimation, data file comparison and validation and so on.
This layered architecture ensures that the Grid server can interact with Grid nodes
flexibly. Also it’s easy to implement the demonstration based upon this architecture
from a technical aspect, as the functional algorithms can be modularized and the
change of one layer will cause little effect to other layers.
4.3 Implementation of Demonstration
4.3.1 Globus Configuration
Globus is installed on every grid nodes and server to act as the security basis. There
are three types of users using Globus, as shown in Table 4.5. The first type is the user
48
‘root’, who can do everything as he wants. The second type is the Globus
administrative user, normally is named ‘globus’, who can do installing Globus,
assigning the certificates and other such kinds of jobs. The third type is the Globus
user, who can execute various kinds of Globus commands after getting the user
certificate from server.
Table 4.5 Basic Parameters of Grid Nodes Server Grid node 1 Grid node 2
IP 192.168.0.1 192.168.0.2 192.168.0.3
Hostname Grid01 Grid02 Grid03
OS Debian 3.1 Debian 3.1 Debian 3.1
Users Root/globus/ade Root/globus/ade Root/globus/ade
The installation process of Globus is quite complex and it is briefly described in
Table 4.6.
Table 4.6 Brief Installation Process of Globus Pre-requisites Zlib/Jdk/Ant/gcc/g++/tar/sed/make/perl/sudo/postgres/libiodbc2/libiodbc2-dev
Building the Toolkit
1. Add a non-privileged user named ‘globus’, which will be used to perform
administrative tasks such as starting and stopping the container, deploying
services, etc.
2. Setup java environment.
3. Build and install the toolkit
Setting up security (different in Server and other grid nodes)
1. SimpleCA (Install SimpleCA on the server, for other grid nodes, just
trust this SimpleCA)
2. Make the machine trust the new CA
3. set up hostcert and sign the certificate using the SimpleCA
4. set up usercert and sign the certificate as user ‘globus’
5. create a grid-mapfile as ‘root’ for authorization
6. test and verify of CA
Setting up GridFTP
1. add the gridftp service to xinetd.d
2. add the gsiftp service to /etc/services
3. reload xinetd service
Starting the webservices container
49
1. setup an /etc/init.d entry for the webservices container
2. create an /etc/init.d script to call the globus user’s start-stop script
3. use one of the sample clients/services to interact with the container
Configuring RFT
1. Configure the system to allow TCP/IP connections to postgres, as well as
adding a trust entry for our current host
2. create the ‘rftDatabase’ as the user ‘globus’
3. try an RFT transfer
Setting up WS GRAM
1. Setup sudo so the user ‘globus’ can start jobs as a different user
2. Test WS GRAM command ‘globusrun-ws’
Details about the commands used in the installation process of Globus are described
in Appendix A.
4.3.2 Submitting a Job
Before submitting any job using Globus, a valid proxy has to be generated for user
from Globus CA server in order to make the user be trusted in the grid system. The
command to get the proxy is ‘grid-proxy-init -verify -debug’ and then user has to
provide his grid password. Table 4.7 shows the scenario of generating a valid proxy
for ‘Deming Yin’ in the grid node ‘Auriga’ at HPC of QUT.
Table 4.7 Valid Proxy Generation -bash-3.00$ grid-proxy-init -verify -debug
User Cert File: /home/n6390544/.globus/usercert.pem
User Key File: /home/n6390544/.globus/userkey.pem
Trusted CA Cert Dir: /opt/vdt/globus/TRUSTED_CA
Output File: /tmp/x509up_u1206
Your identity: /C=AU/O=APACGrid/OU=QUT/CN=Deming Yin
Enter GRID pass phrase for this identity:
Creating proxy ....++++++++++++
..++++++++++++
Done
Proxy Verify OK
Your proxy is valid until: Wed Jun 4 05:50:13 2008
50
There are several ways to submit a job in Globus. The simplest way is to use the
command-line ‘globusrun-ws’. Several formats of job descriptions can be executed
by command-line with the format ‘globusrun-ws –submit –c -job-command’. Table
4.8 shows the scenario of submitting a dummy job in the grid node ‘Auriga’ at HPC
of QUT.
Table 4.8 Submitting a Job in Globus -bash-3.00$ globusrun-ws -submit -c /bin/true
Submitting job...Done.
Job ID: uuid:c0629636-3141-11dd-80d0-224466880045
Termination time: 06/04/2008 07:50 GMT
Current job state: Active
Current job state: CleanUp
Current job state: Done
Destroying job...Done.
Every job submitted to Globus will be assigned a job id first. There can be quite a
few states for each job, such as unsubmitted, active, pending, suspended, failed,
clean up, done, and stagein and stageout if existing. Also various resources and
delegation used will be destroyed at the end of every job submission. Different
strategies can be made to decide what should be done if a job is suspended, failed
and etc, which can be implemented in the web service client written in C or Java with
the function of submitting a job.
The second job description format is XML. One example is as depicted in Table 4.9.
Table 4.9 Job Description in XML Format <job>
<executable>gpstk1.5/bin/PRSolve</executable>
<directory>${GLOBUS_USER_HOME}</directory>
<argument>-o </argument>
<argument>bahr1620.08o</argument>
<argument>-n </argument>
<argument>bahr1620.08n</argument>
<stdout>${GLOBUS_USER_HOME}/stdout</stdout>
<stderr>${GLOBUS_USER_HOME}/stderr</stderr>
</job>
It can be executed using ‘globusrun-ws –submit –f a.xml’.
51
We can also specify file stage-in and stage-out in the job description, as shown in
Table 4.10.
Table 4.10 Job Description with File Stage-In and Stage-Out <job>
<executable>gpstk1.5/bin/PRSolve</executable>
<directory>${GLOBUS_USER_HOME}</directory>
<argument>-o </argument>
<argument>bahr1620.08o</argument>
<argument>-n </argument>
<argument>bahr1620.08n</argument>
<stdout>${GLOBUS_USER_HOME}/stdout</stdout>
<stderr>${GLOBUS_USER_HOME}/stderr</stderr>
<fileStageIn>
<transfer>
<sourceUrl>gsiftp://ng2.ivec.org:2811/stdout</sourceUrl>
<destinationUrl>file:///home/n6390544/</destinationUrl>
</transfer>
</fileStageIn>
<fileStageOut>
<transfer>
<sourceUrl>file:///home/n6390544/gpstk1.5/bin/bahr1620.08o</sourceUrl>
<destinationUrl>gsiftp://job.submitting.host:2811/tmp/
</destinationUrl>
</transfer>
</fileStageOut>
<fileCleanUp>
<deletion>
<file>file:///${GLOBUS_USER_HOME}/my_echo</file>
</deletion>
</fileCleanUp>
</job>
The other way is to submit jobs in C or Java using WS GRAM in Globus, which is
frequently used in this research project. The C or Java programs will invoke some
interfaces from C WS Core or Java WS Core, both of which belong to Common
Runtime Components of Globus. The Common Runtime Components provide GT4
web and pre-web services with a set of libraries and tools that allows these services
to be platform independent, to build on various abstraction layers (threading, IO) and
to leverage functionality lower in the web services stack (WSRF, WSN, etc).
52
Basically we have to write the web service client in C or Java with the function of
submitting a job.
4.3.3 Processing Procedure
There are three main processing procedures in this simplified demonstration, that is,
real-time RTCM data downloading, job submission and positioning process. The
overall procedure will be firstly described and followed by details of each individual
procedure.
After commands are sent from the server to each grid node, open source Ntrip clients
are used to download RTCM data from Ntrip Casters. In the first step,
NtripLinuxClient software is adopted to download RTCM data from Internet in real-
time. And at the same time, LinuxRtcmDecoder software decodes the RTCM data
stream that NtripLinuxClient downloaded into RINEX format. The data processing
procedures is depicted in Figure 4.4, which uses one grid node as an example. In the
second step, some computing programs from GPS Toolkit (UTA 2008) are used to
perform calculation tasks in each grid node. This is a simplified computing task
aimed to representing the network-based RTK processing services as discussed in
3.3. Specifically the position of each mountpoint, namely reference station, will be
calculated by PRSolve or rinexpvt functions of GPS Toolkit.
MountpointACOR0
(www.euref-ip.net:80)
Ntriprtcm
RTCM DATA
RINEXDATA
NtripDecoder
Figure 4.4 Ntrip Data Processing Procedure
Two types of data are required for data processing and computation tasks:
observation data and navigation data. The observation data from multiple reference
stations in 1Hz updating rate will be obtained from the real-time Ntrip downloading
process. The navigation data will be retrieved from FTP site every 2 hours. The
whole data downloading process is properly controlled by some Linux shell scripts
and all the data are saved into files, every few seconds for observation data and every
two hours for navigation data.
53
Debian (192.168.0.1)
Globus (SimpleCA)
GPS Toolkit
Debian (192.168.0.2)
Globus (User Certificate)
GPS Toolkit
Switch
Debian (192.168.0.3)
Globus (User Certificate)
GPS Toolkit
Job
Job
Figure 4.5 Job Submission Procedure
As depicted in Figure 4.5, Globus acts as the middleware software lying between the
operating system and application software. One grid node serves as the server, which
will be the certificate authority. Other grid nodes have to apply for their user
certificates from certificate authority for the first time and get a valid proxy every
time before submit a job. In this way, the whole security infrastructure is built and
every node can submit job to be executed on other grid nodes. The common structure
of algorithm for submitting a job is:
Step 1. Importing necessary classes and libraries
Step 2. Loading the job description
Step 3. Setting the security attributes
Step 4. Creating the factory client handle
Step 5. Querying for factory resource properties
Step 6. Creating the notification consumer
Step 7. Creating the job resource
Step 8. Subscribing for job state notifications
Step 9. Releasing any state holds (if necessary)
Step 10. Destroying resources
54
One example algorithm written in C is given in Appendix B.
The last procedure is the positioning process, which is carried out by some programs
from GPS Toolkit. The observation and navigation file acquired from the real-time
Ntrip data downloading procedure is used as the input of the positioning algorithm of
the GPS Toolkit program. The positioning algorithm will calculate the position
epoch-by-epoch (i.e. second-by-second), and save the results and the final receiver-
autonomous integrity monitoring (RAIM) solutions for all the files.
4.4 Results Analysis and Discussions
Test of each procedure is conducted in the last stage of the demonstration. First
retrieving real-time data with Ntrip from multiple mountpoints through configuration
on each Grid node is carried out. Then jobs are submitted to the Grid. Thirdly,
different computation task is performed on each Grid node. Also some additional
tests are performed, such as different numbers of Grid nodes are added in the Grid
Computing environment to test the scalability of the Grid and certain mount of data
is transferred to test the data sharing ability of the Grid. Finally, the whole
performance by a timing factor of this demonstration is evaluated and the result is
analysed and discussed.
4.4.1 Performance Evaluation
In order to show the results of each test, we have to give some performance
evaluation metrics first. We submitted different kinds of jobs in the Grid
environments of both the laboratory and Grid Australia. Simple jobs, like a dummy
job with no data input or output, need the least time and are most stable one in the
whole demonstration. Jobs written in job description language, like XML, take a
little longer time for the grid to process. One reason is that it takes time to resolve the
structure of the job description file. But more information can be contained for this
kind of job than the simple jobs. The most time-consuming jobs seem to be the jobs
with file stage-in and stage-out, which means there are huge data input and output.
This situation is more obvious in the laboratory, as the whole communication must
come through the simple five-port switch. Below is the detail description of job type.
55
Simple job: job that does not involve computation, e.g. dummy job /bin/date
etc.
Simple job with job description: simple job submitted in the form of XML-
based Job Submission Description Language (JSDL).
Complex job: job with file stage-in or stage-out and computation.
In each run, 5 jobs are submitted, and the time from the first submission to last
completion is measured at the client and then divided by 5 to get the average per-job
time. The time command of Linux reports the real time, microprocessor time used by
the program. The microprocessor time is divided into User CPU time and Sys CPU
time. The detail description of each time types (given in seconds) is given below.
Elapsed time: elapsed time from beginning to end of the program
User CPU time: time used by the program and library subroutines that it calls
Sys CPU time: time used by system calls invoked by the program (directly or
indirectly)
4.4.2 Multiple Mountpoints on Each Grid Node
On each Grid node, RTCM data from multiple mountpoints are downloaded and
saved to separate files naming with the RINEX naming convention (Gurtner and
Estey 2007), such as ACOR077C.08O and etc.
The result indicates that every Grid node is capable of retrieving real time data from
multiple mountpoints. Some analyses for this result are summarized as follows:
• the hardware configuration of each Grid node is not too low
• the data amount is relatively small
• the speed of internet is relatively high within the campus network
Since every Grid node can be utilised if needed, in this procedure Grid will
outperform single PC with nearly the number of Grid nodes of the Grid times’
computational capability.
56
4.4.3 Computation Task Submission
Some simple jobs are submitted to the grid in the test of the second procedure. As the
execution time of simple job can be omitted, the elapsed time of simple jobs can be
treated as the time needed to submit a job to the Grid. The time factor of simple job
is summarized as Table 4.11. Due to communication cost, it seems to be a little bit
slower in Grid Australia than in the Laboratory.
Table 4.11 Time Factor of Simple Job (Unit: Second)
Environment Job Type Elapsed
time
User
CPU time
Sys
CPU time STD*
Laboratory (PC
network)
Simple job 2.949 0.324 0.018 0.231 Simple Job with
job description 2.999 0.32 0.022 0.301
HPC@QUT
(Grid Australia,
Cluster)
Simple job 3.810 0.43 0.022 0.823 Simple Job with
job description 3.464 0.434 0.016 0.567
*STD the standard deviation of the elapsed time based on the samples of 5 tests.
4.4.4 Different Computation Tasks on Each Grid Node
The second test is to execute different computation tasks on each Grid node, such as
calculating positions, ionopheric bias estimation, data file comparison and validation
and so on. Various kinds of tasks are executed on each Grid node to test the
robustness of the Grid. The result we got proves that the Grid can perform every kind
of tasks which can be done on a PC, and without any compromise of functionality.
Some examples are shown in Figure 4.6 and Figure 4.7 respectively.
57
Figure 4.6 Calculating Positions on Auriga
Figure 4.6 shows the scenario when using ‘PRSolve’ function to calculate the
position of one reference station. Observation file and navigation file are the input
parameters and pseudorange position measurement is adopted. Then Figure 4.7 is the
example of calculating the ionospheric bias using ‘IonoBias’. Two days’ observation
data and navigation data are used as the input and the output is displayed on the
screen directly.
Figure 4.7 Calculating Ionospheric Bias on Auriga
58
The time factor of a complex job with different kinds of stage-in and stage-out is
depicted in Table 4.12. As we don’t have quota to transfer data in Grid Australia, this
test and the data sharing test is not conducted there. In one of the test cases, which is
used to calculate the position, 140KB observation data is staged-out from Grid server
to Grid node, and 22.2KB result data in log file is staged-in back to grid server. One
issue we must mention here is that the real-world NRTK module always takes a
longer time to finish (especially the post-processing), such as several minutes, than
the user CPU time in this experiment, herein less than 1 second.
Table 4.12 Time Factor of a Complex Job (Unit: Second)
Environment Complex Job Elapsed
time
User
CPU
time
Sys
CPU
time
STD*
Laboratory (PC
network)
None 140KB 5.5628 0.508 0.034 0.439 None 26.4MB 10.538 0.57 0.00 0.422 None 172MB 25.844 0.58 0.005 0.435
22.2KB 140KB 7.9188 0.536 0.036 0.399
* STD the standard deviation of the elapsed time based on the samples of 5 tests
As we can see from Table 4.12, high time cost will occur even only a small amount
of data is staged in or staged out. So there is a high demand for the data needed to be
downloaded in each Grid node locally, while not being staged out from the server as
in the centralised solution.
4.4.5 Additional Tests
Different number of Grid nodes
One of the additional tests is to add different number of Grid nodes into the Grid
Computing environment to test the scalability of the Grid. As there are only 5 Grid
nodes in total, the time factor of this test doesn’t differ a lot as more Grid nodes are
connected to the Grid. Some thoughts of this test are given below,
• It always takes some efforts to add an additional node
59
• The whole problem needs to be modularized reasonably to make full use of
all the Grid nodes
• Sometimes more Grid nodes don’t necessary bring reduction of time cost, but
certainly improve capability of problem solving
Data sharing
Data sharing is also tested in this demonstration, where 699M data is transferred in
the Grid using GridFTP protocol. The result is shown in Table 4.13.
Table 4.13 Time Factor of Data Sharing (Unit: Second)
Environment Method Elapsed
time
User
CPU time
Sys
CPU time
Laboratory (PC
network)
Local 46.2145 0.195 3.48 GridFTP 67.0565 0.105 0.005
From the time factors of different kinds of jobs in Table 4.11, 4.12, 4.13, we can get
several conclusions:
Near real-time result with 2-3s processing time can be achieved without data
stage-in and stage-out
High demand for the needed data to be downloaded in each Grid node
locally
Data transferring is high efficient in Grid with performance similar with
operation on local disk
4.4.6 Discussion
The results shown in this Chapter indicated that the Grid Computing based
framework developed for this research is of high throughput and high efficiency.
However, it has difficulties in achieving real-time processing within 1s. Several
reasons are given below:
The fundamental processes (security and communication process) of the
current network in Grid are very time consuming.
60
In the Grid environment, as Grid nodes can be from different virtual
organisations, each Grid node has to trust each other through the
authenticating process before one job can be submitted and executed.
Application execution time is to some extent fixed
The process to execute an application or algorithm needs a certain amount
of time, which is independent of single PC or Grid.
From the time cost analysis shown in Section 4.4.4, improvements to the framework
and implemented architecture have to be made if real-time requirements are needed.
For example, security mode can be simplified or authentication time can be
eliminated if all the Grid nodes sit in one organisation, and this can reduce the whole
time cost a lot. Another solution is that Grid Computing only be used for background
processing service as given in Section 3.3, in which delays of few seconds are not
issues. Additionally, our task can be divided into a huge number of small tasks, then
lightweight Grid tool like Java Parallel Processing Framework (JPPF) (JPPF 2008),
or multi-level scheduling strategy such as FALKON (Raicu, Zhao et al. 2007) or
Condor glide-ins can be utilised to achieve a relatively smaller time cost instead of
using fork or full-featured Local Resource Managers (LRMs), such as Condor,
Portable Batch System (PBS), Loading Sharing Facility (LSF), and SEG.
4.5 Result From Lightweight Grid Tool JPPF
This section demonstrates how to achieve better result utilising lightweight Grid tool-
Java Parallel Processing Framework (JPPF). JPPF is an open source Grid Computing
platform written in Java that makes it easy to run applications in parallel, and speed
up their execution by orders of magnitude (JPPF 2008). JPPF’s architecture is
divided into three layers, Client Application, JPPF Driver and Grid nodes, which is
depicted in Figure 4.8.
61
Figure 4.8 JPPF Architecture (JPPF 2008)
In JPPF terminology, the basic unit of execution is called a task. It is the smallest
self-contained piece of code that can be executed remotely. Clients submit their jobs
through Client Applications to JPPF Driver. JPPF Driver is the Grid server, which
manages the tasks queue and takes care of the security and other fundamental issues.
Tasks are distributed to different Grid note to execute and finally the result is
collected back through JPPF Driver to Clients.
Data sharing is tested using JPPF, where 699M data is transferred in the Grid using
GridFTP protocol.
62
Table 4.14 Time Factor of Data Sharing (Unit: Second)
Environment Method Elapsed
time
User
CPU time
Sys
CPU time
Laboratory (PC
network)
Local 46.2145 0.195 3.48 GridFTP 57.0334 0.104 0.005
As we can see in Table 4.14, the test result can be improved to some extent using
lightweight Grid tool JPPF.
4.6 Summary This chapter has provided a proof-of-concept framework demonstration with the
simplified Network RTK processes. The open source software packages, such as
Ntrip client and GPS toolkit, are utilised to facilitate the demonstration tasks, such as
downloading real-time RTCM data through Ntrip from Internet and GPS related
computing tasks. The design and implementation part of the demonstration was also
described in detail and different types of tests have been conducted in the later stage
to verify the functionality and performance of the Grid.
The next chapter will give a summary of the whole thesis and research project, and
some recommendations for future research directions.
63
5. Concluding Remarks and Future Work
5.1 Concluding Remarks
The objective of this research was to design a framework for NRTK data processing
based on Grid Computing. One of the very first steps taken in this research was to
identify the current challenges of NRTK data processing. As described in Chapter 1,
the challenges include a much larger volume of data brought by an increasing
number of GNSS constellations, transmission frequencies, and ground reference
stations. Additionally, higher computational capabilities are required due to more
complex algorithms and a large volume of users whose positions may be computed at
the server ends. Review of these critical issues in Chapter 2 has provided a good
basis for the framework.
Next, a layered framework was designed to address these computing challenges in
NRTK data processing. The distributed platform with scalable ability can cope with
the higher computational capability requirements by scheduling the task to various
grid nodes, which are RTK network centres or servers. At the same time, the
processing algorithms are modularised into, for instance, real time positioning
services and background processing services, so different grid nodes or servers can
handle different parts of the whole processing task to meet different time
requirements. Data location is also taken into account in this framework, with the
preference of collecting and managing data locally in each grid node.
Although it is not possible to fully demonstrate the performance potential of the
designed framework for network-based RTK services within the research period, a
proof-of-concept system with a simplified demonstration has been performed in the
laboratory at QUT and Grid Australia. In the demonstration, real-time data was
downloaded from Ntrip casters using evaluated open source Ntrip clients. The
downloaded RTCM data format was then converted to RINEX and saved to a file on
local disk. The second part of the demonstration was sending commands from the
Grid server to other Grid nodes through Globus. In this procedure jobs were
submitted by the simple function provided by Globus and customised function
separately. The last part of the demonstration is the position calculation processing
using a few programs from the open source GPS toolkit. Various computation tasks
64
were performed and different numbers of Grid nodes were added in the Grid
environment to test the scalability of the Grid.
Results from the demonstration revealed that near real-time results of 2-3s processing
time can be achieved by utilising the Grid. This means that the proposed framework
maybe used for background processing in network-based RTK processing, which
updates network-based differential corrections at intervals of tens of seconds to
minutes. Different kinds of time costs, such as security (authentication),
communication, and application execution, have been analysed and suggestions were
given as recommendations to improve the time performance under certain
circumstances.
5.2 Future Work
Based on the work conducted and the results achieved during this master’s research
period, the author has identified several areas of improvements that can be made to
the overall NRTK data processing framework, which are provided as follows:
• Improving the scheduling algorithm. Scheduling is the kernel part of Grid
Computing, which determines the performance of the whole Grid. Due to
time limitation of the master’s project, only existing algorithms were
implemented in the demonstration. The improvement of scheduling algorithm
will speed up the whole distributed computing a lot, and in this way it makes
the real-time positioning more sensible.
• Refining the NRTK data processing framework. At this stage of this research,
each organisation seems still busy in constructing their own CORS networks,
the opportunity to apply the framework to real-world application has not been
available. More importantly the RTK algorithms are still under development.
In one to two year’s time when the CRC-SI project has been completed and
real time RTK processing software platforms are ready, it should be the time
to fully implement and test the proposed grid computing framework for the
Network RTK positioning service. In this way, the more accurate
requirements can be obtained and the framework can be refined during the
deployment of the framework.
65
• Improving the time performance under certain circumstances. As mentioned
in the result discussion part of the demonstration, some lightweight multi-
level schedulers can be utilised to get a much lower time cost if the task can
be divided into a large number of small tasks. Also security cost can be
reduced significantly if most of the Grid nodes sit within one organisation.
66
Appendix A: Installation Process of Globus
This appendix shows a full installation of Globus Toolkit 4 on a Debian 3.1 machine.
Tips of Unix
1. Unix has different shells, such as Bourne Shell (bash) and C shell (csh). For
example, the command prompt of bash looks like ‘root@grid01:/usr/java#’,
while that of csh looks like ‘grid01 %’.
2. Users’ privilege in Unix is strictly controlled. So it’s necessary to execute the
commands as the same user as this appendix when installing Globus.
3. some commands:
cat: concatenate files and print on the standard output
vim: create or edit file
scp: remote secure file copy from one machine to another
Pre-requisites
Utility tools:
zlib/gcc/g++/tar/sed/make/perl/sudo/postgres/libiodbc2/libiodbc2-dev
use the following commands to check the tools:
grid01 % dpkg --list | grep zlib
grid01 % gcc –version
grid01 % g++ –version
grid01 % tar –version
grid01 % sed –version
grid01 % make –version
grid01 % perl –version
grid01 % sudo -V
grid01 % dpkg --list | grep postgres
grid01 % dpkg --list | grep psql
If any tool is not installed, use ‘apt-get install’ command to install it.
root@grid01:/usr/local# apt-get install postgresql
root@grid01:/root# apt-get install libiodbc2 libiodbc2-dev
compulsory software: Jdk/Ant
Install java:
root@grid01:/usr/java# ./jdk-1_5_0_14-linux-i586.bin
67
Install Ant:
root@grid01:/usr/local# tar xzf apache-ant-1.7.0-bin.tar.gz
root@grid01:/usr/local# ls apache-ant-1.7.0
Building the Toolkit
1. Add a non-privileged user named ‘globus’, which will be used to perform
administrative tasks such as starting and stopping the container, deploying
services, etc.
root@grid01:~# adduser globus
root@grid01:/etc/init.d# mkdir /usr/local/globus-4.0.1/
root@grid01:/etc/init.d# chown globus:globus /usr/local/globus-4.0.1/
2. Setup java environment.
globus@grid01:~/gt4.0.1-all-source-installer$ export
ANT_HOME=/usr/local/apache-ant-1.7.0
globus@grid01:~/gt4.0.1-all-source-installer$ export JAVA_HOME=/usr/java/jdk1.5.0_14/
globus@grid01:~/gt4.0.1-all-source-installer$ export
PATH=$ANT_HOME/bin:$JAVA_HOME/bin:$PATH
globus@grid01:~/gt4.0.1-all-source-installer$ ./configure --
prefix=/usr/local/globus-4.0.1/ \
--with-iodbc=/usr/lib
Note:
The machine I am installing on doesn't have access to a scheduler. If it did, I
would have specified one of the wsgram scheduler options, like --enable-
wsgram-condor, --enable-wsgram-lsf, or --enable-wsgram-pbs.
3. Build and install the toolkit
globus@ grid01:~/gt4.0.1-all-source-installer$ make | tee installer.log
globus@grid01:~/gt4.0.1-all-source-installer$ make install
Setting up security (different in Server and other grid nodes)
1. SimpleCA (Install SimpleCA on the server, for other grid nodes, just trust this
SimpleCA)
Install SimpleCA on the server
globus@grid01:~$ export GLOBUS_LOCATION=/usr/local/globus-4.0.1
68
globus@grid01:~$ source $GLOBUS_LOCATION/etc/globus-user-env.sh
globus@grid01:~$ $GLOBUS_LOCATION/setup/globus/setup-simple-ca
check installation result:
globus@grid01:~$ ls ~/.globus/
globus@grid01:~$ ls ~/.globus/simpleCA/
Other grid nodes:
globus@grid02:~$ scp
grid01:.globus/simpleCA/globus_simple_ca_6240356a_setup-0.19.tar.gz .
globus@grid02:~$ export GLOBUS_LOCATION=/usr/local/globus-4.0.1
globus@grid02:~$ $GLOBUS_LOCATION/sbin/gpt-build
globus_simple_ca_6240356a _setup-0.19.tar.gz
globus@ grid02:~$ $GLOBUS_LOCATION/sbin/gpt-postinstall
2. Make the machine trust the new CA
root@grid01:~# export GLOBUS_LOCATION=/usr/local/globus-4.0.1
root@grid01:~# $GLOBUS_LOCATION/setup/globus_simple_ca_6240356a
_setup/setup-gsi –default
check configuration results:
root@grid01:~# ls /etc/grid-security/
root@grid01:~# ls /etc/grid-security/certificates/
3. set up hostcert and sign the certificate using the SimpleCA
root@grid01:~# source $GLOBUS_LOCATION/etc/globus-user-env.sh
root@grid01:~# grid-cert-request -host `hostname`
globus@grid01:~$ grid-ca-sign -in /etc/grid-security/hostcert_request.pem -out
hostsigned.pem
root@grid01:~# cp ~globus/hostsigned.pem /etc/grid-security/hostcert.pem
root@grid01:/etc/grid-security# cp hostcert.pem containercert.pem
root@grid01:/etc/grid-security# cp hostkey.pem containerkey.pem
root@grid01:/etc/grid-security# chown globus:globus container*.pem
root@grid01:/etc/grid-security# ls -l *.pem
4. set up usercert for the real Globus user ‘ade’ and sign the certificate as user
‘globus’
grid01 % setenv GLOBUS_LOCATION /usr/local/globus-4.0.1/
grid01 % source $GLOBUS_LOCATION/etc/globus-user-env.csh
69
grid01 % grid-cert-request
grid01 % cat /home/ade/.globus/usercert_request.pem | mail globus@grid01
globus@grid01:~$ grid-ca-sign -in request.pem -out signed.pem
globus@grid01:~$ cat signed.pem | mail ade@grid01
grid01 % cp signed.pem ~/.globus/usercert.pem
grid01 % ls -l ~/.globus/
5. create a grid-mapfile as ‘root’ for authorization
root@grid01:/etc/grid-security# $GLOBUS_LOCATION/sbin/grid-mapfile-add-
entry -dn "/O=Grid/OU=GlobusTest/OU=simpleCA-
grid01.debianGridDomain/OU=debianGridDomain/CN=ade" -ln ade
6. test and verify of CA
grid01 % grid-proxy-init -verify -debug
Setting up GridFTP
1. add the gridftp service to xinetd.d
root@grid01:/etc/grid-security# vim /etc/xinetd.d/gridftp
root@grid01:/etc/grid-security# cat /etc/xinetd.d/gridftp
service gsiftp
{
instances = 100
socket_type = stream
wait = no
user = root
env += GLOBUS_LOCATION=/usr/local/globus-4.0.1
env += LD_LIBRARY_PATH=/usr/local/globus-4.0.1/lib
server = /usr/local/globus-4.0.1/sbin/globus-gridftp-server
server_args = -i
log_on_success += DURATION
nice = 10
disable = no
}
2. add the gsiftp service to /etc/services
root@grid01:/etc/grid-security# vim /etc/services
70
root@grid01:/etc/grid-security# tail /etc/services
vboxd 20012/udp
binkp 24554/tcp # binkp fidonet protocol
asp 27374/tcp # Address Search Protocol
asp 27374/udp
dircproxy 57000/tcp # Detachable IRC Proxy
tfido 60177/tcp # fidonet EMSI over telnet
fido 60179/tcp # fidonet EMSI over TCP
# Local services
gsiftp 2811/tcp
3. reload xinetd service
root@grid01:/etc/grid-security# /etc/init.d/xinetd reload
Reloading internet superserver configuration: xinetd.
root@grid01:/etc/grid-security# netstat -an | grep 2811
tcp 0 0 0.0.0.0:2811 0.0.0.0:* LISTEN
test the configuration:
grid01 % grid-proxy-init -verify -debug
grid01 % globus-url-copy gsiftp://debiangrid001/etc/group
file:///tmp/ade.test.copy
grid01 % diff /tmp/ade.test.copy /etc/group
Starting the webservices container
1. setup an /etc/init.d entry for the webservices container
globus@gid01:~$ vim $GLOBUS_LOCATION/start-stop
globus@grid01:~$ cat $GLOBUS_LOCATION/start-stop
#! /bin/sh
set -e
export GLOBUS_LOCATION=/usr/local/globus-4.0.1
export JAVA_HOME=/usr/java/j2sdk1.4.2_10/
export ANT_HOME=/usr/local/apache-ant-1.6.5
export GLOBUS_OPTIONS="-Xms256M -Xmx512M"
. $GLOBUS_LOCATION/etc/globus-user-env.sh
71
cd $GLOBUS_LOCATION
case "$1" in
start)
$GLOBUS_LOCATION/sbin/globus-start-container-detached -p 8443
;;
stop)
$GLOBUS_LOCATION/sbin/globus-stop-container-detached
;;
*)
echo "Usage: globus {start|stop}" >&2
exit 1
;;
esac
exit 0
globus@grid01:~$ chmod +x $GLOBUS_LOCATION/start-stop
2. create an /etc/init.d script to call the globus user’s start-stop script
root@grid01:~# vim /etc/init.d/globus-4.0.1
root@grid01:~# cat /etc/init.d/globus-4.0.1
#!/bin/sh -e
case "$1" in
start)
su - globus /usr/local/globus-4.0.1/start-stop start
;;
stop)
su - globus /usr/local/globus-4.0.1/start-stop stop
;;
restart)
$0 stop
sleep 1
$0 start
;;
*)
printf "Usage: $0 {start|stop|restart}\n" >&2
72
exit 1
;;
esac
exit 0
root@grid01:~# chmod +x /etc/init.d/globus-4.0.1
root@grid01:~# /etc/init.d/globus-4.0.1 start
3. use one of the sample clients/services to interact with the container
grid01 % setenv JAVA_HOME /usr/java/jdk1.5.0_14/
grid01 % setenv ANT_HOME /usr/local/apache-ant-1.7.0/
grid01 % setenv PATH $ANT_HOME/bin:$JAVA_HOME/bin:$PATH
grid01 % counter-client -s
https://debiangrid001:8443/wsrf/services/CounterService
Got notification with value: 3
Counter has value: 3
Got notification with value: 13
Configuring RFT
1. Configure the system to allow TCP/IP connections to postgres, as well as adding
a trust entry for our current host
root@grid01:~# vim /var/lib/postgres/postmaster.conf
root@grid01:~# grep POSTMASTER /var/lib/postgres/postmaster.conf
POSTMASTER_OPTIONS="-i"
root@grid01:~# vim /var/lib/postgres/data/pg_hba.conf
root@grid01:~# grep rftDatabase /etc/postgresql/pg_hba.conf
host rftDatabase "globus" "192.168.0.100" 255.255.255.255 md5
root@grid01:~# /etc/init.d/postgresql restart
Stopping PostgreSQL database server: postmaster.
Starting PostgreSQL database server: postmaster.
2. create the ‘rftDatabase’ as the user ‘globus’
root@grid01:~# su postgres -c "createuser -P globus"
Enter password for new user: *****
Enter it again: *****
Shall the new user be allowed to create databases? (y/n) y
Shall the new user be allowed to create more new users? (y/n) n
73
CREATE USER
globus@grid01:~$ createdb rftDatabase
CREATE DATABASE
globus@grid01:~$ psql -d rftDatabase -f
$GLOBUS_LOCATION/share/globus_wsrf_rft/rft_schema.sql
globus@grid01:~$ vim $GLOBUS_LOCATION/etc/globus_wsrf_rft/jndi-
config.xml
globus@grid01:~$ grep -C 3 password
$GLOBUS_LOCATION/etc/globus_wsrf_rft/jndi-config.xml
</parameter>
<parameter>
<name>
password
</name>
<value>
*****
test configuration:
root@grid01:~# /etc/init.d/globus-4.0.1 restart
Stopping Globus container. PID: 29985
Starting Globus container. PID: 8620
root@grid01:~# head /usr/local/globus-4.0.1/var/container.log
3. try an RFT transfer
grid01 % cp /usr/local/globus-4.0.1/share/globus_wsrf_rft_test/transfer.xfr rft.xfr
grid01 % vim rft.xfr
grid01 % cat rft.xfr
true
16000
16000
false
1
true
1
null
null
74
false
10
gsiftp://debiangrid001.debiangriddomain:2811/etc/group
gsiftp://debiangrid001.debiangriddomain:2811/tmp/rftTest_Done.tmp
grid01 % rft -h debiangrid001 -f rft.xfr
grid01 % diff /etc/group /tmp/rftTest_Done.tmp
Setting up WS GRAM
1. Setup sudo so the user ‘globus’ can start jobs as a different user
root@grid01:~# visudo
root@grid01:~# cat /etc/sudoers
globus ALL=(ade) NOPASSWD: /usr/local/globus-4.0.1/libexec/globus-
gridmap-and-execute
-g /etc/grid-security/grid-mapfile /usr/local/globus-4.0.1/libexec/globus-job-manager-script.pl *
globus ALL=(ade) NOPASSWD: /usr/local/globus-4.0.1/libexec/globus-
gridmap-and-execute
-g /etc/grid-security/grid-mapfile /usr/local/globus-4.0.1/libexec/globus-gram-
local-proxy-tool *
2. Test WS GRAM command ‘globusrun-ws’
grid01 % globusrun-ws -submit -c /bin/true
grid01 % echo $?
0
grid01 % globusrun-ws -submit -c /bin/false
grid01 % echo $?
1
75
Appendix B: Web Service for Submitting a Job
//0. Importing necessary classes and libraries
SubmittingJobInC()
{
//1. Loading the job description
const char * file = "job.xml";
globus_soap_message_handle_t message;
wsgram_CreateManagedJobInputType input;
globus_soap_message_handle_init_from_file(&message, file);
globus_soap_message_deserialize_element_unknown(message, &element);
if(strcmp(element.local, "job") == 0)
{
wsgram_JobDescriptionType * jd;
input.choice_value.type = wsgram_CreateManagedJobInputType_job;
jd = &input.choice_value.value.job;
wsgram_JobDescriptionType_deserialize(&element, jd, message, 0);
}
else if(strcmp(element.local, "multiJob") == 0)
{
wsgram_MultiJobDescriptionType * mjd;
input.choice_value.type = wsgram_CreateManagedJobInputType_multiJob;
mjd = &input.choice_value.value.multiJob;
wsgram_MultiJobDescriptionType_deserialize(&element, mjd, message, 0);
}
xsd_QName_destroy_contents(&element);
globus_soap_message_handle_destroy(message);
//2. Setting the security attributes
globus_soap_message_attr_t message_attr;
globus_soap_message_attr_init(&message_attr);
/*
* Set authentication mode to host authorization: other possibilities are
* GLOBUS_SOAP_MESSAGE_AUTHZ_HOST_IDENTITY or
* GLOBUS_SOAP_MESSAGE_AUTHZ_HOST_SELF.
*/
globus_soap_message_attr_set(
message_attr,
GLOBUS_SOAP_MESSAGE_AUTHZ_METHOD_KEY,
NULL,
NULL,
(void *) GLOBUS_SOAP_MESSAGE_AUTHZ_HOST);
/*
* Set message protection level. GLOBUS_SOAP_MESSAGE_AUTH_PROTECTION_PRIVACY
* for encryption.
*/
globus_soap_message_attr_set(
message_attr,
76
GLOBUS_SOAP_MESSAGE_AUTH_PROTECTION_KEY,
NULL,
NULL,
(void *) GLOBUS_SOAP_MESSAGE_AUTH_PROTECTION_PRIVACY);
//3. Creating the factory client handle
ManagedJobFactoryService_client_handle_t factory_handle;
result = ManagedJobFactoryService_client_init(
&factory_handle,
message_attr,
NULL);
//4. Querying for factory resource properties
/*
* localResourceManager, or other resource property names as defined in the
* WSDL
*/
xsd_QName property_name =
{
"http://www.globus.org/namespaces/2004/10/gram/job",
"localResourceManager"
};
wsrp_GetResourcePropertyResponseType * property_response;
int fault_type;
xsd_any * fault;
ManagedJobFactoryPortType_GetResourceProperty(
factory_handle,
endpoint,
&property_name,
&property_response,
(ManagedJobFactoryPortType_GetResourceProperty_fault_t *) &fault_type,
&fault);
//5. Creating the notification consumer
globus_service_engine_t engine;
wsa_EndpointReferenceType consumer_reference;
globus_service_engine_init(&engine, NULL, NULL, NULL, NULL, NULL);
globus_notification_create_consumer(
&consumer_reference,
engine,
notify_callback,
NULL);
//6. Creating the job resource
/*
* You can set input.InitialTerminationTime to be a timeout if interested.
* The xsd_dateTime type is a struct tm pointer.
*/
time_t term_time = time(NULL);
globus_uuid_t uuid;
wsa_AttributedURI * job_id;
wsa_EndpointReferenceType * factory_epr;
xsd_any * reference_property;
wsgram_CreateManagedJobOutputType * output = NULL;
xsd_QName factory_reference_id_qname =
77
{
"http://www.globus.org/namespaces/2004/10/gram/job",
"ResourceID"
};
term_time += 60 * 60; /* 1 hour later */
xsd_dateTime_copy(&input.InitialTerminationTime, gmtime(&term_time));
/*
* Set unique JobID. This is used to reliably create jobs and check for status.
*/
globus_uuid_create(&uuid);
wsa_AttributedURI_init(&job_id);
job_id->base_value = globus_common_create_string("uuid:%s", uuid.text);
/* Subscribe to notifications at create time */
wsnt_SubscribeType_init(&input.Subscribe);
wsa_EndpointReferenceType_copy_contents(
&input.Subscribe.ConsumerReference,
&consumer_reference);
xsd_any_init(&input.Subscribe->TopicExpression.any);
&input.Subscribe->TopicExpression.any->any_info =
&xsd_QName_contents_info;
xsd_QName_copy(
(xsd_QName **) &input.Subscribe->TopicExpression.any->any.value,
&ManagedJobPortType_state_rp_qname);
xsd_anyURI_copy_cstr(
&input.Subscribe->TopicExpression._Dialect,
"http://docs.oasis-open.org/wsn/2004/06/TopicExpression/Simple");
xsd_boolean_init(&input.Subscribe->UseNotify);
*(&input.Subscribe->UseNotify) = GLOBUS_TRUE;
/* Construct the EPR of the job factory */
wsa_EndpointReferenceType_init(&factory_epr);
wsa_AttributedURI_init_contents(&factory_epr->Address);
xsd_anyURI_init_contents_cstr(&factory_epr->Address.base_value,
globus_common_create_string(
"https://%s:%hu/wsrf/services/%s",
"192.168.0.1",
"8443",
"ManagedJobFactoryService");
wsa_ReferencePropertiesTypeinit(&factory_epr->ReferenceProperties);
reference_property = xsd_any_array_push(
&factory_epr->ReferenceProperties.any);
reference_property->any_info = &xsd_string_info;
xsd_QName_copy(
&reference_property->element,
&factory_reference_id_qname);
xsd_string_copy_cstr(
(xsd_string **) &reference_property->value,
"Fork");
/* Submit the request to the service container */
ManagedJobFactoryPortType_createManagedJob_epr(
factory_handle,
78
factory_epr,
input,
&output,
(ManagedJobFactoryPortType_createManagedJob_fault_t *) &fault_type,
&fault);
//7. Subscribing for job state notifications
ManagedJobService_client_handle_t job_handle;
wsnt_SubscribeType subscribe_input;
wsnt_SubscribeResponseType * subscribe_response;
ManagedJobService_client_init(
&job_handle,
message_attr,
NULL);
ManagedJobPortType_Subscribe_epr(
job_handle,
output->managedJobEndpoint,
subscribe_input,
&subscribe_response,
(ManagedJobPortType_Subscribe_fault_t *) &fault_type,
&fault);
//8. Releasing any state holds (if necessary)
wsgram_ReleaseInputType release;
wsgram_ReleaseOutputType * release_response = NULL;
wsgram_ReleaseInputType_init_contents(&release);
ManagedJobPortType_release_epr(
job_handle,
output->managedJobEndpoint,
&release,
&release_response,
(ManagedJobPortType_release_fault_t *) &fault_type,
&fault);
//9. Destroying resources
/* destroy subscription resource */
SubscriptionManagerService_client_init subscription_handle;
wsnt_DestroyType destroy;
wsnt_DestroyResponseType * destroy_response = NULL;
SubscriptionManagerService_client_init(
&subscription_handle,
message_attr,
NULL);
/* if subscription done at job creation time, use
* output->subscriptionEndpoint in place of
* subscribe_response->SubscriptionReference,
*/
SubscriptionManager_Destroy_epr(
subscription_handle,
subscribe_response->SubscriptionReference,
destroy,
&destroy_response,
SubscriptionManager_Destroy_fault_t *) &fault_type,
fault);
79
/* destroy the job resource */
jobPort.destroy(new Destroy());
ManagedJobPortType_Destroy_epr(
job_handle,
output->managedJobEndpoint,
&destroy,
&destroy_response,
(ManagedJobPortType_Destroy_fault_t *) &fault_type,
&fault);
}
80
References AuScope. (2008). "AuScope." from http://www.auscope.org.au/. Cafaro, D. A. (2007). "Migrating from WSRF to WSRT." Retrieved 20, May, 2008, from http://www.ibm.com/developerworks/grid/library/gr-wsrfwsrt/. CERN. (2008). "LHC Computing Grid." Retrieved 3rd May, 2008, from http://public.web.cern.ch/public/en/LHC/Computing-en.html.
=89
CRCSI. (2007). "Delivering Precise Positioning Services in Regional Areas." from http://www.crcsi.com.au/pages/project.aspx?projectid . Ellisman, M. and S. Peltier (2004). Medical Data Federation: The Biomedical Informatics Research Network. The grid: blueprint for a new computing infrastructure FACC. (2007). "NTRIP." from http://igs.bkg.bund.de/. Feller, M., I. Foster and S. Martin (2007). GT4 GRAM: A Functionality and Performance Study. TERAGRID 2007 CONFERENCE, MADISON, WI, US. Feng, Y. and B. Li (2008). An Overview of Three Carrier Ambiguity Resolutions: Problems, Models, Methods and Performance Analysis Using Semi-Generated Triple Frequency GPS Data. Proceedings of ION GNSS 2008, Savannah, Georgia. Feng, Y. and C. Rizos (2005). Three Carrier Approaches for Future Global, Regional and Local GNSS Positioning Services: Concepts and Performance Perspectives. Proceedings of ION GNSS 2005, Long Beach, CA. Feng, Y. and Y. Zheng (2005). "Efficient Interpolations to GPS Orbits for Precise Wide Area Applications." GPS Solutions 9(4): 273-282. Foster, I. (2005). "A Globus Primer." Retrieved 3rd, May, 2008, from http://www.globus.org/. Foster, I. and C. Kesselman (2004). The grid: blueprint for a new computing infrastructure. 2nd ed., Elsevier Foster, I., C. Kesselman, G. Tsudik and S. Tuecke (1998). "A Security Architecture for Computational Grids." 5th ACM Conference on Computer and Communication Security. Globus. (2008). "The Globus Alliance." Retrieved 3rd May, 2008, from http://www.globus.org/. GPSnet. (2008). "GPSnet." from http://www.land.vic.gov.au/GPSnet. Gurtner, W. and L. Estey. (2007). "RINEX: The Receiver Independent Exchange Format Version 2.11." from ftp://ftp.unibe.ch/aiub/rinex/.
81
Higgins, M. (2008). Legal Traceability of GNSS Measurements in Australia. Integrating Generations, FIG Working Week 2008. Stockholm, Sweden. Joseph, J. and C. Fellenstein (2004). Grid computing, Upper Saddle River, N.J. : Prentice Hall Professional Technical Reference. JPPF. (2008). "Java Parallel Processing Framework." Retrieved 25th, Dec, 2008, from http://www.jppf.org/. Kamath, C. (2001). "The Role of Parallel and Distributed Processing in Data Mining." IEEE Computer Society, Spring(Newsletter of the Technical Committee on Distributed Processing). LENZ, E. (2004). Networked Transport of RTCM via Internet Protocol (NTRIP) – Application and Benefit in Modern Surveying Systems. FIG Working Week. Athens, Greece. Leung, J. Y.-T. (2004). Handbook of Scheduling : Algorithms, Models, and Performance. London, Chapman & Hall/CRC. Lim, S. and C. Rizos (2007). A New Framework for Server-Based and Thin-Client GNSS Operations for High Accuracy Applications in Surveying and Navigation. ION GNSS 20th International Technical Meeting of the Satellite Division. Fort Worth, TX, U.S. Lim, S. and C. Rizos (2008). System Architecture for Server-Based Network-RTK using Multiple GNSS. Loo, A. W.-S. (2003). "The future of peer-to-peer computing." Communication of ACM 46(9): 57-61. Loo, A. W.-S. (2007). Peer-to-Peer Computing: Building Supercomputers with Web Technologies. First Edition, Springer. Misra, P. and P. Enge (2001). GLOBAL POSITIONING SYSTEM-Signals, Measurements, and Performance. First Edition. Lincoln, Massachusetts, U.S.A, Ganga-Jamuna Press. Misra, P. and P. Enge (2006). GLOBAL POSITIONING SYSTEM-Signals, Measurements, and Performance. Second Edition. Lincoln, Massachusetts, U.S.A, Ganga-Jamuna Press. NGS, C. T. (2006). "What Is CORS?" from http://www.ngs.noaa.gov/CORS/cors-data.html. R.Fraser, T.Rankine and R.Woodcock (2007). "Service oriented grid architecture for geosciences community." Proceedings of the fifth Australasian symposium on ACSW frontiers 68(Fifth Australasian Symposium on Grid Computing and e-Research (AusGrid 2007)).
82
83
Raicu, I., Y. Zhao, C. Dumitrescu, I. Foster and M. Wilde (2007). Falkon: a Fast and Light-weight tasK executiON framework. SC07. Reno, Nevada, USA. RETSCHER, G. (2002). "Accuracy Performance of Virtual Reference Station (VRS) Networks." Journal of Global Positioning Systems 1. Rizos, C. (2003). "Network RTK Research and Implementation - A Geodetic Perspective." Journal of Global Positioning Systems 1(No. 2): 144-150. Rizos, C. (2007). The International GNSS Service: In the Service of Geoscience and the Geospatial Industry. International Global Navigation Satellite Systems Society IGNSS Symposium 2007. The University of New South Wales, Sydney, Australia. RTCM. (2007). "The Radio Technical Commission for Maritime Services." from http://www.rtcm.org/. Sotomayor, B. and L. Childers (2006). Globus Toolkit 4: programming Java Services. 1st ed., Elsevier SydNET. (2008). "SydNET." from http://sydnet.lands.nsw.gov.au/images/MetroNETCoverage.jpg.
me
UTA. (2008). "GPS Toolkit." from http://www.gpstk.org/bin/view/Documentation/WebHo . Wanninger, L. (2003). "Virtual Reference Stations (VRS)." GPS on the Web. Wanninger, L. (2006). "Introduction to Network RTK ", from http://www.network-rtk.info/intro/introduction.html. Weber G., D. D., Gebhard H. (2005). Networked Transport of RTCM via Internet Protocol (Ntrip)–IP-Streaming for Real-Time GNSS Applications. ION GNSS 18th International Technical Meeting of the Satellite Division. Long Beach, CA. Wikipedia. (2007). "Global Positioning System." from http://en.wikipedia.org/wiki/Global_Positioning_System.
29
Wikipedia. (2008). "Kerberos (protocol)." from http://en.wikipedia.org/wiki/Kerberos_%28protocol% .