Major Application Areas in Cyberspace Joel Crichlow, Ph.D.
-
Upload
lora-sheena-gordon -
Category
Documents
-
view
217 -
download
0
Transcript of Major Application Areas in Cyberspace Joel Crichlow, Ph.D.
Major Application Areas in Cyberspace
Joel Crichlow, Ph.D
AreasDistributed File SystemsDistributed Database SystemsDistributed Computation SystemsDistributed Real-Time SystemsDistributed Multimedia SystemsDistributed Operating Systems
Distributed File Systems
Structure◦ Client-Server◦ Peer-to-Peer
Issues◦ Unit of Access
◦ File◦ Page/Block◦ Record◦ Word/Byte
Distributed File Systems
Issues◦ Division of Labor
◦ Clients maintain own file system◦ Server maintains a global file system
◦ All file commands are channeled to the server◦ Use mounting to combine local file systems with global
file system
Distributed File Systems
Client maintains file system
Maps local textual names onto global FIDs
Client Server
User 1 Filename FID FID Page mapUser 2 Friends 100179 entryUser 3 Foes 428761 100179
Filename FID Page Block0 41 72 3
File map
Distributed File SystemsClients maintain own file systemGlobal file naming is done at the server levelIf the file server provides automatic backup and recovery facilities, then files can be classified as recoverable, robust or ordinaryThe unit of access available to the client will determine how much data are stored at the server for mapping the client’s logical request onto the physical address
Distributed File Systems
Use mounting to combine local file systems with global file system
Server 1 has a directory ‘play’; server 2 has a directory ‘work’. Client 1 places ‘play’ and ‘work’ at the same level; client 2 places ‘work’ in a sub-directory of ‘play’
Server 1: play
playhard
playeasy
workhard
workeasy
Client 1work play
Client 2
play
playhard
playeasy
workhard
workeasy
workhard
workeasy
work playhard playeasy
Server 2: work
Google File System (GFS)Latest version is called Colossus.Two of the key issues addressed by the designers were (a) The frequency of component failures.(b) The management of very large data sets.
Google File System (GFS)GFS runs on thousands of storage machines built from inexpensive commodity parts, and it is accessed by an equivalent number of client machinesFailure is viewed as the norm rather than the exceptionThe system must constantly monitor itself to detect, tolerate and recover from failure
Google File System (GFS)The system supports millions of files of any size, but multi-GB files are common. Many of the accesses to these files are large streaming reads that can read up to 1 MB or more.
Google File System (GFS)There are many large sequential writes that append multiple KB to MB of data to filesMultiple clients can append atomically to the same file concurrentlyThere are also small reads of a few KB at any offset and small writes to arbitrary positions in a file
Google File System (GFS)The GFS architecture comprises a single master, multiple chunkservers and multiple clients. Files are divided into fixed-size blocks called chunks of 64 MB (current size).
Google File System (GFS)The master keeps informed of the current state of the system by sending (Heartbeat) messages periodically to each chunkserver.
The GFS client provides the interface for applications to use the file system.
Distributed Database Systems
Distribution Problem and Pattern◦ Volume and Activity◦ Number of Participating Hosts◦ Storage Facilities◦ Communication Load◦ Replication and Partitioning
Distributed Database Systems
Queries and Updates
Phases
Query phases◦ Copy identification phase◦ Query decomposition◦ Response composition
Update phases◦ Copy Identification◦ Pessimistic/Optimistic approach
Distributed Database Systems
Queries Supplier relation Unit price relation
What are the names of suppliers in NY who supply screws at a unit price of less than $1.00?
S# Name City S# P# Price100 JOHN POS 100 1011 $0.50200 DOE NY 100 1300 $1.50
200 1123 $0.60200 1246 $0.70
Parts relation DictionaryP# Pname Quant. Relat. Locat. #Tups T-size1011 Bolt 400 Sup. Site 1 800 101123 Nut 400 Part Site 2 1500 101246 Screw 600 Price Site 3 10000 31300 Nail 500
Site 1Site 2
Site 3
Supplerrelation
Partsrelation
Unit pricerelation
Query is made here
Distributed Database Systems
Updates◦ Integrity◦ Concurrency◦ Replication
Big Data managementHandles very large amounts of data distributed over many serversHighly available service with no single point of failureKey-value storeDifferent levels of consistencyAutomatic replication of data to multiple nodes
Google BigTableGoogle’s NoSQL distributed data management system.BigTable is a sparse map or (key, value) store distributed over multiple servers.It is designed to include clusters comprising thousands of commodity servers storing petabytes of data.
Google BigTable The data or values stored in BigTable are treated as uninterpreted strings. The BigTable key is three-dimensional. The three-part key contains a row key, a column key and a timestamp. Therefore the mapping takes the form:
(row key, column key, timestamp) value.
Col Family 0 Col Family 1 Col Fam 2
Col 0 Col 1 Col 1 Col 1 Col 0 Col 0 Col 2
Row
Timestamp
Distributed Computation SystemsNetworked computers cooperate in the execution of a computationally intensive programThe Network PlatformAlgorithm Design and ImplementationLanguages, Standards and Tools
Distributed Computation Systems
The Network Platform◦ Cluster Computing◦ The Internet◦ The Lambdagrid
Algorithm Design and Implementation◦ control parallelism◦ data parallelism
Distributed Computation Systems
Languages, Standards and Tools◦ PVM◦ MPI◦ DCE◦ CORBA◦ Globus Toolkit
Distributed Computation
Tasks (T) interact with each other in a PVM running context. PVM uses network protocols (N) for communication among the computers
Distributed applications use MIDDLEWARE tools to interoperate over a network of heterogeneous computers
T
T
T
T
PVM
PVMPVM
NN
N
Distributed applications
MIDDLEWARE
Host OS and network service
Distributed applications
MIDDLEWARE
Host OS and network service
network
XSEDEThe Extreme Science and Engineering Discovery Environment, XSEDE, tightly integrates supercomputing resources, storage and scientific instruments across geographically dispersed major research centersThe interconnection network includes a backbone of hubs allowing interhub transmission capacity of 40 Gbps.To the hubs are linked border routers which are the interfaces between the grid and the sites.Each site has up to 10 Gbps dedicated transmission capacity
XSEDEThe XSEDE interconnection network is hierarchical.
Distributed Real-Time Systems
EnvironmentGeographic RangeCommunication TrafficComputer Processing
Distributed Real-Time Systems
Computer Processing
Distributed real-time processing may be hierarchical, involving a low-level network of sensors feeding data to data -aggregation nodes which feed high-level servers
Server Server
Data Aggregation Network
Network of Sensors
Distributed Multimedia Systems
The Signals◦ Stereo quality audio CD would require up to 1.411 Mbps.◦ Video: flash discrete images at a rate of 50 or more images per second◦ The images in video can be represented as a sequence of frames (a
frame is a rectangular grid of pixels)◦ Twenty-four bits per pixel with a frame of 1024 * 768 pixels is
illustrative of present high-resolution technology◦ Transmitting at 25 frames per second would require transmission
capacity in excess of 400 Mbps
Peer-to-Peer Multimedia Systems Media On Demand (MOD)
◦ Video on Demand (VOD) or On Demand (OD)
Distributed Multimedia
Media On Demand (MOD)MOD server maintains a digital repository of videos which home users, via communication networks, can access and view immediately
MOD
servernetwork
home
home
home
Distributed MultimediaMedia On Demand (MOD)Massive storage must be arranged as hierarchical structure
network
client
client viewers
Server
RAM
Server
RAM
Mag disk
Mag disk
Opt disk
Opt diskMag
tapeMag tape
Distributed Operating Systems
Network Operating SystemDistributed Operating SystemIssuesThreads
Network Operating System
Network operating system of agents and different local operating systems
network
agent
agent
agent
Local OS Local OS
Local OS
Distributed Operating Systems
Homogeneous network-wide operating system
Issues◦ Fundamental OS problems◦ Data integrity◦ Fail-Soft operation◦ Security◦ Performance◦ Scalability
Threads
Windows NT family The Windows NT family comprises a series of releases of operating systems that support distributed system applications
The NT architecture comprises a number of layers◦ Hardware Abstraction Layer (HAL)◦ Kernel◦ Executive◦ Subsystems.
Windows NT family Architecture
Mach microkernel Mach is a distributed operating system project that has seen its kernel used in several Unix-like operating systems and in the Mac OS X operating system
ConclusionWe looked at:
Distributed File Systems
Distributed Database Systems
Distributed Computation Systems
Distributed Real-Time Systems
Distributed Multimedia Systems
Distributed Operating Systems