Major Application Areas in Cyberspace Joel Crichlow, Ph.D.

Post on 22-Dec-2015

217 views 0 download

Transcript of Major Application Areas in Cyberspace Joel Crichlow, Ph.D.

Major Application Areas in Cyberspace

Joel Crichlow, Ph.D

AreasDistributed File SystemsDistributed Database SystemsDistributed Computation SystemsDistributed Real-Time SystemsDistributed Multimedia SystemsDistributed Operating Systems

Distributed File Systems

Structure◦ Client-Server◦ Peer-to-Peer

Issues◦ Unit of Access

◦ File◦ Page/Block◦ Record◦ Word/Byte

Distributed File Systems

Issues◦ Division of Labor

◦ Clients maintain own file system◦ Server maintains a global file system

◦ All file commands are channeled to the server◦ Use mounting to combine local file systems with global

file system

Distributed File Systems

Client maintains file system

Maps local textual names onto global FIDs

Client Server

User 1 Filename FID FID Page mapUser 2 Friends 100179 entryUser 3 Foes 428761 100179

Filename FID Page Block0 41 72 3

File map

Distributed File SystemsClients maintain own file systemGlobal file naming is done at the server levelIf the file server provides automatic backup and recovery facilities, then files can be classified as recoverable, robust or ordinaryThe unit of access available to the client will determine how much data are stored at the server for mapping the client’s logical request onto the physical address

Distributed File Systems

Use mounting to combine local file systems with global file system

Server 1 has a directory ‘play’; server 2 has a directory ‘work’. Client 1 places ‘play’ and ‘work’ at the same level; client 2 places ‘work’ in a sub-directory of ‘play’

Server 1: play

playhard

playeasy

workhard

workeasy

Client 1work play

Client 2

play

playhard

playeasy

workhard

workeasy

workhard

workeasy

work playhard playeasy

Server 2: work

Google File System (GFS)Latest version is called Colossus.Two of the key issues addressed by the designers were (a) The frequency of component failures.(b) The management of very large data sets.

Google File System (GFS)GFS runs on thousands of storage machines built from inexpensive commodity parts, and it is accessed by an equivalent number of client machinesFailure is viewed as the norm rather than the exceptionThe system must constantly monitor itself to detect, tolerate and recover from failure

Google File System (GFS)The system supports millions of files of any size, but multi-GB files are common. Many of the accesses to these files are large streaming reads that can read up to 1 MB or more.

Google File System (GFS)There are many large sequential writes that append multiple KB to MB of data to filesMultiple clients can append atomically to the same file concurrentlyThere are also small reads of a few KB at any offset and small writes to arbitrary positions in a file

Google File System (GFS)The GFS architecture comprises a single master, multiple chunkservers and multiple clients. Files are divided into fixed-size blocks called chunks of 64 MB (current size).

Google File System (GFS)The master keeps informed of the current state of the system by sending (Heartbeat) messages periodically to each chunkserver.

The GFS client provides the interface for applications to use the file system.

Distributed Database Systems

Distribution Problem and Pattern◦ Volume and Activity◦ Number of Participating Hosts◦ Storage Facilities◦ Communication Load◦ Replication and Partitioning

Distributed Database Systems

Queries and Updates

Phases

Query phases◦ Copy identification phase◦ Query decomposition◦ Response composition

Update phases◦ Copy Identification◦ Pessimistic/Optimistic approach

Distributed Database Systems

Queries Supplier relation Unit price relation

What are the names of suppliers in NY who supply screws at a unit price of less than $1.00?

S# Name City S# P# Price100 JOHN POS 100 1011 $0.50200 DOE NY 100 1300 $1.50

200 1123 $0.60200 1246 $0.70

 

Parts relation DictionaryP# Pname Quant. Relat. Locat. #Tups T-size1011 Bolt 400 Sup. Site 1 800 101123 Nut 400 Part Site 2 1500 101246 Screw 600 Price Site 3 10000 31300 Nail 500

Site 1Site 2

Site 3

Supplerrelation

Partsrelation

Unit pricerelation

Query is made here

Distributed Database Systems

Updates◦ Integrity◦ Concurrency◦ Replication

Big Data managementHandles very large amounts of data distributed over many serversHighly available service with no single point of failureKey-value storeDifferent levels of consistencyAutomatic replication of data to multiple nodes

Google BigTableGoogle’s NoSQL distributed data management system.BigTable is a sparse map or (key, value) store distributed over multiple servers.It is designed to include clusters comprising thousands of commodity servers storing petabytes of data.

Google BigTable The data or values stored in BigTable are treated as uninterpreted strings. The BigTable key is three-dimensional. The three-part key contains a row key, a column key and a timestamp. Therefore the mapping takes the form:

(row key, column key, timestamp) value.

Col Family 0 Col Family 1 Col Fam 2

Col 0 Col 1 Col 1 Col 1 Col 0 Col 0 Col 2

Row

Timestamp

Distributed Computation SystemsNetworked computers cooperate in the execution of a computationally intensive programThe Network PlatformAlgorithm Design and ImplementationLanguages, Standards and Tools

Distributed Computation Systems

The Network Platform◦ Cluster Computing◦ The Internet◦ The Lambdagrid

Algorithm Design and Implementation◦ control parallelism◦ data parallelism

Distributed Computation Systems

Languages, Standards and Tools◦ PVM◦ MPI◦ DCE◦ CORBA◦ Globus Toolkit

Distributed Computation

Tasks (T) interact with each other in a PVM running context. PVM uses network protocols (N) for communication among the computers

Distributed applications use MIDDLEWARE tools to interoperate over a network of heterogeneous computers

T

T

T

T

PVM

PVMPVM

NN

N

Distributed applications

MIDDLEWARE

Host OS and network service

Distributed applications

MIDDLEWARE

Host OS and network service

network

XSEDEThe Extreme Science and Engineering Discovery Environment, XSEDE, tightly integrates supercomputing resources, storage and scientific instruments across geographically dispersed major research centersThe interconnection network includes a backbone of hubs allowing interhub transmission capacity of 40 Gbps.To the hubs are linked border routers which are the interfaces between the grid and the sites.Each site has up to 10 Gbps dedicated transmission capacity

XSEDEThe XSEDE interconnection network is hierarchical.

Distributed Real-Time Systems

EnvironmentGeographic RangeCommunication TrafficComputer Processing

Distributed Real-Time Systems

Computer Processing

Distributed real-time processing may be hierarchical, involving a low-level network of sensors feeding data to data -aggregation nodes which feed high-level servers

Server Server

Data Aggregation Network

Network of Sensors

Distributed Multimedia Systems

The Signals◦ Stereo quality audio CD would require up to 1.411 Mbps.◦ Video: flash discrete images at a rate of 50 or more images per second◦ The images in video can be represented as a sequence of frames (a

frame is a rectangular grid of pixels)◦ Twenty-four bits per pixel with a frame of 1024 * 768 pixels is

illustrative of present high-resolution technology◦ Transmitting at 25 frames per second would require transmission

capacity in excess of 400 Mbps

Peer-to-Peer Multimedia Systems Media On Demand (MOD)

◦ Video on Demand (VOD) or On Demand (OD)

Distributed Multimedia

Media On Demand (MOD)MOD server maintains a digital repository of videos which home users, via communication networks, can access and view immediately

MOD

servernetwork

home

home

home

Distributed MultimediaMedia On Demand (MOD)Massive storage must be arranged as hierarchical structure

network

client

client viewers

Server

RAM

Server

RAM

Mag disk

Mag disk

Opt disk

Opt diskMag

tapeMag tape

Distributed Operating Systems

Network Operating SystemDistributed Operating SystemIssuesThreads

Network Operating System

Network operating system of agents and different local operating systems

network

agent

agent

agent

Local OS Local OS

Local OS

Distributed Operating Systems

Homogeneous network-wide operating system

Issues◦ Fundamental OS problems◦ Data integrity◦ Fail-Soft operation◦ Security◦ Performance◦ Scalability

Threads

Windows NT family The Windows NT family comprises a series of releases of operating systems that support distributed system applications

The NT architecture comprises a number of layers◦ Hardware Abstraction Layer (HAL)◦ Kernel◦ Executive◦ Subsystems.

Windows NT family Architecture

Mach microkernel Mach is a distributed operating system project that has seen its kernel used in several Unix-like operating systems and in the Mac OS X operating system

ConclusionWe looked at:

Distributed File Systems

Distributed Database Systems

Distributed Computation Systems

Distributed Real-Time Systems

Distributed Multimedia Systems

Distributed Operating Systems