Parallel Processing Architectures Laxmi Narayan Bhuyan bhuyan.
Data Processing Architectures
-
Upload
fritz-moran -
Category
Documents
-
view
28 -
download
0
description
Transcript of Data Processing Architectures
![Page 1: Data Processing Architectures](https://reader035.fdocuments.net/reader035/viewer/2022081604/568135d5550346895d9d4162/html5/thumbnails/1.jpg)
Data Structure and Storage
The modern world has a false sense of superiority because it relies on the mass of knowledge that it can use, but what is important is the extent to
which knowledge is organized and mastered
Goethe, 1810
![Page 2: Data Processing Architectures](https://reader035.fdocuments.net/reader035/viewer/2022081604/568135d5550346895d9d4162/html5/thumbnails/2.jpg)
BytesAbbreviation
Prefix Factor
k kilo 103
M mega 106
G giga 109
T tera 1012
P peta 1015
E exa 1018
Z zetta 1021
Y yotta 1024
![Page 3: Data Processing Architectures](https://reader035.fdocuments.net/reader035/viewer/2022081604/568135d5550346895d9d4162/html5/thumbnails/3.jpg)
Market2012
Digital universe estimated at 2.8 YBDoubling every two years
20205.2 TB per person
![Page 4: Data Processing Architectures](https://reader035.fdocuments.net/reader035/viewer/2022081604/568135d5550346895d9d4162/html5/thumbnails/4.jpg)
Data StructuresThe goal is to minimize disk accessesDisks are relatively slow compared to main memory
Writing a letter compared to a telephone call
Disks are a bottleneckAppropriate data structures can reduce disk accesses
![Page 5: Data Processing Architectures](https://reader035.fdocuments.net/reader035/viewer/2022081604/568135d5550346895d9d4162/html5/thumbnails/5.jpg)
Database access
![Page 6: Data Processing Architectures](https://reader035.fdocuments.net/reader035/viewer/2022081604/568135d5550346895d9d4162/html5/thumbnails/6.jpg)
DisksData stored on tracks on a surfaceA disk drive can have multiple surfaces Rotational delay
Waiting for the physical storage location of the data to appear under the read/write headAround 4 msec for a magnetic diskSet by the manufacturer
Access arm delayMoving the read/write head to the track on which the storage location can be foundAround 9 msec for a magnetic disk
![Page 7: Data Processing Architectures](https://reader035.fdocuments.net/reader035/viewer/2022081604/568135d5550346895d9d4162/html5/thumbnails/7.jpg)
Disks
![Page 8: Data Processing Architectures](https://reader035.fdocuments.net/reader035/viewer/2022081604/568135d5550346895d9d4162/html5/thumbnails/8.jpg)
Minimizing data access times
Rotational delay is fixed by the manufacturerAccess arm delay can be reduced by storing files on
The same trackThe same track on each surface• A cylinder
![Page 9: Data Processing Architectures](https://reader035.fdocuments.net/reader035/viewer/2022081604/568135d5550346895d9d4162/html5/thumbnails/9.jpg)
ClusteringRecords that are often retrieved together should be stored togetherIntra-file clustering
Records within the one file• A sequential file
Inter-file clusteringRecords in different files• A nation and its stocks
![Page 10: Data Processing Architectures](https://reader035.fdocuments.net/reader035/viewer/2022081604/568135d5550346895d9d4162/html5/thumbnails/10.jpg)
Disk managerManages physical I/OSees the disk as a collection of pages
Has a directory of each page on a diskRetrieves, replaces, and manages free pages
![Page 11: Data Processing Architectures](https://reader035.fdocuments.net/reader035/viewer/2022081604/568135d5550346895d9d4162/html5/thumbnails/11.jpg)
File managerManages the storage of filesSees the disk as a collection of stored files
Each file has a unique identifierEach record within a file has a unique record identifier
![Page 12: Data Processing Architectures](https://reader035.fdocuments.net/reader035/viewer/2022081604/568135d5550346895d9d4162/html5/thumbnails/12.jpg)
File manager's tasksCreate a fileDelete a fileRetrieve a record from a fileUpdate a record in a fileAdd a new record to a fileDelete a record from a file
![Page 13: Data Processing Architectures](https://reader035.fdocuments.net/reader035/viewer/2022081604/568135d5550346895d9d4162/html5/thumbnails/13.jpg)
Sequential retrievalConsider a file of 10,000 records each occupying 1 pageQueries that require processing all records will require 10,000 accesses
e.g., Find all items of type 'E'Many disk accesses are wasted if few records meet the condition
![Page 14: Data Processing Architectures](https://reader035.fdocuments.net/reader035/viewer/2022081604/568135d5550346895d9d4162/html5/thumbnails/14.jpg)
IndexingAn index is a small file that has data for one field of a fileIndexes reduce disk accesses
![Page 15: Data Processing Architectures](https://reader035.fdocuments.net/reader035/viewer/2022081604/568135d5550346895d9d4162/html5/thumbnails/15.jpg)
Querying with an indexRead the index into memorySearch the index to find records meeting the conditionAccess only those records containing required dataDisk accesses are substantially reduced when the query involves few records
![Page 16: Data Processing Architectures](https://reader035.fdocuments.net/reader035/viewer/2022081604/568135d5550346895d9d4162/html5/thumbnails/16.jpg)
Maintaining an indexAdding a record requires at least two disk accesses
Update the fileUpdate the index
Trade-offFaster queriesSlower maintenance
![Page 17: Data Processing Architectures](https://reader035.fdocuments.net/reader035/viewer/2022081604/568135d5550346895d9d4162/html5/thumbnails/17.jpg)
Using indexes
Sequential processing of a portion of a file
Find all items with a type code in the range 'E' to 'K'
Direct processingFind all items with a type code of 'E' or 'N'
Existence testingDetermining whether a record meeting the criteria exists without having to retrieve it
![Page 18: Data Processing Architectures](https://reader035.fdocuments.net/reader035/viewer/2022081604/568135d5550346895d9d4162/html5/thumbnails/18.jpg)
Multiple indexesFind red items of type 'C'
Both indexes can be searched to identify records to retrieve
![Page 19: Data Processing Architectures](https://reader035.fdocuments.net/reader035/viewer/2022081604/568135d5550346895d9d4162/html5/thumbnails/19.jpg)
Multiple indexes
Indexes are also called inverted lists
A file of record locations rather than data
Trade-offFaster retrievalSlower maintenance
![Page 20: Data Processing Architectures](https://reader035.fdocuments.net/reader035/viewer/2022081604/568135d5550346895d9d4162/html5/thumbnails/20.jpg)
B-treeA form of inverted listFrequently used for relational systemsBasis of IBM’s VSAM underlying DB2Supports sequential and direct accessingHas two parts
Sequence setIndex set
![Page 21: Data Processing Architectures](https://reader035.fdocuments.net/reader035/viewer/2022081604/568135d5550346895d9d4162/html5/thumbnails/21.jpg)
B-tree
Sequence set is a single level index with pointers to recordsIndex set is a tree-structured index to the sequence set
![Page 22: Data Processing Architectures](https://reader035.fdocuments.net/reader035/viewer/2022081604/568135d5550346895d9d4162/html5/thumbnails/22.jpg)
B+ treeThe combination of index set (the B-tree) and the sequence set is called a B+ treeThe number of data values and pointers for any given node are not restrictedFree space is set aside to permit rapid expansion of a fileTradeoffs
Fast retrieval when pages are packed with data values and pointersSlow updates when pages are packed with data values and pointers
![Page 23: Data Processing Architectures](https://reader035.fdocuments.net/reader035/viewer/2022081604/568135d5550346895d9d4162/html5/thumbnails/23.jpg)
Hash for internal memoryHash maps are available in most programing languages
Also known as lookup tablesA key-value pair
Key ValueAfghanistan 93Albania 355Algeria 213American Samoa 1684… …
International dialing codes
![Page 24: Data Processing Architectures](https://reader035.fdocuments.net/reader035/viewer/2022081604/568135d5550346895d9d4162/html5/thumbnails/24.jpg)
Bit map indexesUses a single bit, rather than multiple bytes, to indicate the specific value of a field
Color can have only three values, so use three bits
Itemcode Color Code Disk address
Red Green Blue A N
1001 0 0 1 0 1 d1
1002 1 0 0 1 0 d2
1003 1 0 0 1 0 d3
1004 0 1 0 1 0 d4
![Page 25: Data Processing Architectures](https://reader035.fdocuments.net/reader035/viewer/2022081604/568135d5550346895d9d4162/html5/thumbnails/25.jpg)
Bit map indexesA bit map index saves space and time compared to a standard index
Itemcode ColorCHAR(8)
CodeCHAR(1)
Disk address
1001 Blue N d11002 Red A d21003 Red A d31004 Green A d4
![Page 26: Data Processing Architectures](https://reader035.fdocuments.net/reader035/viewer/2022081604/568135d5550346895d9d4162/html5/thumbnails/26.jpg)
Join indexes
Speed up joins by creating an index for the primary key and foreign key pairnation index stock index
natcode Disk address
natcode Disk address
UK d1 UK d101USA d2 UK d102
UK d103USA d104USA d105
join indexnationdisk address
stockdisk address
d1 d101d1 d102d1 d103d2 d104d2 d105
![Page 27: Data Processing Architectures](https://reader035.fdocuments.net/reader035/viewer/2022081604/568135d5550346895d9d4162/html5/thumbnails/27.jpg)
Data coding standardsASCIIUNICODE
![Page 28: Data Processing Architectures](https://reader035.fdocuments.net/reader035/viewer/2022081604/568135d5550346895d9d4162/html5/thumbnails/28.jpg)
ASCIIEach alphabetic, numeric, or special character is represented by a 7-bit code128 possible charactersASCII code usually occupies one byte
![Page 29: Data Processing Architectures](https://reader035.fdocuments.net/reader035/viewer/2022081604/568135d5550346895d9d4162/html5/thumbnails/29.jpg)
UNICODEA unique binary code for every character, no matter what the platform, program, or languageCurrently contains 34,168 distinct characters derived from 24 supported language scriptsCovers the principal written languages
![Page 30: Data Processing Architectures](https://reader035.fdocuments.net/reader035/viewer/2022081604/568135d5550346895d9d4162/html5/thumbnails/30.jpg)
UNICODETwo encoding forms
A default 16-bit form A 8-bit form called UTF-8 for ease of use with existing ASCII-based systems
The default encoding of HTML and XMLThe basis of global software
![Page 31: Data Processing Architectures](https://reader035.fdocuments.net/reader035/viewer/2022081604/568135d5550346895d9d4162/html5/thumbnails/31.jpg)
Comma-separated values (CSV)
A text fileRecords separated by line breaks
Typically, all records have the same set of fields in the same sequenceFirst record can be a header
Each record consists of fields separated by some other character or string
Usually a comma or tabStrings usually enclosed in quotes
Can import into and export from MySQL
![Page 32: Data Processing Architectures](https://reader035.fdocuments.net/reader035/viewer/2022081604/568135d5550346895d9d4162/html5/thumbnails/32.jpg)
CSV"shrcode","shrfirm","shrprice","shrqty","shrdiv","shrpe""AR","Abyssinian Ruby",31.82,22010,1.32,13"BE","Burmese Elephant",0.07,154713,0.01,3"BS","Bolivian Sheep",12.75,231678,1.78,11"CS","Canadian Sugar",52.78,4716,2.50,15"FC","Freedonia Copper",27.50,10529,1.84,16"ILZ","Indian Lead & Zinc",37.75,6390,3.00,12"NG","Nigerian Geese",35.00,12323,1.68,10"PT","Patagonian Tea",55.25,12635,2.50,10"ROF","Royal Ostrich Farms",33.75,1234923,3.00,6"SLG","Sri Lankan Gold",50.37,32868,2.68,16
Header
Data
![Page 33: Data Processing Architectures](https://reader035.fdocuments.net/reader035/viewer/2022081604/568135d5550346895d9d4162/html5/thumbnails/33.jpg)
JavaScript object notation (JSON)
A language independent data exchange formatA collection of name/value pairsAn ordered list of valuesParsers available for most common languagesExtensions available to import to and export from MySQL
![Page 34: Data Processing Architectures](https://reader035.fdocuments.net/reader035/viewer/2022081604/568135d5550346895d9d4162/html5/thumbnails/34.jpg)
JSON data typesNumber
Double precision floating-pointString
A sequence of zero or more Unicode characters in double quotes, with backslash escaping of special characters
ObjectArrayNull
Empty
![Page 35: Data Processing Architectures](https://reader035.fdocuments.net/reader035/viewer/2022081604/568135d5550346895d9d4162/html5/thumbnails/35.jpg)
JSON objectAn unordered set of name/value pairs
Separated by :Enclosed in curly braces
![Page 36: Data Processing Architectures](https://reader035.fdocuments.net/reader035/viewer/2022081604/568135d5550346895d9d4162/html5/thumbnails/36.jpg)
JSON arrayAn ordered collection of values
Enclosed in square brackets
![Page 37: Data Processing Architectures](https://reader035.fdocuments.net/reader035/viewer/2022081604/568135d5550346895d9d4162/html5/thumbnails/37.jpg)
JSON{ "shares": [ { "shrcode": "FC", "shrdiv": 1.84, "shrfirm": "Freedonia Copper", "shrpe": 16, "shrprice": 27.5, "shrqty": 10529 }, { "shrcode": "PT", "shrdiv": 2.5, "shrfirm": "Freedonia Copper", "shrpe": 10, "shrprice": 55.25, "shrqty": 12635 } ]}
Array
Object
Value
![Page 38: Data Processing Architectures](https://reader035.fdocuments.net/reader035/viewer/2022081604/568135d5550346895d9d4162/html5/thumbnails/38.jpg)
The evolution of hard drives
A history of the hard drive in pictures
![Page 39: Data Processing Architectures](https://reader035.fdocuments.net/reader035/viewer/2022081604/568135d5550346895d9d4162/html5/thumbnails/39.jpg)
Data storage devicesWhat data storage device will be used for
On-line data• Access speed• Capacity
Back-up files• Security against data loss
Archival data• Long-term storage
![Page 40: Data Processing Architectures](https://reader035.fdocuments.net/reader035/viewer/2022081604/568135d5550346895d9d4162/html5/thumbnails/40.jpg)
Key variables
Data volumeData volatilityAccess speedStorage costMedium reliabilityLegal standing of stored data
![Page 41: Data Processing Architectures](https://reader035.fdocuments.net/reader035/viewer/2022081604/568135d5550346895d9d4162/html5/thumbnails/41.jpg)
Magnetic technologyThe major form of data storageA mature and widely used technologyStrong magnetic fields can erase dataMagnetization decays with time
![Page 42: Data Processing Architectures](https://reader035.fdocuments.net/reader035/viewer/2022081604/568135d5550346895d9d4162/html5/thumbnails/42.jpg)
Hard disk drive (HDD)Sealed, permanently mountedHighly reliableAccess times of 4-10 msecTransfer rates as high as 1,300 Mbytes per secondCapacities in Tbytes
![Page 43: Data Processing Architectures](https://reader035.fdocuments.net/reader035/viewer/2022081604/568135d5550346895d9d4162/html5/thumbnails/43.jpg)
Hard disk drive (HDD)HDD unit shipments and sales revenues are declining, though production (exabytes per year) is growing
![Page 44: Data Processing Architectures](https://reader035.fdocuments.net/reader035/viewer/2022081604/568135d5550346895d9d4162/html5/thumbnails/44.jpg)
A disk storage unit
![Page 45: Data Processing Architectures](https://reader035.fdocuments.net/reader035/viewer/2022081604/568135d5550346895d9d4162/html5/thumbnails/45.jpg)
RAIDRedundant arrays of inexpensive or independent drivesExploits economies of scale of disk manufacturing for the personal computer marketCan also give greater securityIncreases a system's fault toleranceNot a replacement for regular backup
![Page 46: Data Processing Architectures](https://reader035.fdocuments.net/reader035/viewer/2022081604/568135d5550346895d9d4162/html5/thumbnails/46.jpg)
ParityA bit added to the end of a binary code that indicates whether the number of bits in the string with the value one is even or oddParity is used for detecting and correcting errors
Data Number of one bits
Even parity Odd parity
0001100 2 00011000 00011001
![Page 47: Data Processing Architectures](https://reader035.fdocuments.net/reader035/viewer/2022081604/568135d5550346895d9d4162/html5/thumbnails/47.jpg)
Mirroring
![Page 48: Data Processing Architectures](https://reader035.fdocuments.net/reader035/viewer/2022081604/568135d5550346895d9d4162/html5/thumbnails/48.jpg)
MirroringWrite
Identical copies of a file are written to each drive in an array
ReadAlternate pages are read simultaneously from each drivePages put together in memoryAccess time is reduced by approximately the number of disks in the array
Read errorRead required page from another drive
TradeoffsReduced access timeGreater securityMore disk space
![Page 49: Data Processing Architectures](https://reader035.fdocuments.net/reader035/viewer/2022081604/568135d5550346895d9d4162/html5/thumbnails/49.jpg)
Striping
![Page 50: Data Processing Architectures](https://reader035.fdocuments.net/reader035/viewer/2022081604/568135d5550346895d9d4162/html5/thumbnails/50.jpg)
StripingThree drive modelWrite
Half of file to first driveHalf of file to second driveParity bit to third drive
ReadPortions from each drive are put together in memory
Read errorLost bits are reconstructed from third drive’s parity data
TradeoffsIncreased data securityLess storage capacity than mirroringNot as fast as mirroring
![Page 51: Data Processing Architectures](https://reader035.fdocuments.net/reader035/viewer/2022081604/568135d5550346895d9d4162/html5/thumbnails/51.jpg)
RAID levels
All levels, except 0, have common featuresThe operating system sees a set of physical drives as one logical driveData are distributed across physical drivesParity is used for data recovery
![Page 52: Data Processing Architectures](https://reader035.fdocuments.net/reader035/viewer/2022081604/568135d5550346895d9d4162/html5/thumbnails/52.jpg)
RAID levelsLevel 0
Data spread across multiple drivesNo data recovery when a drive fails
Level 1MirroringCritical non-stop applications
Level 3Striping
Level 5A variation of stripingParity data is spread across drivesLess capacity than level 1Higher I/O rates than level 3
![Page 53: Data Processing Architectures](https://reader035.fdocuments.net/reader035/viewer/2022081604/568135d5550346895d9d4162/html5/thumbnails/53.jpg)
RAID 5
![Page 54: Data Processing Architectures](https://reader035.fdocuments.net/reader035/viewer/2022081604/568135d5550346895d9d4162/html5/thumbnails/54.jpg)
Magnetic technology
Removable magnetic diskMagnetic tapeMagnetic tape cartridgeMass storage
![Page 55: Data Processing Architectures](https://reader035.fdocuments.net/reader035/viewer/2022081604/568135d5550346895d9d4162/html5/thumbnails/55.jpg)
Solid StateArrays of memory chips
~30 cents per GbyteMagnetic disk is ~ 5 cents per Gbyte
Prices for SSD are decreasing much faster than HDD pricesFasterLess energyMore reliableHandhelds and laptops
![Page 56: Data Processing Architectures](https://reader035.fdocuments.net/reader035/viewer/2022081604/568135d5550346895d9d4162/html5/thumbnails/56.jpg)
Flash driveSmallRemovableSolid stateUSB connectorUp to 1 Tbytes capacityAround 25 cents per Gbyte for smaller capacity drivesAbout 50 cents per Gbyte for larger capacity drivers
![Page 57: Data Processing Architectures](https://reader035.fdocuments.net/reader035/viewer/2022081604/568135d5550346895d9d4162/html5/thumbnails/57.jpg)
Optical technologyA more recent development than magneticUse a laser for reading and writing dataHigh storage densitiesLow costDirect accessLong storage lifeNot susceptible to head crashes
![Page 58: Data Processing Architectures](https://reader035.fdocuments.net/reader035/viewer/2022081604/568135d5550346895d9d4162/html5/thumbnails/58.jpg)
Digital Versatile Disc (DVD)
DVD drives have transfer rates of around 2.76 M bytes/sec and access times of 150 msec Read-only versions
DVD-Video (movies)DVD-ROM (software)DVD-Audio (songs)
DVD-RRecordable (write once, read many)
DVD-RAMErasable (write many, read many)
![Page 59: Data Processing Architectures](https://reader035.fdocuments.net/reader035/viewer/2022081604/568135d5550346895d9d4162/html5/thumbnails/59.jpg)
Blu-ray DiscCapacity of 25 to 50 Gbytes20 layer version can store 500 GbytesVersions for BD-ROM, BD-R, BD-RE
![Page 60: Data Processing Architectures](https://reader035.fdocuments.net/reader035/viewer/2022081604/568135d5550346895d9d4162/html5/thumbnails/60.jpg)
Storage life
![Page 61: Data Processing Architectures](https://reader035.fdocuments.net/reader035/viewer/2022081604/568135d5550346895d9d4162/html5/thumbnails/61.jpg)
Merit of data storage devices
Device Access speed
Volume Volatility Cost per megabyte
Reliability Legal standing
Solid state *** * *** * ** *
Fixed disk *** *** *** ** ** *
RAID *** *** *** ** *** *
Removable disk
** ** *** ** ** *
Tape * ** * *** ** *
Cartridge ** *** * *** ** *
Mass storage ** *** * *** ** *
SAN *** *** *** ** *** *
Optical-ROM * *** * *** *** ***
Optical-R * *** * *** *** **
Optical-RW * *** ** *** *** *
![Page 62: Data Processing Architectures](https://reader035.fdocuments.net/reader035/viewer/2022081604/568135d5550346895d9d4162/html5/thumbnails/62.jpg)
Data compressionEncoding digital data so it requires less storage space and thus less network bandwidthLossless
File can be restored to original stateLossy
File cannot be restored to original stateUsed for graphics, video, and audio files
![Page 63: Data Processing Architectures](https://reader035.fdocuments.net/reader035/viewer/2022081604/568135d5550346895d9d4162/html5/thumbnails/63.jpg)
Recent developmentsDeclining cost of main-memoryMulticore processorsCan get massive performance improvements for business analyticsSAP HANA is a product of these recent developments
Overcomes the disk bottleneck
![Page 64: Data Processing Architectures](https://reader035.fdocuments.net/reader035/viewer/2022081604/568135d5550346895d9d4162/html5/thumbnails/64.jpg)
Rethinking metricsTraditional
Cost per TBNew
Cost per TB per secondMain memory is roughly 10 times more expensive but 1000 times fasterTotal cost of ownership lower as well
![Page 65: Data Processing Architectures](https://reader035.fdocuments.net/reader035/viewer/2022081604/568135d5550346895d9d4162/html5/thumbnails/65.jpg)
Eliminating disk-based database
Faster and cheaper architectureSome firms will have a need for disk for some years because of database sizeTransition as main memory becomes cheaper
![Page 66: Data Processing Architectures](https://reader035.fdocuments.net/reader035/viewer/2022081604/568135d5550346895d9d4162/html5/thumbnails/66.jpg)
Columnar and row-based data storage
A table can be stored as a series of rows or columnsRow-storage typically good for transactionsColumn-storage typically good for business analyticsIn-memory facilitates either approach
And so can disk
![Page 67: Data Processing Architectures](https://reader035.fdocuments.net/reader035/viewer/2022081604/568135d5550346895d9d4162/html5/thumbnails/67.jpg)
In-memory systemsIn-memory can achieve 5-10 compression ratios
Helps reduce cost of the transitionSQL with some extensionsSAP has showcased a 250TB systems with 250 nodes
![Page 69: Data Processing Architectures](https://reader035.fdocuments.net/reader035/viewer/2022081604/568135d5550346895d9d4162/html5/thumbnails/69.jpg)
Key pointsDisk drives are relatively slow compared to main memoryStorage devices vary on several parametersSSD gradually replacing HDDSelect a storage device based on storage and retrieval goalsIn-memory database is a recent and growing development