IS303 Architectural Analysis: SMU SIS Personal Notes

Week 1Objectives

- What is Architecture? (Structure, Perspectives & Views)

What is Architecture?

A system of inter-related pieces

- Structure of Parts- Changes in one part impacts the other- Relationships more important than the pieces

“The fundamental organization of a system embodied in its components, their relationships to each other and to the environment and the principles guiding its design and evolution.”

- IEEE Std. 1471 definition

Environment: Complex, Uncertain, Poorly structured

Quality Attributes: Non-functional aspects dealing with system behaviour

System Architect: Sole owner of designing the architecture and solving the problems with it. Design is iterative

Functionality

More Concerned With Less Concerned WithMultiple Machines (distribution) Widget selectionGlobal Accessibility (bandwidth, latency, language) Data FormatsIntegration with other Systems (Version compatibility & variations)

Algorithm selection

Constraints

- Design & Implementation decisions being made already- Constraints flowing from architectural implementation to another (data, network, interfaces)

Quality Attributes

- Performance (Response & Throughput)o Sensitivity of latency requirements (controls, web pages, reporting)o Typical loads, peak hour loads and spike loadso Response time vs. Throughput vs. Scalability

- Availability & Reliability (How often does it work? What happens during a failure?)o How much tolerance for service outages (Recovery time and criticality of safety)o Disaster scale events

- Modifiability (How easy to change? Porting, protocols and adding of features)- Security, Ease of Use and Time to Market

Quality Attributes by SEI

Concerns- Parameters from which attributes of system are judged, specified and measured. Requirements are expressed as concerns

Attribute-specific factors- Properties of system and environment with an impact on the concerns. Attributes can be internal or external based on the internal and external properties

- Performance factors- Dependability impairments (Aspects of system leading to lack of dependability)- Security factors (aspects contributing to security including environment and internal features)- Safety impairments (aspects contributing to lack of safety. Hazards are system states that may lead to

mishaps that are unplanned events with undesirable consequences

Methods- How concerns are addressed

PerformanceSmith’s Definition

Performance refers to responsiveness. The time required to respond to specific events or the number of events processed in a given time interval. This characterizes the timeliness of service delivered by the system.

Performance is not Speed

Poor performance is not salvaged by just using better processors. For many system, faster alone is not enough to achieve timeliness when execution speed is just one factor.

Objective of “fast-computing” is reducing the response time for a group of services, but real-time computing is the individual timing requirements of each service. “Predictability, not speed, is foremost in real-time system design”. Performance engineering is concerned with predictable performance which is worst-case or average-case

The Scheduling Problem

How to allocate shared resources when multiple demands need to be carried out on this same set of resources

Performance Concerns

- Criteria to evaluate the schedule- Timing constraints to responding to these events

Latency Throughput Capacity ModesTime taken to respond to an event

Precedence: Specification for partial/ total ordering of event responsesJitter: Variation in computed result from cycle to cycleCriticality: Importance

Number of event responses over a given observation interval

Processing rate is not enough (also specify observation intervals) as you need to look at processing patterns in the period

Maximum achievable throughput under ideal workload conditions (Bandwidth megabits/s)

Response time accompanied with throughput requirement. Not to violate latency requirements*

Mode characterized by the state of demand placed on the system (configuration of resources to satisfy demands)

Reduced capacity and overload are periods commonly experienced

*Utilization is the percentage of time a resource is busy. Schedulable utilization is the maximum utilization achievable by a system while meeting timing requirements

Performance Factors

- Behavioural patterns and intensity, Resource usage and software descriptions, Jobs and operations; these characterize system demand

- Execution environment, numbers and types of machines characterizing the system

Performance Methods

- Synthesis and analysis drawing on the queuing, scheduling theories and formal methods used to understand relationships between factors and concerns

Factors affecting PerformanceDemand

1. Arrival Pattern: Periodic and aperiodic on how events come in2. Execution Time: Time requirements for responding to each event. Worst and best case to help define

boundary case behaviour

System

Resources used to execute event responses

- Resource Types: CPU, Memory, I/O, Backplane bus, Network, Data Object- Software Services: Managing system resources that usually reside in the OS- Resource Allocation

Real-time OS: Small, fast proprietary kernels; Real-time extensions of commercial OS; Research oriented OS

- Context switch times and the priority of OS services- Interrupt latency- Time during which interrupts are disabled- Virtual memory and bounds on execution of system calls

Resource allocation policy to resolve contention for shared resources that influence the performance of systems

MethodsSynthesis: Methods to synthesize real-time methodologies intended to augment rather than supplant other engineering methodologies

Analysis: 2 schools of thought in analysing system performance in queuing & scheduling analysis

Queuing Theory

Model systems as one or more service facilities performing services for a stream of arriving customers. A server and queue for waiting customers. Concerned with average case aggregate behaviours good for performance capacity planning and management info systems

Scheduling Theory

Rooted in job-shop scheduling, it is applicable to performance analysis of real-time systems and offers valuable intuition. Computing utilization bounds and response times are key to help compare to a theoretically derived bound. Timing requirements are guaranteed where the utilization is kept beneath a specific bound

DependabilityProperty of a system where reliance can be justifiably be placed on the service it delivers

- Availability- Readiness for usage- Reliability- Continuity of service- Safety- Non-occurrence of catastrophic consequences on the environment- Confidentiality- Non-occurrence of unauthorized disclosure of information- Integrity- Non-occurrence of improper alterations of information- Maintainability- Aptitude to undergo repairs and evolution

Availability

Availability measured as the limit of probability that the system is functioning at time t

Reliability

Ability to continue operating over time, measured by MTTF, the expected life of the system

Maintainability

Aptitude to undergo repair and evolution. Quantitative measure of maintainability but doesn’t tell the entire story. Built in diagnostics can reduce MTTR at the possible cost of extra memory, run-time or development.

Safety

In terms of dependability, safety is the absence of catastrophic consequences on the environment.

Confidentiality

Non-occurrence of unauthorized disclosure of information

Integrity

Non-occurrence of improper alteration of information

Impairments to Dependability

1. Failures

Domain Failures

- Value failures: Improper values computed that is inconsistent with proper execution of the system- Timing failure: Service is delivered too early or too late- Halting failure: Service no longer delivered

Perception of Failures

- Consistent failure: Same view of the failures- Inconsistent failures: Some users have different perception of failures. Hardest to detect

Consequence on Environment

A system that can only fail in a benign manner is termed fail-safe

2. ErrorsSystem state that is at risk to lead to failure if not corrected; 3 factors determine if it’ll lead to failure

i. Redundancy (designed or inherent in the system)ii. System Activity (error may go away before damage is caused)

iii. What user deems as acceptable behaviour (in data transmission there is notion of acceptable error rate)

3. FaultsHypothesised cause of an error and classed in accordance of

Cause Physical: Occurs because of adverse physical phenomena (e.g. lightning)Human-made: Human imperfection like poor design, manufacture of misuse

Nature Accidental: Created by chanceIntentional: Deliberate, with or without malice

Phase of Creation Can be created at developmental time or happen during operational timeBoundary Internal: Parts of internal state of system where invoked will produce an error

External: Induction by like radiation outside the systemPersistence Temporary: Fault disappears over time. Transient faults are those from the external

physical environment. Intermittent are for internal faults

Week 2

UML Distilled: Sequence Diagrams

1. Interaction Styles (Control)

Participants: Used in UML 2. If classes are used then follow object: class notation

Found Message: Comes from undetermined source, no participant

Centralized with 1 participant doing all the processing while others supply data

Distributed Control: Processing is split across participants with each handling a little of the algorithm

Distribution helps to localize effects of CHANGE and introduce POLYMORPHISM rather than conditional logic

2. Creating & Deleting Participants

2 Types of Deletion: Self-deletion (the normal sort) and External Deletion

In garbage-collected environments, deletion is not done directly, but the X is still useful to indicate when the object is no longer needed and ready to be collected.

3. Interaction Frames

Useful to help delimit a portion of the sequence diagram. Operators and guards used to execute the logic. Only frames whose logic is true will be executed.

Asynchronous Message: Stick arrowhead showing no need for a response

Synchronous Message: Filled arrowhead showing the need for a response

UML Distilled: Deployment Diagrams

System’s physical layout showing which pieces of software run on what pieces of hardware

Node: Something capable of hosting some software and connected by communication pathways. 2 forms

Device: Hardware like a computer or simpler hardware connected to a system

Execution Environment: Software hosting or containing other software

Nodes contain artefacts (physical manifestations of software; namely files like executables, data files, configuration files, HTML documents etc.). Listing the artefact within a node shows that it is deployed in there during runtime.

Wikipedia: TCPTCP provides reliable, ordered, error-checked delivery of a stream of octets between programs running on computers connected to a local area network,intranet or the public Internet. It resides at the transport layer.

- Communications service between the application program and IP layer- TCP handles the breaking of request to IP-sized chunks- Network congestion, traffic load-balancing and unpredictable network behaviour, packets can be lost

(sequence of octets containing a header and body)- TCP detects these problems, requests retransmission of lost data and rearranges out-of-order data. These

actions also help to reduce congestion by minimizing occurrence of other problems- Abstracts application communication from network details- Optimized for accurate rather than timely delivery; RTP, UDP is more suitable for real-time applications

like VOIP (Voice over IP)

A. Segment StructureAccepts data from data stream, divides it into chunks and adds a TCP header creating a TCP segment. TCP segment then encapsulated into an IP datagram and exchanged with peers.

“Processes transmit data by calling on the TCP and passing buffers of data as arguments. The TCP packages the data from these buffers into segments and calls on the internet module [e.g. IP] to transmit each segment to the destination TCP.[5] “

A TCP segment consists of a segment header and a data section. The TCP header contains 10 mandatory fields, and an optional extension field (Options, orange background in table).

The data section follows the header. Its contents are the payload data carried for the application.

http://en.wikipedia.org/wiki/Transmission_Control_Protocol#cite_note-5

http://en.wikipedia.org/wiki/Transport_layer

http://en.wikipedia.org/wiki/Public_Internet

http://en.wikipedia.org/wiki/Intranet

http://en.wikipedia.org/wiki/Local_area_network

http://en.wikipedia.org/wiki/Octet_(computing)

http://en.wikipedia.org/wiki/Error_detection_and_correction

http://en.wikipedia.org/wiki/Reliability_(computer_networking)

Source Port (16 bits)

Destination Port (16 bits)

Sequence Number (32 bits) having dual role

- SYN flag (1), initial sequence number. The sequence number of the actual first data byte and the acknowledged number in the corresponding ACK; this sequence number plus 1.

- SYN flag (0), accumulated sequence number of the first data byte of this segment for the current session.

Acknowledgement Number (32 bits)

If the ACK flag is set then the value of this field is the next sequence number that the receiver is expecting. This acknowledges receipt of all prior bytes (if any). The first ACK sent by each end acknowledges the other end's initial sequence number itself, but no data.

Data Offset (4 bits)

Specifies the size of the TCP header in 32-bit words. The minimum size header is 5 words and the maximum is 15 words thus giving the minimum size of 20 bytes and maximum of 60 bytes, allowing for up to 40 bytes of options in the header. This field gets its name from the fact that it is also the offset from the start of the TCP segment to the actual data.

Reserved (3 bits)

For future use and to be set to zero

Flags (9 bits) a.k.a. Control bits that are 9 1-bit flags

1. NS (1 bit) – ECN-nonce concealment protection (added to header by RFC 3540).2. CWR (1 bit) – Congestion Window Reduced (CWR) flag is set by the sending host to indicate that it

received a TCP segment with the ECE flag set and had responded in congestion control mechanism (added to header by RFC 3168).

3. ECE (1 bit) – ECN-Echo indicates

a. SYN flag (1), that the TCP peer is ECN capable.

b. SYN flag (0), that a packet with Congestion Experienced flag in IP header set is received during

normal transmission (added to header by RFC 3168).4. URG (1 bit) – indicates that the Urgent pointer field is significant5. ACK (1 bit) – indicates that the Acknowledgment field is significant. All packets after the initial SYN packet

sent by the client should have this flag set.6. PSH (1 bit) – Push function. Asks to push the buffered data to the receiving application.7. RST (1 bit) – Reset the connection8. SYN (1 bit) – Synchronize sequence numbers. Only the first packet sent from each end should have this

flag set. Some other flags change meaning based on this flag, and some are only valid for when it is set, and others when it is clear.

9. FIN (1 bit) – No more data from sender

Window Size (16 bits)

Size of the receive window, which specifies the number of window size units (by default, bytes) (beyond the sequence number in the acknowledgment field) that the sender of this segment is currently willing to receive (see Flow control and Window Scaling)

Checksum (16 bits)

Used for error-checking of the header and data

Urgent Pointer (16-bits)

if the URG flag is set, then this 16-bit field is an offset from the sequence number indicating the last urgent data byte

Options (Variable 0-320 bits divisible by 32)

The length of this field is determined by the data offset field. Options have up to three fields: Option-Kind (1 byte), Option-Length (1 byte), Option-Data (variable).

- Option-Kind field indicates the type of option, and is the only field that is not optional. Depending on what kind of option we are dealing with, the next two fields may be set

- Option-Length field indicates the total length of the option- Option-Data field contains the value of the option, if applicable

Padding

TCP header padding is used to ensure that the TCP header ends and data begins on a 32 bit boundary. The padding is composed of zeros

B. Protocol Operation3 phases

1. Connection establishment: Multi-step handshake process2. Data Transfer3. Connection Termination: Closes established virtual circuits. Release allocated resources.

Description DescriptionLISTEN Server: Waiting connection request

from any remote TCP and portCLOSE-WAIT

Server & Client: waiting for connection termination request from local user

SYN-SENT Client: Waiting for matching connection request after firing a request out

CLOSING Server & Client: waiting for connection termincation request acknowledgement from remote TCP

SYN-RECEIVED

Server: Awaiting a confirming acknowledged after received and sending a connection request

LAST-ACK Server & Client: Waiting for acknowledgement of connection termination request previously sent to remote TCP (acknowledgement of its connection termination request)

ESTABLISHED Server & Client: Open connection where data received can be delivered to the user

TIME-WAIT Server & Client: Maximum waiting time to be sure remote TCP receives acknowledgement of previous

http://en.wikipedia.org/wiki/Transmission_Control_Protocol#Window_scaling

http://en.wikipedia.org/wiki/Transmission_Control_Protocol#Flow_control

connection termination requestFIN-WAIT-1 Server & Client: Waiting for connection

termination request from remote TCP or acknowledgement of connection termination request previously sent

CLOSED Server & Client: No connection state at all

FIN-WAIT-2 Server & Client: Waiting for connection termination request from remote TCP

Connection Termination

C. Data Transfer

There are a few key features that set TCP apart from User Datagram Protocol:

Ordered data transfer — the destination host rearranges according to sequence number[2]

Retransmission of lost packets — any cumulative stream not acknowledged is retransmitted[2]

Error-free data transfer[14]

Flow control — limits the rate a sender transfers data to guarantee reliable delivery. The receiver continually hints the sender on how much data can be received (controlled by the sliding window). When the receiving host's buffer fills, the next acknowledgment contains a 0 in the window size, to stop transfer and allow the data in the buffer to be processed.[2]

Congestion control [2]

Reliable TransmissionTCP primarily uses a cumulative acknowledgment scheme, where the receiver sends an acknowledgment signifying that the receiver has received all data preceding the acknowledged sequence number.

The sender sets the sequence number field to the sequence number of the first payload byte in the segment's data field, and the receiver sends an acknowledgment specifying the sequence number of the next byte they expect to receive.

For example, if a sending computer sends a packet containing four payload bytes with a sequence number field of 100, then the sequence numbers of the four payload bytes are 100, 101, 102 and 103. When this packet arrives at the receiving computer, it would send back an acknowledgment number of 104 since that is the sequence number of the next byte it expects to receive in the next packet.

Error DetectionSequence numbers allow receivers to discard duplicate packets and properly sequence reordered packets.

Acknowledgments allow senders to determine when to retransmit lost packets.

Wikipedia: IP

The Internet Protocol (IP) is the principal communications protocol in the Internet protocol suite for relaying datagrams across network boundaries. Its routingfunction enables internetworking, and essentially establishes the Internet.

IP, as the primary protocol in the Internet layer of the Internet protocol suite, has the task of delivering packets from the source host to the destination host solely based on the IP addresses in the packet headers.

IP defines packet structures that encapsulate the data to be delivered. It also defines addressing methods that are used to label the datagram with source and destination information.

A. Datagram Construction2 components: a header and a payload.

- IP header is tagged with the source IP address, the destination IP address, and other meta-data needed to route and deliver the datagram

- Payload is the data that is transported.

B. IP Addressing & RoutingAssignment of IP addresses and associated parameters to host interfaces. The address space is divided into networks and sub-networks, involving the designation of network or routing prefixes.

- IP routing is performed by all hosts, but most importantly by routers, which transport packets across network boundaries

- Communicate with one another via specially designed routing protocols, either interior gateway protocols or

exterior, as needed for the topology of the network

IP provides only best effort delivery and service is characterized as unreliable; it being a connectionless protocol. Routing is dynamic where each packet is independent where the network maintains no state based on path of prior packets. Improper sequencing can occur where some packets are routed on a different path to their destination.

http://en.wikipedia.org/wiki/Interior_gateway_protocol

http://en.wikipedia.org/wiki/Subnetwork

http://en.wikipedia.org/wiki/IP_header_(disambiguation)

http://en.wikipedia.org/wiki/Payload_(computing)

http://en.wikipedia.org/wiki/Header_(computing)

http://en.wikipedia.org/wiki/Encapsulation_(networking)

http://en.wikipedia.org/wiki/Header_(computing)

http://en.wikipedia.org/wiki/IP_address

http://en.wikipedia.org/wiki/Host_(network)

http://en.wikipedia.org/wiki/Packet_(information_technology)

http://en.wikipedia.org/wiki/Internet_layer

http://en.wikipedia.org/wiki/Internet

http://en.wikipedia.org/wiki/Internetwork

http://en.wikipedia.org/wiki/Routing

http://en.wikipedia.org/wiki/Datagram

http://en.wikipedia.org/wiki/Internet_protocol_suite

http://en.wikipedia.org/wiki/Communications_protocol

Lesson Notes

1. Internet Model

1. Application layer (user interface services and support services)

Applications create user data and communicate this data to other applications on another or the same host. The

communications partners are often called peers. This is where the higher level protocols such

as SMTP, FTP, SSH, HTTP, etc. operate.

2. Transport layer (process-to-process)

The transport layer constitutes the networking regime between two network processes, on either the same or

different hosts and on either the local network or remote networks separated by routers. Processes are addressed

via "ports," and the transport layer header contains the port numbers.

UDP is the basic transport layer protocol, providing communication between processes via port addresses in the

header. Also, some OSI session layer services such as flow-control, error-correction, and connection establishment

and teardown protocols belong at the transport layer. In the Internet protocol suite, TCP provides flow-control,

connection establishment, and reliable transmission of data.

3. Network layer

The internet layer has the task of exchanging datagrams across network boundaries. It provides a uniform

networking interface that hides the actual topology (layout) of the underlying network connections. It is therefore

also referred to as the layer that establishes inter-networking, indeed, it defines and establishes the Internet. This

layer defines the addressing and routing structures used for the TCP/IP protocol suite. The primary protocol in this

scope is the Internet Protocol, which defines IP addresses. Its function in routing is to transport datagrams to the

next IP router that has the connectivity to a network closer to the final data destination.

4. Link layer

This layer defines the networking methods within the scope of the local network link on which hosts communicate

without intervening routers. This layer describes the protocols used to describe the local network topology and the

interfaces needed to effect transmission of Internet layer datagrams to next-neighbor hosts.

2. Network Diagram

– IP is the address;

• It determine where the packets go

• Much of replication – both for load balancing and fault tolerance – will depend on this underlying behavior.

– TCP provides some reliability, and therefore requires a time-out

• Time-outs come up all the time in fault tolerance; computers cannot distinguish failure & silence.

Week 3- Hardware Architecture & Load Balancing

Readings (1) Leaky AbstractionsIP: Unreliable nature

TCP: Reliable transmission that is organized and accurate (not garbled or corrupted)

Leaky Abstraction

TCP attempts to provide a complete abstraction of an underlying unreliable network, but sometimes, the network leaks through the abstraction and you feel the things that the abstraction can't quite protect you from.

All non-trivial abstractions, to some degree, are leaky

Abstraction fails. Sometimes a lot where there is leakage.

E.g. in some cases, certain SQL queries are thousands of times slower than other logically equivalent queries. A famous example of this is that some SQL servers are dramatically faster if you specify "where a=b and b=c and a=c" than if you only specify "where a=b and b=c" even though the result set is the same.

The problem with abstractions is that, by encapsulating away the detail and structure of these abstractions, when leakages happen, developers abstracting from these issues are unable to resolve these bugs/ problems

Readings (2) Little Man ComputerCharacteristics of the LMC

CAN’T remember anything

CAN’T multitask

CAN’T understand anything more complicated than “Go there now” or “1+1”

Components of the LMC

100 mailboxes: Storing of values

1 accumulator: Only number LMC currently remembers

1 input box: User input

1 output box: User output

1 instruction counter: Necessary for playing with using loops and conditionals

Commands

Command Instruction Description

ADD XX ADD Adds the value stored in mailbox XX into accumulatorSUB XX SUBTRACT Subtracts the value stored in mailbox XX from accumulatorSTA XX STORE Stores accumulator value to a mailbox XXLDA XX LOAD Loads mailbox XX’s value to accumulatorBRA XX BRANCH Branches to a specific line of code. The value of XX will be the next

value executedBRZ XX BRANCH (IF ZERO) Branches to a specific line of code IF accumulator value is zeroBRP XX BRANCH (IF POSITIVE) Branches to a specific line of code IF accumulator value is positiveINP XX INPUT Asks for user input, and places value XX in accumulator

OUT OUTPUT Outputs value in accumulatorHLT HALT Stops working.

Boxes can be labelled. Labelling allows a more “englishfied” version of looping as an option to the branch instead of specifying a line number in Branch XX

LMC in Pseudo-code

LMC Translating Code

Concurrency

Assuming mailbox 00 contains the value 1, the code can be broken into the following:

L1 LDA X Load from Box XL2 BRP POS Branch to POS if positiveL3 HLT Stop the program since we don’t want POS to happenL4 SUB 00 Subtract value from box 00, which is 1L5 STA X Store value into box XSince the code is “multi-threaded”, each thread runs at the same time, but the LMC is incapable of multi-tasking. So it hops from thread to thread as it goes. This means that it can switch at any time; at line 1, 2, 3 or 4.

If it switches after line 5 consistently, all is well; the value in box X is always updated correctly before the next thread loads X. The final value of x is 0.

If it switches at any other time, you get a problem. The value of box X is not updated before the switch; the LMC’s values go haywire!

Question: Switching at lines 1, 2, 3 or 4 all give inaccurate results. What result would they give?

Readings (3) Von Neumann ArchitectureStored program computer where an instructional fetch and a data operation cannot occur concurrently since they share a common bus

- Includes by design an instruction set which can store in memory a set of instructions. Had to be done manually in the past

- Allows for self-modifying code that was important till the emergence of index registers and indirect addressing due to a need for the program to increment or modify the address portion of instructions

o Self-modifying code: code that alters it’s own instructions during execution- usually to reduce instruction path length and improve performance or simply to reduce similarly repetitive code

o Absolute Addressing: Address parameter itself + No modificationso PC-relative Instruction: Next instruction address + offset parameter. For referencing code before

and after the instruction. Useful for connection with jumps to nearby instructionso Indirect Addressing: Contents of register “reg”. Effect is to transfer control to instruction whose

address is in the specified registero Index Register : Processor register for modifying operand addresses during runtime. Usually

added/ subtracted from address to give an effective address. In early systems without indirect addressing, operations have to be done by modifying the instruction address requiring several programs and more computer memory

This model evolved where memory-mapped I/O allows input and output devices to be treated the same as memory. A single system bus could be used to provide a modular system with lower cost. But this streamlining led to the Von Neumann bottleneck where the limited throughput (data transfer rate) between the CPU and memory compared to the amount of memory

1. Program & data memory cannot be accessed simultaneously2. Throughput is smaller than the rate the CPU can work with3. CPU forced to wait for needed data to be transferred to/from memory4. Effective processing speed limited by the CPU performing minimal processing on large amounts of data5. Severity increases as CPU & memory sizes continue to improve

Resolving the Bottleneck

Caching Modified Havard Architecture

Branch Predictor Algorithms

Limited CPU stack/ Scratchpad Memory

Storing copies of data from frequently used main memory locations

Average latency of memory accesses closer to cache than MM latency

Cache & Path Separation where contents of instruction memory are accessed as if they were data

Word width, timing, implementation of data and program differs

For conditionals

Not known whether a conditional jump will be taken or not until the condition has been calculated; the execution stage of the pipeline

High-speed internal memory for temporary storage of calculations, data and other work in progress

Simplification of caching logic guaranteeing the unit works w/o main memory consternation

2 simultaneous memory fetchesGreater and more predictable memory bandwidth

Conditional jump tries to predict by guessing whether the jump is likely to happen and speculatively executes the branch most likely to be executed

Lesson Notes: Hardware & Architecture

1. RegistersOn CPU storage:

- Only storage directly changed by arithmetic and control units- Size in bits (how many bits for your laptop’s cpu?)

Modern CPU’s have several dozen

- Instruction- Program Counter- Memory Address- Memory Register- General purpose (used by applications)

2. Load BalancingProblem: More requests coming in than the LMC can handle

Solution: Have clusters of Little Men

• Hardware:

– Individual CPU cores execute very simple instructions, very fast, one at a time

• 1 addition ~ 3 instructions

• Allocate an array ~ a lot of instructions!

– Registers are the only operational memory

– Speed of memory: Registers > RAM >>>>>>> Disk

Week 4- Parallelism

Readings: Parallel ComputingSoftware traditionally written for serial computation; run on a single CPU where problem is broken into discrete series of instructions that are executed one after another with only 1 at any point in time.

1. Parallel ComputingSimultaneous use of multiple computing resources to solve a computational problem running on multiple processors where the discrete parts can be solved concurrently with an overall control/coordination mechanism

Problem should be able to be:

- Broken down into discrete parts capable of being solved simultaneously- Execute multiple program instructions at any point in time- Solved in less time in multiple than a single compute resource

Why parallel computing?

- Saves time/money (Parallel computers built from cheap commodity components)- Solution to larger problems (Complex problems impossible for limited computer memory)- Concurrency enabled- Use of non-local resources- Limits to serial computing

LLNL (Lawrence Livermore National Laboratory) Parallel Computer

Compute Node: Each node a multi-processor parallel computer itself

Infiniband Switch: Multiple compute nodes networked together

Special purpose Nodes: Multiprocessor meant for other purposes

2. Parallel Computer ArchitecturesShared Memory All processors share the same memory

Uniform Access Memory (UMA)Identical processors & equal access & access times to memory

Hybrid Distributed Shared Memory

For current and fastest computers with a mix of advantage and disadvantages

GPUs perform computationally intensive kernels with local on-node dataMPI (message passing model for communications)

Distributed Memory

Requires a communication network for inter-processor memory

AdvantageMemory scalable with processorsNo memory interference or overhead in keeping cache coherency and cost effectiveness

DisadvantageProgrammer responsible for data communication between processorsDifficult to map existing data structures to such memory organization

Non-Uniform Memory Access (NUMA)Not all processors have equal accessAccess across links are slower

AdvantageUser friendly programming perspective to memoryFast & uniform data sharing due to proximity of memory to CPUs

DisadvantageLack of scalability between memory & CPUsProgrammer responsible to ensure correctness of access to global memoryExpense

3. Designing Parallel Programs

3.1. PartitioningBreaking problems into discrete chunks that can be distributed to multiple tasks aka decomposition

Domain Decomposition

Data is associated with problem to be decomposed and each parallel task works on a portion of the data

Functional Decomposition

Good for problems that can be split into different tasks (especially the independent sequential ones)

3.2. CommunicationsThis is dependent on the type of problem being solved

Embarrassingly Parallel: Problems that do not need any inter-task communications (e.g. converting image pixels)

Factors to Consider1. Cost of Communications

a. Inter-task communications always implies overheadsb. Machine cycles and resources for computation are instead for packaging and transmitting datac. Requires synchronization between tasks, resulting in some “waiting” instead of doing workd. Competing traffic can saturate the network and aggravate performance issues

2. Latency vs. Bandwidth a. Latency: Time taken to send a message from Point A to Bb. Bandwidth: Amount of data that can be communicated per unit time

3. Visibility of Communications a. Message Passing Model: Communications explicit and visible under the programmer’s controlb. Data Parallel Model: Communications are transparent but inter-task communications are not

exactly known4. Synchronous vs. Asynchronous

a. Synchronous requires some sort of “handshaking” to be accomplishedb. Referred as blocking since other work must wait till communications is donec. Asynchronous communications allow for independent data transfers. Non-blockingd. Interleaving computation with communications is the greatest benefit

5. Scope of Communications a. Knowing which task must communicate with each other is critical during designb. Point-to-Point: 2 tasks with 1 acting as sender/producer of data. Other the receiverc. Collective: Data-sharing across more than 2 tasks

d.6. Efficiency of Communications

a. Programmer has choice of factors that can affect communications performanceb. Which implementation model should be used? Performance variesc. What type of communication operations should be used? Async is fasterd. Network media: Some platforms have more than one network for communications

3.3. Synchronization

Barrier Lock/ Semaphore Synchronous Communication Implies all tasks are involved Each task performs till

barrier is reached When the LAST task

completes, all tasks are synchronized

Involves any number of tasks Used to serialize access to global data or

section of code Only ONE task may own the lock at a time Others can attempt to own the lock but have

to wait till current lock owner releases lock

Only tasks executing a communication operation

When a task performs a communication operation, some form of coordination needed

3.4. Data DependenciesDependence exists between program statements when order of execution affects the program results

Data dependence results from multiple use of same location(s) in storage by different tasks

The value of A(J-1) must be computed before the value of A(J), therefore A(J) exhibits a data dependency on A(J-1). Parallelism is inhibited.

If Task 2 has A(J) and task 1 has A(J-1), computing the correct value of A(J) necessitates:

o Distributed memory architecture - task 2 must obtain the value of A(J-1) from task 1 after task 1 finishes its computation

o Shared memory architecture - task 2 must read A(J-1) after task 1 updates it

Parallelism is inhibited. The value of Y is dependent on:

o Distributed memory architecture - if or when the value of X is communicated between the tasks.

o Shared memory architecture - which task last stores the value of X.

Although all data dependencies are important to identify when designing parallel programs, loop carried dependencies are particularly important since loops are possibly the most common target of parallelization efforts.

Handling Data Dependencies

Distributed memory architectures - communicate required data at synchronization points.

Shared memory architectures -synchronize read/write operations between tasks.

3.5. Load Balancing- Distributing approximately equal amounts of work amongst tasks so that all tasks are kept

busy all the time- Important where the slowest task is the performance bottleneck

How to achieve load balance?Equal Partitioning

1. For arrays/matrices, equally distribute data set across tasks2. For loop iterations where work is similar, equally split iterations3. For machines of varying performances, use some analysis tool to detect load imbalances

Dynamic Work Assignment

Some problems will have load imbalances even if data is evenly distributed, like sparse arrays where some tasks have data while others are mostly zeroes

Scheduler Task Pool Approach: As each task completes, it queues to get new work

An algorithm to detect and handle load imbalances may be necessary where they occur dynamically within code

3.6. GranularityQualitative measure of the ratio of computation to communication; Periods of computation separated from communication via synchronization events.

Fine Grained Parallelism

- Small amounts of computational work- Low computation to communication ratio- Facilitates load-balancing- Implies high communication overhead and less opportunity for performance enhancement- If too fine, it is possible that the communications overhead and synchronization is higher than

computation

Coarse Grained Parallelism

- Large sets of computational work between communication/ synchronization events- Advantageous for coarse granularity since comms and sync overheads are high - Fine grained parallelism can help reduce overheads due to load imbalance

3.7. I/O

- I/O operations are generally regarded as inhibitors to parallelism.- I/O operations require an order of magnitude (or greater) amount of time than memory operations.- Parallel I/O systems may be immature or not available for all platforms.- In an environment where all tasks see the same file space, write operations can result in file overwriting.- Read operations can be affected by file server's ability to handle multiple read requests simultaneously- I/O that must be conducted over the network (NFS, non-local) can cause severe bottlenecks and even

crash file servers.

4. Parallel Examples

Array Processing This example demonstrates calculations on 2-

dimensional array elements, with the computation on each array element being independent from other array elements.

The serial program calculates one element at a time in sequential order.

Serial code could be of the form:

do j = 1,ndo i = 1,n a(i,j) = fcn(i,j)end doend do

The calculation of elements is independent of one another - leads to an embarrassingly parallel situation.

The problem should be computationally intensive.

Array Processing Parallel Solution 1

Arrays elements are distributed so that each processor owns a portion of an array (subarray).

Independent calculation of array elements ensures there is no need for communication between tasks.

Distribution scheme is chosen by other criteria, e.g. unit stride (stride of 1) through the subarrays. Unit stride maximizes cache/memory usage.

Since it is desirable to have unit stride through the subarrays, the choice of a distribution scheme depends on the programming language. See theBlock - Cyclic Distributions Diagram for the options.

After the array is distributed, each task executes the portion of the loop corresponding to the data it owns. For example, with Fortran block distribution:

do j = mystart, myenddo i = 1,n a(i,j) = fcn(i,j)end doend do

Notice that only the outer loop variables are different from the serial solution.

One Possible Solution:

Implement as a Single Program Multiple Data (SPMD) model. Master process initializes array, sends info to worker processes and receives results. Worker process receives info, performs its share of computation and sends results to

master. Using the Fortran storage scheme, perform block distribution of the array. Pseudo code solution: red highlights changes for parallelism.

find out if I am MASTER or WORKER

if I am MASTER

initialize the array send each WORKER info on part of array it owns send each WORKER its portion of initial array

receive from each WORKER results

else if I am WORKER receive from MASTER info on part of array I own receive from MASTER my portion of initial array

https://computing.llnl.gov/tutorials/parallel_comp/#distributions

https://computing.llnl.gov/tutorials/parallel_comp/#distributions

# calculate my portion of array do j = my first column,my last column do i = 1,n a(i,j) = fcn(i,j) end do end do

send MASTER results

endif

Example MPI Program in C: mpi_array.c Example MPI Program in Fortran: mpi_array.f

Array Processing Parallel Solution 2: Pool of Tasks

The previous array solution demonstrated static load balancing:o Each task has a fixed amount of work to doo May be significant idle time for faster or more lightly loaded processors -

slowest tasks determines overall performance. Static load balancing is not usually a major concern if all tasks are performing the

same amount of work on identical machines. If you have a load balance problem (some tasks work faster than others), you may

benefit by using a "pool of tasks" scheme.

Pool of Tasks Scheme:

Two processes are employed

Master Process:

o Holds pool of tasks for worker processes to doo Sends worker a task when requestedo Collects results from workers

Worker Process: repeatedly does the following

o Gets task from master processo Performs computationo Sends results to master

Worker processes do not know before runtime which portion of array they will handle or how many tasks they will perform.

Dynamic load balancing occurs at run time: the faster tasks will get more work to do. Pseudo code solution: red highlights changes for parallelism.

https://computing.llnl.gov/tutorials/mpi/samples/Fortran/mpi_array.f

https://computing.llnl.gov/tutorials/mpi/samples/C/mpi_array.c


if I am MASTER

do until no more jobs if request send to WORKER next job else receive results from WORKER end do

else if I am WORKER

do until no more jobs request job from MASTER receive from MASTER next job

calculate array element: a(i,j) = fcn(i,j)

send results to MASTER end do

endif

Discussion:

In the above pool of tasks example, each task calculated an individual array element as a job. The computation to communication ratio is finely granular.

Finely granular solutions incur more communication overhead in order to reduce task idle time.

A more optimal solution might be to distribute more work with each job. The "right" amount of work is problem dependent.

Parallel Examples

PI Calculation The value of PI can be calculated in a

number of ways. Consider the following method of approximating PI

1. Inscribe a circle in a square2. Randomly generate points in

the square3. Determine the number of

points in the square that are also in the circle

4. Let r be the number of points in the circle divided by the number of points in the square

5. PI ~ 4 r6. Note that the more points

generated, the better the approximation

Serial pseudo code for this procedure:

npoints = 10000circle_count = 0

do j = 1,npoints generate 2 random numbers between 0 and 1 xcoordinate = random1 ycoordinate = random2 if (xcoordinate, ycoordinate) inside circle then circle_count = circle_count + 1end do

PI = 4.0*circle_count/npoints

Note that most of the time in running this program would be spent executing the loop

Leads to an embarrassingly parallel solution

o Computationally intensiveo Minimal communicationo Minimal I/O

PI CalculationParallel Solution

Parallel strategy: break the loop into portions that can be executed by the tasks.

For the task of approximating PI:o Each task executes its portion

of the loop a number of times.o Each task can do its work

without requiring any information from the other tasks (there are no data dependencies).

o Uses the SPMD model. One task acts as master and collects the results.

Pseudo code solution: red highlights changes for parallelism.

npoints = 10000circle_count = 0

p = number of tasksnum = npoints/p


do j = 1,num generate 2 random numbers between 0 and 1 xcoordinate = random1 ycoordinate = random2 if (xcoordinate, ycoordinate) inside circle then circle_count = circle_count + 1end do

if I am MASTER

receive from WORKERS their circle_counts compute PI (use MASTER and WORKER calculations)

else if I am WORKER

send to MASTER circle_count

endif

Example MPI Program in C: mpi_pi_reduce.c

https://computing.llnl.gov/tutorials/mpi/samples/C/mpi_pi_reduce.c

Example MPI Program in Fortran: mpi_pi_reduce.f

Parallel Examples

Simple Heat Equation Most problems in parallel computing require

communication among the tasks. A number of common problems require communication with "neighbor" tasks.

The heat equation describes the temperature change over time, given initial temperature distribution and boundary conditions.

A finite differencing scheme is employed to solve the heat equation numerically on a square region.

The initial temperature is zero on the boundaries and high in the middle.

The boundary temperature is held at zero. For the fully explicit problem, a time stepping

algorithm is used. The elements of a 2-dimensional array represent the temperature at points on the square.

The calculation of an element is dependent upon neighbor element values.

A serial program would contain code like:

do iy = 2, ny - 1do ix = 2, nx - 1 u2(ix, iy) = u1(ix, iy) + cx * (u1(ix+1,iy) + u1(ix-1,iy) - 2.*u1(ix,iy)) + cy * (u1(ix,iy+1) + u1(ix,iy-1) - 2.*u1(ix,iy))end doend do

https://computing.llnl.gov/tutorials/mpi/samples/Fortran/mpi_pi_reduce.f

Simple Heat EquationParallel Solution

I. Implement as an SPMD modelII. The entire array is partitioned and

distributed as subarrays to all tasks. Each task owns a portion of the total array.

III. Determine data dependencieso interior elements belonging to a

task are independent of other taskso border elements are dependent

upon a neighbor task's data, necessitating communication.

IV. Master process sends initial info to workers, and then waits to collect results from all workers

V. Worker process calculates solution within specified number of time steps, communicating as necessary with neighbor processes

VI. Pseudo code solution: red highlights changes for parallelism.


if I am MASTER initialize array send each WORKER starting info and subarray receive results from each WORKER

else if I am WORKER receive from MASTER starting info and subarray

do t = 1, nsteps update time send neighbors my border info receive from neighbors their border info

update my portion of solution array

end do

send MASTER results

endif

VII. Example MPI Program in C: mpi_heat2D.cVIII. Example MPI Program in Fortran: mpi_heat2D.f

https://computing.llnl.gov/tutorials/mpi/samples/Fortran/mpi_heat2D.f

https://computing.llnl.gov/tutorials/mpi/samples/C/mpi_heat2D.c

https://computing.llnl.gov/tutorials/parallel_comp/images/heat_edge.gif

https://computing.llnl.gov/tutorials/parallel_comp/images/heat_interior.gif

Lesson Notes

1. Levels of Parallelism

2. Large ParallelismPractical Problems Performance Limitations

• Launching Masters/workers• Allocating tasks to workers• Tracking workers• Handling Master failures• Handling worker failures (common)• Dealing with network failures• Getting data to workers• Getting data between workers

• Data size >>> Memory Space• Network is limited, and therefore slow

• World’s current fastest computer won through network innovations, not processors or memory

• Put processing with, or near, data. Keep data local with its processing

• Stragglers• In a large cluster, there is always some

machine which is slow• Stragglers can take up 30% of response time

3. Map Reduce

• Creates master & workers

• Tracks and restart workers as needed

• Passes data between workers

• Ensures data locality

• Deals with stragglers

Week 5- Concurrency

Reading (1) Thread- Smallest sequence of programmed instructions that can be managed independently by OS scheduler- Scheduler is a lightweight process- Thread is contained in a process with other threads. Resources like memory are shared within processes- In a single processor, multiplexing is used for time-division. Context-switching happens to switch tasks

1. Threads vs. Processes- Processes are independent, threads are subsets of processes- Processes have more state information. Threads share them- Threads share address spaces- Processes interact only through OS-provided inter-process communication mechanisms- Context switching in threads is faster

2. MultithreadingAllows processing across multiple cores and multiple CPUs in a cluster of machines; lending themselves to truly concurrent processing of tasks; Race conditions need to be observed.

“Race conditions are behaviors of software systems where output is dependent on sequence of uncontrollable events and becomes a bug when events don’t happen in the intended order. 2 signals racing each other to influence output first.”

Threads may recover mutually exclusive operations to prevent concurrent modification of common data.

Preemptive Multitasking

Allows for the OS to determine when context switch should occur. Inappropriate switch may occur leading to lock convoys, priority inversion and other effects that can be avoided by cooperative multi-threading

Cooperative Multithreading

Relies on threads to relinquish control once they’re at a stopping point; creates problems if a thread is waiting for a resource to be available

3. Processes, Kernel Threads, User Threads & FibersProcess

Heaviest unit of kernel scheduling; own resources allocated by operating system such as memory, file handles, sockets, device handles and windows. Address spaces or file resources are not shared except through explicit methods like inheriting file handles or mapping to the same file in a shared way

Typically preemptively multitasked

Kernel Thread

Lightest unit of scheduling; at least one exists within each process. Preemptively multitasked if the OS process scheduler is preemptive

Do not own resources except for a stack, copy of registers including program counter and the thread-local storage.

User Threads

Sometimes implemented in user space libraries where the kernel is unaware of them. Green threads are user threads implemented by virtual machines.

Generally fast to create and manage but unable to take advantage of multi-processing/tasking if all associated kernel threads get blocked even when ready to run.

Fibers

Even lighter unit of scheduling that is cooperatively scheduled. It must explicitly yield to allow another fiber to run. This can be scheduled to run in any thread in the same process. It is a system-level construct similar to co-routines (language levels construct).

4. Thread & Fiber Issues

4.1. Concurrency & Data Structures- Sharing address spaces allow for tight coupling and exchange of data without IPC overhead

o Inter-process communication is a set of methods for exchanging data amongst multiple threads in one or more processeso In Java, such processes includes pipes and sockets

- Prone to race conditions that require more than one CPU instruction to update- Synchronizing primitives like mutexes help lock data structures against concurrent access

o Mutually exclusive events

4.2. I/O and SchedulingFor user threads/ fiber implementations existing entirely in user-space; context switches here are extremely efficient without any interaction with the kernel

- Problems happen when most OS user-spaces are performing blocking system calls- Implementing a synchronous interface that uses non-blocking I/O internally as a solution

Reading (2) Concurrency- Property of system where several computations are executing simultaneously and potentially interacting

with one another.- Concurrent use of shared resources can be a source of indeterminacy leading to issues like deadlock and

starvation.o Deadlock: Situation where 2 or more competing actions are waiting for each other to finish and never does theno Starvation: Case in multitasking where process is perpetually denied resources such that it’s task can never be completed

Reading (3) Context SwitchProcess of storing and restoring state (context) of a process so that execution can be resumed from the same time at a later time; usually computationally intensive requiring actions like

- Saving & loading registers and memory maps- Updating tables and lists

Reading (4) Java ConcurrencyProcess ThreadSelf-contained execution environment that communicates with IPC resources like pipes & sockets

ProcessBuilder object used to create new processes

All applications start with 1 MAIN thread

1. Thread ObjectsInstantiate a thread to directly control creation and management; pass application tasks to an executor to abstract thread management.

1.1. Defining & Starting a Thread

Runnable Implementation Subclass a Thread

Thread implements Runnable though the run does nothing- More general- Flexible to allow subclassing other objects

- Has methods useful for thread management and status of the thread

1.2. Pausing Execution with SleepThread.sleep allows the current thread to suspend execution for a specified period and makes the processor available for other application uses.

1.3. InterruptsIndicates to the thread to stop its current process and do something else.

Thread.interrupted() : Checks if the current operation has been interrupted. Throwing an interrupted exception helps centralize code into a catch clause

Interrupt Status Flag

Thread.interrupt sets this flag and it is cleared when the Thread.interrupted method is invoked. A non-static isInterrupted method is used to check statuses of other threads and doesn’t change this status flag

1.4. JoinAllows 1 thread to wait for completion of the other till it ends; overloads allow us to specify the waiting period. Interrupt is responded to by exiting with an InterruptedException.

2. SynchronizationThreads communicate via sharing access to fields and the objects referenced to.

- Extremely efficient- 2 problems of thread interference and memory consistency errors

Synchronization prevents these errors but can lead to thread contention where 2 or more threads try to access the same resource, leading to one being executed more slowly (or even be suspended).

2.1. Thread InterferenceInterleaving: Where 2 operations running in different threads act on the same data

2.2. Memory Consistency ErrorsDifferent threads have inconsistent views of the same data (supposedly). Happens-before relationships are a guarantee that memory writes by a specific statement are visible to another statement.

2 actions that already create a happens-before relation

1. Thread.start: Every statement with a happens-before with that statement also has a happens-before with every statement executed by the new thread

2. Thread.join: All statements executed by the terminate thread have a happens-before relation with all statements following the successful join. Effects of code in thread is visible to the one performing the join

2.3. Synchronized Methods1. Not possible to interleave: When 1 thread executes a synchronized method, others are blocked till this

first is done with the object2. Happens-before: When it exits the method, a happens-before relation is established with any subsequent

invocation guaranteeing that state changes are visible to all threads

2.4. Intrinsic Locks & SynchronizationEvery object has an intrinsic lock associated with it which enforces exclusive access to its state and establishes a happens-before relation essential to visibility.

When a thread releases an intrinsic lock, a happens-before relation is established between the action and any subsequent acquisition of the same lock.

Synchronized Statements

Unlike sync methods, statements must specify the object providing the intrinsic lock

This is use to synchronize changes within the object but to also avoid synchronizing invocations of other object’s methods (which can create liveness problems).

Reentrant Synchronization

Allowing a thread to acquire a lock it already owns. Synchronized code, directly or indirectly, invokes a method already present in its control. Without this, many precautions have to be taken to avoid self-blocking.

2.5. Atomic AccessAn action that effectively occurs all at once; it cannot stop in the middle and happens either altogether or none. Side effects of the actions are visible only upon completion of the action.

- Reads and writes are atomic for reference variables and most primitives- Reads and writes are atomic for all variables declared volatile (including long and doubles)

o Any write to a volatile variable establishes a happens-beforeo Changes are always visible and effects of code also seen

Simple atomic access is more efficient but requires more care to avoid memory consistency errors

3. LivenessConcurrent application’s ability to execute in a timely manner.

3.1. DeadlockWhere 2 or more threads are blocked forever waiting for each other. It is likely that both threads will block when attempting to invoke the return, but will never end since each thread is waiting for the other to exit

3.2. StarvationThread cannot gain access to resources that have been consumed by “greedy” threads and hence unable to progress. Blocking ensues.

3.3. LivelockThreads often act in response to each other. If the other thread’s action is also a response to the action of another thread, then livelock happens where both are too busy responding to each other to resume work.

4. Guarded BlocksPolling a condition till the block can proceed. Object has a notifyAll() method to inform all waiting threads that something important has happened.

- Wait(): Tells calling thread to give up the monitor and sleep till another thread enters the same monitor and calls notify()

- Notify(): Wakes up the first thread that called wait() on the same object

5. Immutable ObjectsObject is immutable if its state cannot change after construction. Since these objects cannot change state, they cannot be corrupted by thread interference or observed in an inconsistent state.

5.1. Strategy to define Immutable Objects

- Don't provide "setter" methods — methods that modify fields or objects referred to by fields.- Make all fields final and private.- Don't allow subclasses to override methods . The simplest way to do this is to declare the class as final. A

more sophisticated approach is to make the constructor private and construct instances in factory methods.

- If the instance fields include references to mutable objects, don't allow those objects to be changed:

o Don't provide methods that modify the mutable objects.o Don't share references to the mutable objects. Never store references to external, mutable objects

passed to the constructor; if necessary, create copies, and store references to the copies. Similarly, create copies of your internal mutable objects when necessary to avoid returning the originals in your methods.

6. High Level Concurrency- Lock Objects: Locking idioms that simplify concurrent applications- Executors: Launching and managing threads- Concurrent Collections: Manage large collections of data & reduce the need for synchronization- Atomic Variables: Features helping to minize memory consistency errors- ThreadLocalRandom: Efficient generation of pseudorandom numbers off multiple threads

6.1. Lock ObjectsImplementation of a simple version of Reentrant Lock; unlike the normal implicit lock is that this backs out of an attempt to acquire a lock (i.e. tryLock method; lockInterruptibly backs out if another thread sends in an interrupt before acquisition of lock)

import java.util.concurrent.locks.Lock;import java.util.concurrent.locks.ReentrantLock;import java.util.Random;

public class Safelock { static class Friend { private final String name; private final Lock lock = new ReentrantLock();

public Friend(String name) { this.name = name; }

public String getName() { return this.name; }

public boolean impendingBow(Friend bower) { Boolean myLock = false; Boolean yourLock = false; try { myLock = lock.tryLock(); yourLock = bower.lock.tryLock(); } finally { if (! (myLock && yourLock)) { if (myLock) { lock.unlock(); } if (yourLock) { bower.lock.unlock(); } } } return myLock && yourLock; }

public void bow(Friend bower) { if (impendingBow(bower)) { try {

System.out.format("%s: %s has" + " bowed to me!%n", this.name, bower.getName()); bower.bowBack(this); } finally { lock.unlock(); bower.lock.unlock(); } } else { System.out.format("%s: %s started" + " to bow to me, but saw that" + " I was already bowing to" + " him.%n", this.name, bower.getName()); } }

public void bowBack(Friend bower) { System.out.format("%s: %s has" + " bowed back to me!%n", this.name, bower.getName()); } }

static class BowLoop implements Runnable { private Friend bower; private Friend bowee;

public BowLoop(Friend bower, Friend bowee) { this.bower = bower; this.bowee = bowee; }

public void run() { Random random = new Random(); for (;;) { try { Thread.sleep(random.nextInt(10)); } catch (InterruptedException e) {} bowee.bow(bower); } } }

public static void main(String[] args) { final Friend alphonse = new Friend("Alphonse"); final Friend gaston = new Friend("Gaston"); new Thread(new BowLoop(alphonse, gaston)).start(); new Thread(new BowLoop(gaston, alphonse)).start(); }}

6.2. ExecutorsSeparate thread management and creation from the rest of application

6.2.1. Executor Interfaces- Executor: Simple interface supporting the launching of new tasks

- ExecutorService: Sub interface of executor adding features that manage the lifecycle of both individual tasks and executor itself

- ScheduledExecutorService: Subinterface of ES supporting future/ periodic execution of tasks

Executor

Execute does the same thing as Thread(r).start() but is more likely to use an existing worker thread to run or place it in a queue to wait for a worker thread to become available

ExecutorService

Uses the more versatile submit method which accepts Runnable, Callable objects that allows the return of Future objects. Methods are also provided to manage shutdown of executor but tasks should also be able to handle interrupts correctly

public class CallableFutures {

private static final int NTHREDS = 10;

public static void main(String[] args) {

ExecutorService executor = Executors.newFixedThreadPool(NTHREDS);

List<Future<Long>> list = new ArrayList<Future<Long>>();

for (int i = 0; i < 20000; i++) {

Callable<Long> worker = new MyCallable();

Future<Long> submit = executor.submit(worker);

list.add(submit);

}

long sum = 0;

System.out.println(list.size());

// now retrieve the result

for (Future<Long> future : list) {

try {

sum += future.get();

} catch (InterruptedException e) {

e.printStackTrace();

} catch (ExecutionException e) {

e.printStackTrace();

}

}

System.out.println(sum);

executor.shutdown();

}

}

Results can only be retrieved when computation is done, blocking where necessary

ScheduledExecutorService

Supplements with a schedule method that executes a Runnable or Callable after a specified delay. Specific tasks can also be supplanted repeatedly at defined intervals with scheduleWithFixedDelay

import static java.util.concurrent.TimeUnit.*; class BeeperControl { private final ScheduledExecutorService scheduler = Executors.newScheduledThreadPool(1);

public void beepForAnHour() { final Runnable beeper = new Runnable() { public void run() { System.out.println("beep"); } }; final ScheduledFuture<?> beeperHandle = scheduler.scheduleAtFixedRate(beeper, 10, 10, SECONDS); scheduler.schedule(new Runnable() { public void run() { beeperHandle.cancel(true); } }, 60 * 60, SECONDS); } }ScheduledFuture<?> schedule(Runnable command, long delay, TimeUnit unit)Creates and executes a one-shot action that becomes enabled after the given delay.

Parameters:

command - the task to execute

delay - the time from now to delay execution

unit - the time unit of the delay parameter

Returns:

a ScheduledFuture representing pending completion of the task and whose get() method will return nullupon completion

6.2.2. Thread PoolsMost executor implementations use thread pools consisting of worker threads; exists separately from Runnable and Callable tasks it executes and is usually used for multiple tasks

- Minimize overhead due to thread creation (allocating and de-allocating many threads causes significant memory overhead management)

- Advantage of degrading gracefully by allowing system to handle threads according to how much it can handle and not how fast they come in

Core Pool Size

ThreadPoolExecutor will automatically adjust the pool size (see getPoolSize()) according to the bounds set by

corePoolSize (see getCorePoolSize()) and maximumPoolSize (see getMaximumPoolSize()). When a new

task is submitted in method execute(java.lang.Runnable), and fewer than corePoolSize threads are running, a

new thread is created to handle the request, even if other worker threads are idle.

On-demand Construction

By default, even core threads are initially created and started only when new tasks arrive, but this can be overridden dynamically using method prestartCoreThread() or prestartAllCoreThreads(). You probably want to prestart

threads if you construct the pool with a non-empty queue.

Creating New Threads

By supplying a different ThreadFactory, you can alter the thread's name, thread group, priority, daemon status, etc. If a ThreadFactory fails to create a thread when asked by returning null from newThread, the executor will continue, but might

not be able to execute any tasks.

Keep-Alive Times

Means of reducing resource consumption when the pool is not being actively used. If the pool becomes more active later, new threads will be constructed.

Queuing

Direct handoffs. A good default choice for a work queue is a SynchronousQueue that hands off tasks to threads without

otherwise holding them. Here, an attempt to queue a task will fail if no threads are immediately available to run it, so a new thread will be constructed.

This policy avoids lockups when handling sets of requests that might have internal dependencies. Direct handoffs generally require unbounded maximumPoolSizes to avoid rejection of new submitted tasks. This in turn admits the possibility of unbounded thread growth when commands continue to arrive on average faster than they can be processed.

6.2.3. Fork/JoinAn extension of the AbstractExecutorService class and implements a work-stealing algorithm to execute tasks that are queued to other busy worker threads

Performing the blur is accomplished by working through the source array one pixel at a time. Each pixel is averaged with its surrounding pixels (the red, green, and blue components are averaged), and the result is placed in the destination array. Since an image is a large array, this process can take a long time. You can take advantage of concurrent processing on multiprocessor systems by implementing the algorithm using the fork/join framework. Here is one possible implementation:

public class ForkBlur extends RecursiveAction { private int[] mSource; private int mStart; private int mLength;

http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ThreadPoolExecutor.html#prestartAllCoreThreads()

http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ThreadPoolExecutor.html#prestartCoreThread()

http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ThreadPoolExecutor.html#execute(java.lang.Runnable)

http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ThreadPoolExecutor.html#getMaximumPoolSize()

http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ThreadPoolExecutor.html#getCorePoolSize()

http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ThreadPoolExecutor.html#getPoolSize()

private int[] mDestination;

// Processing window size; should be odd. private int mBlurWidth = 15;

public ForkBlur(int[] src, int start, int length, int[] dst) { mSource = src; mStart = start; mLength = length; mDestination = dst; }

protected void computeDirectly() { int sidePixels = (mBlurWidth - 1) / 2; for (int index = mStart; index < mStart + mLength; index++) { // Calculate average. float rt = 0, gt = 0, bt = 0; for (int mi = -sidePixels; mi <= sidePixels; mi++) { int mindex = Math.min(Math.max(mi + index, 0), mSource.length - 1); int pixel = mSource[mindex]; rt += (float)((pixel & 0x00ff0000) >> 16) / mBlurWidth; gt += (float)((pixel & 0x0000ff00) >> 8) / mBlurWidth; bt += (float)((pixel & 0x000000ff) >> 0) / mBlurWidth; }

// Reassemble destination pixel. int dpixel = (0xff000000 ) | (((int)rt) << 16) | (((int)gt) << 8) | (((int)bt) << 0); mDestination[index] = dpixel; } }

...

If the previous methods are in a subclass of the RecursiveActionclass, then setting up the task to run in a ForkJoinPool is straightforward, and involves the following steps:

1. Create a task that represents all of the work to be done.2. // source image pixels are in src3. // destination image pixels are in dst4. ForkBlur fb = new ForkBlur(src, 0, src.length, dst);

5. Create the ForkJoinPool that will run the task.6. ForkJoinPool pool = new ForkJoinPool();

7. Run the task.

pool.invoke(fb);

6.3. Concurrent CollectionsHelp avoid a memory consistent error by defining a happens-before relationship between an operation that adds an object to the collection with subsequent operations accessing or removing the object

BlockingQueue defines a first-in-first-out data structure that blocks or times out when you attempt to add to a full queue, or retrieve from an empty queue.

ConcurrentMap is a subinterface of java.util.Map that defines useful atomic operations. These operations remove or replace a key-value pair only if the key is present, or add a key-value pair only if the key is absent. Making these operations atomic helps avoid synchronization. The standard general-purpose implementation of ConcurrentMap isConcurrentHashMap, which is a concurrent analog ofHashMap.

ConcurrentNavigableMap is a subinterface ofConcurrentMap that supports approximate matches. The standard general-purpose implementation ofConcurrentNavigableMap is ConcurrentSkipListMap, which is a concurrent analog of TreeMap.

6.4. Atomic VariablesAtomic operations with get and set methods that work like read and writes on volatile variables. A set has a happens-before relation with any get on the same variable

import java.util.concurrent.atomic.AtomicInteger;

class AtomicCounter { private AtomicInteger c = new AtomicInteger(0);

public void increment() { c.incrementAndGet(); }

public void decrement() { c.decrementAndGet(); }

public int value() { return c.get(); }

}

6.5. Concurrent Random NumbersA convenience class for applications expecting to use random numbers from multiple threads or ForkJoinTasks

int r = ThreadLocalRandom.current() .nextInt(4, 77);

Lesson Notes

class RWDictionary { private final Map<String, Data> m = new TreeMap<String, Data>(); private final ReentrantReadWriteLock rwl = new ReentrantReadWriteLock(); private final Lock r = rwl.readLock(); private final Lock w = rwl.writeLock();

public Data get(String key) { r.lock(); try { return m.get(key); } finally { r.unlock(); } } public String[] allKeys() { r.lock(); try { return m.keySet().toArray(); } finally { r.unlock(); } } public Data put(String key, Data value) { w.lock(); try { return m.put(key, value); } finally { w.unlock(); } } public void clear() { w.lock(); try { m.clear(); } finally { w.unlock(); } } }

Lock Policies

Where multiple threads are waiting for a lock, the acquisition policy can have significant impact on response times and efficiency

Non-Fair (default)

When continuously contended, it may indefinitely postpone one or more reader/writer threads but will normally have higher throughput than a fair lock. Every thread attempts to acquire in random order and may jump queue.

Fair

Keep a FIFO queue of requests with no jumping of queue.

For reentrant read-write locks

When the currently held lock is released either the longest-waiting single writer thread will be assigned the write lock, or if there is a group of reader threads waiting longer than all waiting writer threads, that group will be assigned the read lock.

A thread that tries to acquire a fair write lock (non-reentrantly) will block unless both the read lock and write lock are free (which implies there are no waiting threads). (Note that the non-

blockingReentrantReadWriteLock.ReadLock.tryLock() andReentrantReadWriteLock.WriteLock.tryLock() methods do not honor this fair setting and will acquire the lock if it is possible, regardless of waiting threads.)

Write-Policy

- Writes have priority over reads (they are put in front of reads in the queue)

http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/locks/ReentrantReadWriteLock.WriteLock.html#tryLock()

http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/locks/ReentrantReadWriteLock.ReadLock.html#tryLock()

- Alternately think of two queues, with the write queue always served first.

Granularity: Row vs. TableRow level locking

Advantages Disadvantages

- Fewer lock conflicts when

different sessions access

different rows

- Fewer changes for rollbacks

- Possible to lock a single row for

a long time

- Requires more memory than page-level or table-level locks

- Slower than page-level or table-level locks when used on a

large part of the table because you must acquire many more

locks

- Slower than other locks if you often do GROUP BY operations

on a large part of the data or if you must scan the entire

table frequently

Table Locks

Suitable for the following cases

- Most statements for the table are reads

- Statements for the table are a mix of reads and writes, where writes are updates or deletes for a

single row that can be fetched with one key read:

UPDATE tbl_name SET column=value WHERE unique_key_col=key_value;

DELETE FROM tbl_name WHERE unique_key_col=key_value;

- SELECT combined with concurrent INSERT statements, and very

few UPDATE or DELETE statements

- Many scans or GROUP BY operations on the entire table without any writers

Week 6- Transactions

Reading (1) Wikipedia ACIDAtomicity : All or nothing. If part of the transaction fails, the entire set fails and the database is unchanged

Consistency : Any transaction state will bring the DB from 1 valid state to another and that programming errors do not violate any rules

Isolation : Concurrent execution of transactions result in a system state obtained as if executed serially

Durability : Committed transaction remains so even if there’s a crash, power loss or errors

ACID FailuresAn integrity constraint requires that the value in A and the value in B must sum to 100. (Constraints ensuring accuracy and consistency in an RDBMS)

Atomicity Failure

After removing 10 from A, the transaction is unable to modify B and the DB retained A’s value. Atomicity and the constraint is violated. There is partial failure.

Consistency Failure

Assuming a transaction attempts to subtract 10 from A without touching B. If the DB does record A+B=90, then the constraint is violated and entire transaction must be canceled and rolled back to previous state.

Isolation Failure

Consider two transactions. T1 transfers 10 from A to B. T2 transfers 10 from B to A. Combined, there are four

actions:

T1 subtracts 10 from A.

T1 adds 10 to B.

http://en.wikipedia.org/wiki/Integrity_constraints

T2 subtracts 10 from B.

T2 adds 10 to A.

If these operations are performed in order, isolation is maintained, although T2 must wait. Consider what happens if

T1 fails half-way through. The database eliminates T1's effects, and T2 sees only valid data.

By interleaving the transactions, the actual order of actions might be:

T1 subtracts 10 from A.

T2 subtracts 10 from B.

T2 adds 10 to A.

T1 adds 10 to B.

Again, consider what happens if T1 fails halfway through. By the time T1 fails, T2 has already modified A; it cannot

be restored to the value it had before T1 without leaving an invalid database. This is known as a write-write failure,[citation needed] because two transactions attempted to write to the same data field. In a typical system, the problem

would be resolved by reverting to the last known good state,

Durability Failure

Transaction is complete in runtime and is queued in the disk buffer waiting to be writer. Power fails and the changes are lost. User assumes that the change is already made.

Locking vs. Multi-versioning

Non-trivial transactions usually require a large number of locks that result in substantial overhead and may block other transactions. If B is attempting to modify what A is already working on, B must wait and this is 2-phase locking to guarantee full isolation (2PL)

2-Phase Locking (2PL)

Expanding Phase: Locks are acquired and none released

Shrinking Phase: Locks released and none acquired

Only when a transaction has entered ready state in all its’ processes that it is ready to be committed

Distributed Transactions

In a distributed DB where no single node is responsible for all data affecting a transaction will present additional complications like network connection failures, node failures etc. 2-Phase commit protocol provides atomicity for distributed transactions.

2-Phase Commit (2PC)

Specialized type of consensus protocol that coordinates all processes participating in a distributed atomic transaction on whether to commit or abort the transaction

Commit-Request Phase : Coordinator prepares all transaction’s participating processes by voting to commit (if local portion execution has ended properly) or abort (if a problem has been detected with the local portion).

Commit Phase : Coordinator decides whether to commit (only when ALL have voted COMMIT) or abort and notifies all cohorts of the result (to follow the needed actions) with local transactional resources and respective portions in the transaction’s other output (where applicable)

Assumptions

1. One node the master site and designated the coordinator. Others are designated cohorts.2. Protocol assumes there is stable storage at each node with a write-ahead log (provides durability

and atomicity by writing modifications to a log including redo and undo information)3. No node crashes forever and data in the log is never lost/corrupted in a crash4. Any 2 nodes can communicate with each other

Disadvantage

Blocking protocol. If the coordinator fails permanently, some cohorts will never resolve their transactions. After a cohort sends an agreement message, it’ll block till a commit/rollback is received

Reading (2) Isolation [Database Systems]Concurrency control contains the underlying mechanisms of a DBMS where isolation is handled and guarantees related correctness. Constraining DB access operations does mean reduced performance (rates of execution) and thus attempt to provide the best performance under these constraints

Serializability property is often compromised for better performance when possible without harming correctness.

1. Isolation LevelsTrade-offs here includes the locking overhead due to high isolation levels. Relaxation of code may result in bugs difficult to find while possibility of deadlock is increased if higher isolation levels are required.

1.1. SerializableHighest isolation level

Requires read and write locks to be released at the END of transaction. Range-locks are acquired when a SELECT query uses a ranged WHERE clause, to avoid phantom reads

For non-lock based concurrency control, no locks are acquired. But if a write collision is detected amongst several concurrent transactions, only 1 is allowed to commit.

Snapshot isolation

Guarantee that all reads will see a consistent snapshot (normally the last committed values at start of snapshot). Transaction commits only if no updates made conflict with any other concurrent updates since the last snapshot

Implemented in MVCC and allows better performance than serializability, and avoids most of the attributed anomalies. MVCC works by allowing each connected user to see a snapshot of the database. When updates are required, a new version marked as the latest is added elsewhere and the rest marked obsolete.

1.2. Repeatable ReadsRead and write locks are maintained, but not range locks. Phantom reads can occur

1.3. Read CommittedWrite locks are kept acquired on selected data, but read locks are released as soon as SELECT is performed (non-repeatable reads can occur along with phantom reads).

Any data read is committed the moment it is read. Restricts reader from seeing any intermediate, uncommitted dirty read; data is free to change after reading.

1.4. Read UncommittedLowest Isolation level

Dirty reads are allowed so one transaction may see not-yet committed changes by other transactions.

2. Read Phenomena

2.1. Dirty Read (Uncommitted Dependency)Happens when a transaction is allowed to read from a row modified by another running transaction and not yet committed; similar to non-repeatable reads

2.2. Non-Repeatable ReadsA row is retrieved twice and values differ between reads. Happens when locks are not acquired while performing a SELECT, or when locks are released as soon has a SELECT is done.

In MVCC, non-repeatable reads may happen when a transaction affected by commit conflict doesn’t rollback

At SERIALIZABLE and REPEATABLE READ, DBMS returns old value. At the READ COMMITTED and READ UNCOMMITTED level, DB returns the updated value

Serial Scheduling MVCCDelay T2 until TI is committed or rolled back T1 and T2 allowed to continue but T1 works on an older

snapshot that is later compared to the schedule. If an error is detected, T1 rolls back with a serialization failure

At REPEATABLE READ level, Query 2 would be blocked till 1 was committed or rolled back.

MVCC

At the SERIALIZABLE level, both SELECT queries see a snapshot of the database taken at the start of T1, hence returning the same data, but if T1 attempts to update too, then a serialization failure happens

At READ COMMITTED, since there is no promise of serializability, the transactions see different data for the start of each query and hence T1 is not retried.

2.3. Phantom ReadsWhere 2 identical queries are executed and the collection of rows returned is different from the first. Where range locks are not acquired.

In SERIALIZABLE mode, Query 1 will lock all records between 10 and 30 so that Query 2 will still be locked, but in REPEATABLE READ onwards, the range will not be locked and allows the insertion of new records hence its return

3. Isolation Levels, Read Phenomena and Locks

3.1. Isolation Levels vs. Read Phenomena

Dirty Reads Non-Repeatable Reads PhantomRead Uncommitted May occur May occur May occur

Read Committed May occur May occurRepeatable Read May occurSerializable

3.2. Isolation Levels vs. Lock DurationC- Locks are held till transaction commits

For the rest- Denotes that locks are held only during currently executing statement

Write Read Range OperationRead UncommittedRead Committed CRepeatable Read C CSerializable C C C

Reading (3) Introduction to Concurrency Control

1. CollisionsCollisions happen when 2 activities (may or not be full-fledged transactions) attempt to change entities within a system of record.

Dirty Read

Activity 1 (A1) reads an entity from the system of record and then updates the system of record but does not commit the change (for example, the change hasn’t been finalized). Activity 2 (A2) reads the entity, unknowingly making a copy of the uncommitted version. A1 rolls back (aborts) the changes, restoring the entity to the original state that A1 found it in. A2 now has a version of the entity that was never committed and therefore is not considered to have actually existed.

Non-Repeatable Read

A1 reads an entity from the system of record, making a copy of it. A2 deletes the entity from the system of record. A1 now has a copy of an entity that does not officially exist.

Phantom Read

A1 retrieves a collection of entities from the system of record, making copies of them, based on some sort of search criteria such as “all customers with first name Bill.”A2 then creates new entities, which would have met the search criteria (for example, inserts “Bill Klassen” into the database), saving them to the system of record. If A1 reapplies the search criteria it gets a different result set.

2. Locking Strategies

2.1. Pessimistic LockingEntity is locked for the whole time it is in application memory by preventing other uses from working with it.

- Write Lock: Disables write, read, delete entity

- Read Lock: Allowing reads but not write or delete

Scope can be whole database, table, row or page locks.

2.2. Optimistic LockingCommon that collisions are infrequent, instead of preventing, we detect and resolve the collision

1. Read lock secured on the data2. Object is read into memory for manipulation and lock is released3. Process for manipulation4. Obtains a write lock once ready for updating5. Reads original source to see if there’s been a collision (Updates and unlocks if no collision)6. Resolution of collision if it occurs

Strategies to Resolving Collision

Mark source with Unique Identifier

Source data marked with unique value each time it is to be updated. At the point of update, this mark is checked for changes,

- Datetime stamps (use DB’s timestamp for this since not all machines have the same sync)- Incremental counters- UserIDs (only if everyone has a unique ID, logged into only 1 machine and applications ensure just

one of this in memory- Values generated by a globally unique surrogate key generator

Retain an original Copy

Source data retrieved and compared to determine for collision; there should be substance sufficient for this

3. Collision Resolution Strategies1. Give up2. Display problems and let user decide3. Merge changes4. Log the problem so others can make sense of it in future5. Ignore the collision and overwrite

4. A Locking StrategyTable Type Examples Suggested Locking Strategy

Live-High Volume Account

Optimistic(first choice) Pessimistic(second choice)

Live-Low Volume Customer Insurance Policy

Pessimistic(first choice) Optimistic(second choice)

Log (typically append only)

AccessLog AccountHistory TransactionRecord

Overly Optimistic

Lookup/Reference (typically read only)

State PaymentType

Overly Optimistic

Reading (4) MySQL SET TRANSACTIONGlobal: all subsequent sessions; Session: all subsequent transactions within session; then next transaction

InnoDB Repeatable ReadDefault transaction level with a difference from READ COMMITTED onwards. All consistent reads are from the same snapshot established by the first read. These SELECT statements (non-locking) are consistent with respect to each other

Locking for reads and writes depends on whether unique index with unique search criteria is applied. For a unique index with search conditions, InnoDB locks only the index record found, not the gap before it.

For other conditions, InnoDB locks the range scanned with gap-locks or next-key locks to block insertions by other sessions into the gaps covered by the range

InnoDB Read CommittedEvery consistent read, even within the same transaction sets and reads its own fresh snapshot.

Consistent Reading

“A consistent read means that InnoDB uses multi-versioning to present to a query a snapshot of the

database at a point in time. The query sees the changes made by transactions that committed before that point of time, and no changes made by later or uncommitted transactions. The exception to this rule is that the query sees the changes made by earlier statements within the same transaction. This exception causes the following anomaly: If you update some rows in a table, a SELECT sees the latest version of the updated

rows, but it might also see older versions of any rows. If other sessions simultaneously update the same table, the anomaly means that you might see the table in a state that never existed in the database”

Repeatable Read: All read the first snapshot established in the transaction. Get fresher ones by committing and reissuing new queries

Read Committed: Each consistent read has its own snapshot. So to get the freshest set, use this mode

Read Uncommitted: SELECT statements are non-locking such that a possible earlier version may be used. Such reads are not consistent leading to “dirty reads”

Serializable: InnoDB implicitly converts all SELECTs to SELECT … LOCK IN SHARE MODE if auto commit is disabled. If autocommit is enabled, then the SELECT is its own TRANSACTION. It is thus READ ONLY and can be serialized if

http://dev.mysql.com/doc/refman/5.0/en/select.html

performed as a consistent non-locking read. (Disable autocommit to force a plain SELECT to block if other transactions have modified it)

Advantages & DisadvantagesIt is important to remember that InnoDB actually locks index entries, not rows. During the execution of a statement

InnoDB must lock every entry in the index that it traverses to find the rows it is modifying. It must do this to prevent

deadlocks and maintain the isolation level.

In Repeatable Read, every lock acquired during a transaction is held for the duration of the transactionIn Read Committed, the locks that do not match the scan are released after the statement completes

InnoDB doesn’t release the heap memory back after releasing the locks but the number of locks held is way lower. This means in READ COMMITTED other transactions are free to update rows they would not have been able to update once the UPDATE statement completesConsistent Read Views

REPEATABLE READ MODEWithin a transaction, the same snapshot is used for the duration till committing. This above UPDATE also creates a gap lock that will prevent rows from being inserted until rollback or commits are made

There is no possibility to change additional rows since the gap after 100 is previously locked

Non-Repeatable ReadsREAD COMMITTED MODEA read is created within the start of each STATEMENT even within the same transaction. Read view of each transaction only lasts as long as the statement execution.

For consecutive executions of the same statement could lead to the ‘phantom row’ problem. Here, gap locks are NEVER created, so the example of SELECT … FOR UPDATE will NEVER work in preventing insertions of new rows into the table by other transactions

This means possibly updating more rows than you intended to.

Reading (5) Oracle Multi Version Concurrency ControlOracle automatically provides read consistency to a query so that all the data that the query sees comes from a single point in time (statement-level read consistency).

Oracle can also provide read consistency to all of the queries in a transaction (transaction-level read consistency).

As a query enters the execution stage, the current system change number (SCN) is determined.

In Figure 13-1, this system change number is 10023. As data blocks are read on behalf of the query, only blocks written with the observed SCN are used.

Blocks with changed data (more recent SCNs) are reconstructed from data in the rollback segments, and the reconstructed data is returned for the query. Therefore, each query returns all committed

data with respect to the SCN recorded at the time that query execution began. Changes of other transactions that occur during a query's execution are not observed, guaranteeing that consistent data is returned for each query.

Statement Level Read ConsistencyEvery single query comes from a single point in time- time where query begun. Dirty data or changes made from transactions committing during query execution are not seen.

A consistent result set is provided for EVERY QUERY, guaranteeing data consistency. SELECT, INSERT, UPDATE, DELETE all return consistent data. Each is a query in itself.

Transaction Level Read ConsistencySerializable Mode

All data accesses reflect state of DB at time of transaction’s beginning. All data is consistent to a single point in time. Consistently produces repeatable reads and does not expose a query to phantoms

Deadlocks

A deadlock can occur when two or more users are waiting for data locked by each other. Deadlocks prevent some transactions from continuing to work.Figure 13-3 is a hypothetical illustration of two transactions in a deadlock.

Figure 13-3 Two Transactions in a Deadlock

http://docs.oracle.com/cd/B19306_01/server.102/b14220/consist.htm#i6885


Description of "Figure 13-3 Two Transactions in a Deadlock"

Deadlock Detection

Oracle automatically detects deadlock situations and resolves them by rolling back one of the statements involved in the deadlock, thereby releasing one set of the conflicting row locks.

A corresponding message also is returned to the transaction that undergoes statement-level rollback. The statement rolled back is the one belonging to the transaction that detects the deadlock. Usually, the signalled transaction should be rolled back explicitly, but it can retry the rolled-back statement after waiting.

Note:In distributed transactions, local deadlocks are detected by analyzing wait data, and global deadlocks are detected by a time out. Once detected, non-distributed and distributed deadlocks are handled by the database and application in the same way.

Deadlocks most often occur when transactions explicitly override the default locking of Oracle. Because Oracle itself does no lock escalation and does not use read locks for queries, but does use row-level locking (rather than page-level locking), deadlocks occur infrequently in Oracle.

See Also:"Explicit (Manual) Data Locking" for more information about manually acquiring locks

Avoid Deadlocks

Multitable deadlocks can usually be avoided if transactions accessing the same tables lock those tables in the same order, either through implicit or explicit locks. For example, all application developers might follow the rule that when both a master and detail table are updated, the master table is locked first and then the detail table. If such rules are properly designed and then followed in all applications, deadlocks are very unlikely to occur.

When you know you will require a sequence of locks for one transaction, consider acquiring the most exclusive (least compatible) lock first.


http://docs.oracle.com/cd/B19306_01/server.102/b14220/img_text/cncpt068.htm

Types of Locks

Oracle automatically uses different types of locks to control concurrent access to data and to prevent destructive interaction between users. Oracle automatically locks a resource on behalf of a transaction to prevent other transactions from doing something also requiring exclusive access to the same resource. The lock is released automatically when some event occurs so that the transaction no longer requires the resource.

Throughout its operation, Oracle automatically acquires different types of locks at different levels of restrictiveness depending on the resource being locked and the operation being performed.

Oracle locks fall into one of three general categories.

Lock Description

DML locks (data locks)

DML locks protect data. For example, table locks lock entire tables, row locks lock selected rows.

DDL locks (dictionary locks)

DDL locks protect the structure of schema objects—for example, the definitions of tables and views.

Internal locks and latches

Internal locks and latches protect internal database structures such as datafiles. Internal locks and latches are entirely automatic.

The following sections discuss DML locks, DDL locks, and internal locks.

DML Locks

The purpose of a DML lock (data lock) is to guarantee the integrity of data being accessed concurrently by multiple users. DML locks prevent destructive interference of simultaneous conflicting DML or DDL operations. DML statements automatically acquire both table-level locks and row-level locks.

Note:The acronym in parentheses after each type of lock or lock mode is the abbreviation used in the Locks Monitor of Enterprise Manager. Enterprise Manager might display TM for any table lock, rather than indicate the mode of table lock (such as RS or SRX).

Row Locks (TX)

Row-level locks are primarily used to prevent two transactions from modifying the same row. When a transaction needs to modify a row, a row lock is acquired.

There is no limit to the number of row locks held by a statement or transaction, and Oracle does not escalate locks from the row level to a coarser granularity. Row locking provides the finest grain locking possible and so provides the best possible concurrency and throughput.

The combination of multiversion concurrency control and row-level locking means that users contend for data only when accessing the same rows, specifically:

Readers of data do not wait for writers of the same data rows.

Writers of data do not wait for readers of the same data rows unless SELECT ... FOR UPDATE is used, which specifically requests a lock for the reader.

Writers only wait for other writers if they attempt to update the same rows at the same time.

Note:Readers of data may have to wait for writers of the same data blocks in some very special cases of pending distributed transactions.

A transaction acquires an exclusive row lock for each individual row modified by one of the following statements: INSERT, UPDATE, DELETE, and SELECT with the FOR UPDATE clause.

A modified row is always locked exclusively so that other transactions cannot modify the row until the transaction holding the lock is committed or rolled back.However, if the transaction dies due to instance failure, block-level recovery makes a row available before the entire transaction is recovered. Row locks are always acquired automatically by Oracle as a result of the statements listed previously.

If a transaction obtains a row lock for a row, the transaction also acquires a table lock for the corresponding table. The table lock prevents conflicting DDL operations that would override data changes in a current transaction.

See Also:"DDL Locks"

Table Locks (TM)

Table-level locks are primarily used to do concurrency control with concurrent DDL operations, such as preventing a table from being dropped in the middle of a DML operation. When a DDL or DML statement is on a table, a table lock is acquired. Table locks do not affect concurrency of DML operations. For partitioned tables, table locks can be acquired at both the table and the subpartition level.

A transaction acquires a table lock when a table is modified in the following DML statements: INSERT, UPDATE, DELETE, SELECT with the FOR UPDATE clause, and LOCK TABLE. These DML operations require table locks for two purposes: to reserve DML access to the table on behalf of a transaction and to prevent DDL operations that would conflict with the transaction. Any table lock prevents the acquisition of an exclusive DDL lock on the same table and thereby prevents DDL operations that require such locks. For example, a table cannot be altered or dropped if an uncommitted transaction holds a table lock for it.

A table lock can be held in any of several modes: row share (RS), row exclusive (RX), share (S), share row exclusive (SRX), and exclusive (X). The restrictiveness of a table lock's mode determines the modes in which other table locks on the same table can be obtained and held.


Lesson Notes

Week 9 Fault Tolerance

Typical Failure Modes

1. Code: Bugs2. Data Consistency3. Systems: Hardware failures, software hangs4. Environment: Differing data from redundant input sources

Common to embedded control systems

Availability & Fault Tolerance

- Availability to give service- Availablility% = Uptime/Total Time

o Total Time = Uptime + Total Downtimeo Downtime = Time to Detect + Time to Restart

How to count scheduled maintenance- Fault Tolerance = Working correctly despite errors

Client Based

Load-Balancer

DNSDomain Name Mapping of I.P. addresses to meaningful URL names (Uniform Resource Locator)

System

IP

Virtual I.P

Restart

Week 10- Replication

Reading (1) Replication in ComputingSharing of information to ensure consistency between redundant resources to improve reliability, fault-tolerance or accessibility

Data Replication: Same data on multiple storage devices

Computation Replication: Same computing task executed many times

- Typically replicated in space (across devices)- Or across time (many times on the device)

- Often linked via the scheduling algorithm

Active Replication: Performed by processing the same request at every replica

Passive Replication: Processing the request in a single replica and then transfer the results to all replicas

Master-Slave: Primary-backup scheme where pre-dominant in high availability clusters. Master replica is designated to process all requests. Single master makes it easier to achieve consistency within the group

Multi-Master: Any replica processes and distributes the new state. Distributed concurrency control is needed such as a distributed lock manager

Load balancing focuses on distributing a load of different computations across machines and allows a single computation to be dropped in case of failure. Sometimes, data replication is needed for distribution of data across machines

Backups vs. Replicas: Backups saves a copy of data for a long period of time while replicas undergo frequent updates and quickly lose any historical state.

Replication Models in Distributed SystemsTransactional Replication

A database combined with some transactional storage structure. One-copy serializability employed in accordance with ACID properties guaranteed.

State Machine Replication

Replicated process is deterministic automaton and atomic broadcast is possible. Based on distributed consensus model and is similar to transaction replication. Implemented by a replicated log using Paxos algorithm; used by Google’s Chubby system and behind Keyspace datastore.

Virtual Synchrony

Used when a group of processes cooperate to replicate in-memory data and to coordinate actions. A process group is defined for this; and provided a checkpoint containing current state of the data.

Processes can send multicasts that are seen by all members to communicate current state of data

It is more simple especially if you just write to one node fallback and recovery are rather easy. Even if all things are automated simple things mean less software bugs.

Handling write load If your application is write intensive master-N-slave configuration will be saturated much faster because it has to handle much more write load. Especially keeping into account MySQL replication is single thread it might be not long before it will be unable to keep up.

Waste of cache memory If you have same data on the slaves you will likely same data cached in their database caches. You can partially improve it by load partitioning but still it will not be perfect – for example all of write load has to go to all nodes getting appropriate data in the cache. In our example if you have 16GB boxes and say 12GB allocated to MySQL database caches you can get 12GB effective cache on the master-N-slave configuration compared to 36GB of effective cache on 3 master-master pairs.

Waste of Disk Disk is cheap but for IO bound workloads you may need fast disk array, which becomes not so cheap so having less data to deal with becomes important.

More time to clone If replication breaks you may need more time to re-clone it (or restore database from backup) compared to multiple master-master pairs.

Week 11- Big Distributed Data

Partitioning Problem

1. Google Big Table

CONSISTENT, AVAILABLE, PARTITION-TOLERANT

Notes:

Note that “locking” granularity is small (single row) and the transactions are very fast (single row read/write). Thus contention is likely to be low and blocking times are short

Replication is fast partly because it is inherently conflict free – no one else can edit the data at the same time; and partly because GFS is heavily optimized for this

*BigTable uses Chubby, a distributed lock system to ensure only one server is hosting each range of rows (called a tablet)

2. Google Megastore

Replicas need to track if they are up to date or not. If they are, the read is done locally. If not, the read forces them to catch up. Writes always force a catch up.

3 consistency modes for each read

Inconsistent: whatever is available locally now

Snapshot: most recently applied transaction

Current: forces all commits to be applied and gets most current data

*(file read/write, network communication, cluster distribution)

** the synch groups vs. asynch groups is one example. Another is that the API does not allow joins, the programmer must implement them – thus exposing when a difficult/expensive operation is entailed.

3. Amazon Dynamo

Local Fault detection

If A can’t replicate to B, then A thinks B is down and periodically retries

Membership info and fault info is “gossiped” around the cluster

*W = number of nodes MUST write to

R = number of nodes read from (if possible)

N = number of nodes in cluster

4. Facebook Cassandra

http://www.datastax.com/dev/blog/your-ideal-performance-consistency-tradeoff

More on Cassandrahttp://www.datastax.com/dev/blog/whats-new-in-cassandra-1-0-performance

Paxos

Paxos is a family of protocols for solving consensus in a network of unreliable processors. Consensus is the

process of agreeing on one result among a group of participants. This problem becomes difficult when the

participants or their communication medium may experience failures.[1]

Consensus protocols are the basis for the state machine approach to distributed computing, as suggested by Leslie

Lamport [2] and surveyed by Fred Schneider.[3] The state machine approach is a technique for converting an

algorithm into a fault-tolerant, distributed implementation. Ad-hoc techniques may leave important cases of failures

unresolved. The principled approach proposed by Lamport et al. ensures all cases are handled safely.

The Paxos protocol was first published in 1989 and named after a fictional legislative consensus system used on

the Paxos island in Greece.[4] It was later published as a journal article in 1998.[5]

The Paxos family of protocols includes a spectrum of trade-offs between the number of processors, number of

message delays before learning the agreed value, the activity level of individual participants, number of messages

sent, and types of failures. Although no deterministic fault-tolerant consensus protocol can guarantee progress in

an asynchronous network (a result proved in a paper by Fischer, Lynch and Paterson[6]), Paxos guarantees safety

(consistency), and the conditions that could prevent it from making progress are difficult to provoke.[5][7][8][9][10]

http://en.wikipedia.org/wiki/Paxos_(computer_science)#cite_note-byzantine-10

http://en.wikipedia.org/wiki/Paxos_(computer_science)#cite_note-general-9

http://en.wikipedia.org/wiki/Paxos_(computer_science)#cite_note-fast-8

http://en.wikipedia.org/wiki/Paxos_(computer_science)#cite_note-cheap-7

http://en.wikipedia.org/wiki/Paxos_(computer_science)#cite_note-paxos-5

http://en.wikipedia.org/wiki/Paxos_(computer_science)#cite_note-flp-6

http://en.wikipedia.org/wiki/Paxos_(computer_science)#cite_note-paxos-5

http://en.wikipedia.org/wiki/Paxos_(computer_science)#cite_note-Lamport-4

http://en.wikipedia.org/wiki/Paxi

http://en.wikipedia.org/wiki/Paxos_(computer_science)#cite_note-schneider-3

http://en.wikipedia.org/wiki/Paxos_(computer_science)#cite_note-clocks-2

http://en.wikipedia.org/wiki/Leslie_Lamport

http://en.wikipedia.org/wiki/Leslie_Lamport

http://en.wikipedia.org/wiki/State_machine_replication

http://en.wikipedia.org/wiki/Paxos_(computer_science)#cite_note-agree-1

http://en.wikipedia.org/wiki/Consensus_(computer_science)

http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-0-performance

http://www.datastax.com/dev/blog/your-ideal-performance-consistency-tradeoff

Paxos is usually used where durability is required (for example, to replicate a file or a database), in which the

amount of durable state could be large. The protocol attempts to make progress even during periods when some

bounded number of replicas are unresponsive. There is also a mechanism to drop a permanently failed replica or to

add new a replica.

IS303 Architectural Analysis: SMU SIS Personal Notes

Documents

Transcript of IS303 Architectural Analysis: SMU SIS Personal Notes