Apache Apex Fault Tolerance and Processing Semantics
-
Upload
apache-apex -
Category
Technology
-
view
217 -
download
0
Transcript of Apache Apex Fault Tolerance and Processing Semantics
Apache Apex (incubating)Fault Tolerance and Processing
SemanticsThomas Weise, Architect & Co-founder, PPMC member Pramod Immaneni, Architect, PPMC member
March 24th 2016
Apache Apex Features• In-memory Stream Processing• Partitioning and Scaling out• Windowing (temporal boundary)• Reliability
ᵒ Statefulᵒ Automatic Recoveryᵒ Processing Guarantees
• Operability• Compute Locality• Dynamic updates
2
3
Apex Platform Overview
4
Native Hadoop Integration
• YARN is the resource manager
• HDFS used for storing any persistent state
5
Streaming Windows
Application window Sliding window and tumbling window
Checkpoint window No artificial latency
6
Fault Tolerance• Operator state is checkpointed to persistent store
ᵒ Automatically performed by engine, no additional coding neededᵒ Asynchronous and distributed ᵒ In case of failure operators are restarted from checkpoint state
• Automatic detection and recovery of failed containersᵒ Heartbeat mechanismᵒ YARN process status notification
• Buffering to enable replay of data from recovered pointᵒ Fast, incremental recovery, spike handling
• Application master state checkpointedᵒ Snapshot of physical (and logical) planᵒ Execution layer change log
7
Checkpointing Operator State• Save state of operator so that it can be recovered on failure• Pluggable storage handler• Default implementation
ᵒ Serialization with Kryoᵒ All non-transient fields serializedᵒ Serialized state written to HDFSᵒ Writes asynchronous, non-blocking
• Possible to implement custom handlers for alternative approach to extract state or different storage backend (such as IMDG)
• For operators that rely on previous state for computationᵒ Operators can be marked @Stateless to skip checkpointing
• Checkpoint frequency tunable (by default 30s)ᵒ Based on streaming windows for consistent state
• In-memory PubSub• Stores results emitted by operator until committed• Handles backpressure / spillover to local disk• Ordering, idempotency
Operator 1
Container 1
BufferServer
Node 1
Operator 2
Container 2
Node 2
Buffer Server
8
9
Application Master State• Snapshot state on plan change
ᵒ Serialize Physical Plan (includes logical plan)ᵒ Infrequent, expensive operation
• WAL (Write-ahead-Log) for state changesᵒ Execution layer changesᵒ Container, operator state, property changes
• Containers locate master through DFSᵒ AM can fail and restart, other containers need to find itᵒ Work preserving restart
• Recoveryᵒ YARN restarts application masterᵒ Apex restores state from snapshot and replays log
• Container process fails• NM detects• In case of AM (Apex Application Master), YARN launches replacement
container (for attempt count < max)• Node Manager Process fails
• RM detects NM failure and notifies AM• Machine fails
• RM detects NM/AM failure and recovers or notifies AM• RM fails - RM HA option
• Entire YARN cluster down – stateful restart of Apex application
Failure Scenarios
10
NM NM
Resource Manager
Apex AM
3
2
1
Apex AM
1 2
3
NM
Failure Scenarios
NM
11
Failure Scenarios… EW2, 1, 3, BW2, EW1, 4, 2, 1, BW1
sum0
… EW2, 1, 3, BW2, EW1, 4, 2, 1, BW1sum
7
… EW2, 1, 3, BW2, EW1, 4, 2, 1, BW1sum10
… EW2, 1, 3, BW2, EW1, 4, 2, 1, BW1sum
7
12
13
Processing GuaranteesAt-least-once• On recovery data will be replayed from a previous checkpoint
ᵒ No messages lostᵒ Default, suitable for most applications
• Can be used to ensure data is written once to storeᵒ Transactions with meta information, Rewinding output, Feedback
from external entity, Idempotent operationsAt-most-once• On recovery the latest data is made available to operator
ᵒ Useful in use cases where some data loss is acceptable and latest data is sufficient
Exactly-onceᵒ At-least-once + idempotency + transactional mechanisms (operator
logic) to achieve end-to-end exactly once behavior
End-to-End Exactly Once
14
• Becomes important when writing to external systems• Data should not be duplicated or lost in the external system even in
case of application failures• Common external systems
ᵒ Databasesᵒ Filesᵒ Message queues
• Platform support for at least once is a must so that no data is lost• Data duplication must still be avoided when data is replayed from
checkpointᵒ Operators implement the logic dependent on the external system
• Aid of platform features such as stateful checkpointing and windowing• Three different mechanisms with implementations explained in next
slides
Files
15
• Streaming data is being written to file on a continuous basis• Failure at a random point results in file with an unknown amount of
data• Operator works with platform to ensure exactly once
ᵒ Platform responsibility• Restores state and restarts operator from an earlier checkpoint• Platform replays data from the exact point after checkpoint
ᵒ Operator responsibility• Replayed data doesn’t get duplicated in the file• Accomplishes by keeping track of file offset as state
ᵒ Details in next slide• Implemented in operator AbstractFileOutputOperator in
apache/incubator-apex-malhar github repository available here• Example application AtomicFileOutputApp available here
Exactly Once Strategy
16
File Data
Offset
• Operator saves file offset during checkpoint
• File contents are flushed before checkpoint to ensure there is no pending data in buffer
• On recovery platform restores the file offset value from checkpoint
• Operator truncates the file to the offset
• Starts writing data again• Ensures no data is duplicated or lost
Chk
Transactional databases
17
• Use of streaming windows• For exactly once in failure scenarios
ᵒ Operator uses transactionsᵒ Stores window id in a separate table in the databaseᵒ Details in next slide
• Implemented in operator AbstractJdbcTransactionableOutputOperator in apache/incubator-apex-malhar github repository available here
• Example application streaming data in from kafka and writing to a JDBC database is available here
Exactly Once Strategy
18
d11 d12 d13
d21 d22 d23
lwn1 lwn2 lwn3
op-id wn
chk wn wn+1
Lwn+11 Lwn+12 Lwn+13
op-id wn+1
Data TableMeta Table
• Data in a window is written out in a single transaction
• Window id is also written to a meta table as part of the same transaction
• Operator reads the window id from meta table on recovery
• Ignores data for windows less than the recovered window id and writes new data
• Partial window data before failure will not appear in data table as transaction was not committed
• Assumes idempotency for replay
Stateful Message Queue
19
• Data is being sent to a stateful message queue like Apache Kafka• On failure data already sent to message queue should not be re-sent• Exactly once strategy
ᵒ Sends a key along with data that is monotonically increasingᵒ On recovery operator asks the message queue for the last sent message
• Gets the recovery key from the messageᵒ Ignores all replayed data with key that is less than or equal to the recovered
keyᵒ If the key is not monotonically increasing then data can be sorted on the key
at the end of the window and sent to message queue• Implemented in operator AbstractExactlyOnceKafkaOutputOperator
in apache/incubator-apex-malhar github repository available here
20
Resources• Subscribe - http://apex.incubator.apache.org/community.html• Download - http://apex.incubator.apache.org/downloads.html• Apex website - http://apex.incubator.apache.org/• Twitter - @ApacheApex; Follow - https://
twitter.com/apacheapex• Facebook - https://www.facebook.com/ApacheApex/• Meetup - http://www.meetup.com/topics/apache-apex
Q&A
21