Real-time Platform for Second Look Business Use Case Using Spark and Kafka: Spark Summit East talk...

14
Real-time Platform for Second Look Use Case using Spark and Kafka Ivy Lu Capital One

Transcript of Real-time Platform for Second Look Business Use Case Using Spark and Kafka: Spark Summit East talk...

Page 1: Real-time Platform for Second Look Business Use Case Using Spark and Kafka: Spark Summit East talk by Ivy Lu

Real-time Platform for Second Look Use Case using Spark and KafkaIvy LuCapital One

Page 2: Real-time Platform for Second Look Business Use Case Using Spark and Kafka: Spark Summit East talk by Ivy Lu

How closely do you look at credit card statements?• Ifyouranswerisnotcloselyenough,thenyouprobablyaren’t

alone!• RecentresearchfromCapitalOnerevealssomeofthereal

costsoftheseunexpectedcharges.

Page 3: Real-time Platform for Second Look Business Use Case Using Spark and Kafka: Spark Summit East talk by Ivy Lu

Take a Second Look• Fraudvs.Non-Fraudbutunexpectedcharges

• Typesofchargesyoumaybemissing

• Customersareincorrectlycharged$150onaverageperyear

Recurring TransactionsSpikes in monthly

recurring bills

Duplicate Charges

Multiple swipes at thesame merchant

Generous TipsTip higher than average tipping

behavior

Page 4: Real-time Platform for Second Look Business Use Case Using Spark and Kafka: Spark Summit East talk by Ivy Lu
Page 5: Real-time Platform for Second Look Business Use Case Using Spark and Kafka: Spark Summit East talk by Ivy Lu

Email and Mobile UI

Page 6: Real-time Platform for Second Look Business Use Case Using Spark and Kafka: Spark Summit East talk by Ivy Lu

Second Look Program (initial phase)• Launch:

– Email: August 2015– Mobile push notification: January 2016

• Coverage:– Auto-enroll for credit card customers– Tens of thousands alerts sent per day– Tens of thousands customers reached per day– Several different types of alerts

Page 7: Real-time Platform for Second Look Business Use Case Using Spark and Kafka: Spark Summit East talk by Ivy Lu

Real Time Pipeline (current phase)

Page 8: Real-time Platform for Second Look Business Use Case Using Spark and Kafka: Spark Summit East talk by Ivy Lu

Microservices• Distributed• Decoupled jobs

Page 9: Real-time Platform for Second Look Business Use Case Using Spark and Kafka: Spark Summit East talk by Ivy Lu

Real-Time + Batch Data• Batch Data

– high volume– relatively slow

• Real-Time Data– medium-low

volume– fast

Page 10: Real-time Platform for Second Look Business Use Case Using Spark and Kafka: Spark Summit East talk by Ivy Lu

Deduplication• Cause of Duplication

– At-least-once at data source

– Spark, Kafka Job• Deduplicate at

Database

Page 11: Real-time Platform for Second Look Business Use Case Using Spark and Kafka: Spark Summit East talk by Ivy Lu

Checkpointing• Goal: achieving zero data-loss (at least once)• Spark checkpoint vs. Kafka offset• Connect to Kafka using Spark’s Direct Stream

approach and store offsets back to ZooKeeper

Ref http://aseigneurin.github.io/2016/05/07/spark-kafka-achieving-zero-data-loss.html

Page 12: Real-time Platform for Second Look Business Use Case Using Spark and Kafka: Spark Summit East talk by Ivy Lu

Social Media Feedback“Thank you @CapitalOne for your 'take a second look' email! You saved me money!!!”

@josedunham

“@CapitalOne Wow love the email I got about a possible fraudulent charge. A restaurant added on a tip without my permission and you caught it”

@ryan_babypro98

“Gotta love when a #CreditCard Company lets you know when there are higher charges then normal on your account thanks @CapitalOne YourTheBest”

@AnnButlerDesign

“Thank you @CapitalOne for the catching of an over charge that I otherwise May not had noticed. Must give credit where credit is due. Rock on”

@Jasonmjarrett

Page 13: Real-time Platform for Second Look Business Use Case Using Spark and Kafka: Spark Summit East talk by Ivy Lu

Word Cloud of email feedback

Page 14: Real-time Platform for Second Look Business Use Case Using Spark and Kafka: Spark Summit East talk by Ivy Lu

Thank [email protected]