Near Real-Time Data Analysis With FlyData

28
Near Real-Time Data Analysis With FlyData Move your data on the fly!

description

This document describes our products. FlyData makes it easy to load data automatically and continuously to Amazon Redshift. You can also refer to our HP ( http://flydata.com/ ) for more information.

Transcript of Near Real-Time Data Analysis With FlyData

Page 1: Near Real-Time Data Analysis With FlyData

Near Real-Time Data Analysis With FlyData

Move your data on the fly!

Page 2: Near Real-Time Data Analysis With FlyData

FlyData

Cloud based big data integration

Page 3: Near Real-Time Data Analysis With FlyData

Difficulty of loading data to Redshift~Difference between traditional DBs and Redshift~

MySQL, PostgreSQL, Oracle, etc.

Amazon Redshift

TransactionalRDB

Data warehouse

SQL INSERT

Bulk upload, but

how?

synchronous

asynchronous

Page 4: Near Real-Time Data Analysis With FlyData

MySQL, PostgreSQL, Oracle, etc.

Amazon Redshift

SQL INSERT

FlyData

TransactionalRDB

Data warehouse

synchronous

asynchronous

Difficulty of loading data to Redshift~Difference between traditional DBs and Redshift~

Page 5: Near Real-Time Data Analysis With FlyData

Process of upload to Redshift

1. Data extraction (E)

2. Transform data (T)

3. Upload TSV file to S3

4. Run COPY command to load data from S3 to Redshift (L)

5. Error Handling

AmazonRedshift

S3TSVData Extraction

(E) and Transform (T)

Client Server

Log Files

Load to DB(L)

Error Handling

Page 6: Near Real-Time Data Analysis With FlyData

FlyData: Near-Real Time Upload To Redshift

Manage all with

FlyData

AmazonRedshiftClient

Server

Page 7: Near Real-Time Data Analysis With FlyData

FlyData Features

Page 8: Near Real-Time Data Analysis With FlyData

FlyData – A service for Amazon Redshift

1. Continuous Loading

2. Flexible JSON format Support

3. Query Scheduling and Management

4. All-in-One package for Amazon Redshift

Page 9: Near Real-Time Data Analysis With FlyData

Continuous Loading

• Near Real-time Data: Send data to Redshift periodically, every 5 minutes

• Scaling. FlyData can handle large amounts of data (100GB+ per day) for many tables, while optimizing appropriately with scheduled COPY commands

• Error handling. – Retry and notifications. – Even when Redshift is in its

maintenance window

Page 10: Near Real-Time Data Analysis With FlyData

Nested JSON and Apache Log Formats

• Support for Nested JSON logs and Apache log formats, not yet offered by AWS

• Dynamic Column Creation– Brings flexibility to tables– Less need to predefine table schema

• Smooth handling of nested data– Auto-creation of parent-child table relationships

Page 11: Near Real-Time Data Analysis With FlyData

Example of auto-creating tables from JSON Logs

Your JSON logs:

Get stored in RS as:

Page 12: Near Real-Time Data Analysis With FlyData

Flexible JSON format Support

• Your JSON log can be loaded into Redshift directly!

• Automatic creation of tables and columns for Redshift from your JSON log

• Nested JSON support– Handles structure by creating

parent-child table relations with foreign keys

Page 13: Near Real-Time Data Analysis With FlyData

Query Scheduling and Management

• Stored SQL management on web console• Mail notifications and downloads for queries

that take a long time to run• Periodical query scheduling

(under development)– Time scheduled query processing– Running maintenance tasks

Page 14: Near Real-Time Data Analysis With FlyData

All in One package for Amazon Redshift

• We are an Amazon Redshift partner– Officially listed on

https://aws.amazon.com/redshift/partners/

• Complete technical support for FlyData & Redshift

• As a Reseller Partner, we can provide Amazon Redshift under a flexible pricing schedule

Page 15: Near Real-Time Data Analysis With FlyData

FlyData Sync

Page 16: Near Real-Time Data Analysis With FlyData

FlyData Sync

• Released in January 2014• Enables Synchronization between RDBMS to

Redshift. (Currently supporting MySQL)• Just another feature of FlyData for Redshift

– Easy setup through web/command-line interface– One-line install command

• Supporting Insert / Delete / Update statements

Page 17: Near Real-Time Data Analysis With FlyData

18

Amazon Redshift

Customer Data Center or Cloud

FlyData Client

Replication

binlog access

binlog access

Read Replicais Optional

scalabledata servers

Amazon S3

Load Controller

Load Optimization for Redshift

FlyData Sync for MySQL

Page 18: Near Real-Time Data Analysis With FlyData

FlyData Sync Requirements

• Support currently limited to MySQL• FlyData module must be installed on a data server with

access to MySQL transaction logs• Supported MySQL DB Engines: InnoDB and MyISAM• Transaction log format: ROW

– --binlog-format=ROW

• Synced table must have Primary Key set• For data types not supported on Redshift:

– MySQL’s "binary”,"varbinary” switched to “VARCHAR”, etc.

Page 19: Near Real-Time Data Analysis With FlyData

Use Case: Game Analytics

• Multi-platform game titles

FlyData client module makes it easy to manage

• Basic Log Format: JSON

Makes analytics flexible and reduces data

• Large amounts of data in popular titles (200GB / day)– Large amounts of data are concentrated in a specific table– Hard to load in real-time ( due to Redshift restrictions)

FlyData can handle it!

Page 20: Near Real-Time Data Analysis With FlyData

Contact Information

[email protected]• Toll Free: 1-855-427-9787• http://flydata.com

We are an official data integration partner of Amazon Redshift

Page 21: Near Real-Time Data Analysis With FlyData

FlyData Autoload:Use Cases

Move your data on the fly!

Page 22: Near Real-Time Data Analysis With FlyData

Gaming

Page 23: Near Real-Time Data Analysis With FlyData

Real-time analytics for gaming client

• Case– Client is a leading mobile gaming company in Japan with multiple released game

titles– Previously large amount of data was stored MySQL cluster– MySQL often went down because of the large amount of data. Repair took weeks of

man-hours every time this happened.– Historical analysis over multiple years was simply impossible, given the data size.

• Solution– Implemented FlyData Enterprise with JSON logs across multiple titles– Outputs user activity by application into JSON log files– Data is automatically fed to Amazon Redshift

• Result– Engineering time is saved and real-time BI insights can be fed back to application

development cycle – Client saves 2 weeks of man-hours every month, with added insight into user

behavior. As a result, the client continues to steadily grow its user base and its bottom line.

Page 24: Near Real-Time Data Analysis With FlyData

AdTech

Page 25: Near Real-Time Data Analysis With FlyData

Data analytics on Online Ad Effectiveness• Case

– Client is a online advertisement startup in the US with Display Ads shown across multiple websites– User activity from the duration of engagement to the position of the cursor is all logged to measure viewer

engagement– Client needs to save large amounts of data, and be able to query that data real-time. This data will then be

used to generate Ad Performance Reports.– Their initial option Hadoop turned out to be too costly in terms of Engineering time. The learning curve for the

team was steep, for both query generation and maintenance of their Hadoop clusters

• Solution– Implemented FlyData Enterprise using “Extended” Apache logs– Outputs all user activity in Apache logs with additional information appended, such as key-value pair

information for URL parameters and custom variables– Data is automatically fed to Amazon Redshift in the appropriate columns. When appropriate columns do not

exist, the columns are added on the fly. This allows for added flexibility in table schema design– Customer can now know the real-time effectiveness of their online advertisements through Ad Performance

Reports– The client’s internal BI team can quickly analyze which ads are working and which are not,

in real-time and can gain insight or optimize for the best performing ads

• Result– With a more cost-effective solution than Hadoop, client was able to increase revenue by steadily increasing

the quality of ads based on data gathered by FlyData and analyzed in Amazon Redshift.– Client has an implemented scalable backend reporting system that can handle multi-TB sized ad campaigns.

Page 26: Near Real-Time Data Analysis With FlyData

Digital Media

Page 27: Near Real-Time Data Analysis With FlyData

Faster Feedback, Faster Development Cycles• Case

– Client is a digital media startup in the US that has a website with rapid growth in user access, becoming one of the most “Like”d pages on Facebook 1000万を超える

– User activity logs are carefully analyzed and assessed both for the website content and for the user experience

– Used log data to perform funnel analysis on customer conversion rates– Client received user activity from its site as JSON objects, before storing it in MongoDB– Given the nature of the queries they wanted to run, MongoDB became very slow as their user

base grew

• Solution– Implemented FlyData Enterprise using nested JSON logs– Outputs all user activity as a JSON log file– FlyData automatically uploads the data into Redshift, so BI team (= App Development team) can

simply query their user activity logs– Client now can quickly perform funnel analysis on customer data

• Result– Query speed dramatically improved. Queries that took 20 minutes before, now take less than a

minute, while still being able to have the flexibility of JSON.– Faster development cycles (Build-Measure-Learn cycles) were achieved.

Page 28: Near Real-Time Data Analysis With FlyData

Contact Information

[email protected]• Toll Free: 1-855-427-9787• http://flydata.com

We are an official data integration partner of Amazon Redshift