hive@king Threshing data

download hive@king Threshing data

If you can't read please download the document

  • date post

  • Category


  • view

  • download


Embed Size (px)


hive@king Threshing data. Mattias Andersson, BI Developer, - PowerPoint PPT Presentation

Transcript of hive@king Threshing data

PowerPoint Presentation

hive@kingThreshing dataMattias Andersson, BI Developer, matte@king.comHive is a data warehouse system for Hadoop that facilitates easy data summarization, ad-hoc queries, and the analysis of large datasets stored in Hadoop compatible file systems. Hive provides a mechanism to project structure onto this data and query the data using a SQL-like language called HiveQL. At the same time this language also allows traditional map/reduce programmers to plug in their custom mappers and reducers when it is inconvenient or inefficient to express this logic in HiveQL.

3AgendaA short history of KingWhy do we use hive at King?I will discuss hive from an analytics and data warehouse user perspectiveKeep it short

This isBragging warning!Level 1Thomas Hartwig (CTO), Patrik Stymne (Architect) Sebastian Knutsson (Chief Product Officer), Riccardo Zacconi (CEO), Lars Markgren (GM Sweden)

Founded in 2003 by a bunch of ex-Spray guys

+ in London, Malm, Bucharest, San Fran, Malta & Barcelona.A European developer with its heart in SthlmSilicontull

We create & publish casual games

2003-2010At we have a process of removing luck/risk out of the equation.

King has been around for ten years this year. We started out focusing on web games on Today we have more than 10 million Daily active users there.

We use that as a testing ground for the pure game mechanics first. We carefully measure and tweak with very small teams. Just two or three people.

This is how the original Candy Crush game came about in mid 20118

200+ casual gamesThe foundation for our crusade on Facebook and developed more than 150 casual games since 2003 so there is no accident that we are doing well right now. 9Fucked by Facebook (FBF Index)

500m2004 2005 2006 2007 2008 20092010Facebook unique visitorsYahoo Games US unique visitorsFall of 2010

Facebook Fall of 2012, Industry experts: King missed the train, its too late now Zynga and Wooga owns the marketProvocative start: 'it's too late, the market is very competitive, virality is dead and marketing is expensive'this is what we were told in March 201111

Kings response?

Provocative start: 'it's too late, the market is very competitive, virality is dead and marketing is expensive'this is what we were told in March 201112It is never too late to disrupt an industryVad undrar man?

Hur gjorde King? Hur tjnar man pengar? Hur marknadsfr man spelen?

Hur gr man spelen? Bra underbyggda pstenden. P mobilt r majoriteten av nerladdnnignarnana direkt via App Storarna. 13

April 2011: Bubble Saga on Facebook2011The Saga format

Bubble Saga was a hitn.7 on Facebook after 4 monthsDaily Active Users (DAU2.4 million DAU!

April 2011

Bubble Witch Saga

Daily Active Uniques (DAU)Explosive growth: from 0 to 6 million daily players in 4 monthsOct 2011-20121 year growth: from 220,000 DAU to 8,500,000!

Mobile: July 2012Provocative start: 'it's too late, the market is very competitive, virality is dead and marketing is expensive'this is what we were told in March 201117MobileJuly 2012 - now

Also #1 top grossing app in Sweden since February19How we succeeded technically speakingOur platformTech choices:Application 96 servers (java)MySQL 59 serversMemcache 24 serversHadoop cluster 20 servers

How it all works from a BI perspectiveMySQL shards with user state, they are off limits for BIThe game logs events whenever something interesting has happenedHourly rolling of logs to central logserver where we fetch the data

20Big data, bigger metadataMetadata

21We are on our wayAre we Big Data?

22The most important successfactor for hive

Hive connectivityWeb interface to hiveEasy to use so is a great first encounterHueEnables us to pull data from hive into Qlikview/R/Excel

ODBCThe default/advanced interface

Command line interfaceDifferent interfaces use different escape sequences/variable substitution

Scumbag hive:

23This is what sold it to meHive programmabilityHive custom transformfrom ( from dual map a using 'seq 1 5' as sequence int sort by sequence ) map_outreduce sequenceusing 'awk "{sum+=$0\; print sum}"' as cumulative int;Output:1361015

Really easy to make something horribly unmaintainable. Perl/xslt/wget in one hql-statement

Scumbag hive:

24Map as a double entendreHive complexityMap datatypecreate table if not exists test2(test map)ROW FORMAT DELIMITEDSTORED AS TEXTFILE;

select test ["test"]["x"] from test2;There is no syntax to declare map/array separators after the first for hive in textfile format, \004 \005 and \006 \007 is hardcoded.

Scumbag hive:

25Its complicatedSo why did we choose to use hive?ProsSQL is easy to learnSupports custom mapreduce jobsODBC connection for QlikViewHue for lightweight accessDevelopment is moving fast Open source

ConsHigh latencyLots of moving partsNot free from bugs

The end.