Snowplow: evolve your analytics stack with your business
-
Upload
yalisassoon -
Category
Business
-
view
83 -
download
5
Transcript of Snowplow: evolve your analytics stack with your business
![Page 1: Snowplow: evolve your analytics stack with your business](https://reader030.fdocuments.net/reader030/viewer/2022021506/58b8a7b41a28abc06d8b6267/html5/thumbnails/1.jpg)
Snowplow: evolve your analytics stack with your
business
Snowplow Meetup San Francisco, Feb 2017
![Page 2: Snowplow: evolve your analytics stack with your business](https://reader030.fdocuments.net/reader030/viewer/2022021506/58b8a7b41a28abc06d8b6267/html5/thumbnails/2.jpg)
Our businesses are constantly evolving…
• Our digital products (apps and platforms) are constantly developing
• The questions we ask of our data are constantly changing
• It is critical that our analytics stack can evolve with our business
![Page 3: Snowplow: evolve your analytics stack with your business](https://reader030.fdocuments.net/reader030/viewer/2022021506/58b8a7b41a28abc06d8b6267/html5/thumbnails/3.jpg)
Self-describing data Event data modeling+
Analytics stack that evolves with your business
How Snowplow users evolve their analytics stacks with their business
![Page 4: Snowplow: evolve your analytics stack with your business](https://reader030.fdocuments.net/reader030/viewer/2022021506/58b8a7b41a28abc06d8b6267/html5/thumbnails/4.jpg)
Self-describing dataOverview
![Page 5: Snowplow: evolve your analytics stack with your business](https://reader030.fdocuments.net/reader030/viewer/2022021506/58b8a7b41a28abc06d8b6267/html5/thumbnails/5.jpg)
Event data varies widely by company
![Page 6: Snowplow: evolve your analytics stack with your business](https://reader030.fdocuments.net/reader030/viewer/2022021506/58b8a7b41a28abc06d8b6267/html5/thumbnails/6.jpg)
As a Snowplow user, you can define your own events and entities
Events
Entities (contexts)
• Build castle • Form alliance • Declare war
• Player • Game • Level • Currency
• View product • Buy product • Deliver product
• Product • Customer • Basket • Delivery van
![Page 7: Snowplow: evolve your analytics stack with your business](https://reader030.fdocuments.net/reader030/viewer/2022021506/58b8a7b41a28abc06d8b6267/html5/thumbnails/7.jpg)
You then define a schema for each event and entity
{ "$schema": "http://iglucentral.com/schemas/com.snowplowanalytics.self-desc/schema/jsonschema/1-0-0#", "description": "Schema for a fighter context", "self": { "vendor": "com.ufc", "name": "fighter_context", "format": "jsonschema", "version": "1-0-1" },
"type": "object", "properties": { "FirstName": { "type": "string" }, "LastName": { "type": "string" }, "Nickname": { "type": "string" }, "FacebookProfile": { "type": "string" }, "TwitterName": { "type": "string" }, "GooglePlusProfile": { "type": "string" },
"HeightFormat": { "type": "string" }, "HeightCm": { "type": ["integer", "null"] }, "Weight": { "type": ["integer", "null"] }, "WeightKg": { "type": ["integer", "null"] }, "Record": { "type": "string", "pattern": "^[0-9]+-[0-9]+-[0-9]+$" }, "Striking": { "type": ["number", "null"], "maxdecimal": 15 }, "Takedowns": { "type": ["number", "null"], "maxdecimal": 15 }, "Submissions": { "type": ["number", "null"], "maxdecimal": 15 }, "LastFightUrl": { "type": "string" },
"LastFightEventText": { "type": "string" }, "NextFightUrl": { "type": "string" }, "NextFightEventText": { "type": "string" }, "LastFightDate": { "type": "string", "format": "timestamp" } }, "additionalProperties": false }
Upload the schema to Iglu
![Page 8: Snowplow: evolve your analytics stack with your business](https://reader030.fdocuments.net/reader030/viewer/2022021506/58b8a7b41a28abc06d8b6267/html5/thumbnails/8.jpg)
Then send data into Snowplow as self-describing JSONs
1. Validation 2. Dimension widening
3. Data modeling
{ “schema”: “iglu:com.israel365/temperature_measure/jsonschema/1-0-0”, “data”: { “timestamp”: “2016-11-16 19:53:21”, “location”: “Berlin”, “temperature”: 3 “units”: “Centigrade” } }
{ "$schema": "http://iglucentral.com/schemas/com.snowplowanalytics.self-desc/schema/jsonschema/1-0-0#", "description": "Schema for an ad impression event", "self": { "vendor": “com.israel365", "name": “temperature_measure", "format": "jsonschema", "version": "1-0-0" }, "type": "object",
"properties": { "timestamp": { "type": "string" }, "location": { "type": "string" }, … }, … }
Event
Schema reference
Schema
![Page 9: Snowplow: evolve your analytics stack with your business](https://reader030.fdocuments.net/reader030/viewer/2022021506/58b8a7b41a28abc06d8b6267/html5/thumbnails/9.jpg)
The schemas can then be used in a number of ways
• Validate the data (important for data quality)
• Load the data into tidy tables in your data warehouse
• Make it easy / safe to write downstream data processing application (e.g. for real-time users)
![Page 10: Snowplow: evolve your analytics stack with your business](https://reader030.fdocuments.net/reader030/viewer/2022021506/58b8a7b41a28abc06d8b6267/html5/thumbnails/10.jpg)
Event data modelingOverview
![Page 11: Snowplow: evolve your analytics stack with your business](https://reader030.fdocuments.net/reader030/viewer/2022021506/58b8a7b41a28abc06d8b6267/html5/thumbnails/11.jpg)
What is event data modeling?
1. Validation 2. Dimension widening
3. Data modeling
Event data modeling is the process of using business logic to aggregate over event-level data to produce 'modeled' data that is simpler for querying.
![Page 12: Snowplow: evolve your analytics stack with your business](https://reader030.fdocuments.net/reader030/viewer/2022021506/58b8a7b41a28abc06d8b6267/html5/thumbnails/12.jpg)
event 1
event n
…
Users
Sessions
…
Funnels
Immutable. Unopiniated. Hard to consume. Not contentious
Mutable and opinionated. Easy to consume. May
be contentious
Unmodeled data Modeled data
![Page 13: Snowplow: evolve your analytics stack with your business](https://reader030.fdocuments.net/reader030/viewer/2022021506/58b8a7b41a28abc06d8b6267/html5/thumbnails/13.jpg)
In general, event data modeling is performed on the complete event stream
• Late arriving events can change the way you understand earlier arriving events
• If we change our data models: this gives us the flexibility to recompute historical data based on the new model
![Page 14: Snowplow: evolve your analytics stack with your business](https://reader030.fdocuments.net/reader030/viewer/2022021506/58b8a7b41a28abc06d8b6267/html5/thumbnails/14.jpg)
The evolving event data pipeline
![Page 15: Snowplow: evolve your analytics stack with your business](https://reader030.fdocuments.net/reader030/viewer/2022021506/58b8a7b41a28abc06d8b6267/html5/thumbnails/15.jpg)
How do we handle pipeline evolution?
PUSH FACTORS:
What is being tracked will change over
time
PULL FACTORS:
What questions are being asked of the data will change
over time
Businesses are not static, so event pipelines should not be either
Web
Apps
Servers
Comms channels
Push …
Data warehouse
Data exploration
Predictive modeling
Real-time dashboards
Real-time, data-driven applicationsRT
Bidder Voucher
Person-alization …
Collection Processing
Smart car / home
…
![Page 16: Snowplow: evolve your analytics stack with your business](https://reader030.fdocuments.net/reader030/viewer/2022021506/58b8a7b41a28abc06d8b6267/html5/thumbnails/16.jpg)
Push example: new source of event data
• If data is self-describing it is easy to add an additional sources
• Self-describing data is good for managing bad data and pipeline evolution
I’m an email send event and I have information about the recipient (email address, customer ID) and the email
(id, tags, variation)
![Page 17: Snowplow: evolve your analytics stack with your business](https://reader030.fdocuments.net/reader030/viewer/2022021506/58b8a7b41a28abc06d8b6267/html5/thumbnails/17.jpg)
Pull example: new business question
Answer
Insight
Question?
![Page 18: Snowplow: evolve your analytics stack with your business](https://reader030.fdocuments.net/reader030/viewer/2022021506/58b8a7b41a28abc06d8b6267/html5/thumbnails/18.jpg)
Answering the question: 3 possibilities
1. Existing data model supports answer
2. Need to update data model
3. Need to update data model and data
collection
• Possible to answer question with existing modeled data
• Data collected already supports answer
• Additional computation required in data modeling step (additional logic)
• Need to extend event tracking
• Need to update data models to incorporate additional data (and potentially additional logic)
![Page 19: Snowplow: evolve your analytics stack with your business](https://reader030.fdocuments.net/reader030/viewer/2022021506/58b8a7b41a28abc06d8b6267/html5/thumbnails/19.jpg)
Self-describing data and the ability to recompute data models are essential to enable pipeline evolution
Self-describing data Recompute data models on entire data set
• Updating existing events and entities in a backward compatible way e.g. add optional new fields
• Update existing events and entities in a backwards incompatible way e.g. change field types, remove fields, add compulsory fields
• Add new event and entity types
• Add new columns to existing derived tables e.g. add new audience segmentation
• Change the way existing derived tables are generated e.g. change sessionization logic
• Create new derived tables
![Page 20: Snowplow: evolve your analytics stack with your business](https://reader030.fdocuments.net/reader030/viewer/2022021506/58b8a7b41a28abc06d8b6267/html5/thumbnails/20.jpg)
Questions?