Strata NY: Best Practices for Publishing Data

41
FIND AND UNDERSTAND DATA October, 2012 Hjalmar Gislason, founder & CEO - [email protected] Best Practices for Publishing Data

description

A presentation by Hjalmar Gislason, founder and CEO of DataMarket at the Strata Conference in New York, October 2012

Transcript of Strata NY: Best Practices for Publishing Data

Page 1: Strata NY: Best Practices for Publishing Data

F I N D A N D U N D E R S TA N D D ATA

October, 2012Hjalmar Gislason, founder & CEO - [email protected]

Best Practices for

Publishing Data

Page 2: Strata NY: Best Practices for Publishing Data

Founder and CEO

HjalmarGislason

Twitter: @datamarketSlides: http://blog.datamarket.com/

Page 4: Strata NY: Best Practices for Publishing Data
Page 5: Strata NY: Best Practices for Publishing Data

HeavyData Consumers

Providers of

Data Delivery Technology

Page 6: Strata NY: Best Practices for Publishing Data

| BEST PRACTICES for PUBLISHING DATA | Hjalmar Gislason, [email protected] | October 2012

Computers Humans

Page 7: Strata NY: Best Practices for Publishing Data

| BEST PRACTICES for PUBLISHING DATA | Hjalmar Gislason, [email protected] | October 2012

Computers

• Structure

Humans

• Understand and use

Page 8: Strata NY: Best Practices for Publishing Data

| BEST PRACTICES for PUBLISHING DATA | Hjalmar Gislason, [email protected] | October 2012

Computers

• Structure

Humans

• Understand and use

Page 9: Strata NY: Best Practices for Publishing Data

1. Simple formats2. Indexes, unique IDs and meta-data3. FAQs and feedback channels

Publishing for Computers

Page 10: Strata NY: Best Practices for Publishing Data

"Don't anthropomorphize computers - they hate it."

- Unknown

Simple Formats

Page 11: Strata NY: Best Practices for Publishing Data

Simple Formats

Page 12: Strata NY: Best Practices for Publishing Data

| BEST PRACTICES for PUBLISHING DATA | Hjalmar Gislason, [email protected] | October 2012

Simple Formats:Tim Berners-Lee’s Five Stars

Page 13: Strata NY: Best Practices for Publishing Data

| BEST PRACTICES for PUBLISHING DATA | Hjalmar Gislason, [email protected] | October 2012

Simple Formats:You lost me at “Semantics”

Page 16: Strata NY: Best Practices for Publishing Data

| BEST PRACTICES for PUBLISHING DATA | Hjalmar Gislason, [email protected] | October 2012

Indexes, unique ids and meta-data

Page 17: Strata NY: Best Practices for Publishing Data

| BEST PRACTICES for PUBLISHING DATA | Hjalmar Gislason, [email protected] | October 2012

Indexes, unique ids and meta-data

Page 18: Strata NY: Best Practices for Publishing Data

| BEST PRACTICES for PUBLISHING DATA | Hjalmar Gislason, [email protected] | October 2012

Indexes, unique IDs and meta-data

• Must: Unique ID, Title, Last updated• Should: Meta-data

• Why?• No need for scraping

• Less load on your end• Ensures full coverage• Ensures content removal and updates

Page 19: Strata NY: Best Practices for Publishing Data

| BEST PRACTICES for PUBLISHING DATA | Hjalmar Gislason, [email protected] | October 2012

Indexes, unique IDs and meta-data

• Hard to emphasize enough!

• Unique IDs for everything: Datsets, columns, entities, ...

• Why?• Continuity: A small change for a man = giant leap for a

computer

Page 20: Strata NY: Best Practices for Publishing Data

| BEST PRACTICES for PUBLISHING DATA | Hjalmar Gislason, [email protected] | October 2012

Indexes, unique IDs and meta-data

• Any relevant contextual information• URL(s), descriptions, methodology, next updated, authors,

keywords, units, license information, ...

Page 21: Strata NY: Best Practices for Publishing Data

| BEST PRACTICES for PUBLISHING DATA | Hjalmar Gislason, [email protected] | October 2012

FAQs and feedback channels

#1 reason for not publishing data:

“There are errors in the data and I don'twant others to discover them”

Page 22: Strata NY: Best Practices for Publishing Data

| BEST PRACTICES for PUBLISHING DATA | Hjalmar Gislason, [email protected] | October 2012

FAQs and feedback channels

#1 reason for not publishing data:

“There are errors in the data and I dowant others to discover them”

Page 24: Strata NY: Best Practices for Publishing Data

| BEST PRACTICES for PUBLISHING DATA | Hjalmar Gislason, [email protected] | October 2012

FAQs and feedback channels

Page 25: Strata NY: Best Practices for Publishing Data

1. Simple formats2. Indexes, unique IDs and meta-data3. FAQs and feedback channels

Publishing for Computers

Page 26: Strata NY: Best Practices for Publishing Data

| BEST PRACTICES for PUBLISHING DATA | Hjalmar Gislason, [email protected] | October 2012

Computers

• Structure

Humans

• Understand and use

Page 27: Strata NY: Best Practices for Publishing Data

1. Search / Discovery2. Visualization3. Download

Publishing for Humans

Page 28: Strata NY: Best Practices for Publishing Data

Search / Discovery

• Requirements differ from web/text search• A lot less textual content to base on

• Synonyms, dictionaries, autocomplete• But (hopefully) good meta-data = facets and filtering

• Give people ways to browse• Categories vs. tags vs. search• Serendipity: Random, related, interesting...

Page 30: Strata NY: Best Practices for Publishing Data

Visualize

Page 32: Strata NY: Best Practices for Publishing Data

109 columnsx

340 lines=

37.060 cells

Page 34: Strata NY: Best Practices for Publishing Data
Page 36: Strata NY: Best Practices for Publishing Data

Visualize

• What you should offer depends on the data

• Statistical data• Focus on the most common charts and get them right• Do NOT invent new visualizations or chart types

• Use standards compatible technologies• No Flash!• Charting and visualization libraries

Page 39: Strata NY: Best Practices for Publishing Data

Download

• Make it easy to use your data outside your tools• Play nicely with those providing functionality beyond what

you can offer: Tableau, R, SAS, MathLab, Mathematica, SPSS, ...

• Provide downloads in the formats most commonly used by your users:• Raw data: Excel, CSV, feeds (R, Excel live feeds, APIs)• Charts and visualizations: Bitmap, vector, PPT, embeds?

Page 40: Strata NY: Best Practices for Publishing Data

| BEST PRACTICES for PUBLISHING DATA | Hjalmar Gislason, [email protected] | October 2012

Computers

• Structure• Simple formats• Indexes, unique IDs and

meta-data• FAQs and feedback

channels

Humans

• Understand and use• Search / Discovery• Visualization• Download

Page 41: Strata NY: Best Practices for Publishing Data

F I N D A N D U N D E R S TA N D D ATA

Twitter: @datamarket · Facebook: DataMarket · E-mail: [email protected]

Hjalmar Gislason, founder & CEO