How to publish CSV on the Web or why standards are important ODI Friday lecture 26.11.2015 J....

47
How to publish CSV on the Web or why standards are important ODI Friday lecture 26.11.2015 J. Umbrich, S. Neumaier

Transcript of How to publish CSV on the Web or why standards are important ODI Friday lecture 26.11.2015 J....

How to publish CSV on the Webor

why standards are important

ODI Friday lecture 26.11.2015J. Umbrich, S. Neumaier

CSV

Prominent OD Format

good bador

CSV?

Example 1

bad

Example 2

bad

Example 3

bad

Example 4

goodish

Example 5

goodish

Example 6

good

CSV on the Web: for humans and machines

CSV on the Webpublishing CSV for

humans and machines

What is CSV? (see RFC4180)

RFC4180: https://tools.ietf.org/html/rfc4180

COMMA-SEPARATED VALUES

1. Each record is separated by a line break2. The last record may or may not have an ending line

break3. There might be an optional header line4. Within the header and each record, there may be one ore

more fields, separated by commas5. Each field may or may not be enclosed in double

quotes6. Fields containing line break, double quotes, and commas

should be enclosed in double quotes7. If double-quotes are used to enclose fields, then double-

quote appearing inside a field must be escaped8. file extension: .csv mime-type: text/csv

CSV on the Web is more used as

Character-Separated Value files!

Most CSV parsers cater for this by using heuristics to identify so called

CSV-dialects

“CSV” in the wild

separator , ; \tline-ending \n \r \r\nquote chars “ ‘

“CSV” on data.gv.atmime-types CSV dialects

Anaylsing “CSV “ from data.gv.at

CSV-related files:1809

Parsable CSV files:1482

detected a header:1294

delimiter:

';’ 1471

',’ 9

None 2#comment lines:

0 1376

1 39

>1 67

Example 1

Example 1

Beilage zum Rechnungsabschluss 2013;;;Nr. 4b;;;;;;;;;NACHWEIS ÜBER DEN SCHULDENDIENST;;;;;;;;;;;;Laut Voranschlag 2013 (inkl. Umbuchungen im Laufe des Jahres) waren für die Abwicklung des Schuldendienstes vorgesehen:;;;;;;;; für Verzinsung;;;8.038.400,00;� Zinsreserve gesamt ;;;12.800.000,00;� für Tilgung;;;29.915.700,00;� Gesamtvoranschlag;;;50.754.100,00;�;;;;Diesem Kredit steht die Jahresvorschreibung von;;;49.665.135,01;�gegenüber, sodass sich beim gesamten Schuldendienst;;;;eine E i n s p a r u n g von ;;;1.088.964,99; �ergibt. ;;;;;;;;

Example 1

Beilage zum Rechnungsabschluss 2013;;;Nr. 4b;;;;;;;;;NACHWEIS ÜBER DEN SCHULDENDIENST;;;;;;;;;;;;Laut Voranschlag 2013 (inkl. Umbuchungen im Laufe des Jahres) waren für die Abwicklung des Schuldendienstes vorgesehen:;;;;;;;; für Verzinsung;;;8.038.400,00;� Zinsreserve gesamt ;;;12.800.000,00;� für Tilgung;;;29.915.700,00;� Gesamtvoranschlag;;;50.754.100,00;�;;;;Diesem Kredit steht die Jahresvorschreibung von;;;49.665.135,01;�gegenüber, sodass sich beim gesamten Schuldendienst;;;;eine E i n s p a r u n g von ;;;1.088.964,99; �ergibt. ;;;;;;;;

• ; as separator

Example 1

Beilage zum Rechnungsabschluss 2013;;;Nr. 4b;;;;;;;;;NACHWEIS ÜBER DEN SCHULDENDIENST;;;;;;;;;;;;Laut Voranschlag 2013 (inkl. Umbuchungen im Laufe des Jahres) waren für die Abwicklung des Schuldendienstes vorgesehen:;;;;;;;; für Verzinsung;;;� 8.038.400,00; Zinsreserve gesamt ;;;� 12.800.000,00; für Tilgung;;;� 29.915.700,00; Gesamtvoranschlag;;;� 50.754.100,00;;;;;Diesem Kredit steht die Jahresvorschreibung von;;;� 49.665.135,01;gegenüber, sodass sich beim gesamten Schuldendienst;;;;eine E i n s p a r u n g von ;;;� 1.088.964,99; ergibt. ;;;;;;;;

• ; as separator• Not well-formed table• empty lines, empty cells• headers for column?

Example 2

Example 2

• Multiple tables in one file• Headers?

Example 3

Example 3

• Adding computation to the table

• Meaning of values?

Example 4

• All rows have the same length• Header available

Example 4

>>curl -I http://www.wolfsberg.at/fileadmin/user_upload/Downloads/Haushalt2015.csvHTTP/1.1 200 OKDate: Fri, 27 Nov 2015 08:35:13 GMTServer: ApacheLast-Modified: Fri, 20 Feb 2015 08:20:48 GMTETag: "2800c44-2b354-50f80bad05800"Accept-Ranges: bytesContent-Length: 176980Vary: Accept-EncodingContent-Type: text/plain

• HTTP HEADER response

Example 5

• All rows have the same length• Header available• Comment rows

Example 5:

• Sex = 1,2,3 ?• 20133112 ? -> 2013-31-12

Example 6

exhibition_id,city,title,location,datefrom,dateuntil3,Rom,I LOVE POP,"Chiostro del Bramante, Rom",1999-03-23,1999-07-257,Krems,Zeitlos - Zur Kunstgeschichte der Zeit,Kunsthalle Krems,1999-05-30,1999-10-038,"Wien, Österreich",School of London,Kunst Haus Wien,1999-05-11,1999-08-2910,Graz,Die Farben Schwarz,"Landesmuseum Joanneum, Graz",1999-05-28,1999-10-3111,Hamburg,Psyche und Kunst,Universitäts-Krankenhaus Eppendorf,1999-08-07,1999-10-1715,Graz,Michael Schuster,Galerie & Edition Artelier Graz,1999-06-01,1999-09-3017,Prag,Crossings II,Rudolfinum Prag,1999-06-24,1999-09-1232,Köln,"""Kunstwelten im Dialog""","Museum Ludwig, Köln",1999-11-05,2000-03-19

Example 6

exhibition_id,city,title,location,datefrom,dateuntil \n3,Rom,I LOVE POP,"Chiostro del Bramante, Rom",1999-03-23,1999-07-25 \n7,Krems,Zeitlos - Zur Kunstgeschichte der Zeit,Kunsthalle Krems,1999-05-30,1999-10-03 \n8,"Wien, Österreich",School of London,Kunst Haus Wien,1999-05-11,1999-08-29 \n10,Graz,Die Farben Schwarz,"Landesmuseum Joanneum, Graz",1999-05-28,1999-10-31 \n11,Hamburg,Psyche und Kunst,Universitäts-Krankenhaus Eppendorf,1999-08-07,1999-10-17 \n15,Graz,Michael Schuster,Galerie & Edition Artelier Graz,1999-06-01,1999-09-30 \n17,Prag,Crossings II,Rudolfinum Prag,1999-06-24,1999-09-12 \n32,Köln,"""Kunstwelten im Dialog""","Museum Ludwig, Köln",1999-11-05,2000-03-19 \n

Example 6

exhibition_id,city,title,location,datefrom,dateuntil \n3,Rom,I LOVE POP,"Chiostro del Bramante, Rom",1999-03-23,1999-07-25 \n7,Krems,Zeitlos - Zur Kunstgeschichte der Zeit,Kunsthalle Krems,1999-05-30,1999-10-03 \n8,"Wien, Österreich",School of London,Kunst Haus Wien,1999-05-11,1999-08-29 \n10,Graz,Die Farben Schwarz,"Landesmuseum Joanneum, Graz",1999-05-28,1999-10-31 \n11,Hamburg,Psyche und Kunst,Universitäts-Krankenhaus Eppendorf,1999-08-07,1999-10-17 \n15,Graz,Michael Schuster,Galerie & Edition Artelier Graz,1999-06-01,1999-09-30 \n17,Prag,Crossings II,Rudolfinum Prag,1999-06-24,1999-09-12 \n32,Köln,"""Kunstwelten im Dialog""","Museum Ludwig, Köln",1999-11-05,2000-03-19 \n

Example 6

exhibition_id,city,title,location,datefrom,dateuntil \n3,Rom,I LOVE POP,"Chiostro del Bramante, Rom",1999-03-23,1999-07-25 \n7,Krems,Zeitlos - Zur Kunstgeschichte der Zeit,Kunsthalle Krems,1999-05-30,1999-10-03 \n8,"Wien, Österreich",School of London,Kunst Haus Wien,1999-05-11,1999-08-29 \n10,Graz,Die Farben Schwarz,"Landesmuseum Joanneum, Graz",1999-05-28,1999-10-31 \n11,Hamburg,Psyche und Kunst,Universitäts-Krankenhaus Eppendorf,1999-08-07,1999-10-17 \n15,Graz,Michael Schuster,Galerie & Edition Artelier Graz,1999-06-01, 1999-09-30 \n17,Prag,Crossings II,Rudolfinum Prag,1999-06-24,1999-09-12 \n32,Köln,"""Kunstwelten im Dialog""","Museum Ludwig, Köln",1999-11-05,2000-03-19 \n

Example 6

exhibition_id,city,title,location,datefrom,dateuntil \n3,Rom,I LOVE POP,"Chiostro del Bramante, Rom",1999-03-23,1999-07-25 \n7,Krems,Zeitlos - Zur Kunstgeschichte der Zeit,Kunsthalle Krems,1999-05-30,1999-10-03 \n8,"Wien, Österreich",School of London,Kunst Haus Wien,1999-05-11,1999-08-29 \n10,Graz,Die Farben Schwarz,"Landesmuseum Joanneum, Graz",1999-05-28,1999-10-31 \n11,Hamburg,Psyche und Kunst,Universitäts-Krankenhaus Eppendorf,1999-08-07,1999-10-17 \n15,Graz,Michael Schuster,Galerie & Edition Artelier Graz,1999-06-01, 1999-09-30 \n17,Prag,Crossings II,Rudolfinum Prag,1999-06-24,1999-09-12 \n32,Köln,"""Kunstwelten im Dialog""","Museum Ludwig, Köln",1999-11-05,2000-03-19 \n

Example 6

exhibition_id,city,title,location,datefrom,dateuntil \n3,Rom,I LOVE POP,"Chiostro del Bramante, Rom",1999-03-23,1999-07-25 \n7,Krems,Zeitlos - Zur Kunstgeschichte der Zeit,Kunsthalle Krems,1999-05-30,1999-10-03 \n8,"Wien, Österreich",School of London,Kunst Haus Wien,1999-05-11,1999-08-29 \n10,Graz,Die Farben Schwarz,"Landesmuseum Joanneum, Graz",1999-05-28,1999-10-31 \n11,Hamburg,Psyche und Kunst,Universitäts-Krankenhaus Eppendorf,1999-08-07,1999-10-17 \n15,Graz,Michael Schuster,Galerie & Edition Artelier Graz,1999-06-01, 1999-09-30 \n17,Prag,Crossings II,Rudolfinum Prag,1999-06-24,1999-09-12 \n32,Köln,"""Kunstwelten im Dialog""","Museum Ludwig, Köln",1999-11-05,2000-03-19 \n

Example 6

Example 6

» curl -I http://data.mumok.at/exhibition.csvHTTP/1.1 200 OKDate: Thu, 26 Nov 2015 22:18:47 GMTServer: Apache/2.2.22 (Debian)Last-Modified: Thu, 26 Nov 2015 02:03:28 GMTETag: "6d44-1b853-52567fb2450dd"Accept-Ranges: bytesContent-Length: 112723Content-Type: text/csv; charset=utf-8; header=presentLink: </exhibition.csv-metadata.json>;rel=describedBy;type=application/csvm+json

Example 6

» curl -I http://data.mumok.at/exhibition.csvHTTP/1.1 200 OKDate: Thu, 26 Nov 2015 22:18:47 GMTServer: Apache/2.2.22 (Debian)Last-Modified: Thu, 26 Nov 2015 02:03:28 GMTETag: "6d44-1b853-52567fb2450dd"Accept-Ranges: bytesContent-Length: 112723Content-Type: text/csv; charset=utf-8; header=presentLink: </exhibition.csv-metadata.json>;rel=describedBy;type=application/csvm+json

• Metadata attached to CSV file• Allows for a rich “semantic” description of the table and its data

W3C: CSV on the Web Working Group

Metadata about tabular data Using JSON format

Allows for describing : the CSV dialect, including comments row or multi-

header rows, encoding, language, … data types and value ranges for columns primary key and relation to other tables Transformation rules to convert

CSV to RDF CSV to JSON

W3C CSV on the Web WG

https://www.w3.org/2013/csvw/wiki/Main_Page

CSVM Properties

CSVM Properties

CSVM Properties

Example CSV Metadata

» curl -I http://data.mumok.at/exhibition.csvHTTP/1.1 200 OKDate: Thu, 26 Nov 2015 22:18:47 GMTServer: Apache/2.2.22 (Debian)Last-Modified: Thu, 26 Nov 2015 02:03:28 GMTETag: "6d44-1b853-52567fb2450dd"Accept-Ranges: bytesContent-Length: 112723Content-Type: text/csv; charset=utf-8; header=presentLink: </exhibition.csv-metadata.json>;rel=describedBy;type=application/csvm+json

Example CSV Metadata

{ "@context": ["http://www.w3.org/ns/csvw", {"@language": "en"}], "url": "http://data.mumok.at/exhibition.csv", "dc:title": "Exhibitions for objects from the mumok collection", "dcat:keyword": ["art", "museum", "exhibition"], "dc:publisher": { "schema:name": "mumok - museum moderner kunst stiftung ludwig wien", "schema:url": {"@id": "http://www.mumok.at"} }, "dc:license": {"@id": "https://creativecommons.org/licenses/by/3.0/at/legalcode"}, "dc:modified": {"@value": "2015-07-04", "@type": "xsd:date"},….

Example CSV Metadata

"dialect": { "encoding": "utf-8", "lineTerminators": ["\r\n", "\n"], "quoteChar": "\"", "doubleQuote": true, "skipRows": 0, "commentPrefix": "#", "header": true, "headerRowCount": 1, "delimiter": ",", "skipColumns": 0, "skipBlankRows": false, "skipInitialSpace": false, "trim": false },

Example CSV Metadata

"tableSchema": { "columns": [{ "name": "exhibition_id", "titles": "Exhibition Identifier", "dc:description": "A unique identifier for the exhibition.", "datatype": "integer", "required": true }, { "name": "city", "titles": "City", "dc:description": "The city in which the exhibition took place (no language defined, mostly in German).", "datatype": "string" }, {

How to publish CSV on the WEB

Don’t publish CSV on the Web for humans e.g., EXCEL exports

RFC 4180 Encoding

Use UTF-8, don’t mix encodings File extension: .csv Content-type: text/csv

Optional, but big improvement! Ideally, publish CSV MetaData along your CSV file Avoid Acronyms or encodings (e.g., sex=1,2,3)

ADEQUATe Open Data Umfrage

Bitte teilnehmen!http://odsurvey.ai.wu.ac.atOpenDataSurveyAustria

Das Ziel dieses Fragebogens ist es, Informationen über Open Data Potenziale und Barrieren zu sammeln. Die Umfrage dauert etwa 5 bis 15

Minuten, je nach Ihrer Bereitschaft, auch optionale Fragen zu beantworten.

More information?

Please contact us if you have any further questions or need help/support.

Stay tuned: !! UI to clean “CSV” files and create/edit metadata Community/Publisher Workshop ? If requested

[email protected]

[email protected]