Powerful Information Discovery with Big Knowledge Graphs –The Offshore Leaks Case

21
Powerful Information Discovery with Big Knowledge Graphs – The Offshore Leaks Case Ontotext, July 2016

Transcript of Powerful Information Discovery with Big Knowledge Graphs –The Offshore Leaks Case

Powerful Information Discovery with Big Knowledge Graphs ndash

The Offshore Leaks CaseOntotext July 2016

Data - Content - Userbull Psycho-graphic vs demographic profiles

bull Build behavioural profiles on the basis of

semantic metadata associated with the assets

bull Control results bias with runtime parameters

bull Create semantic fingerprints of assets

bull Driven off of a knowledge graph

bull Automatically adapts through machine

learning

bull Semantic Database

bull Replication Cluster for enterprise clients

bull Connectors to 3rd party indexingstorage

products amp hybrid queries

Data Layer ndash the Core

Semantic Fingerprints of Content

Instance Data Relationships Facts

Ontology Schema Domain Model

GraphDB Node Zoom In

Node 1 Node 3

Master 1 Master 2Enterprise

Semantic Enrichment Overview

Personalization ndash User Actions Model

perform

comments

votes

posts

preview

read

contains leads to

readleads to

preview

Article

Search Action

Result

Date

FTS Q Tag

Cat

Tag set

results

cattaxonomy

Search Log

-------------

-------------

-------------

-------------

-------------

Quick news-analytics case

bull Our Dynamic Semantic

Publishing platform

offers linking of text

with big open data

graphs

bull One can navigate from

text to concepts get

trends related entities

and news

bull Try it at httpnowontotextcom

FF-NEWS Data Integration and Loading

bull DBpedia (the English version only) 496M statements

bull Geonames (all geographic features on Earth) 150M statementsminus owlsameAs links between DBpedia and Geonames 471K statements

bull Company registry data (GLEI) 3M statements

bull News metadata (from NOW) 128M statements

bull Total size 986М statementsminus Mapped to FIBO 667M explicit statements + 318M inferred statements

minus RDFRank and geo-spatial indices enabled to allow for ranking and efficient geo-spatial constraints

Open data integration for news analytics

Technology Semantic Content Enrichment

News Metadata

bull Metadata from Ontotextrsquos Dynamic Semantic Publishing platformminus Automatically generated as part of the NOWontotextcom semantic news showcase

bull News stream from Google since Feb 2015 about 10k newsmonthminus ~70 tags (annotations) per news article

bull Tags link text mentions of concepts to the knowledge graphminus Technically these are URIs for entities (people organizations locations etc) and key phrases

Apr 2016Hidden Relationships in Data and Risk Analytics

News Metadata

Apr 2016Hidden Relationships in Data and Risk Analytics

Category Count

International News 52 074

Science and Technology 23 201

Sports 20 714

Business 15 155

Lifestyle 11 684

122 828

Mentions entity type Count

Keyphrase 2 589 676

Organization 1 276 441

Location 1 260 972

Person 1 248 784

Work 309 093

Event 258 388

RelationPersonRole 236 638

Species 180 946

Sample queries at httpff-newsontotextcom

F1 Big cities in Eastern Europe

F2 Airports near London

F3 People and organizations related to Google

F4 Top-level industries by number of companies

F5 Mentions in the news of an organization and its related entities

F7 Most popular companies per industry including children

F8 Regional exposition of company ndash normalized

FF-NEWS is in Beta Not officially launched but available to play with

Open data integration for news analytics

News Popularity Ranking Automotive

Open data integration for news analytics

Rank Company News RankCompany incl mentions of child

companies News

1 General Motors 2722 1 General Motors 4620

2 Tesla Motors 2346 2 Volkswagen Group 3999

3 Volkswagen 2299 3 Fiat Chrysler Automobiles 2658

4 Ford Motor Company 1934 4 Tesla Motors 2370

5 Toyota 1325 5 Ford Motor Company 2125

6 Chevrolet 1264 6 Toyota 1656

7 Chrysler 1054 7 Renault-Nissan Alliance 1332

8 Fiat Chrysler Automobiles 1011 8 Honda 864

9 Audi AG 972 9 BMW 715

10 Honda 717 10 Takata Corporation 547

News Popularity Finance

Open data integration for news analytics

Rank Company News Rank Company incl mentions of controlled News

1 Bloomberg LP 3203 1 Intra Bank 261667

2 Goldman Sachs 1992 2 Hinduja Bank (Switzerland) 49731

3 JP Morgan Chase 1712 3 China Merchants Bank 38288

4 Wells Fargo 1688 4 Alphabet Inc 22601

5 Citigroup 1557 5 Capital Group Companies 4076

6 HSBC Holdings 1546 6 Bloomberg LP 3611

7 Deutsche Bank 1414 7 Exor 2704

8 Bank of America 1335 8 Nasdaq Inc 2082

9 Barclays 1260 9 JP Morgan Chase 1972

10 UBS 694 10 Sentinel Capital Partners 1053

Note Including investment funds stock exchanges agencies etc

News Popularity Banking

Open data integration for news analytics

Rank Company News Rank Company incl mentions of controlled News

1 Goldman Sachs 996 1 China Merchants Bank 38288

2 JP Morgan Chase 856 2 JP Morgan Chase 1972

3 HSBC Holdings 773 3 Goldman Sachs 1030

4 Deutsche Bank 707 4 HSBC 966

5 Barclays 630 5 Bank of America 771

6 Citigroup 519 6 Deutsche Bank 742

7 Bank of America 445 7 Barclays 681

8 Wells Fargo 422 8 Citigroup 630

9 UBS 347 9 Wells Fargo 428

10 Chase 126 10 UBS 347

Offshore Leaks Database from ICIJ

bull Published by the International Consortium of Investigative

Journalists (ICIJ) on 9th of May

bull A ldquosearchable databaserdquo about 320 000 offshore companies

minus 214 000 extracted from Panama Papers (valid until 2015)

minus More than 100 000 from 2013 Offshore leaks investigation (valid until 2010)

bull CSV extract from a graph database available for download

bull httpsoffshoreleaksicijorg

Open data integration for news analytics

Offshore Leaks Database

Open data integration for news analytics

Offshore Leaks DB as Linked Open Data

bull Ontotext published the Offshore Leaks DB as Linked Open Data

bull Available for exploration querying and download at

httpdataontotextcom

bull ONTOTEXT DISCLAIMERS

We use the data as is provided by ICIJ We make no representations and warranties of any kind

including warranties of title accuracy absence of errors or fitness for particular purpose All

transformations query results and derivative works are used only to showcase the service and

technological capabilities and not to serve as basis for any statements or conclusions

Open data integration for news analytics

Enrichment and structuring of the data

bull Relationship type hierarchy

minus About 80 types of relationship types in the original dataset got organized in a property hierarchy

bull Classification of officers into Person and Company

minus In the original database there is no way to distinguish whether an officer is a physical person

bull Mapping to DBPedia

minus 209 countries referred in Offshore Leaks DB are mapped to DBPedia

minus About 3000 persons and 300 companies mapped to DBPedia

bull Overall size of the repository 22M statements (20M explicit)

Open data integration for news analytics

The RDF-ization Process

bull Linked data variant produced without programming

minus The raw CSV files are RDF-ized using TARQL httptarqlgithubio

minus Data was further interlinked and enriched in GraphDB using SPARQL

bull The process is documented in this README file

bull All relevant artifacts are open-source available at

httpsgithubcomOntotext-ADleaks

bull The entire publishing and mapping took about 15 person-days

minus Including dataontotextcom portal setup promotion documentation etc

Open data integration for news analytics

Sample queries at httpdataontotextcom

Q1 Countries by number of entities related to them

Q2 Country pairs by ownership statistics

Q3 Statistics by incorporation year

Q4 Officers and entities by number of capital relations

Q5 Countries in Eastern Europe by number of owners

Q6 Intermediaries in Asia by name

Q7 The best connected officers

Q8 Countries by number of Person and Company officers

Play with semantically enriched news

httpnowontotextcom

Play with open data at

httpdataontotextcom and httpff-

newsontotextcom

Data - Content - Userbull Psycho-graphic vs demographic profiles

bull Build behavioural profiles on the basis of

semantic metadata associated with the assets

bull Control results bias with runtime parameters

bull Create semantic fingerprints of assets

bull Driven off of a knowledge graph

bull Automatically adapts through machine

learning

bull Semantic Database

bull Replication Cluster for enterprise clients

bull Connectors to 3rd party indexingstorage

products amp hybrid queries

Data Layer ndash the Core

Semantic Fingerprints of Content

Instance Data Relationships Facts

Ontology Schema Domain Model

GraphDB Node Zoom In

Node 1 Node 3

Master 1 Master 2Enterprise

Semantic Enrichment Overview

Personalization ndash User Actions Model

perform

comments

votes

posts

preview

read

contains leads to

readleads to

preview

Article

Search Action

Result

Date

FTS Q Tag

Cat

Tag set

results

cattaxonomy

Search Log

-------------

-------------

-------------

-------------

-------------

Quick news-analytics case

bull Our Dynamic Semantic

Publishing platform

offers linking of text

with big open data

graphs

bull One can navigate from

text to concepts get

trends related entities

and news

bull Try it at httpnowontotextcom

FF-NEWS Data Integration and Loading

bull DBpedia (the English version only) 496M statements

bull Geonames (all geographic features on Earth) 150M statementsminus owlsameAs links between DBpedia and Geonames 471K statements

bull Company registry data (GLEI) 3M statements

bull News metadata (from NOW) 128M statements

bull Total size 986М statementsminus Mapped to FIBO 667M explicit statements + 318M inferred statements

minus RDFRank and geo-spatial indices enabled to allow for ranking and efficient geo-spatial constraints

Open data integration for news analytics

Technology Semantic Content Enrichment

News Metadata

bull Metadata from Ontotextrsquos Dynamic Semantic Publishing platformminus Automatically generated as part of the NOWontotextcom semantic news showcase

bull News stream from Google since Feb 2015 about 10k newsmonthminus ~70 tags (annotations) per news article

bull Tags link text mentions of concepts to the knowledge graphminus Technically these are URIs for entities (people organizations locations etc) and key phrases

Apr 2016Hidden Relationships in Data and Risk Analytics

News Metadata

Apr 2016Hidden Relationships in Data and Risk Analytics

Category Count

International News 52 074

Science and Technology 23 201

Sports 20 714

Business 15 155

Lifestyle 11 684

122 828

Mentions entity type Count

Keyphrase 2 589 676

Organization 1 276 441

Location 1 260 972

Person 1 248 784

Work 309 093

Event 258 388

RelationPersonRole 236 638

Species 180 946

Sample queries at httpff-newsontotextcom

F1 Big cities in Eastern Europe

F2 Airports near London

F3 People and organizations related to Google

F4 Top-level industries by number of companies

F5 Mentions in the news of an organization and its related entities

F7 Most popular companies per industry including children

F8 Regional exposition of company ndash normalized

FF-NEWS is in Beta Not officially launched but available to play with

Open data integration for news analytics

News Popularity Ranking Automotive

Open data integration for news analytics

Rank Company News RankCompany incl mentions of child

companies News

1 General Motors 2722 1 General Motors 4620

2 Tesla Motors 2346 2 Volkswagen Group 3999

3 Volkswagen 2299 3 Fiat Chrysler Automobiles 2658

4 Ford Motor Company 1934 4 Tesla Motors 2370

5 Toyota 1325 5 Ford Motor Company 2125

6 Chevrolet 1264 6 Toyota 1656

7 Chrysler 1054 7 Renault-Nissan Alliance 1332

8 Fiat Chrysler Automobiles 1011 8 Honda 864

9 Audi AG 972 9 BMW 715

10 Honda 717 10 Takata Corporation 547

News Popularity Finance

Open data integration for news analytics

Rank Company News Rank Company incl mentions of controlled News

1 Bloomberg LP 3203 1 Intra Bank 261667

2 Goldman Sachs 1992 2 Hinduja Bank (Switzerland) 49731

3 JP Morgan Chase 1712 3 China Merchants Bank 38288

4 Wells Fargo 1688 4 Alphabet Inc 22601

5 Citigroup 1557 5 Capital Group Companies 4076

6 HSBC Holdings 1546 6 Bloomberg LP 3611

7 Deutsche Bank 1414 7 Exor 2704

8 Bank of America 1335 8 Nasdaq Inc 2082

9 Barclays 1260 9 JP Morgan Chase 1972

10 UBS 694 10 Sentinel Capital Partners 1053

Note Including investment funds stock exchanges agencies etc

News Popularity Banking

Open data integration for news analytics

Rank Company News Rank Company incl mentions of controlled News

1 Goldman Sachs 996 1 China Merchants Bank 38288

2 JP Morgan Chase 856 2 JP Morgan Chase 1972

3 HSBC Holdings 773 3 Goldman Sachs 1030

4 Deutsche Bank 707 4 HSBC 966

5 Barclays 630 5 Bank of America 771

6 Citigroup 519 6 Deutsche Bank 742

7 Bank of America 445 7 Barclays 681

8 Wells Fargo 422 8 Citigroup 630

9 UBS 347 9 Wells Fargo 428

10 Chase 126 10 UBS 347

Offshore Leaks Database from ICIJ

bull Published by the International Consortium of Investigative

Journalists (ICIJ) on 9th of May

bull A ldquosearchable databaserdquo about 320 000 offshore companies

minus 214 000 extracted from Panama Papers (valid until 2015)

minus More than 100 000 from 2013 Offshore leaks investigation (valid until 2010)

bull CSV extract from a graph database available for download

bull httpsoffshoreleaksicijorg

Open data integration for news analytics

Offshore Leaks Database

Open data integration for news analytics

Offshore Leaks DB as Linked Open Data

bull Ontotext published the Offshore Leaks DB as Linked Open Data

bull Available for exploration querying and download at

httpdataontotextcom

bull ONTOTEXT DISCLAIMERS

We use the data as is provided by ICIJ We make no representations and warranties of any kind

including warranties of title accuracy absence of errors or fitness for particular purpose All

transformations query results and derivative works are used only to showcase the service and

technological capabilities and not to serve as basis for any statements or conclusions

Open data integration for news analytics

Enrichment and structuring of the data

bull Relationship type hierarchy

minus About 80 types of relationship types in the original dataset got organized in a property hierarchy

bull Classification of officers into Person and Company

minus In the original database there is no way to distinguish whether an officer is a physical person

bull Mapping to DBPedia

minus 209 countries referred in Offshore Leaks DB are mapped to DBPedia

minus About 3000 persons and 300 companies mapped to DBPedia

bull Overall size of the repository 22M statements (20M explicit)

Open data integration for news analytics

The RDF-ization Process

bull Linked data variant produced without programming

minus The raw CSV files are RDF-ized using TARQL httptarqlgithubio

minus Data was further interlinked and enriched in GraphDB using SPARQL

bull The process is documented in this README file

bull All relevant artifacts are open-source available at

httpsgithubcomOntotext-ADleaks

bull The entire publishing and mapping took about 15 person-days

minus Including dataontotextcom portal setup promotion documentation etc

Open data integration for news analytics

Sample queries at httpdataontotextcom

Q1 Countries by number of entities related to them

Q2 Country pairs by ownership statistics

Q3 Statistics by incorporation year

Q4 Officers and entities by number of capital relations

Q5 Countries in Eastern Europe by number of owners

Q6 Intermediaries in Asia by name

Q7 The best connected officers

Q8 Countries by number of Person and Company officers

Play with semantically enriched news

httpnowontotextcom

Play with open data at

httpdataontotextcom and httpff-

newsontotextcom

Data Layer ndash the Core

Semantic Fingerprints of Content

Instance Data Relationships Facts

Ontology Schema Domain Model

GraphDB Node Zoom In

Node 1 Node 3

Master 1 Master 2Enterprise

Semantic Enrichment Overview

Personalization ndash User Actions Model

perform

comments

votes

posts

preview

read

contains leads to

readleads to

preview

Article

Search Action

Result

Date

FTS Q Tag

Cat

Tag set

results

cattaxonomy

Search Log

-------------

-------------

-------------

-------------

-------------

Quick news-analytics case

bull Our Dynamic Semantic

Publishing platform

offers linking of text

with big open data

graphs

bull One can navigate from

text to concepts get

trends related entities

and news

bull Try it at httpnowontotextcom

FF-NEWS Data Integration and Loading

bull DBpedia (the English version only) 496M statements

bull Geonames (all geographic features on Earth) 150M statementsminus owlsameAs links between DBpedia and Geonames 471K statements

bull Company registry data (GLEI) 3M statements

bull News metadata (from NOW) 128M statements

bull Total size 986М statementsminus Mapped to FIBO 667M explicit statements + 318M inferred statements

minus RDFRank and geo-spatial indices enabled to allow for ranking and efficient geo-spatial constraints

Open data integration for news analytics

Technology Semantic Content Enrichment

News Metadata

bull Metadata from Ontotextrsquos Dynamic Semantic Publishing platformminus Automatically generated as part of the NOWontotextcom semantic news showcase

bull News stream from Google since Feb 2015 about 10k newsmonthminus ~70 tags (annotations) per news article

bull Tags link text mentions of concepts to the knowledge graphminus Technically these are URIs for entities (people organizations locations etc) and key phrases

Apr 2016Hidden Relationships in Data and Risk Analytics

News Metadata

Apr 2016Hidden Relationships in Data and Risk Analytics

Category Count

International News 52 074

Science and Technology 23 201

Sports 20 714

Business 15 155

Lifestyle 11 684

122 828

Mentions entity type Count

Keyphrase 2 589 676

Organization 1 276 441

Location 1 260 972

Person 1 248 784

Work 309 093

Event 258 388

RelationPersonRole 236 638

Species 180 946

Sample queries at httpff-newsontotextcom

F1 Big cities in Eastern Europe

F2 Airports near London

F3 People and organizations related to Google

F4 Top-level industries by number of companies

F5 Mentions in the news of an organization and its related entities

F7 Most popular companies per industry including children

F8 Regional exposition of company ndash normalized

FF-NEWS is in Beta Not officially launched but available to play with

Open data integration for news analytics

News Popularity Ranking Automotive

Open data integration for news analytics

Rank Company News RankCompany incl mentions of child

companies News

1 General Motors 2722 1 General Motors 4620

2 Tesla Motors 2346 2 Volkswagen Group 3999

3 Volkswagen 2299 3 Fiat Chrysler Automobiles 2658

4 Ford Motor Company 1934 4 Tesla Motors 2370

5 Toyota 1325 5 Ford Motor Company 2125

6 Chevrolet 1264 6 Toyota 1656

7 Chrysler 1054 7 Renault-Nissan Alliance 1332

8 Fiat Chrysler Automobiles 1011 8 Honda 864

9 Audi AG 972 9 BMW 715

10 Honda 717 10 Takata Corporation 547

News Popularity Finance

Open data integration for news analytics

Rank Company News Rank Company incl mentions of controlled News

1 Bloomberg LP 3203 1 Intra Bank 261667

2 Goldman Sachs 1992 2 Hinduja Bank (Switzerland) 49731

3 JP Morgan Chase 1712 3 China Merchants Bank 38288

4 Wells Fargo 1688 4 Alphabet Inc 22601

5 Citigroup 1557 5 Capital Group Companies 4076

6 HSBC Holdings 1546 6 Bloomberg LP 3611

7 Deutsche Bank 1414 7 Exor 2704

8 Bank of America 1335 8 Nasdaq Inc 2082

9 Barclays 1260 9 JP Morgan Chase 1972

10 UBS 694 10 Sentinel Capital Partners 1053

Note Including investment funds stock exchanges agencies etc

News Popularity Banking

Open data integration for news analytics

Rank Company News Rank Company incl mentions of controlled News

1 Goldman Sachs 996 1 China Merchants Bank 38288

2 JP Morgan Chase 856 2 JP Morgan Chase 1972

3 HSBC Holdings 773 3 Goldman Sachs 1030

4 Deutsche Bank 707 4 HSBC 966

5 Barclays 630 5 Bank of America 771

6 Citigroup 519 6 Deutsche Bank 742

7 Bank of America 445 7 Barclays 681

8 Wells Fargo 422 8 Citigroup 630

9 UBS 347 9 Wells Fargo 428

10 Chase 126 10 UBS 347

Offshore Leaks Database from ICIJ

bull Published by the International Consortium of Investigative

Journalists (ICIJ) on 9th of May

bull A ldquosearchable databaserdquo about 320 000 offshore companies

minus 214 000 extracted from Panama Papers (valid until 2015)

minus More than 100 000 from 2013 Offshore leaks investigation (valid until 2010)

bull CSV extract from a graph database available for download

bull httpsoffshoreleaksicijorg

Open data integration for news analytics

Offshore Leaks Database

Open data integration for news analytics

Offshore Leaks DB as Linked Open Data

bull Ontotext published the Offshore Leaks DB as Linked Open Data

bull Available for exploration querying and download at

httpdataontotextcom

bull ONTOTEXT DISCLAIMERS

We use the data as is provided by ICIJ We make no representations and warranties of any kind

including warranties of title accuracy absence of errors or fitness for particular purpose All

transformations query results and derivative works are used only to showcase the service and

technological capabilities and not to serve as basis for any statements or conclusions

Open data integration for news analytics

Enrichment and structuring of the data

bull Relationship type hierarchy

minus About 80 types of relationship types in the original dataset got organized in a property hierarchy

bull Classification of officers into Person and Company

minus In the original database there is no way to distinguish whether an officer is a physical person

bull Mapping to DBPedia

minus 209 countries referred in Offshore Leaks DB are mapped to DBPedia

minus About 3000 persons and 300 companies mapped to DBPedia

bull Overall size of the repository 22M statements (20M explicit)

Open data integration for news analytics

The RDF-ization Process

bull Linked data variant produced without programming

minus The raw CSV files are RDF-ized using TARQL httptarqlgithubio

minus Data was further interlinked and enriched in GraphDB using SPARQL

bull The process is documented in this README file

bull All relevant artifacts are open-source available at

httpsgithubcomOntotext-ADleaks

bull The entire publishing and mapping took about 15 person-days

minus Including dataontotextcom portal setup promotion documentation etc

Open data integration for news analytics

Sample queries at httpdataontotextcom

Q1 Countries by number of entities related to them

Q2 Country pairs by ownership statistics

Q3 Statistics by incorporation year

Q4 Officers and entities by number of capital relations

Q5 Countries in Eastern Europe by number of owners

Q6 Intermediaries in Asia by name

Q7 The best connected officers

Q8 Countries by number of Person and Company officers

Play with semantically enriched news

httpnowontotextcom

Play with open data at

httpdataontotextcom and httpff-

newsontotextcom

Semantic Enrichment Overview

Personalization ndash User Actions Model

perform

comments

votes

posts

preview

read

contains leads to

readleads to

preview

Article

Search Action

Result

Date

FTS Q Tag

Cat

Tag set

results

cattaxonomy

Search Log

-------------

-------------

-------------

-------------

-------------

Quick news-analytics case

bull Our Dynamic Semantic

Publishing platform

offers linking of text

with big open data

graphs

bull One can navigate from

text to concepts get

trends related entities

and news

bull Try it at httpnowontotextcom

FF-NEWS Data Integration and Loading

bull DBpedia (the English version only) 496M statements

bull Geonames (all geographic features on Earth) 150M statementsminus owlsameAs links between DBpedia and Geonames 471K statements

bull Company registry data (GLEI) 3M statements

bull News metadata (from NOW) 128M statements

bull Total size 986М statementsminus Mapped to FIBO 667M explicit statements + 318M inferred statements

minus RDFRank and geo-spatial indices enabled to allow for ranking and efficient geo-spatial constraints

Open data integration for news analytics

Technology Semantic Content Enrichment

News Metadata

bull Metadata from Ontotextrsquos Dynamic Semantic Publishing platformminus Automatically generated as part of the NOWontotextcom semantic news showcase

bull News stream from Google since Feb 2015 about 10k newsmonthminus ~70 tags (annotations) per news article

bull Tags link text mentions of concepts to the knowledge graphminus Technically these are URIs for entities (people organizations locations etc) and key phrases

Apr 2016Hidden Relationships in Data and Risk Analytics

News Metadata

Apr 2016Hidden Relationships in Data and Risk Analytics

Category Count

International News 52 074

Science and Technology 23 201

Sports 20 714

Business 15 155

Lifestyle 11 684

122 828

Mentions entity type Count

Keyphrase 2 589 676

Organization 1 276 441

Location 1 260 972

Person 1 248 784

Work 309 093

Event 258 388

RelationPersonRole 236 638

Species 180 946

Sample queries at httpff-newsontotextcom

F1 Big cities in Eastern Europe

F2 Airports near London

F3 People and organizations related to Google

F4 Top-level industries by number of companies

F5 Mentions in the news of an organization and its related entities

F7 Most popular companies per industry including children

F8 Regional exposition of company ndash normalized

FF-NEWS is in Beta Not officially launched but available to play with

Open data integration for news analytics

News Popularity Ranking Automotive

Open data integration for news analytics

Rank Company News RankCompany incl mentions of child

companies News

1 General Motors 2722 1 General Motors 4620

2 Tesla Motors 2346 2 Volkswagen Group 3999

3 Volkswagen 2299 3 Fiat Chrysler Automobiles 2658

4 Ford Motor Company 1934 4 Tesla Motors 2370

5 Toyota 1325 5 Ford Motor Company 2125

6 Chevrolet 1264 6 Toyota 1656

7 Chrysler 1054 7 Renault-Nissan Alliance 1332

8 Fiat Chrysler Automobiles 1011 8 Honda 864

9 Audi AG 972 9 BMW 715

10 Honda 717 10 Takata Corporation 547

News Popularity Finance

Open data integration for news analytics

Rank Company News Rank Company incl mentions of controlled News

1 Bloomberg LP 3203 1 Intra Bank 261667

2 Goldman Sachs 1992 2 Hinduja Bank (Switzerland) 49731

3 JP Morgan Chase 1712 3 China Merchants Bank 38288

4 Wells Fargo 1688 4 Alphabet Inc 22601

5 Citigroup 1557 5 Capital Group Companies 4076

6 HSBC Holdings 1546 6 Bloomberg LP 3611

7 Deutsche Bank 1414 7 Exor 2704

8 Bank of America 1335 8 Nasdaq Inc 2082

9 Barclays 1260 9 JP Morgan Chase 1972

10 UBS 694 10 Sentinel Capital Partners 1053

Note Including investment funds stock exchanges agencies etc

News Popularity Banking

Open data integration for news analytics

Rank Company News Rank Company incl mentions of controlled News

1 Goldman Sachs 996 1 China Merchants Bank 38288

2 JP Morgan Chase 856 2 JP Morgan Chase 1972

3 HSBC Holdings 773 3 Goldman Sachs 1030

4 Deutsche Bank 707 4 HSBC 966

5 Barclays 630 5 Bank of America 771

6 Citigroup 519 6 Deutsche Bank 742

7 Bank of America 445 7 Barclays 681

8 Wells Fargo 422 8 Citigroup 630

9 UBS 347 9 Wells Fargo 428

10 Chase 126 10 UBS 347

Offshore Leaks Database from ICIJ

bull Published by the International Consortium of Investigative

Journalists (ICIJ) on 9th of May

bull A ldquosearchable databaserdquo about 320 000 offshore companies

minus 214 000 extracted from Panama Papers (valid until 2015)

minus More than 100 000 from 2013 Offshore leaks investigation (valid until 2010)

bull CSV extract from a graph database available for download

bull httpsoffshoreleaksicijorg

Open data integration for news analytics

Offshore Leaks Database

Open data integration for news analytics

Offshore Leaks DB as Linked Open Data

bull Ontotext published the Offshore Leaks DB as Linked Open Data

bull Available for exploration querying and download at

httpdataontotextcom

bull ONTOTEXT DISCLAIMERS

We use the data as is provided by ICIJ We make no representations and warranties of any kind

including warranties of title accuracy absence of errors or fitness for particular purpose All

transformations query results and derivative works are used only to showcase the service and

technological capabilities and not to serve as basis for any statements or conclusions

Open data integration for news analytics

Enrichment and structuring of the data

bull Relationship type hierarchy

minus About 80 types of relationship types in the original dataset got organized in a property hierarchy

bull Classification of officers into Person and Company

minus In the original database there is no way to distinguish whether an officer is a physical person

bull Mapping to DBPedia

minus 209 countries referred in Offshore Leaks DB are mapped to DBPedia

minus About 3000 persons and 300 companies mapped to DBPedia

bull Overall size of the repository 22M statements (20M explicit)

Open data integration for news analytics

The RDF-ization Process

bull Linked data variant produced without programming

minus The raw CSV files are RDF-ized using TARQL httptarqlgithubio

minus Data was further interlinked and enriched in GraphDB using SPARQL

bull The process is documented in this README file

bull All relevant artifacts are open-source available at

httpsgithubcomOntotext-ADleaks

bull The entire publishing and mapping took about 15 person-days

minus Including dataontotextcom portal setup promotion documentation etc

Open data integration for news analytics

Sample queries at httpdataontotextcom

Q1 Countries by number of entities related to them

Q2 Country pairs by ownership statistics

Q3 Statistics by incorporation year

Q4 Officers and entities by number of capital relations

Q5 Countries in Eastern Europe by number of owners

Q6 Intermediaries in Asia by name

Q7 The best connected officers

Q8 Countries by number of Person and Company officers

Play with semantically enriched news

httpnowontotextcom

Play with open data at

httpdataontotextcom and httpff-

newsontotextcom

Personalization ndash User Actions Model

perform

comments

votes

posts

preview

read

contains leads to

readleads to

preview

Article

Search Action

Result

Date

FTS Q Tag

Cat

Tag set

results

cattaxonomy

Search Log

-------------

-------------

-------------

-------------

-------------

Quick news-analytics case

bull Our Dynamic Semantic

Publishing platform

offers linking of text

with big open data

graphs

bull One can navigate from

text to concepts get

trends related entities

and news

bull Try it at httpnowontotextcom

FF-NEWS Data Integration and Loading

bull DBpedia (the English version only) 496M statements

bull Geonames (all geographic features on Earth) 150M statementsminus owlsameAs links between DBpedia and Geonames 471K statements

bull Company registry data (GLEI) 3M statements

bull News metadata (from NOW) 128M statements

bull Total size 986М statementsminus Mapped to FIBO 667M explicit statements + 318M inferred statements

minus RDFRank and geo-spatial indices enabled to allow for ranking and efficient geo-spatial constraints

Open data integration for news analytics

Technology Semantic Content Enrichment

News Metadata

bull Metadata from Ontotextrsquos Dynamic Semantic Publishing platformminus Automatically generated as part of the NOWontotextcom semantic news showcase

bull News stream from Google since Feb 2015 about 10k newsmonthminus ~70 tags (annotations) per news article

bull Tags link text mentions of concepts to the knowledge graphminus Technically these are URIs for entities (people organizations locations etc) and key phrases

Apr 2016Hidden Relationships in Data and Risk Analytics

News Metadata

Apr 2016Hidden Relationships in Data and Risk Analytics

Category Count

International News 52 074

Science and Technology 23 201

Sports 20 714

Business 15 155

Lifestyle 11 684

122 828

Mentions entity type Count

Keyphrase 2 589 676

Organization 1 276 441

Location 1 260 972

Person 1 248 784

Work 309 093

Event 258 388

RelationPersonRole 236 638

Species 180 946

Sample queries at httpff-newsontotextcom

F1 Big cities in Eastern Europe

F2 Airports near London

F3 People and organizations related to Google

F4 Top-level industries by number of companies

F5 Mentions in the news of an organization and its related entities

F7 Most popular companies per industry including children

F8 Regional exposition of company ndash normalized

FF-NEWS is in Beta Not officially launched but available to play with

Open data integration for news analytics

News Popularity Ranking Automotive

Open data integration for news analytics

Rank Company News RankCompany incl mentions of child

companies News

1 General Motors 2722 1 General Motors 4620

2 Tesla Motors 2346 2 Volkswagen Group 3999

3 Volkswagen 2299 3 Fiat Chrysler Automobiles 2658

4 Ford Motor Company 1934 4 Tesla Motors 2370

5 Toyota 1325 5 Ford Motor Company 2125

6 Chevrolet 1264 6 Toyota 1656

7 Chrysler 1054 7 Renault-Nissan Alliance 1332

8 Fiat Chrysler Automobiles 1011 8 Honda 864

9 Audi AG 972 9 BMW 715

10 Honda 717 10 Takata Corporation 547

News Popularity Finance

Open data integration for news analytics

Rank Company News Rank Company incl mentions of controlled News

1 Bloomberg LP 3203 1 Intra Bank 261667

2 Goldman Sachs 1992 2 Hinduja Bank (Switzerland) 49731

3 JP Morgan Chase 1712 3 China Merchants Bank 38288

4 Wells Fargo 1688 4 Alphabet Inc 22601

5 Citigroup 1557 5 Capital Group Companies 4076

6 HSBC Holdings 1546 6 Bloomberg LP 3611

7 Deutsche Bank 1414 7 Exor 2704

8 Bank of America 1335 8 Nasdaq Inc 2082

9 Barclays 1260 9 JP Morgan Chase 1972

10 UBS 694 10 Sentinel Capital Partners 1053

Note Including investment funds stock exchanges agencies etc

News Popularity Banking

Open data integration for news analytics

Rank Company News Rank Company incl mentions of controlled News

1 Goldman Sachs 996 1 China Merchants Bank 38288

2 JP Morgan Chase 856 2 JP Morgan Chase 1972

3 HSBC Holdings 773 3 Goldman Sachs 1030

4 Deutsche Bank 707 4 HSBC 966

5 Barclays 630 5 Bank of America 771

6 Citigroup 519 6 Deutsche Bank 742

7 Bank of America 445 7 Barclays 681

8 Wells Fargo 422 8 Citigroup 630

9 UBS 347 9 Wells Fargo 428

10 Chase 126 10 UBS 347

Offshore Leaks Database from ICIJ

bull Published by the International Consortium of Investigative

Journalists (ICIJ) on 9th of May

bull A ldquosearchable databaserdquo about 320 000 offshore companies

minus 214 000 extracted from Panama Papers (valid until 2015)

minus More than 100 000 from 2013 Offshore leaks investigation (valid until 2010)

bull CSV extract from a graph database available for download

bull httpsoffshoreleaksicijorg

Open data integration for news analytics

Offshore Leaks Database

Open data integration for news analytics

Offshore Leaks DB as Linked Open Data

bull Ontotext published the Offshore Leaks DB as Linked Open Data

bull Available for exploration querying and download at

httpdataontotextcom

bull ONTOTEXT DISCLAIMERS

We use the data as is provided by ICIJ We make no representations and warranties of any kind

including warranties of title accuracy absence of errors or fitness for particular purpose All

transformations query results and derivative works are used only to showcase the service and

technological capabilities and not to serve as basis for any statements or conclusions

Open data integration for news analytics

Enrichment and structuring of the data

bull Relationship type hierarchy

minus About 80 types of relationship types in the original dataset got organized in a property hierarchy

bull Classification of officers into Person and Company

minus In the original database there is no way to distinguish whether an officer is a physical person

bull Mapping to DBPedia

minus 209 countries referred in Offshore Leaks DB are mapped to DBPedia

minus About 3000 persons and 300 companies mapped to DBPedia

bull Overall size of the repository 22M statements (20M explicit)

Open data integration for news analytics

The RDF-ization Process

bull Linked data variant produced without programming

minus The raw CSV files are RDF-ized using TARQL httptarqlgithubio

minus Data was further interlinked and enriched in GraphDB using SPARQL

bull The process is documented in this README file

bull All relevant artifacts are open-source available at

httpsgithubcomOntotext-ADleaks

bull The entire publishing and mapping took about 15 person-days

minus Including dataontotextcom portal setup promotion documentation etc

Open data integration for news analytics

Sample queries at httpdataontotextcom

Q1 Countries by number of entities related to them

Q2 Country pairs by ownership statistics

Q3 Statistics by incorporation year

Q4 Officers and entities by number of capital relations

Q5 Countries in Eastern Europe by number of owners

Q6 Intermediaries in Asia by name

Q7 The best connected officers

Q8 Countries by number of Person and Company officers

Play with semantically enriched news

httpnowontotextcom

Play with open data at

httpdataontotextcom and httpff-

newsontotextcom

Quick news-analytics case

bull Our Dynamic Semantic

Publishing platform

offers linking of text

with big open data

graphs

bull One can navigate from

text to concepts get

trends related entities

and news

bull Try it at httpnowontotextcom

FF-NEWS Data Integration and Loading

bull DBpedia (the English version only) 496M statements

bull Geonames (all geographic features on Earth) 150M statementsminus owlsameAs links between DBpedia and Geonames 471K statements

bull Company registry data (GLEI) 3M statements

bull News metadata (from NOW) 128M statements

bull Total size 986М statementsminus Mapped to FIBO 667M explicit statements + 318M inferred statements

minus RDFRank and geo-spatial indices enabled to allow for ranking and efficient geo-spatial constraints

Open data integration for news analytics

Technology Semantic Content Enrichment

News Metadata

bull Metadata from Ontotextrsquos Dynamic Semantic Publishing platformminus Automatically generated as part of the NOWontotextcom semantic news showcase

bull News stream from Google since Feb 2015 about 10k newsmonthminus ~70 tags (annotations) per news article

bull Tags link text mentions of concepts to the knowledge graphminus Technically these are URIs for entities (people organizations locations etc) and key phrases

Apr 2016Hidden Relationships in Data and Risk Analytics

News Metadata

Apr 2016Hidden Relationships in Data and Risk Analytics

Category Count

International News 52 074

Science and Technology 23 201

Sports 20 714

Business 15 155

Lifestyle 11 684

122 828

Mentions entity type Count

Keyphrase 2 589 676

Organization 1 276 441

Location 1 260 972

Person 1 248 784

Work 309 093

Event 258 388

RelationPersonRole 236 638

Species 180 946

Sample queries at httpff-newsontotextcom

F1 Big cities in Eastern Europe

F2 Airports near London

F3 People and organizations related to Google

F4 Top-level industries by number of companies

F5 Mentions in the news of an organization and its related entities

F7 Most popular companies per industry including children

F8 Regional exposition of company ndash normalized

FF-NEWS is in Beta Not officially launched but available to play with

Open data integration for news analytics

News Popularity Ranking Automotive

Open data integration for news analytics

Rank Company News RankCompany incl mentions of child

companies News

1 General Motors 2722 1 General Motors 4620

2 Tesla Motors 2346 2 Volkswagen Group 3999

3 Volkswagen 2299 3 Fiat Chrysler Automobiles 2658

4 Ford Motor Company 1934 4 Tesla Motors 2370

5 Toyota 1325 5 Ford Motor Company 2125

6 Chevrolet 1264 6 Toyota 1656

7 Chrysler 1054 7 Renault-Nissan Alliance 1332

8 Fiat Chrysler Automobiles 1011 8 Honda 864

9 Audi AG 972 9 BMW 715

10 Honda 717 10 Takata Corporation 547

News Popularity Finance

Open data integration for news analytics

Rank Company News Rank Company incl mentions of controlled News

1 Bloomberg LP 3203 1 Intra Bank 261667

2 Goldman Sachs 1992 2 Hinduja Bank (Switzerland) 49731

3 JP Morgan Chase 1712 3 China Merchants Bank 38288

4 Wells Fargo 1688 4 Alphabet Inc 22601

5 Citigroup 1557 5 Capital Group Companies 4076

6 HSBC Holdings 1546 6 Bloomberg LP 3611

7 Deutsche Bank 1414 7 Exor 2704

8 Bank of America 1335 8 Nasdaq Inc 2082

9 Barclays 1260 9 JP Morgan Chase 1972

10 UBS 694 10 Sentinel Capital Partners 1053

Note Including investment funds stock exchanges agencies etc

News Popularity Banking

Open data integration for news analytics

Rank Company News Rank Company incl mentions of controlled News

1 Goldman Sachs 996 1 China Merchants Bank 38288

2 JP Morgan Chase 856 2 JP Morgan Chase 1972

3 HSBC Holdings 773 3 Goldman Sachs 1030

4 Deutsche Bank 707 4 HSBC 966

5 Barclays 630 5 Bank of America 771

6 Citigroup 519 6 Deutsche Bank 742

7 Bank of America 445 7 Barclays 681

8 Wells Fargo 422 8 Citigroup 630

9 UBS 347 9 Wells Fargo 428

10 Chase 126 10 UBS 347

Offshore Leaks Database from ICIJ

bull Published by the International Consortium of Investigative

Journalists (ICIJ) on 9th of May

bull A ldquosearchable databaserdquo about 320 000 offshore companies

minus 214 000 extracted from Panama Papers (valid until 2015)

minus More than 100 000 from 2013 Offshore leaks investigation (valid until 2010)

bull CSV extract from a graph database available for download

bull httpsoffshoreleaksicijorg

Open data integration for news analytics

Offshore Leaks Database

Open data integration for news analytics

Offshore Leaks DB as Linked Open Data

bull Ontotext published the Offshore Leaks DB as Linked Open Data

bull Available for exploration querying and download at

httpdataontotextcom

bull ONTOTEXT DISCLAIMERS

We use the data as is provided by ICIJ We make no representations and warranties of any kind

including warranties of title accuracy absence of errors or fitness for particular purpose All

transformations query results and derivative works are used only to showcase the service and

technological capabilities and not to serve as basis for any statements or conclusions

Open data integration for news analytics

Enrichment and structuring of the data

bull Relationship type hierarchy

minus About 80 types of relationship types in the original dataset got organized in a property hierarchy

bull Classification of officers into Person and Company

minus In the original database there is no way to distinguish whether an officer is a physical person

bull Mapping to DBPedia

minus 209 countries referred in Offshore Leaks DB are mapped to DBPedia

minus About 3000 persons and 300 companies mapped to DBPedia

bull Overall size of the repository 22M statements (20M explicit)

Open data integration for news analytics

The RDF-ization Process

bull Linked data variant produced without programming

minus The raw CSV files are RDF-ized using TARQL httptarqlgithubio

minus Data was further interlinked and enriched in GraphDB using SPARQL

bull The process is documented in this README file

bull All relevant artifacts are open-source available at

httpsgithubcomOntotext-ADleaks

bull The entire publishing and mapping took about 15 person-days

minus Including dataontotextcom portal setup promotion documentation etc

Open data integration for news analytics

Sample queries at httpdataontotextcom

Q1 Countries by number of entities related to them

Q2 Country pairs by ownership statistics

Q3 Statistics by incorporation year

Q4 Officers and entities by number of capital relations

Q5 Countries in Eastern Europe by number of owners

Q6 Intermediaries in Asia by name

Q7 The best connected officers

Q8 Countries by number of Person and Company officers

Play with semantically enriched news

httpnowontotextcom

Play with open data at

httpdataontotextcom and httpff-

newsontotextcom

FF-NEWS Data Integration and Loading

bull DBpedia (the English version only) 496M statements

bull Geonames (all geographic features on Earth) 150M statementsminus owlsameAs links between DBpedia and Geonames 471K statements

bull Company registry data (GLEI) 3M statements

bull News metadata (from NOW) 128M statements

bull Total size 986М statementsminus Mapped to FIBO 667M explicit statements + 318M inferred statements

minus RDFRank and geo-spatial indices enabled to allow for ranking and efficient geo-spatial constraints

Open data integration for news analytics

Technology Semantic Content Enrichment

News Metadata

bull Metadata from Ontotextrsquos Dynamic Semantic Publishing platformminus Automatically generated as part of the NOWontotextcom semantic news showcase

bull News stream from Google since Feb 2015 about 10k newsmonthminus ~70 tags (annotations) per news article

bull Tags link text mentions of concepts to the knowledge graphminus Technically these are URIs for entities (people organizations locations etc) and key phrases

Apr 2016Hidden Relationships in Data and Risk Analytics

News Metadata

Apr 2016Hidden Relationships in Data and Risk Analytics

Category Count

International News 52 074

Science and Technology 23 201

Sports 20 714

Business 15 155

Lifestyle 11 684

122 828

Mentions entity type Count

Keyphrase 2 589 676

Organization 1 276 441

Location 1 260 972

Person 1 248 784

Work 309 093

Event 258 388

RelationPersonRole 236 638

Species 180 946

Sample queries at httpff-newsontotextcom

F1 Big cities in Eastern Europe

F2 Airports near London

F3 People and organizations related to Google

F4 Top-level industries by number of companies

F5 Mentions in the news of an organization and its related entities

F7 Most popular companies per industry including children

F8 Regional exposition of company ndash normalized

FF-NEWS is in Beta Not officially launched but available to play with

Open data integration for news analytics

News Popularity Ranking Automotive

Open data integration for news analytics

Rank Company News RankCompany incl mentions of child

companies News

1 General Motors 2722 1 General Motors 4620

2 Tesla Motors 2346 2 Volkswagen Group 3999

3 Volkswagen 2299 3 Fiat Chrysler Automobiles 2658

4 Ford Motor Company 1934 4 Tesla Motors 2370

5 Toyota 1325 5 Ford Motor Company 2125

6 Chevrolet 1264 6 Toyota 1656

7 Chrysler 1054 7 Renault-Nissan Alliance 1332

8 Fiat Chrysler Automobiles 1011 8 Honda 864

9 Audi AG 972 9 BMW 715

10 Honda 717 10 Takata Corporation 547

News Popularity Finance

Open data integration for news analytics

Rank Company News Rank Company incl mentions of controlled News

1 Bloomberg LP 3203 1 Intra Bank 261667

2 Goldman Sachs 1992 2 Hinduja Bank (Switzerland) 49731

3 JP Morgan Chase 1712 3 China Merchants Bank 38288

4 Wells Fargo 1688 4 Alphabet Inc 22601

5 Citigroup 1557 5 Capital Group Companies 4076

6 HSBC Holdings 1546 6 Bloomberg LP 3611

7 Deutsche Bank 1414 7 Exor 2704

8 Bank of America 1335 8 Nasdaq Inc 2082

9 Barclays 1260 9 JP Morgan Chase 1972

10 UBS 694 10 Sentinel Capital Partners 1053

Note Including investment funds stock exchanges agencies etc

News Popularity Banking

Open data integration for news analytics

Rank Company News Rank Company incl mentions of controlled News

1 Goldman Sachs 996 1 China Merchants Bank 38288

2 JP Morgan Chase 856 2 JP Morgan Chase 1972

3 HSBC Holdings 773 3 Goldman Sachs 1030

4 Deutsche Bank 707 4 HSBC 966

5 Barclays 630 5 Bank of America 771

6 Citigroup 519 6 Deutsche Bank 742

7 Bank of America 445 7 Barclays 681

8 Wells Fargo 422 8 Citigroup 630

9 UBS 347 9 Wells Fargo 428

10 Chase 126 10 UBS 347

Offshore Leaks Database from ICIJ

bull Published by the International Consortium of Investigative

Journalists (ICIJ) on 9th of May

bull A ldquosearchable databaserdquo about 320 000 offshore companies

minus 214 000 extracted from Panama Papers (valid until 2015)

minus More than 100 000 from 2013 Offshore leaks investigation (valid until 2010)

bull CSV extract from a graph database available for download

bull httpsoffshoreleaksicijorg

Open data integration for news analytics

Offshore Leaks Database

Open data integration for news analytics

Offshore Leaks DB as Linked Open Data

bull Ontotext published the Offshore Leaks DB as Linked Open Data

bull Available for exploration querying and download at

httpdataontotextcom

bull ONTOTEXT DISCLAIMERS

We use the data as is provided by ICIJ We make no representations and warranties of any kind

including warranties of title accuracy absence of errors or fitness for particular purpose All

transformations query results and derivative works are used only to showcase the service and

technological capabilities and not to serve as basis for any statements or conclusions

Open data integration for news analytics

Enrichment and structuring of the data

bull Relationship type hierarchy

minus About 80 types of relationship types in the original dataset got organized in a property hierarchy

bull Classification of officers into Person and Company

minus In the original database there is no way to distinguish whether an officer is a physical person

bull Mapping to DBPedia

minus 209 countries referred in Offshore Leaks DB are mapped to DBPedia

minus About 3000 persons and 300 companies mapped to DBPedia

bull Overall size of the repository 22M statements (20M explicit)

Open data integration for news analytics

The RDF-ization Process

bull Linked data variant produced without programming

minus The raw CSV files are RDF-ized using TARQL httptarqlgithubio

minus Data was further interlinked and enriched in GraphDB using SPARQL

bull The process is documented in this README file

bull All relevant artifacts are open-source available at

httpsgithubcomOntotext-ADleaks

bull The entire publishing and mapping took about 15 person-days

minus Including dataontotextcom portal setup promotion documentation etc

Open data integration for news analytics

Sample queries at httpdataontotextcom

Q1 Countries by number of entities related to them

Q2 Country pairs by ownership statistics

Q3 Statistics by incorporation year

Q4 Officers and entities by number of capital relations

Q5 Countries in Eastern Europe by number of owners

Q6 Intermediaries in Asia by name

Q7 The best connected officers

Q8 Countries by number of Person and Company officers

Play with semantically enriched news

httpnowontotextcom

Play with open data at

httpdataontotextcom and httpff-

newsontotextcom

Technology Semantic Content Enrichment

News Metadata

bull Metadata from Ontotextrsquos Dynamic Semantic Publishing platformminus Automatically generated as part of the NOWontotextcom semantic news showcase

bull News stream from Google since Feb 2015 about 10k newsmonthminus ~70 tags (annotations) per news article

bull Tags link text mentions of concepts to the knowledge graphminus Technically these are URIs for entities (people organizations locations etc) and key phrases

Apr 2016Hidden Relationships in Data and Risk Analytics

News Metadata

Apr 2016Hidden Relationships in Data and Risk Analytics

Category Count

International News 52 074

Science and Technology 23 201

Sports 20 714

Business 15 155

Lifestyle 11 684

122 828

Mentions entity type Count

Keyphrase 2 589 676

Organization 1 276 441

Location 1 260 972

Person 1 248 784

Work 309 093

Event 258 388

RelationPersonRole 236 638

Species 180 946

Sample queries at httpff-newsontotextcom

F1 Big cities in Eastern Europe

F2 Airports near London

F3 People and organizations related to Google

F4 Top-level industries by number of companies

F5 Mentions in the news of an organization and its related entities

F7 Most popular companies per industry including children

F8 Regional exposition of company ndash normalized

FF-NEWS is in Beta Not officially launched but available to play with

Open data integration for news analytics

News Popularity Ranking Automotive

Open data integration for news analytics

Rank Company News RankCompany incl mentions of child

companies News

1 General Motors 2722 1 General Motors 4620

2 Tesla Motors 2346 2 Volkswagen Group 3999

3 Volkswagen 2299 3 Fiat Chrysler Automobiles 2658

4 Ford Motor Company 1934 4 Tesla Motors 2370

5 Toyota 1325 5 Ford Motor Company 2125

6 Chevrolet 1264 6 Toyota 1656

7 Chrysler 1054 7 Renault-Nissan Alliance 1332

8 Fiat Chrysler Automobiles 1011 8 Honda 864

9 Audi AG 972 9 BMW 715

10 Honda 717 10 Takata Corporation 547

News Popularity Finance

Open data integration for news analytics

Rank Company News Rank Company incl mentions of controlled News

1 Bloomberg LP 3203 1 Intra Bank 261667

2 Goldman Sachs 1992 2 Hinduja Bank (Switzerland) 49731

3 JP Morgan Chase 1712 3 China Merchants Bank 38288

4 Wells Fargo 1688 4 Alphabet Inc 22601

5 Citigroup 1557 5 Capital Group Companies 4076

6 HSBC Holdings 1546 6 Bloomberg LP 3611

7 Deutsche Bank 1414 7 Exor 2704

8 Bank of America 1335 8 Nasdaq Inc 2082

9 Barclays 1260 9 JP Morgan Chase 1972

10 UBS 694 10 Sentinel Capital Partners 1053

Note Including investment funds stock exchanges agencies etc

News Popularity Banking

Open data integration for news analytics

Rank Company News Rank Company incl mentions of controlled News

1 Goldman Sachs 996 1 China Merchants Bank 38288

2 JP Morgan Chase 856 2 JP Morgan Chase 1972

3 HSBC Holdings 773 3 Goldman Sachs 1030

4 Deutsche Bank 707 4 HSBC 966

5 Barclays 630 5 Bank of America 771

6 Citigroup 519 6 Deutsche Bank 742

7 Bank of America 445 7 Barclays 681

8 Wells Fargo 422 8 Citigroup 630

9 UBS 347 9 Wells Fargo 428

10 Chase 126 10 UBS 347

Offshore Leaks Database from ICIJ

bull Published by the International Consortium of Investigative

Journalists (ICIJ) on 9th of May

bull A ldquosearchable databaserdquo about 320 000 offshore companies

minus 214 000 extracted from Panama Papers (valid until 2015)

minus More than 100 000 from 2013 Offshore leaks investigation (valid until 2010)

bull CSV extract from a graph database available for download

bull httpsoffshoreleaksicijorg

Open data integration for news analytics

Offshore Leaks Database

Open data integration for news analytics

Offshore Leaks DB as Linked Open Data

bull Ontotext published the Offshore Leaks DB as Linked Open Data

bull Available for exploration querying and download at

httpdataontotextcom

bull ONTOTEXT DISCLAIMERS

We use the data as is provided by ICIJ We make no representations and warranties of any kind

including warranties of title accuracy absence of errors or fitness for particular purpose All

transformations query results and derivative works are used only to showcase the service and

technological capabilities and not to serve as basis for any statements or conclusions

Open data integration for news analytics

Enrichment and structuring of the data

bull Relationship type hierarchy

minus About 80 types of relationship types in the original dataset got organized in a property hierarchy

bull Classification of officers into Person and Company

minus In the original database there is no way to distinguish whether an officer is a physical person

bull Mapping to DBPedia

minus 209 countries referred in Offshore Leaks DB are mapped to DBPedia

minus About 3000 persons and 300 companies mapped to DBPedia

bull Overall size of the repository 22M statements (20M explicit)

Open data integration for news analytics

The RDF-ization Process

bull Linked data variant produced without programming

minus The raw CSV files are RDF-ized using TARQL httptarqlgithubio

minus Data was further interlinked and enriched in GraphDB using SPARQL

bull The process is documented in this README file

bull All relevant artifacts are open-source available at

httpsgithubcomOntotext-ADleaks

bull The entire publishing and mapping took about 15 person-days

minus Including dataontotextcom portal setup promotion documentation etc

Open data integration for news analytics

Sample queries at httpdataontotextcom

Q1 Countries by number of entities related to them

Q2 Country pairs by ownership statistics

Q3 Statistics by incorporation year

Q4 Officers and entities by number of capital relations

Q5 Countries in Eastern Europe by number of owners

Q6 Intermediaries in Asia by name

Q7 The best connected officers

Q8 Countries by number of Person and Company officers

Play with semantically enriched news

httpnowontotextcom

Play with open data at

httpdataontotextcom and httpff-

newsontotextcom

News Metadata

bull Metadata from Ontotextrsquos Dynamic Semantic Publishing platformminus Automatically generated as part of the NOWontotextcom semantic news showcase

bull News stream from Google since Feb 2015 about 10k newsmonthminus ~70 tags (annotations) per news article

bull Tags link text mentions of concepts to the knowledge graphminus Technically these are URIs for entities (people organizations locations etc) and key phrases

Apr 2016Hidden Relationships in Data and Risk Analytics

News Metadata

Apr 2016Hidden Relationships in Data and Risk Analytics

Category Count

International News 52 074

Science and Technology 23 201

Sports 20 714

Business 15 155

Lifestyle 11 684

122 828

Mentions entity type Count

Keyphrase 2 589 676

Organization 1 276 441

Location 1 260 972

Person 1 248 784

Work 309 093

Event 258 388

RelationPersonRole 236 638

Species 180 946

Sample queries at httpff-newsontotextcom

F1 Big cities in Eastern Europe

F2 Airports near London

F3 People and organizations related to Google

F4 Top-level industries by number of companies

F5 Mentions in the news of an organization and its related entities

F7 Most popular companies per industry including children

F8 Regional exposition of company ndash normalized

FF-NEWS is in Beta Not officially launched but available to play with

Open data integration for news analytics

News Popularity Ranking Automotive

Open data integration for news analytics

Rank Company News RankCompany incl mentions of child

companies News

1 General Motors 2722 1 General Motors 4620

2 Tesla Motors 2346 2 Volkswagen Group 3999

3 Volkswagen 2299 3 Fiat Chrysler Automobiles 2658

4 Ford Motor Company 1934 4 Tesla Motors 2370

5 Toyota 1325 5 Ford Motor Company 2125

6 Chevrolet 1264 6 Toyota 1656

7 Chrysler 1054 7 Renault-Nissan Alliance 1332

8 Fiat Chrysler Automobiles 1011 8 Honda 864

9 Audi AG 972 9 BMW 715

10 Honda 717 10 Takata Corporation 547

News Popularity Finance

Open data integration for news analytics

Rank Company News Rank Company incl mentions of controlled News

1 Bloomberg LP 3203 1 Intra Bank 261667

2 Goldman Sachs 1992 2 Hinduja Bank (Switzerland) 49731

3 JP Morgan Chase 1712 3 China Merchants Bank 38288

4 Wells Fargo 1688 4 Alphabet Inc 22601

5 Citigroup 1557 5 Capital Group Companies 4076

6 HSBC Holdings 1546 6 Bloomberg LP 3611

7 Deutsche Bank 1414 7 Exor 2704

8 Bank of America 1335 8 Nasdaq Inc 2082

9 Barclays 1260 9 JP Morgan Chase 1972

10 UBS 694 10 Sentinel Capital Partners 1053

Note Including investment funds stock exchanges agencies etc

News Popularity Banking

Open data integration for news analytics

Rank Company News Rank Company incl mentions of controlled News

1 Goldman Sachs 996 1 China Merchants Bank 38288

2 JP Morgan Chase 856 2 JP Morgan Chase 1972

3 HSBC Holdings 773 3 Goldman Sachs 1030

4 Deutsche Bank 707 4 HSBC 966

5 Barclays 630 5 Bank of America 771

6 Citigroup 519 6 Deutsche Bank 742

7 Bank of America 445 7 Barclays 681

8 Wells Fargo 422 8 Citigroup 630

9 UBS 347 9 Wells Fargo 428

10 Chase 126 10 UBS 347

Offshore Leaks Database from ICIJ

bull Published by the International Consortium of Investigative

Journalists (ICIJ) on 9th of May

bull A ldquosearchable databaserdquo about 320 000 offshore companies

minus 214 000 extracted from Panama Papers (valid until 2015)

minus More than 100 000 from 2013 Offshore leaks investigation (valid until 2010)

bull CSV extract from a graph database available for download

bull httpsoffshoreleaksicijorg

Open data integration for news analytics

Offshore Leaks Database

Open data integration for news analytics

Offshore Leaks DB as Linked Open Data

bull Ontotext published the Offshore Leaks DB as Linked Open Data

bull Available for exploration querying and download at

httpdataontotextcom

bull ONTOTEXT DISCLAIMERS

We use the data as is provided by ICIJ We make no representations and warranties of any kind

including warranties of title accuracy absence of errors or fitness for particular purpose All

transformations query results and derivative works are used only to showcase the service and

technological capabilities and not to serve as basis for any statements or conclusions

Open data integration for news analytics

Enrichment and structuring of the data

bull Relationship type hierarchy

minus About 80 types of relationship types in the original dataset got organized in a property hierarchy

bull Classification of officers into Person and Company

minus In the original database there is no way to distinguish whether an officer is a physical person

bull Mapping to DBPedia

minus 209 countries referred in Offshore Leaks DB are mapped to DBPedia

minus About 3000 persons and 300 companies mapped to DBPedia

bull Overall size of the repository 22M statements (20M explicit)

Open data integration for news analytics

The RDF-ization Process

bull Linked data variant produced without programming

minus The raw CSV files are RDF-ized using TARQL httptarqlgithubio

minus Data was further interlinked and enriched in GraphDB using SPARQL

bull The process is documented in this README file

bull All relevant artifacts are open-source available at

httpsgithubcomOntotext-ADleaks

bull The entire publishing and mapping took about 15 person-days

minus Including dataontotextcom portal setup promotion documentation etc

Open data integration for news analytics

Sample queries at httpdataontotextcom

Q1 Countries by number of entities related to them

Q2 Country pairs by ownership statistics

Q3 Statistics by incorporation year

Q4 Officers and entities by number of capital relations

Q5 Countries in Eastern Europe by number of owners

Q6 Intermediaries in Asia by name

Q7 The best connected officers

Q8 Countries by number of Person and Company officers

Play with semantically enriched news

httpnowontotextcom

Play with open data at

httpdataontotextcom and httpff-

newsontotextcom

News Metadata

Apr 2016Hidden Relationships in Data and Risk Analytics

Category Count

International News 52 074

Science and Technology 23 201

Sports 20 714

Business 15 155

Lifestyle 11 684

122 828

Mentions entity type Count

Keyphrase 2 589 676

Organization 1 276 441

Location 1 260 972

Person 1 248 784

Work 309 093

Event 258 388

RelationPersonRole 236 638

Species 180 946

Sample queries at httpff-newsontotextcom

F1 Big cities in Eastern Europe

F2 Airports near London

F3 People and organizations related to Google

F4 Top-level industries by number of companies

F5 Mentions in the news of an organization and its related entities

F7 Most popular companies per industry including children

F8 Regional exposition of company ndash normalized

FF-NEWS is in Beta Not officially launched but available to play with

Open data integration for news analytics

News Popularity Ranking Automotive

Open data integration for news analytics

Rank Company News RankCompany incl mentions of child

companies News

1 General Motors 2722 1 General Motors 4620

2 Tesla Motors 2346 2 Volkswagen Group 3999

3 Volkswagen 2299 3 Fiat Chrysler Automobiles 2658

4 Ford Motor Company 1934 4 Tesla Motors 2370

5 Toyota 1325 5 Ford Motor Company 2125

6 Chevrolet 1264 6 Toyota 1656

7 Chrysler 1054 7 Renault-Nissan Alliance 1332

8 Fiat Chrysler Automobiles 1011 8 Honda 864

9 Audi AG 972 9 BMW 715

10 Honda 717 10 Takata Corporation 547

News Popularity Finance

Open data integration for news analytics

Rank Company News Rank Company incl mentions of controlled News

1 Bloomberg LP 3203 1 Intra Bank 261667

2 Goldman Sachs 1992 2 Hinduja Bank (Switzerland) 49731

3 JP Morgan Chase 1712 3 China Merchants Bank 38288

4 Wells Fargo 1688 4 Alphabet Inc 22601

5 Citigroup 1557 5 Capital Group Companies 4076

6 HSBC Holdings 1546 6 Bloomberg LP 3611

7 Deutsche Bank 1414 7 Exor 2704

8 Bank of America 1335 8 Nasdaq Inc 2082

9 Barclays 1260 9 JP Morgan Chase 1972

10 UBS 694 10 Sentinel Capital Partners 1053

Note Including investment funds stock exchanges agencies etc

News Popularity Banking

Open data integration for news analytics

Rank Company News Rank Company incl mentions of controlled News

1 Goldman Sachs 996 1 China Merchants Bank 38288

2 JP Morgan Chase 856 2 JP Morgan Chase 1972

3 HSBC Holdings 773 3 Goldman Sachs 1030

4 Deutsche Bank 707 4 HSBC 966

5 Barclays 630 5 Bank of America 771

6 Citigroup 519 6 Deutsche Bank 742

7 Bank of America 445 7 Barclays 681

8 Wells Fargo 422 8 Citigroup 630

9 UBS 347 9 Wells Fargo 428

10 Chase 126 10 UBS 347

Offshore Leaks Database from ICIJ

bull Published by the International Consortium of Investigative

Journalists (ICIJ) on 9th of May

bull A ldquosearchable databaserdquo about 320 000 offshore companies

minus 214 000 extracted from Panama Papers (valid until 2015)

minus More than 100 000 from 2013 Offshore leaks investigation (valid until 2010)

bull CSV extract from a graph database available for download

bull httpsoffshoreleaksicijorg

Open data integration for news analytics

Offshore Leaks Database

Open data integration for news analytics

Offshore Leaks DB as Linked Open Data

bull Ontotext published the Offshore Leaks DB as Linked Open Data

bull Available for exploration querying and download at

httpdataontotextcom

bull ONTOTEXT DISCLAIMERS

We use the data as is provided by ICIJ We make no representations and warranties of any kind

including warranties of title accuracy absence of errors or fitness for particular purpose All

transformations query results and derivative works are used only to showcase the service and

technological capabilities and not to serve as basis for any statements or conclusions

Open data integration for news analytics

Enrichment and structuring of the data

bull Relationship type hierarchy

minus About 80 types of relationship types in the original dataset got organized in a property hierarchy

bull Classification of officers into Person and Company

minus In the original database there is no way to distinguish whether an officer is a physical person

bull Mapping to DBPedia

minus 209 countries referred in Offshore Leaks DB are mapped to DBPedia

minus About 3000 persons and 300 companies mapped to DBPedia

bull Overall size of the repository 22M statements (20M explicit)

Open data integration for news analytics

The RDF-ization Process

bull Linked data variant produced without programming

minus The raw CSV files are RDF-ized using TARQL httptarqlgithubio

minus Data was further interlinked and enriched in GraphDB using SPARQL

bull The process is documented in this README file

bull All relevant artifacts are open-source available at

httpsgithubcomOntotext-ADleaks

bull The entire publishing and mapping took about 15 person-days

minus Including dataontotextcom portal setup promotion documentation etc

Open data integration for news analytics

Sample queries at httpdataontotextcom

Q1 Countries by number of entities related to them

Q2 Country pairs by ownership statistics

Q3 Statistics by incorporation year

Q4 Officers and entities by number of capital relations

Q5 Countries in Eastern Europe by number of owners

Q6 Intermediaries in Asia by name

Q7 The best connected officers

Q8 Countries by number of Person and Company officers

Play with semantically enriched news

httpnowontotextcom

Play with open data at

httpdataontotextcom and httpff-

newsontotextcom

Sample queries at httpff-newsontotextcom

F1 Big cities in Eastern Europe

F2 Airports near London

F3 People and organizations related to Google

F4 Top-level industries by number of companies

F5 Mentions in the news of an organization and its related entities

F7 Most popular companies per industry including children

F8 Regional exposition of company ndash normalized

FF-NEWS is in Beta Not officially launched but available to play with

Open data integration for news analytics

News Popularity Ranking Automotive

Open data integration for news analytics

Rank Company News RankCompany incl mentions of child

companies News

1 General Motors 2722 1 General Motors 4620

2 Tesla Motors 2346 2 Volkswagen Group 3999

3 Volkswagen 2299 3 Fiat Chrysler Automobiles 2658

4 Ford Motor Company 1934 4 Tesla Motors 2370

5 Toyota 1325 5 Ford Motor Company 2125

6 Chevrolet 1264 6 Toyota 1656

7 Chrysler 1054 7 Renault-Nissan Alliance 1332

8 Fiat Chrysler Automobiles 1011 8 Honda 864

9 Audi AG 972 9 BMW 715

10 Honda 717 10 Takata Corporation 547

News Popularity Finance

Open data integration for news analytics

Rank Company News Rank Company incl mentions of controlled News

1 Bloomberg LP 3203 1 Intra Bank 261667

2 Goldman Sachs 1992 2 Hinduja Bank (Switzerland) 49731

3 JP Morgan Chase 1712 3 China Merchants Bank 38288

4 Wells Fargo 1688 4 Alphabet Inc 22601

5 Citigroup 1557 5 Capital Group Companies 4076

6 HSBC Holdings 1546 6 Bloomberg LP 3611

7 Deutsche Bank 1414 7 Exor 2704

8 Bank of America 1335 8 Nasdaq Inc 2082

9 Barclays 1260 9 JP Morgan Chase 1972

10 UBS 694 10 Sentinel Capital Partners 1053

Note Including investment funds stock exchanges agencies etc

News Popularity Banking

Open data integration for news analytics

Rank Company News Rank Company incl mentions of controlled News

1 Goldman Sachs 996 1 China Merchants Bank 38288

2 JP Morgan Chase 856 2 JP Morgan Chase 1972

3 HSBC Holdings 773 3 Goldman Sachs 1030

4 Deutsche Bank 707 4 HSBC 966

5 Barclays 630 5 Bank of America 771

6 Citigroup 519 6 Deutsche Bank 742

7 Bank of America 445 7 Barclays 681

8 Wells Fargo 422 8 Citigroup 630

9 UBS 347 9 Wells Fargo 428

10 Chase 126 10 UBS 347

Offshore Leaks Database from ICIJ

bull Published by the International Consortium of Investigative

Journalists (ICIJ) on 9th of May

bull A ldquosearchable databaserdquo about 320 000 offshore companies

minus 214 000 extracted from Panama Papers (valid until 2015)

minus More than 100 000 from 2013 Offshore leaks investigation (valid until 2010)

bull CSV extract from a graph database available for download

bull httpsoffshoreleaksicijorg

Open data integration for news analytics

Offshore Leaks Database

Open data integration for news analytics

Offshore Leaks DB as Linked Open Data

bull Ontotext published the Offshore Leaks DB as Linked Open Data

bull Available for exploration querying and download at

httpdataontotextcom

bull ONTOTEXT DISCLAIMERS

We use the data as is provided by ICIJ We make no representations and warranties of any kind

including warranties of title accuracy absence of errors or fitness for particular purpose All

transformations query results and derivative works are used only to showcase the service and

technological capabilities and not to serve as basis for any statements or conclusions

Open data integration for news analytics

Enrichment and structuring of the data

bull Relationship type hierarchy

minus About 80 types of relationship types in the original dataset got organized in a property hierarchy

bull Classification of officers into Person and Company

minus In the original database there is no way to distinguish whether an officer is a physical person

bull Mapping to DBPedia

minus 209 countries referred in Offshore Leaks DB are mapped to DBPedia

minus About 3000 persons and 300 companies mapped to DBPedia

bull Overall size of the repository 22M statements (20M explicit)

Open data integration for news analytics

The RDF-ization Process

bull Linked data variant produced without programming

minus The raw CSV files are RDF-ized using TARQL httptarqlgithubio

minus Data was further interlinked and enriched in GraphDB using SPARQL

bull The process is documented in this README file

bull All relevant artifacts are open-source available at

httpsgithubcomOntotext-ADleaks

bull The entire publishing and mapping took about 15 person-days

minus Including dataontotextcom portal setup promotion documentation etc

Open data integration for news analytics

Sample queries at httpdataontotextcom

Q1 Countries by number of entities related to them

Q2 Country pairs by ownership statistics

Q3 Statistics by incorporation year

Q4 Officers and entities by number of capital relations

Q5 Countries in Eastern Europe by number of owners

Q6 Intermediaries in Asia by name

Q7 The best connected officers

Q8 Countries by number of Person and Company officers

Play with semantically enriched news

httpnowontotextcom

Play with open data at

httpdataontotextcom and httpff-

newsontotextcom

News Popularity Ranking Automotive

Open data integration for news analytics

Rank Company News RankCompany incl mentions of child

companies News

1 General Motors 2722 1 General Motors 4620

2 Tesla Motors 2346 2 Volkswagen Group 3999

3 Volkswagen 2299 3 Fiat Chrysler Automobiles 2658

4 Ford Motor Company 1934 4 Tesla Motors 2370

5 Toyota 1325 5 Ford Motor Company 2125

6 Chevrolet 1264 6 Toyota 1656

7 Chrysler 1054 7 Renault-Nissan Alliance 1332

8 Fiat Chrysler Automobiles 1011 8 Honda 864

9 Audi AG 972 9 BMW 715

10 Honda 717 10 Takata Corporation 547

News Popularity Finance

Open data integration for news analytics

Rank Company News Rank Company incl mentions of controlled News

1 Bloomberg LP 3203 1 Intra Bank 261667

2 Goldman Sachs 1992 2 Hinduja Bank (Switzerland) 49731

3 JP Morgan Chase 1712 3 China Merchants Bank 38288

4 Wells Fargo 1688 4 Alphabet Inc 22601

5 Citigroup 1557 5 Capital Group Companies 4076

6 HSBC Holdings 1546 6 Bloomberg LP 3611

7 Deutsche Bank 1414 7 Exor 2704

8 Bank of America 1335 8 Nasdaq Inc 2082

9 Barclays 1260 9 JP Morgan Chase 1972

10 UBS 694 10 Sentinel Capital Partners 1053

Note Including investment funds stock exchanges agencies etc

News Popularity Banking

Open data integration for news analytics

Rank Company News Rank Company incl mentions of controlled News

1 Goldman Sachs 996 1 China Merchants Bank 38288

2 JP Morgan Chase 856 2 JP Morgan Chase 1972

3 HSBC Holdings 773 3 Goldman Sachs 1030

4 Deutsche Bank 707 4 HSBC 966

5 Barclays 630 5 Bank of America 771

6 Citigroup 519 6 Deutsche Bank 742

7 Bank of America 445 7 Barclays 681

8 Wells Fargo 422 8 Citigroup 630

9 UBS 347 9 Wells Fargo 428

10 Chase 126 10 UBS 347

Offshore Leaks Database from ICIJ

bull Published by the International Consortium of Investigative

Journalists (ICIJ) on 9th of May

bull A ldquosearchable databaserdquo about 320 000 offshore companies

minus 214 000 extracted from Panama Papers (valid until 2015)

minus More than 100 000 from 2013 Offshore leaks investigation (valid until 2010)

bull CSV extract from a graph database available for download

bull httpsoffshoreleaksicijorg

Open data integration for news analytics

Offshore Leaks Database

Open data integration for news analytics

Offshore Leaks DB as Linked Open Data

bull Ontotext published the Offshore Leaks DB as Linked Open Data

bull Available for exploration querying and download at

httpdataontotextcom

bull ONTOTEXT DISCLAIMERS

We use the data as is provided by ICIJ We make no representations and warranties of any kind

including warranties of title accuracy absence of errors or fitness for particular purpose All

transformations query results and derivative works are used only to showcase the service and

technological capabilities and not to serve as basis for any statements or conclusions

Open data integration for news analytics

Enrichment and structuring of the data

bull Relationship type hierarchy

minus About 80 types of relationship types in the original dataset got organized in a property hierarchy

bull Classification of officers into Person and Company

minus In the original database there is no way to distinguish whether an officer is a physical person

bull Mapping to DBPedia

minus 209 countries referred in Offshore Leaks DB are mapped to DBPedia

minus About 3000 persons and 300 companies mapped to DBPedia

bull Overall size of the repository 22M statements (20M explicit)

Open data integration for news analytics

The RDF-ization Process

bull Linked data variant produced without programming

minus The raw CSV files are RDF-ized using TARQL httptarqlgithubio

minus Data was further interlinked and enriched in GraphDB using SPARQL

bull The process is documented in this README file

bull All relevant artifacts are open-source available at

httpsgithubcomOntotext-ADleaks

bull The entire publishing and mapping took about 15 person-days

minus Including dataontotextcom portal setup promotion documentation etc

Open data integration for news analytics

Sample queries at httpdataontotextcom

Q1 Countries by number of entities related to them

Q2 Country pairs by ownership statistics

Q3 Statistics by incorporation year

Q4 Officers and entities by number of capital relations

Q5 Countries in Eastern Europe by number of owners

Q6 Intermediaries in Asia by name

Q7 The best connected officers

Q8 Countries by number of Person and Company officers

Play with semantically enriched news

httpnowontotextcom

Play with open data at

httpdataontotextcom and httpff-

newsontotextcom

News Popularity Finance

Open data integration for news analytics

Rank Company News Rank Company incl mentions of controlled News

1 Bloomberg LP 3203 1 Intra Bank 261667

2 Goldman Sachs 1992 2 Hinduja Bank (Switzerland) 49731

3 JP Morgan Chase 1712 3 China Merchants Bank 38288

4 Wells Fargo 1688 4 Alphabet Inc 22601

5 Citigroup 1557 5 Capital Group Companies 4076

6 HSBC Holdings 1546 6 Bloomberg LP 3611

7 Deutsche Bank 1414 7 Exor 2704

8 Bank of America 1335 8 Nasdaq Inc 2082

9 Barclays 1260 9 JP Morgan Chase 1972

10 UBS 694 10 Sentinel Capital Partners 1053

Note Including investment funds stock exchanges agencies etc

News Popularity Banking

Open data integration for news analytics

Rank Company News Rank Company incl mentions of controlled News

1 Goldman Sachs 996 1 China Merchants Bank 38288

2 JP Morgan Chase 856 2 JP Morgan Chase 1972

3 HSBC Holdings 773 3 Goldman Sachs 1030

4 Deutsche Bank 707 4 HSBC 966

5 Barclays 630 5 Bank of America 771

6 Citigroup 519 6 Deutsche Bank 742

7 Bank of America 445 7 Barclays 681

8 Wells Fargo 422 8 Citigroup 630

9 UBS 347 9 Wells Fargo 428

10 Chase 126 10 UBS 347

Offshore Leaks Database from ICIJ

bull Published by the International Consortium of Investigative

Journalists (ICIJ) on 9th of May

bull A ldquosearchable databaserdquo about 320 000 offshore companies

minus 214 000 extracted from Panama Papers (valid until 2015)

minus More than 100 000 from 2013 Offshore leaks investigation (valid until 2010)

bull CSV extract from a graph database available for download

bull httpsoffshoreleaksicijorg

Open data integration for news analytics

Offshore Leaks Database

Open data integration for news analytics

Offshore Leaks DB as Linked Open Data

bull Ontotext published the Offshore Leaks DB as Linked Open Data

bull Available for exploration querying and download at

httpdataontotextcom

bull ONTOTEXT DISCLAIMERS

We use the data as is provided by ICIJ We make no representations and warranties of any kind

including warranties of title accuracy absence of errors or fitness for particular purpose All

transformations query results and derivative works are used only to showcase the service and

technological capabilities and not to serve as basis for any statements or conclusions

Open data integration for news analytics

Enrichment and structuring of the data

bull Relationship type hierarchy

minus About 80 types of relationship types in the original dataset got organized in a property hierarchy

bull Classification of officers into Person and Company

minus In the original database there is no way to distinguish whether an officer is a physical person

bull Mapping to DBPedia

minus 209 countries referred in Offshore Leaks DB are mapped to DBPedia

minus About 3000 persons and 300 companies mapped to DBPedia

bull Overall size of the repository 22M statements (20M explicit)

Open data integration for news analytics

The RDF-ization Process

bull Linked data variant produced without programming

minus The raw CSV files are RDF-ized using TARQL httptarqlgithubio

minus Data was further interlinked and enriched in GraphDB using SPARQL

bull The process is documented in this README file

bull All relevant artifacts are open-source available at

httpsgithubcomOntotext-ADleaks

bull The entire publishing and mapping took about 15 person-days

minus Including dataontotextcom portal setup promotion documentation etc

Open data integration for news analytics

Sample queries at httpdataontotextcom

Q1 Countries by number of entities related to them

Q2 Country pairs by ownership statistics

Q3 Statistics by incorporation year

Q4 Officers and entities by number of capital relations

Q5 Countries in Eastern Europe by number of owners

Q6 Intermediaries in Asia by name

Q7 The best connected officers

Q8 Countries by number of Person and Company officers

Play with semantically enriched news

httpnowontotextcom

Play with open data at

httpdataontotextcom and httpff-

newsontotextcom

News Popularity Banking

Open data integration for news analytics

Rank Company News Rank Company incl mentions of controlled News

1 Goldman Sachs 996 1 China Merchants Bank 38288

2 JP Morgan Chase 856 2 JP Morgan Chase 1972

3 HSBC Holdings 773 3 Goldman Sachs 1030

4 Deutsche Bank 707 4 HSBC 966

5 Barclays 630 5 Bank of America 771

6 Citigroup 519 6 Deutsche Bank 742

7 Bank of America 445 7 Barclays 681

8 Wells Fargo 422 8 Citigroup 630

9 UBS 347 9 Wells Fargo 428

10 Chase 126 10 UBS 347

Offshore Leaks Database from ICIJ

bull Published by the International Consortium of Investigative

Journalists (ICIJ) on 9th of May

bull A ldquosearchable databaserdquo about 320 000 offshore companies

minus 214 000 extracted from Panama Papers (valid until 2015)

minus More than 100 000 from 2013 Offshore leaks investigation (valid until 2010)

bull CSV extract from a graph database available for download

bull httpsoffshoreleaksicijorg

Open data integration for news analytics

Offshore Leaks Database

Open data integration for news analytics

Offshore Leaks DB as Linked Open Data

bull Ontotext published the Offshore Leaks DB as Linked Open Data

bull Available for exploration querying and download at

httpdataontotextcom

bull ONTOTEXT DISCLAIMERS

We use the data as is provided by ICIJ We make no representations and warranties of any kind

including warranties of title accuracy absence of errors or fitness for particular purpose All

transformations query results and derivative works are used only to showcase the service and

technological capabilities and not to serve as basis for any statements or conclusions

Open data integration for news analytics

Enrichment and structuring of the data

bull Relationship type hierarchy

minus About 80 types of relationship types in the original dataset got organized in a property hierarchy

bull Classification of officers into Person and Company

minus In the original database there is no way to distinguish whether an officer is a physical person

bull Mapping to DBPedia

minus 209 countries referred in Offshore Leaks DB are mapped to DBPedia

minus About 3000 persons and 300 companies mapped to DBPedia

bull Overall size of the repository 22M statements (20M explicit)

Open data integration for news analytics

The RDF-ization Process

bull Linked data variant produced without programming

minus The raw CSV files are RDF-ized using TARQL httptarqlgithubio

minus Data was further interlinked and enriched in GraphDB using SPARQL

bull The process is documented in this README file

bull All relevant artifacts are open-source available at

httpsgithubcomOntotext-ADleaks

bull The entire publishing and mapping took about 15 person-days

minus Including dataontotextcom portal setup promotion documentation etc

Open data integration for news analytics

Sample queries at httpdataontotextcom

Q1 Countries by number of entities related to them

Q2 Country pairs by ownership statistics

Q3 Statistics by incorporation year

Q4 Officers and entities by number of capital relations

Q5 Countries in Eastern Europe by number of owners

Q6 Intermediaries in Asia by name

Q7 The best connected officers

Q8 Countries by number of Person and Company officers

Play with semantically enriched news

httpnowontotextcom

Play with open data at

httpdataontotextcom and httpff-

newsontotextcom

Offshore Leaks Database from ICIJ

bull Published by the International Consortium of Investigative

Journalists (ICIJ) on 9th of May

bull A ldquosearchable databaserdquo about 320 000 offshore companies

minus 214 000 extracted from Panama Papers (valid until 2015)

minus More than 100 000 from 2013 Offshore leaks investigation (valid until 2010)

bull CSV extract from a graph database available for download

bull httpsoffshoreleaksicijorg

Open data integration for news analytics

Offshore Leaks Database

Open data integration for news analytics

Offshore Leaks DB as Linked Open Data

bull Ontotext published the Offshore Leaks DB as Linked Open Data

bull Available for exploration querying and download at

httpdataontotextcom

bull ONTOTEXT DISCLAIMERS

We use the data as is provided by ICIJ We make no representations and warranties of any kind

including warranties of title accuracy absence of errors or fitness for particular purpose All

transformations query results and derivative works are used only to showcase the service and

technological capabilities and not to serve as basis for any statements or conclusions

Open data integration for news analytics

Enrichment and structuring of the data

bull Relationship type hierarchy

minus About 80 types of relationship types in the original dataset got organized in a property hierarchy

bull Classification of officers into Person and Company

minus In the original database there is no way to distinguish whether an officer is a physical person

bull Mapping to DBPedia

minus 209 countries referred in Offshore Leaks DB are mapped to DBPedia

minus About 3000 persons and 300 companies mapped to DBPedia

bull Overall size of the repository 22M statements (20M explicit)

Open data integration for news analytics

The RDF-ization Process

bull Linked data variant produced without programming

minus The raw CSV files are RDF-ized using TARQL httptarqlgithubio

minus Data was further interlinked and enriched in GraphDB using SPARQL

bull The process is documented in this README file

bull All relevant artifacts are open-source available at

httpsgithubcomOntotext-ADleaks

bull The entire publishing and mapping took about 15 person-days

minus Including dataontotextcom portal setup promotion documentation etc

Open data integration for news analytics

Sample queries at httpdataontotextcom

Q1 Countries by number of entities related to them

Q2 Country pairs by ownership statistics

Q3 Statistics by incorporation year

Q4 Officers and entities by number of capital relations

Q5 Countries in Eastern Europe by number of owners

Q6 Intermediaries in Asia by name

Q7 The best connected officers

Q8 Countries by number of Person and Company officers

Play with semantically enriched news

httpnowontotextcom

Play with open data at

httpdataontotextcom and httpff-

newsontotextcom

Offshore Leaks Database

Open data integration for news analytics

Offshore Leaks DB as Linked Open Data

bull Ontotext published the Offshore Leaks DB as Linked Open Data

bull Available for exploration querying and download at

httpdataontotextcom

bull ONTOTEXT DISCLAIMERS

We use the data as is provided by ICIJ We make no representations and warranties of any kind

including warranties of title accuracy absence of errors or fitness for particular purpose All

transformations query results and derivative works are used only to showcase the service and

technological capabilities and not to serve as basis for any statements or conclusions

Open data integration for news analytics

Enrichment and structuring of the data

bull Relationship type hierarchy

minus About 80 types of relationship types in the original dataset got organized in a property hierarchy

bull Classification of officers into Person and Company

minus In the original database there is no way to distinguish whether an officer is a physical person

bull Mapping to DBPedia

minus 209 countries referred in Offshore Leaks DB are mapped to DBPedia

minus About 3000 persons and 300 companies mapped to DBPedia

bull Overall size of the repository 22M statements (20M explicit)

Open data integration for news analytics

The RDF-ization Process

bull Linked data variant produced without programming

minus The raw CSV files are RDF-ized using TARQL httptarqlgithubio

minus Data was further interlinked and enriched in GraphDB using SPARQL

bull The process is documented in this README file

bull All relevant artifacts are open-source available at

httpsgithubcomOntotext-ADleaks

bull The entire publishing and mapping took about 15 person-days

minus Including dataontotextcom portal setup promotion documentation etc

Open data integration for news analytics

Sample queries at httpdataontotextcom

Q1 Countries by number of entities related to them

Q2 Country pairs by ownership statistics

Q3 Statistics by incorporation year

Q4 Officers and entities by number of capital relations

Q5 Countries in Eastern Europe by number of owners

Q6 Intermediaries in Asia by name

Q7 The best connected officers

Q8 Countries by number of Person and Company officers

Play with semantically enriched news

httpnowontotextcom

Play with open data at

httpdataontotextcom and httpff-

newsontotextcom

Offshore Leaks DB as Linked Open Data

bull Ontotext published the Offshore Leaks DB as Linked Open Data

bull Available for exploration querying and download at

httpdataontotextcom

bull ONTOTEXT DISCLAIMERS

We use the data as is provided by ICIJ We make no representations and warranties of any kind

including warranties of title accuracy absence of errors or fitness for particular purpose All

transformations query results and derivative works are used only to showcase the service and

technological capabilities and not to serve as basis for any statements or conclusions

Open data integration for news analytics

Enrichment and structuring of the data

bull Relationship type hierarchy

minus About 80 types of relationship types in the original dataset got organized in a property hierarchy

bull Classification of officers into Person and Company

minus In the original database there is no way to distinguish whether an officer is a physical person

bull Mapping to DBPedia

minus 209 countries referred in Offshore Leaks DB are mapped to DBPedia

minus About 3000 persons and 300 companies mapped to DBPedia

bull Overall size of the repository 22M statements (20M explicit)

Open data integration for news analytics

The RDF-ization Process

bull Linked data variant produced without programming

minus The raw CSV files are RDF-ized using TARQL httptarqlgithubio

minus Data was further interlinked and enriched in GraphDB using SPARQL

bull The process is documented in this README file

bull All relevant artifacts are open-source available at

httpsgithubcomOntotext-ADleaks

bull The entire publishing and mapping took about 15 person-days

minus Including dataontotextcom portal setup promotion documentation etc

Open data integration for news analytics

Sample queries at httpdataontotextcom

Q1 Countries by number of entities related to them

Q2 Country pairs by ownership statistics

Q3 Statistics by incorporation year

Q4 Officers and entities by number of capital relations

Q5 Countries in Eastern Europe by number of owners

Q6 Intermediaries in Asia by name

Q7 The best connected officers

Q8 Countries by number of Person and Company officers

Play with semantically enriched news

httpnowontotextcom

Play with open data at

httpdataontotextcom and httpff-

newsontotextcom

Enrichment and structuring of the data

bull Relationship type hierarchy

minus About 80 types of relationship types in the original dataset got organized in a property hierarchy

bull Classification of officers into Person and Company

minus In the original database there is no way to distinguish whether an officer is a physical person

bull Mapping to DBPedia

minus 209 countries referred in Offshore Leaks DB are mapped to DBPedia

minus About 3000 persons and 300 companies mapped to DBPedia

bull Overall size of the repository 22M statements (20M explicit)

Open data integration for news analytics

The RDF-ization Process

bull Linked data variant produced without programming

minus The raw CSV files are RDF-ized using TARQL httptarqlgithubio

minus Data was further interlinked and enriched in GraphDB using SPARQL

bull The process is documented in this README file

bull All relevant artifacts are open-source available at

httpsgithubcomOntotext-ADleaks

bull The entire publishing and mapping took about 15 person-days

minus Including dataontotextcom portal setup promotion documentation etc

Open data integration for news analytics

Sample queries at httpdataontotextcom

Q1 Countries by number of entities related to them

Q2 Country pairs by ownership statistics

Q3 Statistics by incorporation year

Q4 Officers and entities by number of capital relations

Q5 Countries in Eastern Europe by number of owners

Q6 Intermediaries in Asia by name

Q7 The best connected officers

Q8 Countries by number of Person and Company officers

Play with semantically enriched news

httpnowontotextcom

Play with open data at

httpdataontotextcom and httpff-

newsontotextcom

The RDF-ization Process

bull Linked data variant produced without programming

minus The raw CSV files are RDF-ized using TARQL httptarqlgithubio

minus Data was further interlinked and enriched in GraphDB using SPARQL

bull The process is documented in this README file

bull All relevant artifacts are open-source available at

httpsgithubcomOntotext-ADleaks

bull The entire publishing and mapping took about 15 person-days

minus Including dataontotextcom portal setup promotion documentation etc

Open data integration for news analytics

Sample queries at httpdataontotextcom

Q1 Countries by number of entities related to them

Q2 Country pairs by ownership statistics

Q3 Statistics by incorporation year

Q4 Officers and entities by number of capital relations

Q5 Countries in Eastern Europe by number of owners

Q6 Intermediaries in Asia by name

Q7 The best connected officers

Q8 Countries by number of Person and Company officers

Play with semantically enriched news

httpnowontotextcom

Play with open data at

httpdataontotextcom and httpff-

newsontotextcom

Sample queries at httpdataontotextcom

Q1 Countries by number of entities related to them

Q2 Country pairs by ownership statistics

Q3 Statistics by incorporation year

Q4 Officers and entities by number of capital relations

Q5 Countries in Eastern Europe by number of owners

Q6 Intermediaries in Asia by name

Q7 The best connected officers

Q8 Countries by number of Person and Company officers

Play with semantically enriched news

httpnowontotextcom

Play with open data at

httpdataontotextcom and httpff-

newsontotextcom

Play with semantically enriched news

httpnowontotextcom

Play with open data at

httpdataontotextcom and httpff-

newsontotextcom