Importing Data into Neo4j quickly and easily - StackOverflow
ETL into Neo4j
-
Upload
max-de-marzi -
Category
Technology
-
view
15.945 -
download
0
description
Transcript of ETL into Neo4j
![Page 1: ETL into Neo4j](https://reader036.fdocuments.net/reader036/viewer/2022062405/554f6455b4c905c8088b4c5e/html5/thumbnails/1.jpg)
ETL into Neo4j
Max De Marzi
![Page 2: ETL into Neo4j](https://reader036.fdocuments.net/reader036/viewer/2022062405/554f6455b4c905c8088b4c5e/html5/thumbnails/2.jpg)
About Me
• My Blog: http://maxdemarzi.com• Find me on Twitter: @maxdemarzi• Email me: [email protected]• GitHub: http://github.com/maxdemarzi
Built the Neography Gem (Ruby Wrapper to the Neo4j REST API)Playing with Neo4j since 10/2009
![Page 3: ETL into Neo4j](https://reader036.fdocuments.net/reader036/viewer/2022062405/554f6455b4c905c8088b4c5e/html5/thumbnails/3.jpg)
Agenda
• ETL your mind• ETL with Batch and the REST API• ETL with Gremlin and Groovy• ETL with the Batch Importer• ETL from SQL
![Page 4: ETL into Neo4j](https://reader036.fdocuments.net/reader036/viewer/2022062405/554f6455b4c905c8088b4c5e/html5/thumbnails/4.jpg)
ETL your Mind
You have to start there
![Page 5: ETL into Neo4j](https://reader036.fdocuments.net/reader036/viewer/2022062405/554f6455b4c905c8088b4c5e/html5/thumbnails/5.jpg)
More Relational than Relational
Stop thinking about howTables are related
Start thinking about relationships
![Page 6: ETL into Neo4j](https://reader036.fdocuments.net/reader036/viewer/2022062405/554f6455b4c905c8088b4c5e/html5/thumbnails/6.jpg)
Objects like to mingle
Optimized for “trees” of data Optimized for seeing the forest and the trees, and the branches, and the trunks
![Page 7: ETL into Neo4j](https://reader036.fdocuments.net/reader036/viewer/2022062405/554f6455b4c905c8088b4c5e/html5/thumbnails/7.jpg)
SELECT skills.*, user_skill.* FROM users JOIN user_skill ON users.id = user_skill.user_id JOIN skills ON user_skill.skill_id = skill.id WHERE users.id = 1
![Page 8: ETL into Neo4j](https://reader036.fdocuments.net/reader036/viewer/2022062405/554f6455b4c905c8088b4c5e/html5/thumbnails/8.jpg)
START user = node(1) MATCH user -[user_skill]-> skill RETURN skill, user_skill
![Page 9: ETL into Neo4j](https://reader036.fdocuments.net/reader036/viewer/2022062405/554f6455b4c905c8088b4c5e/html5/thumbnails/9.jpg)
Property Graph
![Page 10: ETL into Neo4j](https://reader036.fdocuments.net/reader036/viewer/2022062405/554f6455b4c905c8088b4c5e/html5/thumbnails/10.jpg)
name
code
word_count
Language
name
code
flag_uri
Country
IS_SPOKEN_IN
as_primary
language_code
language_name
word_count
Language
country_code
country_name
flag_uri
Country
language_code
country_code
primary
LanguageCountry
![Page 11: ETL into Neo4j](https://reader036.fdocuments.net/reader036/viewer/2022062405/554f6455b4c905c8088b4c5e/html5/thumbnails/11.jpg)
name: “Canada”
languages_spoken: “[ ‘English’, ‘French’ ]”
name: “Canada”
language:“English”
language:“French”
spoken_in
spoken_in
name: “USA”
name: “France”
spoken_in
spoken_in
![Page 12: ETL into Neo4j](https://reader036.fdocuments.net/reader036/viewer/2022062405/554f6455b4c905c8088b4c5e/html5/thumbnails/12.jpg)
name
flag_uri
language_name
number_of_words
yes_in_langauge
no_in_language
currency_code
currency_name
Country
USES_CURRENCY
name
flag_uri
Country
name
number_of_words
yes
no
Language
SPEAKS
code
name
Currency
![Page 13: ETL into Neo4j](https://reader036.fdocuments.net/reader036/viewer/2022062405/554f6455b4c905c8088b4c5e/html5/thumbnails/13.jpg)
ETL with Batch and the REST API
![Page 14: ETL into Neo4j](https://reader036.fdocuments.net/reader036/viewer/2022062405/554f6455b4c905c8088b4c5e/html5/thumbnails/14.jpg)
Batch command from REST API
Great for importing Facebook/Twitter friends
Keep each request under 10k commands
Preferably send a request every 2k to 5k commands
![Page 15: ETL into Neo4j](https://reader036.fdocuments.net/reader036/viewer/2022062405/554f6455b4c905c8088b4c5e/html5/thumbnails/15.jpg)
Using Batch from Neography
![Page 16: ETL into Neo4j](https://reader036.fdocuments.net/reader036/viewer/2022062405/554f6455b4c905c8088b4c5e/html5/thumbnails/16.jpg)
Why BatchTransactional: any failures not committed.
Ordered: responses guaranteed to be in the same order as sent.
Continuous loading/updating nodes and relationships in spurts or streaming.
![Page 17: ETL into Neo4j](https://reader036.fdocuments.net/reader036/viewer/2022062405/554f6455b4c905c8088b4c5e/html5/thumbnails/17.jpg)
ETL with Gremlin and Groovy
![Page 18: ETL into Neo4j](https://reader036.fdocuments.net/reader036/viewer/2022062405/554f6455b4c905c8088b4c5e/html5/thumbnails/18.jpg)
Commit every 1000 changes or so, make sure to stop the transaction to commit the last few changes at the very end.
Look into auto-indexing to make life easier.
Disabled by default. See Docs for trick to make it full text instead of exact index.
http://docs.neo4j.org/chunked/milestone/auto-indexing.html
![Page 19: ETL into Neo4j](https://reader036.fdocuments.net/reader036/viewer/2022062405/554f6455b4c905c8088b4c5e/html5/thumbnails/19.jpg)
Crazy Format is okId :: Title :: Genre|Genre|Genre
But it’s preferable to stay clear of escape characters like “|”
String location of data file, converted to URL, then processed one line at a time.Movie vertex created, genre vertex created unless it exists (index lookup), edge from movie to genre is created.
Full walk-through on http://maxdemarzi.com/2012/01/13/neo4j-on-heroku-part-one/
![Page 20: ETL into Neo4j](https://reader036.fdocuments.net/reader036/viewer/2022062405/554f6455b4c905c8088b4c5e/html5/thumbnails/20.jpg)
ETL with the Batch Importer
![Page 21: ETL into Neo4j](https://reader036.fdocuments.net/reader036/viewer/2022062405/554f6455b4c905c8088b4c5e/html5/thumbnails/21.jpg)
Installation Walk-Through
![Page 22: ETL into Neo4j](https://reader036.fdocuments.net/reader036/viewer/2022062405/554f6455b4c905c8088b4c5e/html5/thumbnails/22.jpg)
Testing it
7.5M nodes, 42M relationships in just over 3 minutes on a laptop.
![Page 23: ETL into Neo4j](https://reader036.fdocuments.net/reader036/viewer/2022062405/554f6455b4c905c8088b4c5e/html5/thumbnails/23.jpg)
Loading it into Neo4j
Full walk-through on http://maxdemarzi.com/2012/02/28/batch-importer-part-1/
![Page 24: ETL into Neo4j](https://reader036.fdocuments.net/reader036/viewer/2022062405/554f6455b4c905c8088b4c5e/html5/thumbnails/24.jpg)
When to use the Batch Importer?
• 1st time loading or periodic reloading
• When you need Speed
• When you don’t mind a little Java
![Page 25: ETL into Neo4j](https://reader036.fdocuments.net/reader036/viewer/2022062405/554f6455b4c905c8088b4c5e/html5/thumbnails/25.jpg)
ETL from SQL
![Page 26: ETL into Neo4j](https://reader036.fdocuments.net/reader036/viewer/2022062405/554f6455b4c905c8088b4c5e/html5/thumbnails/26.jpg)
Identities who vouched for each other
row_number() and INTO are our friends
![Page 27: ETL into Neo4j](https://reader036.fdocuments.net/reader036/viewer/2022062405/554f6455b4c905c8088b4c5e/html5/thumbnails/27.jpg)
The “term” vouched for will serve as our relationship type, status is a relationship property.
![Page 28: ETL into Neo4j](https://reader036.fdocuments.net/reader036/viewer/2022062405/554f6455b4c905c8088b4c5e/html5/thumbnails/28.jpg)
Notice there are no node ids.These are automatic, clkao is node 1
![Page 29: ETL into Neo4j](https://reader036.fdocuments.net/reader036/viewer/2022062405/554f6455b4c905c8088b4c5e/html5/thumbnails/29.jpg)
No time to get coffee >8-[
![Page 30: ETL into Neo4j](https://reader036.fdocuments.net/reader036/viewer/2022062405/554f6455b4c905c8088b4c5e/html5/thumbnails/30.jpg)
What about multiple types of nodes?No problem, just add the MAX(node_id) from the first table.
Full walk-through at: http://maxdemarzi.com/2012/02/28/batch-importer-part-2/
Need help? E-mail me, catch me on Google chat or Skype.
Please don’t be shy…. and read my blog:
http://maxdemarzi.com
![Page 31: ETL into Neo4j](https://reader036.fdocuments.net/reader036/viewer/2022062405/554f6455b4c905c8088b4c5e/html5/thumbnails/31.jpg)
Thank you!http://maxdemarzi.com