Scraping in 20 mins

Post on 06-May-2015

5.968 views 1 download

Tags:

description

Presenti

Transcript of Scraping in 20 mins

Paul BradshawLeanpub.com/scrapingforjournalists*

Scraping in 20 mins

Friday, 13 July 2012

*

Friday, 13 July 2012

*

Function (Parameters)

Friday, 13 July 2012

*

Function (Parameters)=SUM(A2:A50)=AVERAGE(B2:B300)=COUNTIF(A10:A3000,”Smith”)

Friday, 13 July 2012

*

(“string”, index)

Friday, 13 July 2012

*

Tip: search for documentation

Friday, 13 July 2012

*

Tip: search for structure around data

Friday, 13 July 2012

*

Friday, 13 July 2012

*

//div[starts-with(@class, ‘jobWrap’)]

Friday, 13 July 2012

*

bit.ly/nrwscraper2

Friday, 13 July 2012

*

excelnotes.posterous.com/tag/importxml/tag/importhtml

Friday, 13 July 2012

*

Friday, 13 July 2012

Things to know

• Libraries• Functions• Variables• Lists or arrays [‘Bob’, ‘Jane’]• Index• String, integer, float• If/Else• For loops• Operators

Friday, 13 July 2012

Following the data

• From String (URL) ->• Variable (html) ->• Variable (root) ->• Variable containing a list (tds) ->• Variable (td)

Friday, 13 July 2012

Looping through a list

• Tds = [‘Duarte’, ‘Sihl’, ‘Franzi’, ‘Paul’]• For td in tds• The first time, td = Duarte• The second time, td = Sihl• Then td = Franzi• Then td = Paul• Then it has finished the loop!

Friday, 13 July 2012

*

Friday, 13 July 2012

***

Leanpub.com/scrapingforjournalists@paulbradshaw

onlinejournalismblog.comhelpmeinvestigate.com

slideshare.net/onlinejournalistlinkedin.com/in/onlinejournalist

Friday, 13 July 2012