Web Crawlers - 情報セキュリティ株式会社 · •Web crawlers are known by a variety of...

Post on 16-Mar-2020

7 views 0 download

Transcript of Web Crawlers - 情報セキュリティ株式会社 · •Web crawlers are known by a variety of...

Information Security Inc.

Web Crawlers

Information Security Confidential - Partner Use Only

Contents

2

• What are Web Crawlers?

• Ways to crawl a website

• References

Information Security Confidential - Partner Use Only

What are Web Crawlers?

3

• Web crawlers are known by a variety of names – industry jargon

labels them spiders or bots but technically they are referred to as

web crawlers

Information Security Confidential - Partner Use Only

Ways to crawl a website

4

• Metasploit

Information Security Confidential - Partner Use Only

Ways to crawl a website

5

• HTTrack

Information Security Confidential - Partner Use Only

Ways to crawl a website

6

• Black Widow

Information Security Confidential - Partner Use Only

Ways to crawl a website

7

• Burp Suite Spider

Information Security Confidential - Partner Use Only

Ways to crawl a website

8

• Scrapy framework

(https://doc.scrapy.org/en/master/intro/tutorial.html)

Information Security Confidential - Partner Use Only

Ways to crawl a website

9

• Scrapy framework

(https://doc.scrapy.org/en/master/intro/tutorial.html)

Information Security Confidential - Partner Use Only

Ways to crawl a website

10

• Scrapy framework

(https://doc.scrapy.org/en/master/intro/tutorial.html)

Information Security Confidential - Partner Use Only

Ways to crawl a website

11

• Scrapy framework

(https://doc.scrapy.org/en/master/intro/tutorial.html)

▲ Example Spider (extract all links and follow them)

Information Security Confidential - Partner Use Only

References

12

• Wikipedia

https://en.wikipedia.org/wiki/Web_crawler

• ScienceDaily

https://www.sciencedaily.com/terms/web_crawler.htm

• Metasploit

https://www.metasploit.com

• HTTrack

https://www.httrack.com

• Black Widow

http://softbytelabs.com/us/downloads.html

• Burp Suite

https://portswigger.net/burp

• Scrapy

https://scrapy.org/