Scrapebox Guide€¦ · years the SEO community has been needing one true ultimate Scrapebox...

Scrapebox GuideThis guide is going to teach you how to become a Scrapebox master, so brace yourself. Foryears the SEO community has been needing one true ultimate Scrapebox tutorial, however,no SEO has been brave enough to see it all the way through. At first, I thought it would beimpossible to complete. But then five weeks and 9,000 words later it was finally here, enjoyeveryone.

Contents [hide]

1 Chapter 1: Introduction to Scrapebox2 Chapter 2: Building Footprints3 Chapter 3: Massive Scraping4 Chapter 4: Keyword Research5 Chapter 5: Expired Domaining6 Chapter 6: Link Prospecting7 Chapter 7: Guest Posting8 Chapter 8: Comment Blasting9 Chapter 9: Niche Relevant Comments10 Chapter 10: PageRank Sculpting11 Chapter 11: The Automator12 Chapter 12: Competitor Backlink Analysis13 Chapter 13: Free Scrapebox Addons14 Chapter 14: The END

Chapter 1: Introduction to ScrapeboxIf you are experienced with Scrapebox then please feel free to skip straight to other sections,but for the complete newbies out there we will walk through everything. Ok, so you havedownloaded and installed your copy of Scrapebox (this can be done locally or on VPS). It isnow crucial that you purchase a set of private proxies if you are going to do some seriousscraping.

Update: Scrapebox Discount Code for $40 off If you don’t already own SB, here is the Blackhat World $40 off discount link that most peopledon’t know about. Simply go to Scrapebox.com/BHW and the price will automatically havethe $40 off coupon code applied.

What are proxies and why do we need them?

A proxy server acts as a middle man for Scrapebox to use in grabbing data. Our primarytarget Google, does not like it when their engine is hit multiple times from the same IP in ashort time frame, which is why we use proxies. Then the requests are divided amongst all theproxies allowing us to grab the data we’re after.

So pick yourself up a set of at least 25 private ScrapeBox proxies. Personally I use 100 butI go hard. Start with 25 and see if that works out for you. Get acquainted with the ScrapeboxUI. It can be quite intimidating at first, but trust me, after some time you will become verycomfortable with the interface and understand everything about it.

See the field where it says “Proxies go here”? That is where you paste in your proxies.

The required formatting is – IP:Port:Username:Password

Depending on your provider you might have to rearrange your proxies so they follow thisformat. If your proxies don’t have passwords attached and are activated through browserlogin, then just enter the ip:port portion after logging in.

Then we will click manage.

http://www.jacobking.com/likes/squidproxies

Now the proxy test screen will pop up and we will click “Test all proxies“.

If everything is good to go, you will see nothing but green success and Y for “yes” on theGoogle check. This is crucial! If your proxies aren’t working, you are dead in the water. Somake sure you use a reliable provider with quick proxies, otherwise this is going to be auseless endeavor. First click the filter button and then “Keep Google proxies” to remove anybad proxies.

Good proxies are everything when it comes to using ScrapeBox effectively, so invest in a setfrom SquidProxies if you’re serious about scraping.

Now click “Save to Scrapebox” and it will send all your working proxies back to Scrapebox (ifthey are all working just close).

Ok. So our proxies are good to go, now for our settings.

Everything is good at default for the weekend scrapers out there. If you want to turn the heatup then go to “Adjust Maximum Connections” under the Settings tab. From here you cantweak the amount of connections used when hitting Google under “Google Harvester”settings. The amount in which you can push depends on the amount of proxies you areusing. I usually run 100 proxies at 10 connections, do the math. But also keep in mind thenumber of connections allowed depends on the type of queries you are doing. More on thatin a minute.

For a massive list of footprints all using site: operator, you should turn it down. i.e. the Googleindex check.

And to learn more about proxies, here is a comparison of the top providers I recently ran.

http://www.jacobking.com/likes/squidproxies

http://www.jacobking.com/best-private-proxies

Chapter 2: Building FootprintsWhat is a footprint?

A footprint is anything that consistently come up on the webpages you are trying to find in thesearch engine index.

So if you are looking for WordPress blogs to comment on, the text “Powered by WordPress”is something very common on WordPress blogs. Why is it common? Because the text comeson the default theme.

Bingo, we’ve got ourselves a footprint. Now if you combine that with our target keyword thenyou can start digging up some WordPress blogs/posts in your niche. And yes, we will go waymore in depth but for now understanding this simple example will be enough.

Good footprints are now your best friend as a Scrapebox user. Building them is very simplebut takes some focus and attention. This is where you’re going to be better then the averageScrapebox user. If you are any type of white hat link builder then you have certainly usedsome sort of footprint before, you just might not have called it a “footprint”.

Have you tried searching out guest post opportunities or link resource pages before? You areusing footprints.

But in this section we are building footprints and for strategic reasons. We will build sets offootprints and use them again and again for specific purposes. As a quick side note let meremind you that replication is one of the keys to success in SEO, so let’s build some badassfootprints and start using them over and over again.

Fortunately I have included a massive list of footprints categorized by target platform that I’vespent years digging up. They are enclosed below.

Once you understand the goal, building footprints is quite simple. Pull up some examples ofthe target site you are trying to find. Looking for link partner pages? Well bring up a handfulthat you can find and open them in a bunch of tabs. Compare each one and look forconsistent on page elements.

See a phrase that comes up all the time? You might have yourself a footprint.

And if you haven’t yet, and you call yourself an SEO, become an expert with advancedGoogle search operators.

This knowledge is key to being an effective search engine scraper. So take some time, study,and become a search modifier guru. Then apply that to your footprint building and build somekiller prints.

There are two main elements to hunt for when building footprints.

Either in the url structure or in content somewhere.

Here are my goto operators.

inurl:

intitle:

intext:

How to Test Footprints:

After you think you’ve created a footprint testing them is incredibly simple. Just go Googlethem!

First note how many results come up. If it’s under 1,000 your footprint sucks.

http://www.googleguide.com/advanced_operators_reference.html

We are trying to create footprints that will dig up tons of sites based on platform so thenumber should be decent.

Comb through the results and see how much honey your footprint is finding for you. See abunch ofthe site types you’re searching for? Good, bank that footprint and continue buildmore. Save your footprints with titles for their specific purpose, so say “Vbulletin Footprints”for finding Vbulletin forums. Now that you have some footprints ready, let’s move on tomassive scrapes.

Chapter 3: Massive ScrapingNow you may or may not know what you’re looking for, so let’s get a ton of it.

If you want to scrape big, you’re going to have leave Scrapebox running for a good amount oftime. Sometimes even for several days. For this purpose, some may opt for a Virtual PrivateServer or VPS. This way you can set and forget Scrapebox, close the VPS, and go aboutyour business without taking up resources on your desktop computer. Also know thatScrapebox is PC only but you can run it with Parallels. If you do run SB on Parallels, be sureto increase your RAM allocation. Hit me up if you need some help getting a VPS set up.

Here’s are the different elements you need to consider with big scrapes:

Number of proxiesSpeed of proxiesNumber of connectionsNumber of queriesDelay between each query

With the default settings everything should be golden, so the determinant of how long yourscrapes will take will be mainly on how many “keywords” you put in.

You can change the number of connections – This depends on if you are using private orpublic proxies, and how many working ones you have.

As I mentioned before I usually run with a set of 100 and set my Google threads to 10.

The keyword field in Scrapebox is where you paste in your keywords and merge in yourfootprints.

Merging is very simple. All we are doing is taking what ever is listed in scrapebox andmerging it with a file that contains the list of our footprints, keywords, or stop words. So say

http://www.jacobking.com/how-to-cancel-your-vps

http://download.parallels.com/desktop/v5/docs/en/Parallels_Desktop_Users_Guide/22497.htm

taking keyword “powered by wordpress” and merging it with “dog training” to create.

“powered by wordpress” “dog training”

Ahh yes, this Scrapebox thing is starting to make some sense now.

Now we’re after some urls from some of our favorite search engines, which one is up to us.

See how only Google is checked? This means Scrapebox will only harvest urls from Google.If you want to hit the other engines just select them. Also be sure that you have Use Proxieschecked.

Note: You can also add foreign language Google engines by clicking the dropdown and “addmore google“.

Simply add the extensions for the languages you are going for and click save.

The final thing to note before starting is the Results field.

Very straight forward, this is the number of results (or urls) Scrapebox will grab from thespecified search engine(s).

Depending on your goals, set this accordingly. If I am scraping for some sites to link out to insome of my link building content, I will only go 25 results deep for each keyword. But if I amtrying to find every possible site out there for a certain platform I will do 1,000. And this bringsus to our next problem.

What if our query yields more than 1,000 results?

This is where merging in stop words comes into play.

Manually try the query “dog training” “powered by wordpress”.

You will see there are over 500,000 results.

Now see what happens when I add the word “there”.

Besides that stupid Lynda.com ad, the organic results are different now. By using stop wordscombined with our footprints we can effectively scrape deeper into Google’s index and getaround that 1,000 result limit.

Don’t worry, you can download my personal list of stopwords by sharing this guide below.Keep reading!

Once you have some quality footprints and stop words ready, the rest is easy. We’re going tolet Scrapebox rip and come back when complete. If you’re running on your desktop thenscrape overnight to minimize downtime on your system.

After Scrapebox is complete you will see the prompt saying Scrapebox is complete.

Now if you stop the harvester prematurely a prompt will appear showing you the queries thathave been successfully run and the ones that have not.

Noncomplete queries can mean one of two things.

1. There were zero results for that query.

and

2. That query has not been hit yet.

If you want to complete this harvest later then be sure to export “NonComplete

Keywords”and set them aside. If you inputted a list of 10,000 queries, stopped after 2,000,then you just save the remaining 8,000 queries for later.

One of the keys to massive scrapes is understanding that Scrapebox only holds 1,000,000urls in the urls field and stacks files in the “Harvester Sessions” folder.

For each scrape, the software will create a time stamped folder containing txt files with eachbatch of 1,000,000 urls. And this is great but if you don’t know about Duperemove then youare burnt.

Duperemove is an amazing free addon from Scrapebox that allows you to merge list ofmillions of ulrs and remove dupes and dupe domains. This way we can run massive scrapesand process the resulting URLs.

We can also use Duperemove to split a massive file into smaller files so we can furtherprocess the resulting urls. We can take 100,000 urls and split them into ten files with 10,000urls for example.

After finishing a massive scrape, open dupe remove.

Start by clicking “Select source files to merge” and navigating to your harvester folder withyour batch files of 1,000,000 URLs. Also be sure to save the urls left in the Scrapeboxharvester when stopped, and put this file with the rest of batch files.

Select all the files and give the output file a name, I like to call it “Bulking up”. Now click“Merge files”.

Duperemove will merge everything into one enormous txt file so you can then remove dupeurls and dupe domains.

Below the Merge lists field, select the previous file “Bulking up” and chose a file name for thenew output, I like to call it “Bulking down” .

Then click Remove Dupe URLs and Remove Dupe Domains. Now you have a clean list ofUrls without duplicates. Depending on what you have planned for this giant list I will use thesplit files tool and split the large file into smaller more manageable files.

And now that we have covered everything about footprint building and massive scrapes, let’smove onto keyword research.

Chapter 4: Keyword Research

Having fun yet? Now that we’ve gotten all the introduction shit, things are going to startgetting good.

With keyword research Scraebox continues to be one of my “go to” tools. It has two mainweapons; suggesting tons of Keyword suggestions and giving us Google exact match resultnumbers.

Keyword Research Weapon #1 – The Power of Suggestion

With this method we will be using Scrapebox to harvest 100s or 1000s of suggestions relatedto our keywords. Then we will use the Google keyword tool to get volume and move on to ourresearch weapon #2.

First we will explore the suggestion possibilities and how the keyword scraper works.

Start by clicking the Scrape dropdown, and then Keyword Scraper.

Now after you get the keyword scraper open, type in the keyword you would like to scrapesuggestions for.

Next you can select the sources you for which the scraper will grab for suggestions.

Protip – Tick the YouTube box if you’re doing keyword research specifically for YouTube

videos. Searches can be very different on Youtube compared to typical Google queries.

After you have finished the first run through scraping keywords, remove duplicates, and thenyou have two options.

You can send the results straight to Scrapebox and move on or you can transfer them to theleft and scrape the resulting keywords for more suggestions. You can repeat this processover and over again until you get the desired amount of keywords. Scrape, remove dupes,transfer left, scrape again, crack beer. It’s actually quite enjoyable.

So now that you have keyword scraping/suggesting down we will move on to one of thesimplest and most powerful free addons for Scrapebox. If you haven’t yet, click “addons” inthe top nav, then “show available addons”. Now install the Google Competition Finder addon.

Keyword Research Weapon #2 – Google Exact Match Results

After you open the competition finder the first step is to import the keywords from Scrapebox.Click Load Keywords and Load from Scrapebox.

Also be sure that the Exact match box is ticked. This way Scrapebox will wrap your keywordsin quotes and get the exact match results for each. You can also change the number of

connections for large keyword lists but I would recommend keeping it at the default of 10.Give your proxies a chance to breath.

When all the results are in, click the Export dropdown, and Export content of grid as csv.

Now you will have a nice csv with all your keywords and the corresponding results. The nextstep is to open the grid with excel and sort the data from low to high. Delete the proxy usedand status column, then click the Sort dropdown and “Custom Sort“.

Now that the custom sort screen is open, select the column with the results and sort fromsmallest to largest.

After you click OK you will have a nice sorted list of keywords with exact match results fromlow to high.

Depending on the yield I get, I will break the keywords down into ranges of exact matchresults.

050

50100

100500

5001000

10005000

http://www.jacobking.com/wp-content/uploads/2013/07/excel-custom-sort-screen-1.png

From there I will paste each range into the keyword tool, gather volume, and sort again, thistime from high to low on the search volume. Then you can comb through and find some easyslam dunkable keywords.

Now this is by no means a 100% indicator of Google competition but it’s a good roughestimate. And when the number is REALLY low, it becomes a more accurate indicator of aneasy to dominate keyword. This method can be extremely helpful when you have a massivelist of keywords and you are trying to figure out which ones to target with some supportingcontent, boom, go for the ones with volume that you can easily rank for. This method willunlock those.

Chapter 5: Expired DomainingThis is by far one of the most powerful grey hat SEO areas in the game. Expired domainscan hold a ton of juice, you just need to know how to find them and how to properly relaunchthem. Before diving into the Scrapebox methods we will go over the basics of expireddomaining.

There are three areas you can focus your domaining efforts or some combination of thethree; Building a blog network, creating money sites, and link laundering.

1. Building a blog network

Building a network is one of the most powerful SEO techniques in the business. Owning aprivate network of over 100 sites PR 16 is quite nice, think about it.

Private Blog Network 101

There is nothing wrong with building a private blog network. This SEO strategy is not flawedin anyway. The only flaw is from the creator.

If you leave a footprint, that allows Google to identify the network and your network becomesuseless. And like many other things, after the Google propaganda disseminated throughoutthe community, people deemed PBNs worthless and ineffective. But when done right, linksfrom your private network will be just as effective as naturally occurring links on authoritysites.

Main Points:

*Use many diverse IPs and hosting accounts

*Use different themes, category structures, permalinks, and www. vs root *Vary the extensions! .com, .net, .info, .org, .etc *Use different domain registrars with some private registrations and some with old owner’sinformation. Godaddy, namecheap, etc. some private and some with joe schmoe. *Build some good links to each site.

2. Creating money sites

Occasionally you will find a nice domain that is fitting for a money site. In this case, congrats,you just found yourself an SEO time machine.

I’ve gone back as much as 10 years before and gained myself 40,000 natural links!

How about building a brand new site and working with a domain like that?!

These are rare but they’re out there. Most likely you’re going to have to pay for it in a smallbidding war unless you get lucky. But if you know it’s a winner, then go for it.

Always be cautious with drastically changing the old content theme of the site. If you have amoney domain about dog snuggies, figure out a way to rank and monetize it while keepingthe content semantically relevant to that topic. Used effectively you will easily exceed theresults from the same exact efforts on a fresh domain. Also if you get an aged domain with adiverse natural link profile you will be much safer blasting some links at the site. An existingdiverse link profile can effectively camouflage grey hat link building tactics.

3. Link laundering

This is by far the dirtiest method of all when it comes to expired domaining shenanigans.With this technique we will be using our friend the 301 redirect to redirect pages,subdomains, or entire sites at the site or page we are trying to rank. Effectively sending tonsof link juice while also cloaking our link profile a bit.

See Bluehatseo for more info on link laundering in the traditional way, with this technique wewill be link laundering through server level redirects, specifically the 301.

Step 1. Acquire expired domain

Step 2. Relaunch domain and restore everything.

Step 3. Redirect domain via 301 redirect.

http://grindstoneseo.com/yeah-i-am-that-guy.html

Step 3. Aggressively link build to the now redirected domain.

Here is the redirect code to use in you .htaccess file to execute the redirect:

RewriteEngine on redirectMatch 301 ^(.*)$ http://www.domain.com$1 redirectMatch permanent ^(.*)$ http://www.domain.com$1

After you set the redirect, start blasting some links and enjoy.

Expired Domaining 101

In this section we will step away from Scrapebox a bit and discuss SEO domainingdomination. But don’t worry, we will be back to Scrapebox shortly.

Buying expired domains takes some skill but it’s not rocket science. The thing is, for everygood domain there is ten shitty ones out there that we must avoid.

Here is an overview of the process:

Part 1. Finding domains

Part 2. Analyzing your finds

Part 3. Smart Bidding

Finding Killer Domains with Shit Tons of SEO Juice

Ok, so Scrapebox has the TDNAM scraper addon that we are going to discuss in a momentbut it is limited to only Godaddy auctions. So while this is a free addon, you are notaccessing the entire expired domain market.

In order to do that you are going to have to use some sort of domaining service. Theseservices pull expired feeds from all different sites on the web and also offer some metrics thatScrapebox does not.

Here are my recommended domaining services that I have personally used to snag domainsfor over 100x the initial purchase price.

“

http://www.jacobking.com/likes/godaddyauctions

Freshdrop – This is the top dog, and the price comes with. $99 per month but this isdefinitely the king of expired domain buying tools. If you are trying to build a network then thesubscription will only be short term until you have completed all your domain buys. Recentlythey have added the MajesticSEO API so you can filter results by backlinks right inFreshdrop, pretty awesome.

If you can’t afford this tool then you can still land a whale on Godaddy auctions. Open theTDNAM addon and enter a keyword for domains to lookup.

At default ALL extensions are selected but you can specify between, .com, .net, .org, or .info.Click start and if you don’t already begin feeling like a boss.

After the scraper is finished, click the Export dropdown and Send to Scrapebox.

Analyzing your Domains and Confirming Their Greatness

After we pull up a list of potential prospects it’s time to take things a step further and becertain we have a winner. We will be using the following tools to validate which domains areworth purchasing.

Scrapebox (of course) SEOMoz Api (sorry but for this it’s worth it) Ahrefs

http://www.jacobking.com/likes/freshdrop

Archive.org Domaintools.com

First step is to check the pagerank of each domain prospect (if you haven’t sorted from a toolabove already.

Click the Check Pagrank dropdown and click Get Domain Pagerank.

Now chuck everything with no PR.

Next open the Fake Page Rank Checker addon. This will confirm that each domains haslegitimate Pagerank and not a false redirect.

Open the addon and load your list from Scrapebox. click Start, filter out the trash, and grab abeer.

Open a beer and take a nice chugg, you’re about to get an edge on your competition.

You can now scan through your domains with PR and use your judgement to identifydomains with potential and that you are interested in.

But let’s put this process on steroids shall we?

Now we can use one of the newest free addons, the Page Authority addon. Using the mozapi to scan DA (domain authority) and PA (page authority) we can quickly identify high qualityprospects.

Since we will be using this tool several times later let’s set it up.

After you open the addon, click Account Setup and paste in your access id and api key in thefollowing format.

Access ID|Secret Key

Now click Start and get some great insight from SEOMoz’s internal scoring system. Sure it’snot perfect but gives us a quick and dirty evaluation of the domain prospects. Just enoughscreening to allow us to move on to the next phase of analysis.

Now we need to research the history of the domains and their backlink profiles.

Domain History, What we want:

The shorter the time frame the site has been down the better

Make sure the domain has not changed hands multiple times. Look at the whois history viadomaintools to verify this.

Check Archive.org to see what the site used to be. Something you can roll with?

Backlink Profile, What we want:

Take the domains you’re interested in and start putting them one by one into backlink

checking tool

We want domains juiced with good links, not some piece of shit that someone blasted 10,000viagra links at and threw out after they were done with it. You will also be able to spot an“SEO’d” link profile, just look for an abundance of keyword rich anchors or anchors with lackof natural anchor text distribution and diversity. I avoid these at all costs. Typically SEOshave no idea what they’re doing, so 99% of expired domains that previously had a “linkbuilder” behind them will be complete shit.

Also keep an eye out for some familiar super authority links, like .govs, .edus, and big newssites. Cnet, WSJ, NYtimes, etc. A few of these areusually an indicator of a once legit domain.

Step 3: Smart Bidding

Smart bidding is a very simple process that beginners will neglect.

The process is simple, wait until the last minute and start bidding like a beast.

When you find that money domain with links from bbc.co.uk and huff po, contain yourexcitement and don’t go nuts quite yet.

Depending on the domain auction you’re using, watch the auction, and also set a reminderon your calendar and cell phone.

Whatever works for you, I usually set two timers, the first one hour before the auction closes,and the second 15 minutes before the auction closes.

Use the TimeandDate calculator to find the time in which the domain is going to close. Beready and pounce.

Also keep in mind that early bidding will alert guys like me who occasionally just sort outdomains by # of bids and analyze from there.

So your preemptive $50 bid just alerted me of a quality domain you found that I should throwon my calendar. Then when the time is right I strike like a hungry pit viper out for Pagerankand domain authority.

Conclusion – chill out and bid smart.

This Guide is Originally from Jacobking.com/ultimateguidetoscrapebox

http://www.timeanddate.com/date/dateadd.html

Chapter 6: Link ProspectingIn this chapter we will be analyzing related SERPs to our keyword and looking for places todrop links. Say there is a forum powered by Vbulletin ranking on the 5th page for a relevantkeyword. It would be easy to go and drop a link on that page right? First register for theforum, make a legit profile, go post a few times in other threads, then go drop a nice juicy linkon an already indexed page.

Or if you’re feeling real ambitious, train a VA to run this entire process for you.

Because you see, this same methodology can be applied on a massive level by scanning formultiple platform types.

Using a list of the most popular community and publishing platforms, you should be able tocreate simple html footprints and scan all the urls to identify the potential link dropopportunities.

There are two main approaches that we can use this technique for.

1. Simply analyzing urls related to the target keyword for link dropportunities (see what I didthere).

2. Performing deeper analysis on targeted scrapes

For both methods we will be using the page analyzer plugin to analyze the html code of allthe pages we dig up.

Method #1 – Find Ranking Related Link Dropportunities

Start by scraping a bunch of keyword suggestions closely related to your target keyword.

Set the results to 1,000 and harvest.

Remove dupe urls and open the page scanner addon.

Once the page scanner is open you will need to create the footprints for it to scan with.

Here are some example footprints:

Platform – WordPress wpcontent

Platform – Drupal

Platform – Vbulletin

Platform – General Forum All times are GMT

Note that these footprints are different than the traditional footprints we are building whenscanning for onpage text. We are taking it one step further and scanning the actual sourcecode of the returned pages for a common html element. If you invest the time, you can buildextremely accurate footprints and basically find any platform out there.

After you have inputted the footprints and run the analyzer, export your results. All of theresults will be exported and named by the footprint name. So your Vbullletin linkdropportunities will all be one file name Vbulletin.

Now continue your hunt and perform further link prospecting analysis on the page level.

Check PR, OBLs PA/DA, etc. When completed you will have a finely tuned list of relevantpotential backlink targets to either hand over to a VA or run a posting script on.

Method #2 – General Page Scanning for Targeted Link Dropportunities

With this method I’m going to show you an actual exploit that I discovered the other day toclearly explain this technique.

We are going to be finding blogs with the Comment Luv platform and dofollow links enabled.

All you will need is a few bogey Twitter accounts to tweet the post and get a choice of thepost you want to link to.

*Note – This technique requires your site having a blog feed.

To start we are going to be using an onpage footprint to dig upthese potential comment luv dofollow drops.

Here is the footprint I created, a common piece of text foundright by the comment box, comes default on all Comment Luvinstalls.

“Confirm you are NOT a spammer” “(dofollow)”

And a bit of SEO irony there!

Now save that badboy to a txt file as “Comment Luv Footprint” or something dear to yourheart.

Bust out the keyword scraper and start scraping a shit ton of related suggestions.

Now click the M button and merge that beast in with all your freshly scraped keywords. Clickstart and get ready to unleash the hogs of war.

When the results are in, remove dupes, and open up the page analyzer addon.

Now create a new footprint called “Comment Luv”

And here is the Gem of an html footprint that my buddy Robert Neu came up with.

https://twitter.com/rob_neu

Sorry code wrap not working, check back later.

Thanks Robert!

Now run the analyzer and you’ll have some crisp comment luv enabled dofollow blogs to golink drop your face off.

Hopefully you are starting to see the potential of the page scanner and the wheels areturning. Maybe an evil laugh also?

Chapter 7: Guest PostingContributed by Chris Dyson from TripleSEO.com

If you want to find link building opportunities beyond blog comments, then you can useScrapebox for its primary function which is scraping search results on an industrial scale.

A lot of white hat SEO blogs tell you to run individual searches in Google for inurl:”write forus” + Keyword and use free tools to scrape up to 100 links at a time.

This is a sure fire way to:

a) Get your IP blocked by Google

b) Bore you to death

http://tripleseo.com/

c) Waste your time and money

d) Did I say bore you to death?

Thankfully Scrapebox will come to the rescue here to save your sanity.

#1 – Load up your list of footprints into a custom list in Scrapebox

If you are not sure what to do here please refer back to the “massive scraping section”

#2 – Go grab another cold beer from the refrigerator

Jacob’s office on a Monday Morning

# 3 Now we want to remove any duplicate URLs, in the Remove/Filter drop down you wantto select “Remove Duplicate URL’s” and then “Remove Duplicate Domains”

# 4 – Look up the PageRank

# 5 – Export the results and hand our list over to the VA to check the website is of suitablequality. You also want them to locate the blogs contact information such as name, emailaddress/contact form and whether the site meets the criteria we have for the project.

http://www.jacobking.com/?p=301#Chapter_3_Massive_Scraping

If you haven’t got a web researcher then create a job listing on an outsourcing site such asoDesk to have the links checked against your requirements.

Here is a useful outsourcing guide from Matt Beswick

# 6 – Once your list is cleansed you want to upload the information in to your CRM of choiceand start outreach

Common Guest Blogging Footprints Here is a list of common guest blogging footprints to get you started for free…

guest blogger wantedguest writerguest blog post writer“write for us” OR “write for me”“Submit a blog post”“Become a contributor”“guest blogger”“Add blog post”“guest post”“submit * blog post”“guest column”“contributing author”“Submit post”“submit one guest post”“Suggest a guest post”“Send a guest post”“contributing writer”“Submit blog post”inurl:contributorsinurl:”write for us”guest article OR post”add blog post“submit a guest post”“Become an author”submit postsubmit your own guest post”“Contribute to our site”“Submit an article”“Add a blog post”“Submit a guest post”

http://www.mattbeswick.co.uk/outsourcing-guide/

“Guest bloggers wanted”“guest column”“submit your guest post”“guest article”inurl:”guest posts”“Become * guest writer”inurl:guest*blogger“become a contributor”

Beyond Guest Posting As you can imagine any search query can be added to Scrapebox to harvest URL’s for LinkProspecting for example:

1. Sponsorships2. Scholarships3. Product Reviews4. Discount Programmes5. Resource Lists/Link Pages

It’s quite easy to load your footprints for these types of link building opportunities intoScrapebox and build some high authority links on these types of pages.

keyword + inurl:sponsorskeyword + inurl:sponsorkeyword + intitle:sponsorskeyword + intitle:donorskeyword + intitle:scholarships site:*.edukeyword + intitle:discounts site:*.edu“Submit * for review”keyword + inurl:linkskeyword + inurl:resources

If you are an experienced link builder then you can use other addons in the Scrapebox toolbelt to find broken links or help webmasters fix malware issues on their site.

Chapter 8: Comment BlastingNo Scrapebox guide would be complete without a legit walkthrough on comment blasting.

I know what you’re thinking, comment blasting is so 2006.

http://tripleseo.com/scrapebox-techniques/

Well it is, but only on the first tier. I recommend using blog comment blasts as a third tier linkmore for force indexing.

Since you are dropping comments on indexed and sometimes regularly crawled pages byGoogle, they will crawl your comment link back to whatever tiers you have are linking to thusindexing it.

As in most cases with link blasting, it’s all in the list. So you need to be sure you have adecent auto approve list and aren’t swimming in the gutter too much.

The big determinant is # of outbound links (OBLs) and pagerank. The less OBLs and higherthe PR the better. The thing to be cautious of is if you don’t deeply spin your comments theywill leave an awful footprint which can easily be found with a quick Google search using achunk of your comment output in quotes.

And you can bet your ass if I can dig it up with a few queries than those PHD havingalgorithm writing sons of bitches can too. So keep your game tight.

Here is what you need to run a comment blast:

*Spun Anchors *Fake Auto Generated Emails *List of Websites for Backlinking *Spun Comments *Auto Approve Site List

Spun anchors – To prepare your anchors use the scrapebox keyword suggestions. Selectall sources and scrape a shit ton of keywords. The more comments you plan to blast, themore anchors you should scrape. Get at least a few hundred.

Save this file as names.txt

Optional – Mix in some generic anchors in your list. Simply paste your keyword rich anchorsinto excel and count them, then paste in the desired quantity of generic anchors.

Fake Emails – Under the tools tab you will see “Open Name and Email Generator“, openthat little gem.

After you get this little beauty opened up, type 100,000 in the quantity field, check “Includenumbers in emails” and select Gmail under the dropdown for “Domains for emails @”

After you generate the 100,000 names, just click generate emails, save them as emails.txtand you’re good to go.

List of Websites for Backlinking – If you’ve already built links, check them with the linkchecker, and save those as websites.txt.

Spun Comments – Generating spun comments is actually quite simple. We will simply grabcomments from relevant pages and spin them together.

In the scrapebox harvester, check the WordPress button.

Take you relevant keywords from before and surround them with intitle:”your keyword” *Click start harvesting *Remove duplicates urls when completed *Click on Grab, Grab comments from harvested URL list *Tick Skip comments with URLS *Select to Ignore comments with less than 10 words and URLs in them *Click Start

Now open your favorite text editor and find and replace the page breaks with a space.

For spinning we will be using theBestSpinner.

Copy and paste the exported comments into TheBestSpinner and Click Everyone’s Favorites

*Select Better from the dropdown *Uncheck Replace Everyone’s Favorites inside spun text o Tick Keep the original word foundin the article *Uncheck Only select the #1 best synonym *Spin levels All to All with max synonyms set to 4 *Click Replace *Once complete, highlight all, and select the Spin Together button *Click do not include a blank paragraph

Congratulations, you have some spamtacular comments ready, save them as comments.txt

Auto Approve Site List – Trying Googling some shit like “scrapebox auto approve list”.Have yourself a field day, gather up a ton of lists, and open Duperemove.

Place all the AA list in one folder, select them all and merge together into one monster list.Remove dupe urls and it’s time to blast away.

Blast Settings:

First you need to get your setting right. Under the Settings menu, go to “Adjust TimeoutSettings”.

Move the Fast Poster time out to max, 90 seconds. This way the poster will be able to loadmassive pages with tons of comments and slow load times without timing out.

http://www.jacobking.com/likes/tbs

Check the “Fast Poster” box. And begin opening each of the files you created from above.Names, Emails, Target Websites, Comments, and AA list all in txt format.

Click Start Posting and open beer. Drink beer and continue reading this guide.

Chapter 9: Niche Relevant CommentsContributed by Charles Floate

There’s a cool thing you can do with ScrapeBox to make highly approved and morespecifically niche relevant comments.

Preparing Comments

Firstly, you’re going to need to make 35 different comments per 500 harvested URLs aroundthe same topic.

For example if you’re link building for white hat SEO I could make a comment like:

“Content has always been king, seems the black hats are getting destroyed by the white hatprofit making machines”

Then, you need to “spin” the comment, by spin I mean manually spin the comment.

An example of the above comment, manually spun would be:

“Content|Information has always been|has long been|has become king|master,seems|appears the|all the|all of the black hats|black hat’s are getting destroyed|aregetting owned|are getting own by the|from the white hat profit making machines|white hatprofit makers|white hat profiteers.”

As you can see, it’s perfectly readable in all ways and these kind of comments tend to have apretty high approval rate.

There’s a few different styles I like to incorporate into my strategies that can boost up boththe diversity and the approval rate.

Ego Approval Bait: This is based on the ego of the writer, I’ve been trying to come up with a solution to add aname to the comment but only looks like I can do this with Xrumer, and this tutorial isn’tbased on Xrumer is it ;)

Example Ego Bait Comment: Always a pleasure to read your content, seems you really do have a talent for creating greatcontent! (As a split test, adding the exclamation mark increases approval rate by 6%!)

Social Approval Bait: These tend to be based around asking about social mentions, ask the author how you canconnect with them on Twitter for example.

Website Approval Bait: This is based on the fact that you’re complementing the design (and if you’re posting only toWordPress sites, you already know the answer).

Example Website Bait Comment: Site’s design is really nice, is it a custom theme or can I buy or download the WP theme fromsomewhere?

Harvesting

Now once you have all the comments ready, you’re going to want to search for sites relatedto the niche you’re building for:

Selecting WordPress will find all the WordPress blogs out there, this is great if you just wantto build niche relevant nofollow comments, selecting BlogEngine will find tons of differentblog CMSs, some being dofollow.

Posting Comments

Once all your comments are harvested, you are ready to post.

Names: In the Names Area, you need to open a text document with your anchor texts, I always createa mixture of branded, generics and some LSI/Longtail keywords. Emails: In the emails section, either put your actual email (This a lot of the time will receive an emailabout replies, comment approvals or declines) or just input a list of randomly generatedemails so your email doesn’t get flagged for spam. Websites: In the websites list, just input your websites you wish to build links to. Comments:In the comments section, open the text document with all your manually spun comments. Blogs List: In the blogs list, add in the harvested blogs, this is pretty easy as you can just click: Lists >Transfer URL’s to Blogs Lists for Commenter.

Make sure you select the Fast Poster. Now click start, it’s as easy as that!

FIRE!!!

Chapter 10: PageRank SculptingPageRank sculpting, say it, Matt Cutts won’t hear you. Now if you sculpt like a pro, then thatdumbass Algo won’t have a clue either. There are many ways to approach PR sculpting,some methods are more aggressive than others such as pointing the majority of your posts,homepage, and category pages at the target you want to rank. My method isn’t quite as risky,actually if done right it’s not risky at all, it’s SEO 101.

We will be analyzing all of your indexed urls and making sure we have taken advantage of allrelevant internal link opportunities. This can also be handy for client audits, it’s a quick andeasy win.

There are two methods you can use to gather your site’s urls.

1. Use the harvester and the site: command.

2. The sitemap scraper addon, this is necessary for large sites with over 1,000 indexed urls.With this addon you can scrape XML sitemaps.

After you gather the urls, simply run a PR check and save all the URLs with PR. Then openthe Page Authority Addon if you have the Moz API setup, and analyze each URL. Export toCSV then sort by Page Authority, Moz Rank, or External links to identify your highest juicedpages.

No don’t go dropping heavy anchor text links all over the place like a link happy freak oranything. Be smart about it. Use varied anchors and only where it makes sense. Weave it innaturally not like a drunk Scrapebox toting lunatic. If you find relevant places to drop, do it up.

And don’t go linking to your homepage a bunch of times rook.

Chapter 11: The AutomatorOk, so not only is Scrapebox the most badass SEO tool ever created in almost every aspect,but you can also automate most tasks.

And for a whopping $20 this premium plugin can be yours. Under the tab, click AvailablePremium Plugins, purchase the plugin through paypal and it will be available for download.

This is where you are going to need to use you imagination. With the automator you caneasily string together huge lists of tasks and effectively automate your Scrapebox processes.

The beauty of the automator is not only it’s effectiveness but it’s ease of setup. Very low geekIQ required, simply drag and drop the desired actions, save, and dominate.

As an example I will walk through setting up a series of scrapes.

Say you have multiple clients to harvest some link partner opportunities for. You can literallyset up 20 and walk away. Come back to freshly harvested and PR checked URLs.

We would start by preparing our keywords, merging with footprints, then saving them all intoa folder. Client1, Client2, Client3, etc.

Now open the Automator.

Here is the sequence we would use:

Harvest Urls, Remove duplicates, Check Pagerank, clear, wait a few seconds, and repeat.The screenshot below shows three loops.

After you add the commands, filling out the details should be easy to figure out. You’ll notice Iput a wait command in between each loop, just set that to 5 seconds to let Scrapebox take aquick breath between harvests. I also added the email notification command at the end whichis the icing on the automator cake.

Chapter 12: Competitor Backlink AnalysisTo do this right you are going to need some sort of backlink checking service. Ahrefs,Majestic SEO, or Moz Opensiteexplorer will do.

If you have multiple services, you can use all of them and remove dupes. Yes, this is a bitcrazy but will get as many of your competitor’s backlinks as possible.

Now in classic Scrapebox fashion we are not going to just look at one competitors backlinks,we are going to look at them all. Take your top 10 competitors, export ALL of their backlinksand merge together.

Once you get all the links exported and pasted into Scrapebox, you can began analysis.

We can collect the follow information on our competitors links:

URL or Domain PageRank Moz Page and Domain authority Moz External links Social shares Anchor text IP Address Whois info Platform Type

Dofollow/nofollow links

We can approach this in two ways:

1. Get links from the same places as our competitors.

2. Get a clear picture of what is working for sites currently ranking so to replicate it.

So let’s start with approach one, snagging competitor link opportunities. From here you willbe able to break down your competitors links in many ways. This is where we can use ourlink prospecting techniques via the page scanner addon and spot some easy slam dunk linkopportunities. Thanks competitors!

Depending on your niche, you might be able to pick up some nice traffic driving commentlinks here as well. Bust out the blog analyzer and run all the links through that, it will identifyblogs where your competitors have dropped links. Sort by PR and OBLs, viola you’ve gotsome sweet comment links.

Approach Two, What’s working now…

One of the most powerful SEO tactics around and one that will always live is reverseengineering competitor backlinks to see what is currently working in the SERPs.

There is no one size fits all approach, so understanding what’s ranking the site currently thatyou’re trying to outrank is key.

Sure finding relevant link opportunities and matching your competitors links is huge, butunderstanding what Google is favoring is the insight you need.

Using the live link checker you can take the links and check the exact anchor textpercentages they are using. Since the “sweet spot” can be niche specific with our pal Google,this is a necessary approach for SERPs you’re very focused on.

This is done on a site by site basis. Start by taking the top ranking site’s backlinks and savingthem into a txt file, backlinks.txt

Then create an additional txt file with nothing but the competitors root domain, save that asBacklinktarget.txt

In the comment poster section, tick the box “Check Links”.

Now in the Websites field open the Backlinkstarget.txt file with your competitors homepageurl. Then in the Blog Lists field open the text file with all of the backlinks, backlinks.txt.

Click check links, let roll, then export as csv.

Open the file and sort the anchor text column fro az. From here you can easily see the %distribution of their anchor text. Take the number of occurrences and divide it by the totalbacklinks. Boom, you know exactly what the anchor text percentage is for the currently topranking site. Use that information how you will.

Now we could continue to go wayyyy more in depth on competitor links and how to leveragethis intelligence in hundreds of different ways but I’m running out of gas here. The best wayto learn this stuff is by getting your hands dirty. So bust open your backlink checkers, roll upyour sleeves, and fire up Scrapebox already.

Start making your competitors wish they would have blocked the backlink crawlers like youdid. Well, hopefully ;)

This Guide is Originally from Jacobking.com/ultimateguidetoscrapebox

Chapter 13: Free Scrapebox AddonsSocial Checker – Bulk check various social metrics; Facebook, Google +1, Twitter,LinkedIn, and Pinterest. Results can be exported in multiple formats, .xlsx, .xls, .csv, .txt, .tsv,and others. Also supports proxies.

Unicode Converter – Convert text in different languages such as Chinese, Russian, andArabic into an encoded format that cane be used in the Google URL harvester keywords andfootprints inputs.

Backlink Checker 2 – Download up to 1,000 backlinks for a URL or domain via Moz API.

Google Cache Extractor – Fetch the exact Google cache date for a list of URLS and exportthe URL and date.

Alive Checker – Take a list of URLs and check the status of the website, alive or dead. Youcan also customize what classifies dead urls by adding response codes like 301 or 302. Willalso follow redirects and report the status of the final destination URL.

Alexa Rank Checker – Check Alexa rank of your harvested urls.

Duperemove – Merge multiple files together of up to 180 million lines and remove dupes.Work with enormous files and split results however you’d like.

Page Scanner – Create custom footprints as plan text and html, then bulk scan URL’ssource code for those footprints. You can then export the matches into separate files.

Google Image Grabber – Harvest images directly from Google image search in small,medium, and large outputs.

Rapid Indexer – Submit your backlinks to various statistic, whois, and similar sites to helpforce indexing.

Audio Player – Bump some tunes while you scrape.

Port Scanner – Display all active connections and corresponding ip addresses and ports.Useful for debugging and monitoring connections.

Article Scraper – Scrape articles from different article directories and save them as txt files.

Dofollow Test – Load in a list of backlinks and check if they are Dofollow or Nofollow.

Bandwith Meter – Displays your up and downstream speed.

Page Authority – Gather page authority, domain authority, and external links for bulk URLsin the harvester.

Blog Analyzer – Analyze URLs from harvester to determine blog platform (WordPress,blogengine, moveable), comments open, spam protection, and image captcha.

Google Competition Finder – Check the number of indexed pages for given list ofkeywords. Grab either broad or exact match results.

Sitemap Scraper – Harvest urls directly from sites XML or AXD sitemap. Also has “deepcrawl” feature where it will visit all urls on the sitemap and identify and URLs not present inthe sitemap.

Malware and Phishing Filter – Bulk detect websites containing malware, or that havecontained malware in the last 90 days.

Link Extractor – Extract all the internal and external links from a list of webpages.

Blogengine Moderated Filter – Scan large lists of BlogEngine blogs and determine whichare moderated and which are not. Then load into the fast poster and blast away.

Domain Resolver – Resolve a list of domain names to the IP addresses(s) they are hostedon and check location.

Outbound Link Checker – Easily determine how many outbound links each URL in a listhas and filter out entries over a certain threshold.

Mass URL Shortner – Shorten massive URLs using some common shortening servicessuch as tinyurl.

Whois Scraper – Retrieve whois entries from harvested URLs, get names, emails, and ifavailable, domain creation and expiration date.

TDNAM Scraper – Harvest soon to expire domains straight from Godaddyauctions.

ANSI Converter – Export URLS from harvester as unicode or UTF8 to use Learning posterin other languages.

Fake PR Checker – Check fake Pagerank of harvested urls.

Chess – Play chess, it’s good for the mind.

Chapter 14: The END

Final Thoughts and General Ass Kicking Advice

Now that your eyes have been opened to the power of Scrapebox you might find yourself inbrief SEO shock. My hope is that not only will you see the benefits of Scrapebox but this willalso change the way you look at playing the game we call SEO.

If you are guilty of manually combing through Google SERPs for link opportunities then I willforigve you if you promise to change your ways.

The data is at your finger tips, leave no stone unturned and don’t let something silly likeGoogle’s 1000 result limit stop you. One of the prerequisites to being a “good” SEO is beingable use search engines better than any other human can. And without some sort of scrapingtool you’re going to get your ass handed to you.

There are always ways to improve your processes, even when you think you have itmastered and 100% optimized. SEOs neglecting the power of Scrapebox is just oneexample. Keep your eyes open and get money!

Wow, you made it to the end, good job. Now as a reward you can download this massivelist of footprints. Yup it just keeps getting better and better.

http://www.jacobking.com/wp-content/uploads/2013/09/footprints.zip

Scrapebox Guide€¦ · years the SEO community has been needing one true ultimate Scrapebox...

Documents

Transcript of Scrapebox Guide€¦ · years the SEO community has been needing one true ultimate Scrapebox...