Quirky insights from Scraping the web

18

Transcript of Quirky insights from Scraping the web

While scraping data from the web, we’ve come across different sides of the internet.

We're happy to share some of the strange insights found with the help of web data.

Here’s a sneak peek:Did you know that travel websites show

different prices to different users?

Know more about this and other such quirky insights that we found while

scraping the web.

1. Every website is collecting data about you

• Creepy, but true.

• Social media websites, e-commerce sites and any such site where you create a profile to become a member collects and treats your data as a precious asset.

• For example, Facebook is reportedly saving everything you type in the status box, even if you do not publish the status and delete the text instead.

2. Showing higher prices on apps

• Mobile phones aren’t great if you want to do some research before buying or booking something.

• This is exploited by many ecommerce websites and online travel agencies as they tend to show higher prices when users search from a mobile device.

• We detected this practice by crawling the websites with mobile and web user agents and comparing the extracted prices.

3. Most websites undergo changes everyday

• Design improvisations, code management, new offers and patches to security issue are some of the most common reasons for changes in a website.

• This is not a huge concern to humans but for web crawlers, even the change of a class name could lead to a loop of errors.

• This stresses on the importance of regularly monitoring the source websites while crawling.

4. Pricing higher if you live in an expensive locality

• Many online travel agencies are using the location-based pricing tactic to get more dollars by tying the average financial status of customers with their respective geographies.

• The reasoning by the adopters of this type of pricing is that one person’s willingness to pay more can help someone else pay less.

• So if you live in a geographical location with a higher average income, you can expect to see higher prices while browsing some of the online travel portals.

5. Nearly 2% of sites on the web block bots

• There are few websites that block web crawlers and automated scraping using the robots.txt or their TOS page.

• Some of them also use bot-detection tools to aggressively block the bots, which is clearly fruitless. In fact, it degrades website's performance and marketing.

• Many content aggregators use web crawling to fetch snippets or summaries of your web pages to display it with a backlink to the source. Over time, these links can help your website garner the traffic and search engine presence it deserves.

6. Demand-based dynamic pricing

• Websites constantly monitor the traffic coming to their website and increase prices accordingly.

• The price can go up if there are more people trying to purchase a particular product or book a flight from city A to city B on a particular date.

• To make sure this isn’t happening, you should ideally check the price at different times of the day.

7. Website don’t have content in source code anymore

• A decade back, most websites had all their content in the source code of the page.

• Coding practices have evolved drastically since then and most websites now follow best practices like asynchronous loading of scripts, avoiding inline CSS etc.

• Writing scripts to extract data would have been easier with the older convention, but we do appreciate and embrace positive changes happening to the web.

Websites also leverage cookies to track users and create a sense of urgency.

Pro tip:Always clear your cookies before purchasing or booking

tickets online.

Websites today consume lots of data and change different aspects in order to gain

maximum returns.

It’s difficult to derive such insights without analyzing huge amount of data.

We're fortunate enough to extensively work on various web data extraction solutions and discover these type of

offbeat insights frequently.

Are you interested in collecting data and finding further quirky insights that might

help your business?

Let us know your requirements at [email protected].