Just Grab All The Data

One of the things I’ve been worrying about recently is how I am going to pull data from web sites to compare against each other. I know there are web scraping tools that can be built or tailored from examples online, the more prevalent of those being in VBA and Python, but that’s a whole lot of work that I don’t think I’m ready to start learning just yet.

The Python web scrapers in particular even show up in some of the analyst courses I’ve been looking into, so it does feel like it’s something I should be learning… or should I?

After trying out Alteryx, I read somewhere that you can use Alteryx for web scraping. Since it’s such a powerful tool already, I definitely wanted to know more about that and soon stumbled across an article on Alteryx that uses a web-based scraping service called Import.io.

web_scraper

After signing up for the free trial I was impressed with the simplicity and ease of use it offered. Although the trial is only 7 days and the functions are limited during the trial, I got everything I needed out of it in under an hour for my initial use case. All this time I had been worrying about web scraping and if I had searched for web scraping sites or tools rather than defining Python and VBA, I might have found this site sooner.

After extracting and playing with the data I had collected, I called it a night, but I’d been bitten by the data bug and today I couldn’t help revisiting the site during work breaks and just fiddling about with the options. I have no idea how long it would take me to learn and develop such a tool, but I can bet it isn’t something I could use within an hour like this tool!

Then came the point that I should have arrived at much sooner, searching for alternatives. This opened the doors to a whole world of web scraping tools, so many that there are lists of the top 30. If there are enough tools out there that there is a top 30, you probably don’t need to worry about trying to build your own version!

Anyway, now with my discovery of the world of web scraping and my trial of Alteryx going on for another week and a half, I have plenty of data to capture and blend, so I better get to it!

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: