My first blog post on using automated web scraping

smsmith195 · 11 April 2024 12:04

Hey guys! I’ve started blogging about my automation learning journey with Python and Selenium!

Here’s the first post I’ve made on how I learned how to create web scrapers in Python and Selenium and how I got them to just work. I have more blogs on the topic planned to go over how I improved the scripts further.

Feel free to check it out!
https://sigrothian.co.uk/posts/creating-python-webscrapers-part-1/

shad0wpuppet · 11 April 2024 14:45

I’m curious, have you had any problems with bot detection/protection systems on the websites you use to scrap data? Some websites use mechanisms to block scrapers, DDoS attacks, and other bot activities. How many streams do you use to scrape data - just one, or do you use multiple streams (hundreds of pages in parallel)? And, I assume those websites don’t require logging in to get access to the info?

conrad.braam · 12 April 2024 07:49

A totally excellent exlainer of the process , wish more people would share their journey like you have here Stephen. It proves that you will be able to recall and repeat your learnings, and that you also are able to size up the task well. Just the constraints of the learning task are a thing very few people set themselves as boundaries. Personally I would try using one of the web browser plugins that allow you to interactively navigate the DOM.

Appium has an inspector app for mobile, that lets you interactively highlight any element and then takes you to that element in the DOM. Note that webdriver is looking at the DOM, not the HTML, the HTML is the static document, and when you stop thinking about it all as HTML, then things like controls, javascript, events and other element locator strategies start to make far more sense. The HTML can actually contain more than one ‘view’ of the page, elements stack and hide and move all of the time, they really do move a lot especially in modern web apps that use fancy frameworks. And that’s an area a lot of folk struggle with until they eventually stop thinking of the page as being static, but rather that it contains objects. Then, paging through a site with longer paginated lists for exampe becomes far easier to automate. Great start. Keep this coming.

smsmith195 · 13 April 2024 14:46

Hey,

Really it’s a case of running one script at a time as this has mainly been a learning exercise to get to grips with Selenium in tandem with Python. I made sure to use websites with no need to log in to access the info to make sure that I’ve at least got something to show as a first project.

It’s about taking it one step at a time really.

smsmith195 · 13 April 2024 14:49

Hey Conrad.

Thanks for sharing all of that. Goes to show how much more I need to learn about understanding automation, especially coming from my background of purely manual testing.

As for the Web Browser plugins, I’ve made use of SelectorHub but am happy to know if there are any alternatives that I could make use of to help navigate the DOM?

Other than that, I’m more than happy to continue sharing my learning journey as that has basically become my main focus after my recent layoff.

rosie · 14 April 2024 14:37

Do you have an RSS feed for your blog? I was trying to find it so that I can feed it into our software testing news section.

mahatheed · 15 April 2024 10:30

Very nicely written. Great job.

smsmith195 · 16 April 2024 21:53

Hey! That’s something I plan to implement soon. The website still needs some kinks fixed on the backend.

Topic		Replies	Views
Web Scraper Blog Part 2 Discussions python , learning , automation , career-development , selenium	0	145	13 April 2024
Automate website testing only work on some website , Discussions automation , selenium	5	274	10 January 2024
What do you think about Automation in Selenium Archive automation	11	1574	16 July 2018
Started to use Selenium but still dont know what I am doing Archive automation	13	1588	19 March 2020
Selenium webdriver Archive automation , rant	0	414	18 December 2020

My first blog post on using automated web scraping

Related topics