Help me avoid manually checking redirections of 100+ URLs (it's a one-off tedious task)

nicola · 22 February 2018 12:08

We’re setting up a load of redirects in an “interesting” way and I don’t want to do check 100+ links manually

Anyone know of a handy standalone tool or easiest/quickest process that I can use for the following…

get child URLs by crawling a domain (e.g. A .com/stuff[n])
visit each URL in turn (e.g. /stuff[1], /stuff[2], etc.)
record the resulting URL loaded (after redirection) and final status code (e.g. A .com/stuff[1] loads A .com/thing[1] with 200 response)

There’s quite a few tools I can find for generating site maps or scraping links. But not as easy to find stuff that will ease the checking of the redirection. I’ve found HEADMasterSEO.com which looks potentially helpful - but would love advice.

Oh and just because I’m fussy… if the tool or method takes longer to install/setup/run than manually going through the list twice (once on staging & once on live) then it’s not going to help as this is a non-repeating activity and I’m trying to save time/brain power

Thanks

michelangelo · 22 February 2018 12:29

The easiest way is to cURL on a list of URL’s you want to test.

Example: test http://www.ministryoftesting.com redirects to https://www.ministryoftesting.com with status code 301 Moved Permanently.

curl -I http://www.ministryoftesting.com

Results in the following

HTTP/1.1 301 Moved Permanently
Server: nginx
Date: Thu, 22 Feb 2018 12:20:11 GMT
Content-Type: text/html
Content-Length: 178
Location: https://www.ministryoftesting.com/
Connection: keep-alive

When only checking status code and redirect link, it’s now simple to do a quick check:

t1=$(curl -sSI http://www.ministryoftesting.com)
echo $t1 | grep -c 'HTTP/1.1 301 Moved Permanently'
echo $t1 | grep -c 'Location: https://www.ministryoftesting.com'

this will give you simply the following:

1
1

All you need to do now is feed it with a list of URL’s, expected status code and redirect target

nicola · 22 February 2018 12:42

Hmmm that works for your example, but not for my situation.
When I do
curl -I <my_URL>
I get a 200 OK and no alternative location.
As I mentioned, the redirects are being done in an “Interesting” way

michelangelo · 22 February 2018 12:49

Are we still talking about HTTP redirects or URL proxying/Route mapping? Those are two separate things. For the latter I don’t have a run-of-the-mill solution.

So the url www.example.com/path/to/foo-bar translates into www.example.com/foo-bar-special-of-the-day where there’s a map where /foo-bar-special-of-the-day is actually /path/to/foo-bar, but works because the route is mapped against the underlying structure?

nicola · 22 February 2018 12:55

I’m not entirely sure - I know it’s been set up at the AWS S3 level as there are limits in place preventing it being done as a web server redirect.
So this is why HeadmasterSEO is winning so far - as I’m approaching this very much as an end user sticking in URLs, and get a report back on final loaded URL and status code

michelangelo · 22 February 2018 13:27

Oh, gotcha. That’s more of a spidering tool, similar to Screaming Frog SEO Spider. We use it (or command line tool wget) to generate a list of URL’s we can then feed into our little script I sampled earlier. This way we can automate the testing part.

obaskirt · 23 February 2018 08:12

Maybe you can use Selenium or JMeter to extract and the extracted URLs. I suggest two articles on this topic.
Selenium: https://www.swtestacademy.com/verify-url-responses-selenium/
JMeter: https://www.swtestacademy.com/validate-website-links-jmeter/

nicola · 23 February 2018 16:03

    Maybe you can use Selenium or JMeter to extract and the extracted URLs. I suggest two articles on this topic.

Thanks for the suggestion. Unfortunately it was standalone or cloud based tools I was looking for as there was zero time for upskilling/setup etc.

In the end I did the following:

Used a chrome extension to scrape the links I needed - https://chrome.google.com/webstore/detail/link-klipper-extract-all/fahollcgofmpnehocdgofnhkkchiekoo?hl=en
Saved scraped links to a text file removing any duplicates
Installed HeadmasterSEO
Used the “Check URLs (from file)” option using the file I’d created in step 2
Checked the output & Exported their redirect report to CSV to save against the task

mark1 · 23 February 2018 19:02

Bizarrely I’ve been asked to do this exact same thing this week by our business users…

I have setup a basic TESTNG framework based on something I saw at https://www.swtestacademy.com/data-driven-excel-selenium/. The business provide a excel sheet ( they lurve excel) and this will run through the sheet chcking the url , the expected return code and the redirect (from Location header).

I switched to using restAssured rather than selenium as its a bit more suited to this job.

This is now run from our jenkins every time a content change or category rejig is done. This same spreadsheet is provided to the webmasters.

I’ve put the code up here: https://github.com/mjblue/redirect-crawler if you wanted something ti play with

nicola · 26 February 2018 12:28

The weird synchronicity of the tech world
Like the sound of your solution - will bear it in mind if there is ever the need to do similar here (there has been talk about it after changing of content titles etc.)

Topic		Replies	Views
Best and free Link Checker tool Archive tools	2	486	16 October 2019
Monkey Test It (Website Checker) Archive cool-projects	0	1015	22 January 2018
Testing the speed of URL redirects for a website Archive performance-testing	0	1448	16 November 2017
Pick a tool you’ve never used, to evaluate during the 30 days of tools – 30 Days of Tools, Day 4 30 Days of Testing tools , 30-days-of-tools	29	2032	2 November 2021
Looking for recommendations - GenAI E2E Automation writing tool for websites Discussions automation , ai , end-to-end-testing	1	62	17 September 2024

Help me avoid manually checking redirections of 100+ URLs (it's a one-off tedious task)

Related topics