Help me avoid manually checking redirections of 100+ URLs (it's a one-off tedious task)

Weā€™re setting up a load of redirects in an ā€œinterestingā€ way and I donā€™t want to do check 100+ links manually

Anyone know of a handy standalone tool or easiest/quickest process that I can use for the followingā€¦

  1. get child URLs by crawling a domain (e.g. A .com/stuff[n])
  2. visit each URL in turn (e.g. /stuff[1], /stuff[2], etc.)
  3. record the resulting URL loaded (after redirection) and final status code (e.g. A .com/stuff[1] loads A .com/thing[1] with 200 response)

Thereā€™s quite a few tools I can find for generating site maps or scraping links. But not as easy to find stuff that will ease the checking of the redirection. Iā€™ve found HEADMasterSEO.com which looks potentially helpful - but would love advice.

Oh and just because Iā€™m fussyā€¦ if the tool or method takes longer to install/setup/run than manually going through the list twice (once on staging & once on live) then itā€™s not going to help as this is a non-repeating activity and Iā€™m trying to save time/brain power :slight_smile:

Thanks

The easiest way is to cURL on a list of URLā€™s you want to test.

Example: test http://www.ministryoftesting.com redirects to https://www.ministryoftesting.com with status code 301 Moved Permanently.

curl -I http://www.ministryoftesting.com

Results in the following

HTTP/1.1 301 Moved Permanently
Server: nginx
Date: Thu, 22 Feb 2018 12:20:11 GMT
Content-Type: text/html
Content-Length: 178
Location: https://www.ministryoftesting.com/
Connection: keep-alive

When only checking status code and redirect link, itā€™s now simple to do a quick check:

t1=$(curl -sSI http://www.ministryoftesting.com)
echo $t1 | grep -c 'HTTP/1.1 301 Moved Permanently'
echo $t1 | grep -c 'Location: https://www.ministryoftesting.com'

this will give you simply the following:

1
1

All you need to do now is feed it with a list of URLā€™s, expected status code and redirect target

1 Like

Hmmm that works for your example, but not for my situation.
When I do
curl -I <my_URL>
I get a 200 OK and no alternative location.
As I mentioned, the redirects are being done in an ā€œInterestingā€ way :smiley:

Are we still talking about HTTP redirects or URL proxying/Route mapping? Those are two separate things. For the latter I donā€™t have a run-of-the-mill solution.

So the url www.example.com/path/to/foo-bar translates into www.example.com/foo-bar-special-of-the-day where thereā€™s a map where /foo-bar-special-of-the-day is actually /path/to/foo-bar, but works because the route is mapped against the underlying structure?

Iā€™m not entirely sure - I know itā€™s been set up at the AWS S3 level as there are limits in place preventing it being done as a web server redirect.
So this is why HeadmasterSEO is winning so far - as Iā€™m approaching this very much as an end user sticking in URLs, and get a report back on final loaded URL and status code

Oh, gotcha. Thatā€™s more of a spidering tool, similar to Screaming Frog SEO Spider. We use it (or command line tool wget) to generate a list of URLā€™s we can then feed into our little script I sampled earlier. This way we can automate the testing part.

2 Likes

Maybe you can use Selenium or JMeter to extract and the extracted URLs. I suggest two articles on this topic.
Selenium: https://www.swtestacademy.com/verify-url-responses-selenium/
JMeter: https://www.swtestacademy.com/validate-website-links-jmeter/

    Maybe you can use Selenium or JMeter to extract and the extracted URLs. I suggest two articles on this topic.

Thanks for the suggestion. Unfortunately it was standalone or cloud based tools I was looking for as there was zero time for upskilling/setup etc.

In the end I did the following:

  1. Used a chrome extension to scrape the links I needed - https://chrome.google.com/webstore/detail/link-klipper-extract-all/fahollcgofmpnehocdgofnhkkchiekoo?hl=en
  2. Saved scraped links to a text file removing any duplicates
  3. Installed HeadmasterSEO
  4. Used the ā€œCheck URLs (from file)ā€ option using the file Iā€™d created in step 2
  5. Checked the output & Exported their redirect report to CSV to save against the task

Bizarrely Iā€™ve been asked to do this exact same thing this week by our business usersā€¦

I have setup a basic TESTNG framework based on something I saw at https://www.swtestacademy.com/data-driven-excel-selenium/. The business provide a excel sheet ( they lurve excel) and this will run through the sheet chcking the url , the expected return code and the redirect (from Location header).

I switched to using restAssured rather than selenium as its a bit more suited to this job.

This is now run from our jenkins every time a content change or category rejig is done. This same spreadsheet is provided to the webmasters.

Iā€™ve put the code up here: https://github.com/mjblue/redirect-crawler if you wanted something ti play with

2 Likes

The weird synchronicity of the tech world :slight_smile:
Like the sound of your solution - will bear it in mind if there is ever the need to do similar here (there has been talk about it after changing of content titles etc.)