Help me avoid manually checking redirections of 100+ URLs (it's a one-off tedious task)

(Nicola) #1

We’re setting up a load of redirects in an “interesting” way and I don’t want to do check 100+ links manually

Anyone know of a handy standalone tool or easiest/quickest process that I can use for the following…

  1. get child URLs by crawling a domain (e.g. A .com/stuff[n])
  2. visit each URL in turn (e.g. /stuff[1], /stuff[2], etc.)
  3. record the resulting URL loaded (after redirection) and final status code (e.g. A .com/stuff[1] loads A .com/thing[1] with 200 response)

There’s quite a few tools I can find for generating site maps or scraping links. But not as easy to find stuff that will ease the checking of the redirection. I’ve found which looks potentially helpful - but would love advice.

Oh and just because I’m fussy… if the tool or method takes longer to install/setup/run than manually going through the list twice (once on staging & once on live) then it’s not going to help as this is a non-repeating activity and I’m trying to save time/brain power :slight_smile:


(Michelangelo van Dam) #2

The easiest way is to cURL on a list of URL’s you want to test.

Example: test redirects to with status code 301 Moved Permanently.

curl -I

Results in the following

HTTP/1.1 301 Moved Permanently
Server: nginx
Date: Thu, 22 Feb 2018 12:20:11 GMT
Content-Type: text/html
Content-Length: 178
Connection: keep-alive

When only checking status code and redirect link, it’s now simple to do a quick check:

t1=$(curl -sSI
echo $t1 | grep -c 'HTTP/1.1 301 Moved Permanently'
echo $t1 | grep -c 'Location:'

this will give you simply the following:


All you need to do now is feed it with a list of URL’s, expected status code and redirect target

(Nicola) #3

Hmmm that works for your example, but not for my situation.
When I do
curl -I <my_URL>
I get a 200 OK and no alternative location.
As I mentioned, the redirects are being done in an “Interesting” way :smiley:

(Michelangelo van Dam) #4

Are we still talking about HTTP redirects or URL proxying/Route mapping? Those are two separate things. For the latter I don’t have a run-of-the-mill solution.

So the url translates into where there’s a map where /foo-bar-special-of-the-day is actually /path/to/foo-bar, but works because the route is mapped against the underlying structure?

(Nicola) #5

I’m not entirely sure - I know it’s been set up at the AWS S3 level as there are limits in place preventing it being done as a web server redirect.
So this is why HeadmasterSEO is winning so far - as I’m approaching this very much as an end user sticking in URLs, and get a report back on final loaded URL and status code

(Michelangelo van Dam) #6

Oh, gotcha. That’s more of a spidering tool, similar to Screaming Frog SEO Spider. We use it (or command line tool wget) to generate a list of URL’s we can then feed into our little script I sampled earlier. This way we can automate the testing part.

(Onur) #7

Maybe you can use Selenium or JMeter to extract and the extracted URLs. I suggest two articles on this topic.

(Nicola) #8
    Maybe you can use Selenium or JMeter to extract and the extracted URLs. I suggest two articles on this topic.

Thanks for the suggestion. Unfortunately it was standalone or cloud based tools I was looking for as there was zero time for upskilling/setup etc.

In the end I did the following:

  1. Used a chrome extension to scrape the links I needed -
  2. Saved scraped links to a text file removing any duplicates
  3. Installed HeadmasterSEO
  4. Used the “Check URLs (from file)” option using the file I’d created in step 2
  5. Checked the output & Exported their redirect report to CSV to save against the task

(Mark Jones) #9

Bizarrely I’ve been asked to do this exact same thing this week by our business users…

I have setup a basic TESTNG framework based on something I saw at The business provide a excel sheet ( they lurve excel) and this will run through the sheet chcking the url , the expected return code and the redirect (from Location header).

I switched to using restAssured rather than selenium as its a bit more suited to this job.

This is now run from our jenkins every time a content change or category rejig is done. This same spreadsheet is provided to the webmasters.

I’ve put the code up here: if you wanted something ti play with

(Nicola) #10

The weird synchronicity of the tech world :slight_smile:
Like the sound of your solution - will bear it in mind if there is ever the need to do similar here (there has been talk about it after changing of content titles etc.)