Help me avoid manually checking redirections of 100+ URLs (it's a one-off tedious task)


(Nicola) #1

We’re setting up a load of redirects in an “interesting” way and I don’t want to do check 100+ links manually

Anyone know of a handy standalone tool or easiest/quickest process that I can use for the following…

  1. get child URLs by crawling a domain (e.g. A .com/stuff[n])
  2. visit each URL in turn (e.g. /stuff[1], /stuff[2], etc.)
  3. record the resulting URL loaded (after redirection) and final status code (e.g. A .com/stuff[1] loads A .com/thing[1] with 200 response)

There’s quite a few tools I can find for generating site maps or scraping links. But not as easy to find stuff that will ease the checking of the redirection. I’ve found HEADMasterSEO.com which looks potentially helpful - but would love advice.

Oh and just because I’m fussy… if the tool or method takes longer to install/setup/run than manually going through the list twice (once on staging & once on live) then it’s not going to help as this is a non-repeating activity and I’m trying to save time/brain power :slight_smile:

Thanks


(Michelangelo van Dam) #2

The easiest way is to cURL on a list of URL’s you want to test.

Example: test http://www.ministryoftesting.com redirects to https://www.ministryoftesting.com with status code 301 Moved Permanently.

curl -I http://www.ministryoftesting.com

Results in the following

HTTP/1.1 301 Moved Permanently
Server: nginx
Date: Thu, 22 Feb 2018 12:20:11 GMT
Content-Type: text/html
Content-Length: 178
Location: https://www.ministryoftesting.com/
Connection: keep-alive

When only checking status code and redirect link, it’s now simple to do a quick check:

t1=$(curl -sSI http://www.ministryoftesting.com)
echo $t1 | grep -c 'HTTP/1.1 301 Moved Permanently'
echo $t1 | grep -c 'Location: https://www.ministryoftesting.com'

this will give you simply the following:

1
1

All you need to do now is feed it with a list of URL’s, expected status code and redirect target


(Nicola) #3

Hmmm that works for your example, but not for my situation.
When I do
curl -I <my_URL>
I get a 200 OK and no alternative location.
As I mentioned, the redirects are being done in an “Interesting” way :smiley:


(Michelangelo van Dam) #4

Are we still talking about HTTP redirects or URL proxying/Route mapping? Those are two separate things. For the latter I don’t have a run-of-the-mill solution.

So the url www.example.com/path/to/foo-bar translates into www.example.com/foo-bar-special-of-the-day where there’s a map where /foo-bar-special-of-the-day is actually /path/to/foo-bar, but works because the route is mapped against the underlying structure?


(Nicola) #5

I’m not entirely sure - I know it’s been set up at the AWS S3 level as there are limits in place preventing it being done as a web server redirect.
So this is why HeadmasterSEO is winning so far - as I’m approaching this very much as an end user sticking in URLs, and get a report back on final loaded URL and status code


(Michelangelo van Dam) #6

Oh, gotcha. That’s more of a spidering tool, similar to Screaming Frog SEO Spider. We use it (or command line tool wget) to generate a list of URL’s we can then feed into our little script I sampled earlier. This way we can automate the testing part.


(Onur) #7

Maybe you can use Selenium or JMeter to extract and the extracted URLs. I suggest two articles on this topic.
Selenium: https://www.swtestacademy.com/verify-url-responses-selenium/
JMeter: https://www.swtestacademy.com/validate-website-links-jmeter/


(Nicola) #8
    Maybe you can use Selenium or JMeter to extract and the extracted URLs. I suggest two articles on this topic.

Thanks for the suggestion. Unfortunately it was standalone or cloud based tools I was looking for as there was zero time for upskilling/setup etc.

In the end I did the following:

  1. Used a chrome extension to scrape the links I needed - https://chrome.google.com/webstore/detail/link-klipper-extract-all/fahollcgofmpnehocdgofnhkkchiekoo?hl=en
  2. Saved scraped links to a text file removing any duplicates
  3. Installed HeadmasterSEO
  4. Used the “Check URLs (from file)” option using the file I’d created in step 2
  5. Checked the output & Exported their redirect report to CSV to save against the task

(Mark Jones) #9

Bizarrely I’ve been asked to do this exact same thing this week by our business users…

I have setup a basic TESTNG framework based on something I saw at https://www.swtestacademy.com/data-driven-excel-selenium/. The business provide a excel sheet ( they lurve excel) and this will run through the sheet chcking the url , the expected return code and the redirect (from Location header).

I switched to using restAssured rather than selenium as its a bit more suited to this job.

This is now run from our jenkins every time a content change or category rejig is done. This same spreadsheet is provided to the webmasters.

I’ve put the code up here: https://github.com/mjblue/redirect-crawler if you wanted something ti play with


(Nicola) #10

The weird synchronicity of the tech world :slight_smile:
Like the sound of your solution - will bear it in mind if there is ever the need to do similar here (there has been talk about it after changing of content titles etc.)