My exploratory testing training course includes a section on URL testing, and I include it to some extent in every project. Sorry, this is going to be another long post.
I find that most testers just think of a URL as being a single thing, but it is actually made up of many parts that must be considered separately:
- The protocol. It’s usually https these days, so you may want to see what happens if you change it to http or ftp.
- Subdomains. There may be none, one or many.
- Domain name. There’s not usually much point changing this because you will just get a different website or a 404 error.
- Top-level domain. Again, there’s not much point changing this.
- Folders. There may be none, one or many.
- File name.
- File extension.
- Parameters.
- Fragment identifier. This is not likely to be interesting, but you never know.
You can modify all the different parts of the URL and see what happens. Try changing the capitalisation - this should have no effect on Windows servers, but it may do on Unix and Linux servers. On at least one occasion I found different versions of a page that could be accessed by changing the capitalisation of the filename.
Introducing errors
If a URL is incorrect, novice testers might expect to get a 404 error regardless of which part of the URL is incorrect. However, it’s not unusual to get different types of error, one being the custom 404 page and the others being unhandled error conditions.
On one project, errors could be introduced into one URL that resulted in six different error conditions, only one of which was handled correctly. The unhandled errors leaked a huge amount of information that may be useful to an attacker. We even got different errors depending on whether there was a typo in the first or second folder name.
Accessing prohibited content
Every webserver should be configured to prohibit domain traversal, i.e. viewing the contents of a folder, but some still are not. If you can do this, you can often get at lots of other interesting files and folders that cannot usually be accessed via the website.
On one project, domain traversal revealed that there were three different versions of the website we had been engaged to test. Most of the content was the same, but the pricing was different. Apparently, the website owner offered preferential pricing to certain organisations or sectors, but did not want other clients to know. In one of the folders, we found the notes from our client’s internal sales meetings, which were “interesting”.
Accessing unpublished content
When you use a website mapping tool such as Xenu’s Link Sleuth to find all the URLs on a website, it’s common for it to find URLs ending in /node/[unique integer]. These URLs are used internally by CMSs but they can leak into the front-end code, usually in tags such as <meta rel=“next”> in the <head>.
The integers are almost always consecutive, so it is trivially easy to view every page that has been created, including those that have not yet been published. Just look for URLs containing the integers your mapping tool didn’t find.
On another occasion, I was thinking of signing-up for a very expensive training course that was due to launch in a month’s time. While using domain traversal to poke around the company’s website, I found a folder containing all the course material, which had not yet been published and which I had not paid for.
Editing parameters
This technique is as old as the web - James Whittaker described it in one of his books more than 20 years ago. In his case he was able to change the price for something he was buying because it appeared in the URL. This is rarely possible now, but you can often change other parameters.
One of my favourites is to change the number of results per page on search results pages. On one occasion I crashed the webserver by requesting a page containing 100,000 results. The webserver should be checking that the parameter value matches one of the options that is available for users to select, but they often don’t.
I have even been able to guess parameters that might be supported, although this is rarely successful because you need to guess both the parameter name and a permissible value. I don’t think I have managed to compromise a system this way, but I have found useful functionality, such as adding the sort order to search results pages that don’t have that option. This is a reminder that code libraries often contain functions that the website is not using.
API testing
This is a whole separate topic that I don’t have time to go into, other than to say that I use cURL to do things that you might not be able to do with more advanced tools like Postman. An example would be introducing syntax errors - one of the advantages of the advanced tools is that they ensure the syntax is correct, at least they did when I last used them. But it’s important to test incorrect syntax - maybe those tools can do that now.