When you talk about “all the accented characters”, you’re really talking about the Extended ASCII character set, which adds 128 characters on top of the basic ASCII character set. Then there’s Unicode, which adds tens of thousands more characters.
But you’re thinking about this the wrong way. It’s actually an equivalence partitioning issue, and you will miss many test cases if you don’t think of it that way. For instance:
ASCII
Hex values 0 to FF (or 0 to 127 in decimal) represent the ASCII character set, which is one of the partitions. But not all characters in that set are equivalent. Some are control characters that are not displayed. Some are whitespace characters. Then you have upper case, lower case, numbers and symbols (please don’t ever call them “special characters”).
Depending on an input field’s purpose, you may need to test characters from all those partitions. It’s actually more complicated than that because some characters might be allowed from one of the partitions while others are not. For instance, the letter “e” may be allowed in a numeric field. Do you know why? Here’s a clue - the letter “E” would not be allowed.
Extended ASCII
Hex values 100 to 1FF (or 128 to 255 in decimal) represent the Extended ASCII character set. This adds another 128 accented characters, symbols, shapes, Greek letters and more.
In my experience, systems tend to allow all of them or none of them, although that doesn’t have to be the case.
Unicode
This way lies madness. Unicode was created to accommodate all the thousands of characters in languages such as Chinese and Japanese. It has changed a lot over the years, and Wikipedia currently says “Version 16.0 of the standard defines 154,998 characters and 168 scripts used in various ordinary, literary, academic, and technical contexts”.
Again, quoting Wikipedia, “The Unicode Standard defines three encodings: UTF-8, UTF-16, and UTF-32, though several others exist. Of these, UTF-8 is the most widely used by a large margin, in part due to its backwards-compatibility with ASCII.”
A Unicode code unit can contain from 1 to 4 bytes, and any given code unit might represent one of several different characters depending on which flavour of Unicode has been declared.
What could possibly go wrong?
The existence of different character sets causes all kinds of fun when data is transferred between different parts of a system such as the database, web server, browser, email server, CRM and ERP systems and APIs to other systems including third parties etc. Some parts of the system may work fine, yet the data gets trashed when sent to other parts.
And so it came to pass
We encountered this when testing a ticketing system for a football club in 2011. Two halves of the system were developed separately and joined together at the end. One team knew for sure that everyone was using UTF-8, and the other team knew for sure that everyone was using UTF-7 (which uses one less bit, so supports fewer characters).
Both were stubbing-out the connection to the other part of the system during development, and all their testing worked perfectly until we came along to do the integration testing. The UTF-8 end still worked fine, but the UTF-7 end corrupted all the data it received.