Maximum email length testing

Anyone got a really really long email domain name resource?

Because of the way email RFC does not really limit valid mailboxes and then interfere with the way an email address might display , who knows of a way to get a long domain name not a long email alias name, those are easy to make.

Iā€™m talking conrad.braam+somealias@thisisasuperlonginternetdomainnamethatis reallyreallylongbutnotreallyaslongasthemaximumlengththatisstilllegalbutwilldofornow.com

Any pointers, other than to fake network packets and mess with internet proxies in order to fool the SSL network stack?

3 Likes

OK - So I found a cheap way to do this, but keen to hear if there are free offerings and cheap ($2.00) ones too. AND I found a small miss-communication area in our product, very few people know what the maximums here are, and some components do follow the RFC, but one component does not. Once I have good answers I will try and share how exactly I broke things.
Trouble is my intent was to test with a long email name and then start using evil characters, and Iā€™ve not gotten onto that far enough to consider my job as a tester really done yet or worth reporting on.

1 Like

I did this sort of testing about 15 years ago, so my memory is a little vague, but my recollection is that the length limit for the domain name is specified in the RFC for domain names, not one of the ones for email addresses (there are several). I think itā€™s 255 characters, but that may include the TLD and subdomains.

A client had a list of thousands of real email addresses from a competition they had run, so I did some analysis. The longest address was 107 characters, but this was a GMail address with a very long name starting with ā€œihatemyboyfriendbecauseheisacheatingbastardā€¦ā€, so I expect it didnā€™t get used much. Seven or eight genuine email addresses were longer than 50 characters, but all contained fewer than 60.

1 Like

255, including the tld, yes. Sub-domains are max 63 each, so you would need a handful to hit that length, but it appears that roughly 80 is the upper limit most people use for the domain portion.

In our implementation Iā€™ve not seen any unit test that does a boundary check for that, so Iā€™m still digging, and keen to learn something from it.

In of my projects, when we encountered such an issue where users wanted the ability to enter longer email addresses, after much R&D, we just put in a 200 character limit on the entire email address.

You probably know that already, but I havenā€™t seen that stated explicitly:

  • local part (before @) max length is ā€œ64 octetsā€ (RFC 5321). Roughly 64 characters, but UTF8 extensions might change the math here.
  • domain in total is max 253, but single ā€œlabelā€ is max 63. So you need 5 (to 127) levels to reach total. Wikipedia points to RFC 1034 and RFC 1035 here.

So absolute max seems to be 318 characters (including @).

Now, as for faking. In general, you can just put anything in email and send that. I am moderately confident that there is no strict requirement in email stack itself that receiver needs the ability to connect to sender domain. This is just a label.

However, modern security/anti-spoofing features, like SPF and DKIM, might intervene and make it impossible to see such email on receiver end.

If this is internal to company, the easiest way might be setting up internal DNS server that will resolve unreasonably long domain name to some other host that is properly configured.

But if you are concerned with displaying email addresses, then it might be easier to insert fake (but valid) email directly into data store of system in question. If the server / client that displays email needs to connect to sender domain, that alone looks like a problem to me. There might be valid reasons to do that, but the software should still handle a situation when this is not temporarily possible.

1 Like

That is interesting I had assumed that including domain segment ā€˜dotā€™ separators and the @ that we are looking at 256, but now it makes sense that if you donā€™t count the dots, you can get closer to 318, that is a stinker. Glad I asked, although now I really am tempted to read the RFCs, and had not intended to invest that much into a problem that is not even my teamā€™s remit, since we donā€™t control how a support technician uses our product.

I should have explained, the question is really around showing the email address , since we can exclude GDPR protections due to this being a security product, to show someone the email as a way to prove identity. Whether an email address is a identity proof is beyond the scope of things here, but itā€™s a security measure and permitted over GDPR concerns thus. The problem as stated is that we donā€™t need to show the entire email address in cases where it would scroll off-screen on a small phone, so I want to verify how long we can go and how to use ellipsis to truncate which part of the identity when screen space happens to be limited. The invite process limits people to 80 characters, which makes this easier in my case, even if 80 is a bit small. But it also mentions this

Conrad Braam <cb@somedomain.com>

address format, so thatā€™s why I want to come back to the topic at a point and share what worked for me in terms of good test cases for email ā€œdisplayā€ not for validation, but display when on small screens.

I donā€™t want to discourage you from reading the RFCs, but I read them all and only found them of limited use. They were all written about 30 years ago when no one knew how we would actually use email. As a result, they contain vast amounts of information about features that almost no one uses and may not even be implemented in email servers. You would not believe how long and obscure those documents are.

Also, email server developers implemented features that became de facto standards even though they are not mentioned in the RFCs. For example, the RFCs only require ASCII support, but every email server supports Extended ASCII and some (perhaps most) support Unicode.

It is that gray area of unicode support Steve, that is often much more fun to test than we admit. Off-by-one errors and counting chars, not octets when the underlying data stream stores octets not characters. Not to mention case sensitivity, itā€™s an inter-op house of fun.

Vendors will be doing sanity test input validations for things like SQL injection attacks and understanding where these intersect with valid input values does require a lot of reading up, but if the input field is 80 octets, not chars, you get less than 40 unicode characters to play with, so might be easier than I thought to do enough positive test coverage.