IRI validation

A first-draft working version of IRI validation is here:
 https://homepages.cwi.nl/~steven/forms/tests/iri.xhtml

It only accepts absolute web addresses. The regexp is 474 characters long.

I have taken a few liberties with regards to RFC 3987:

- the set of UCS characters above ASCII that it accepts is slightly looser.
- because in the RFC an IPv4 address is matched in two ways, no checking  
is done that the octets are between 0 and 255, and it is not checked that  
there are exactly 4 of them.
- an IPv6 is recognised, but not checked beyond the characters it may  
contain. So it will not see that 2001::85a3:8d3:1319::370:7348 or  
2001:db8:85a3:8d3:1319:8a2e:370:7348:cefa or  
20011:db8:85a3:8d3:1319:8a2e:370:7348 are invalid.
- it doesn't currently accept characters from the private-use areas in  
queries.

On the other hand, I have made one thing stricter:

- the definition of a registered host is the same as we accept for email.

The input fields are live and incremental, so please play with it, and see  
if you spot things that really ought to be better.

Comments gladly received.

Steven

Received on Wednesday, 14 March 2018 19:51:23 UTC