W3C home > Mailing lists > Public > www-validator@w3.org > August 2007

Re: Bug report - no validation of URIs, not at even the most basic level

From: Benjamin Niemann <pink@odahoda.de>
Date: Sat, 04 Aug 2007 20:22:24 +0200
To: www-validator@w3.org
Message-ID: <f92g51$g3a$1@sea.gmane.org>

Hello,

Cecil Ward wrote:

> It appears that the validator does not check URI syntax, so syntactically
> incorrect documents are reported as "valid". (Equally true for URIs within
> CSS, as reported elsewhere.)

That's beyond the scope of a markup validator (at least SGML/XML DTD
validation). For the validator href (and other attributes containing URIs)
are just opaque CDATA strings - it has no knowledge that href contains
URIs, title freeform text and so on.
URI validation (and many other things) is left as an exercise for a '(X)HTML
conformance checker'.

> Test case follows, tested using "direct input" mode:
> 
> 
> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
> "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
> 
> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="fr" >
> <head>
> <title xml:lang="la">Credo ergo sum</title>
> 
> <link rel="help" href="      " /> <!-- this @href hardly valid -->

At least in SGML (not sure about XML) whitespace at the beginning and end of
CDATA attribute values is stripped away, so the href ends up as "". Looking
at the URI grammar, I don't think that's a valid URI, though the RFC
mentions that as '[...] refers to the start of the current document.'

> </head>
>   <body>
> <p>Je pense, mais je ne suis pas <a href="René Descartes (1637), 'Le
> discours de la méthode'">valide</a>.</p>

That's make the document 'not conforming to the XHTML specification' - but
as mentioned above, this does not affect validation.

>   </body>
> </html>


HTH

-- 
Benjamin Niemann
Email: pink at odahoda dot de
WWW: http://pink.odahoda.de/
Received on Saturday, 4 August 2007 18:22:46 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 25 April 2012 12:14:25 GMT