Re: ID Characters (was: Re: 3.4. Global attributes) from Henri Sivonen on 2007-08-01 (public-html@w3.org from August 2007)

From: Henri Sivonen <hsivonen@iki.fi>
Date: Wed, 1 Aug 2007 09:59:30 +0300
To: Jason White <jason@jasonjgw.net>
Cc: public-html@w3.org
Message-Id: <C612BF9A-8801-4962-9263-A2A6C02A7B5D@iki.fi>

On Aug 1, 2007, at 05:05, Jason White wrote:

> On Wed, Aug 01, 2007 at 09:36:37AM +0900, Karl Dubost wrote:
>> (here I would add a reference to
>> http://www.w3.org/TR/xml-id/#id-avn )
>
> A normative reference?

I think it is worth pointing out that an xml:id Processor, per spec,  
can perform ID assignment even if the value is not an NCName. That  
is, even the xml:id spec implicitly concedes that it is possible to  
continue processing with a value that isn't an NCName.

"The xml:id processor performs ID type assignment on all xml:id  
attributes, even those that do not satisfy the constraints."

> Also, for XML compatibility, shouldn't the "author" section of the  
> proposal
> require that the id satisfy the XML id syntax?

Why would such "XML compatibility" be needed?

When DTDless XHTML5 is parsed using an XML parser, the id attribute  
doesn't have the type ID. It has the type CDATA and is not subject to  
the ID constraints of the XML 1.0 spec in the XML Processor (aka.  
parser).

If you wish to expose XHTML5 id attributes as IDs to XML tools, you  
need an intermediate ID assignment processing stage analogous to an  
xml:id Processor. I propose calling it an "XHTML id Processor". This  
is what I use to make the XPath id() function work with XHTML5.

As far as I can tell, the Name and NCName restrictions of XML 1.0 and  
xml:id/XSD are completely arbitrary and real software that uses an  
XML API to do stuff with XML tends to just do string equality  
checking on IDs instead of rechecking the Name/NCName matching in  
each stage after the initial parse.

Do you have examples of real software that breaks if non-NCName or  
non-Name (whitespaceless) IDs enter into an XML processing pipeline?

(If an author chooses to use a DTD and an XML Processor to do ID  
assignment, the use of a DTD is a self-inflicted wound and the author  
is solely responsible for the additional constraints that the use of  
a DTD places. Other users of XHTML5 shouldn't have to suffer.)

> How should implementations that read the HTML syntax handle ids that
> don't meet XML syntax requirements?

Just compare values for string equality when you do matching.

> Two possibilities that occur to me are:
>
> 1. Ignore them, as though there is no id attribute.

This is not backwards-compatible.

> 2. Accept them for purposes of matching corresponding idref values  
> in the
> document, but when writing out the document (whether in HTML or  
> XHTML 5
> syntax) transform each syntactically invalid id to a syntactically  
> valid one in
> an implementation-defined manner. Maybe the same transformation  
> could be
> applied before exposing the id as a DOM attribute. This also  
> entails similarly
> transforming every matching idref.

And transforming selectors. And transforming scripts, which you can't  
do with a language like JavaScript without hitting the Halting Problem.

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/

Received on Wednesday, 1 August 2007 06:59:50 UTC