Re: ID Characters (was: Re: 3.4. Global attributes)

On Aug 1, 2007, at 5:04 PM, Jim Jewett wrote:

>
> Putting it in a Green notes would be fine.
>
>> Something like:
>>
>> "Note: Authors should be aware that some legacy tools may not handle
>> Unicode characters outside the ASCII rang properly when processing
>> IDs. For maximum compatibility authors should stick with ASCII only
>> characters in producing a value for the @id attribute."
>
> Slight rewording to
>
> "Note: Authors should be aware that some [legacy] tools may not  
> handle all IDs properly.
> For maximum compatibility, authors should use IDs starting with an
> ASCII letter, containing only ASCII letters and numbers, and
> containing only a single case (upper or lower) of letter."

That's fine. I prefer using the term legacy tools for rhetoric  
effect. As far as I'm concerned any tool that's processing HTML or  
XML that is Unicode unaware is a legacy tool: even if it's created  
ten years from now.

On these issues (especially containing only a single case), could you  
provide some examples of tool that have problems. Not that we're  
going to include in the recommendation, but it would be helpful for  
us to have research citations backing up notes like this.

>> In this way we let authors know about wearing seatbelts. However, we
>> don't make it seem like we endorse the continued poor state of the
>> tools. ... After all many of those maturity issues may be fixed
>> even before we go to CR status.
>
> It isn't just legacy tools that will get this wrong.
>
> Many tools written primarily for something other than HTML5 will
> continue to be used with html, simply because they are available and
> familiar.  Other languages (including xml and html 4) have different
> rules.

The legacy tools part I was simply referring to the Unicode problems  
and the case (-folding?) problems

>
> Even new (but simple or homegrown) tools written explicitly for use
> with html5 will often get this wrong, because people will continue to
> assume the "obvious" constraints on an ID.  (Depending on their
> previous experience, they may disagree about what those constraints
> are, but they won't always think to remove the constraints.)

I understand if there are tools that might have problems with the  
starting character and continuation character being  from an expanded  
set of characters compared to XML. However, the case issues and the  
non-ASCII issues are certainly legacy problems. I get the feeling  
some of our members might even say that the restricted character set  
for NAME production is also a legacy issue (and I think they may have  
made a convincing case).

Take care,
Rob

Received on Thursday, 2 August 2007 05:43:34 UTC