- From: Robert Burns <rob@robburns.com>
- Date: Thu, 2 Aug 2007 00:43:27 -0500
- To: Jim Jewett <jimjjewett@gmail.com>
- Cc: public-html@w3.org
On Aug 1, 2007, at 5:04 PM, Jim Jewett wrote: > > Putting it in a Green notes would be fine. > >> Something like: >> >> "Note: Authors should be aware that some legacy tools may not handle >> Unicode characters outside the ASCII rang properly when processing >> IDs. For maximum compatibility authors should stick with ASCII only >> characters in producing a value for the @id attribute." > > Slight rewording to > > "Note: Authors should be aware that some [legacy] tools may not > handle all IDs properly. > For maximum compatibility, authors should use IDs starting with an > ASCII letter, containing only ASCII letters and numbers, and > containing only a single case (upper or lower) of letter." That's fine. I prefer using the term legacy tools for rhetoric effect. As far as I'm concerned any tool that's processing HTML or XML that is Unicode unaware is a legacy tool: even if it's created ten years from now. On these issues (especially containing only a single case), could you provide some examples of tool that have problems. Not that we're going to include in the recommendation, but it would be helpful for us to have research citations backing up notes like this. >> In this way we let authors know about wearing seatbelts. However, we >> don't make it seem like we endorse the continued poor state of the >> tools. ... After all many of those maturity issues may be fixed >> even before we go to CR status. > > It isn't just legacy tools that will get this wrong. > > Many tools written primarily for something other than HTML5 will > continue to be used with html, simply because they are available and > familiar. Other languages (including xml and html 4) have different > rules. The legacy tools part I was simply referring to the Unicode problems and the case (-folding?) problems > > Even new (but simple or homegrown) tools written explicitly for use > with html5 will often get this wrong, because people will continue to > assume the "obvious" constraints on an ID. (Depending on their > previous experience, they may disagree about what those constraints > are, but they won't always think to remove the constraints.) I understand if there are tools that might have problems with the starting character and continuation character being from an expanded set of characters compared to XML. However, the case issues and the non-ASCII issues are certainly legacy problems. I get the feeling some of our members might even say that the restricted character set for NAME production is also a legacy issue (and I think they may have made a convincing case). Take care, Rob
Received on Thursday, 2 August 2007 05:43:34 UTC