- From: Martin Duerst <duerst@w3.org>
- Date: Sat, 25 May 2002 08:11:43 +0900
- To: Aaron Swartz <me@aaronsw.com>, www-tag@w3.org
Hello Aaron, First some procedural points, starting with the end of your mail: >I'm considering appealing this decision, The Character Model is in last call, so you can raise a comment. If you think this is important to you, please use the form at http://www.w3.org/2002/05/charmod/LastCall to make sure we get your comment. If the I18N WG then decides to reject your comment, you can appeal that decision at that stage. >but I wanted to hear the TAG's position first, If you think it's important, you should send in a comment yourself. And the WG should treat this comment as seriously as any other, on its merits. If the TAG has the same issue, they should send it in by themselves. At 14:58 02/05/24 -0500, Aaron Swartz wrote: >I would like to draw the TAG's attention to this requirement in charmod: > >""" >W3C specifications that define protocol or format elements (e.g. HTTP >headers, XML attributes, etc.) which are to be interpreted as URI >references (or specific subsets of URI references, such as absolute URI >references, URIs, etc.) SHOULD use Internationalized Resource Identifiers >(IRI) [I-D IRI] (or an appropriate subset thereof). >""" > - http://www.w3.org/TR/2002/WD-charmod-20020430/#sec-URIs > >RDF, for example, has recently moved to replace URIs with IRIs (or >something like them). Strictly speaking, the RDF Core WG is working on clarifying the following note in the RDF Model and Syntax Specification: (http://www.w3.org/TR/REC-rdf-syntax/) >>>> Note: Although non-ASCII characters in URIs are not allowed by [URI], [XML] specifies a convention to avoid unnecessary incompatibilities in extended URI syntax. Implementors of RDF are encouraged to avoid further incompatibility and use the XML convention for system identifiers. Namely, that a non-ASCII character in a URI be represented in UTF-8 as one or more bytes, and then these bytes be escaped with the URI escaping mechanism (i.e., by converting each byte to %HH, where HH is the hexadecimal notation of the byte value). >>>> >I find this seriously problematic since it will break many utilities which >have made the assumption that RDF identifiers are ASCII strings with no >spaces, etc. I have heard such things in other contexts years ago. I very much hope the Semantic Web is leading technology, not just repeating past errors. Making things work without arbitrary restrictions is easy if you do it from the start, and not too difficult to fix later in most cases. >I can understand presenting strings this way for user-display and >user-entry but storing them this way and making them the official encoding >seems to be going too far. XML can 'store' them without problems. N3 also should be able to do it. >I would think that simply using UTF-8 %-encoding would be fine for these >purposes. Why do you think so? Would you think it would make sense to replace mailto:me@aaronsw.com with something like mailto:%6d%65@%a1%a1%72%6f%6e%73%77.%63%6f%6d or maybe even more appropriately, with something like the above but using Greek letters instead of Latin ones? This is just about how people using another script than Latin in their day-to-day work would feel. Why should they have to use special tools (having to do syntax analysis so that they can figure out where a % is an escape character and when not,...) just to be able to read the text, just because some tools make too restrictive assumptions? >An example from the RDF test cases shows an HTTP URI with embedded >accented characters in Unicode. Great. I think we should get a few more examples, with different scripts. I'd be glad to contribute. That way, tools will quickly be upgraded. Regards, Martin.
Received on Friday, 24 May 2002 19:12:45 UTC