- From: Martin Duerst <duerst@it.aoyama.ac.jp>
- Date: Wed, 20 Jun 2007 10:23:44 +0900
- To: "Grosso, Paul" <pgrosso@ptc.com>
- Cc: <public-iri@w3.org>, "Richard Ishida" <ishida@w3.org>, "Felix Sasaki" <fsasaki@w3.org>, <www-xml-linking-comments@w3.org>, <public-xml-core-wg@w3.org>, public-i18n-core@w3.org
Hello Paul, others, First, I'd prefer a bit more notice if you want to set such a hard deadline, and I guess others would do so, too. Second, with respect to minor differences between the IRI spec and the HRRI draft, I think that you should be able to look at the differences carefully as well as I am able to look at your draft. Nevertheless, I have tried to give it another look. Here is what I have found (with no guarantee for completeness, of course, please check again on your side). http://www.w3.org/TR/REC-xml/#charsets allows (although, at least in never versions, discourages): [#xFDD0-#xFDDF], [#x1FFFE-#x1FFFF], [#x2FFFE-#x2FFFF], [#x3FFFE-#x3FFFF], [#x4FFFE-#x4FFFF], [#x5FFFE-#x5FFFF], [#x6FFFE-#x6FFFF], [#x7FFFE-#x7FFFF], [#x8FFFE-#x8FFFF], [#x9FFFE-#x9FFFF], [#xAFFFE-#xAFFFF], [#xBFFFE-#xBFFFF], [#xCFFFE-#xCFFFF], [#xDFFFE-#xDFFFF], [#xEFFFE-#xEFFFF], [#xFFFFE-#xFFFFF], [#x10FFFE-#x10FFFF] In the IRI spec, these are excluded: ucschar = %xA0-D7FF / %xF900-FDCF / %xFDF0-FFEF / %x10000-1FFFD / %x20000-2FFFD / %x30000-3FFFD / %x40000-4FFFD / %x50000-5FFFD / %x60000-6FFFD / %x70000-7FFFD / %x80000-8FFFD / %x90000-9FFFD / %xA0000-AFFFD / %xB0000-BFFFD / %xC0000-CFFFD / %xD0000-DFFFD / %xE1000-EFFFD so you have to add them to your list in section 3. This in essence seems to amount to adding 'iprivate', the list of characters above, and the list of characters you already have to 'ucschar'. Please check this understanding as a cross-check. Apart from these small fixes, I have to very clearly note that most of my more general concerns haven't been addressed. These are, somewhat reworded/completed: - The overall usefulness (seen from a overall W3C or overall IETF standpoint) of having separate definitions, in separate documents, for two essentially extremely closely related protocol elements. [I have proposed to integrate your material into an update of the IRI spec.] - The choice of name, which is highly suggestive instead of descriptive, inappropriate on several accounts (for the largest part of URIs/IRIs, HRRIs are only marginally more readable, if at all, and the overall syntax still poses a lot of problems for average human users (http://...). - The overall description. I note e.g. the following: "However, it is often inconvenient for authors to encode these characters." How often? Unless somebody is authoring a lot of XPointers by hand, this can't happen that often (maybe with the exception of the space, but then you discurage that (correctly!) yourself). I suggest to reword "often" to "occasionally". There are similar examples elsewhere. - The classification as a BCP. Procedurally, it's unclear to me why the IETF would classify a protocol element spec as a BCP when the related ones (URI, IRI) are standards track. Content-wise, it's unclear why the IETF would call something a BEST current practice if in earlier discussion, they have clearly preferred to disallow or marginalize this practice (and that was only for spaces and such, not for controls). - The security section now mentions the issues with control characters. This should definitely be a bit more specific, and should contain explicit recommendations. I'd write that receivers may want to filter out such characters, or URIs with such characters, and therefore including them in the first place is discouraged. - You have some advice against using raw spaces ("Also, authors of HRRIs are advised to percent encode space characters themselves, rather than rely on the processor to do so, because spaces are often used to separate HRRIs in a sequence"), but not against others, where similar arguments apply: - tabs and CR/LF are removed/merged/coverted to spaces in attribute values (merging also occurrs for spaces) - <> are often used to delimit URIs/IRIs - arbitrary controls may trigger some security filter - private use characters are not interoperable - non-characters (the list above) are discouraged in XML itself (not sure this list is complete, but I guess it's getting close) - The last paragraph of Section 3 is somewhat problematic. In general, it's okay, but the second half of the last sentence ("nor the process of passing a Human Readable Resource Identifier to a process or software component responsible for dereferencing it SHOULD trigger percent encoding") may suggest that resolution interfaces come with three different entry points. I think it would be better to have done this work by the XML side when resolving something. Not only have these concerns not yet been adressed, but also do I not remember having received any kind of reply on these issues. Looking forward to hear from you again. Regards, Martin. At 07:37 07/06/19, Grosso, Paul wrote: > > > > >Martin et al., > >Please check out >http://www.w3.org/XML/2007/04/hrri/draft-walsh-tobin-hrri-01c.html >and let us know whether you have any further comments or are >satisfied with this (draft) ID. In either case, please send >a response. Several specs are on hold awaiting progression >of this to RFC, and we would like to be sure to make progress. > >We would prefer to have a definite response from you, but if >we have not heard by 11:00 ET (Boston time) this Wednesday, >we will assume you have no more comments on this spec. > >paul > >> -----Original Message----- >> From: public-xml-core-wg-request@w3.org >> [mailto:public-xml-core-wg-request@w3.org] On Behalf Of Norman Walsh >> Sent: Tuesday, 2007 June 12 12:25 >> To: Martin Duerst >> Cc: public-iri@w3.org; Richard Ishida; Felix Sasaki; >> www-xml-linking-comments@w3.org; public-xml-core-wg@w3.org >> Subject: Re: Fwd: Re: HRRIs, IRIs, etc >> >> / Martin Duerst <duerst@it.aoyama.ac.jp> was heard to say: >> | Dear IRI and XML experts, >> [...] >> | - The IRI spec excludes private use characters from all but >> the query part. >> >> We have attempted to address this concern[1] by adding >> >> * characters in the Unicode private use area (#xE000-#xF8FF), except >> where they appear in the query part of the resulting IRI. >> >> to the list. >> >> | (there are other smaller differences, but for the moment, >> this is enough) >> >> Could you, please, provide a more exhaustive account of the >> differences which concern you? The Core WG thinks it would be most >> efficient if we could consider as many of them as possible at the same >> time. > > > [1] http://www.w3.org/XML/2007/04/hrri/draft-walsh-tobin-hrri-01c.html #-#-# Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University #-#-# http://www.sw.it.aoyama.ac.jp mailto:duerst@it.aoyama.ac.jp
Received on Wednesday, 20 June 2007 01:24:32 UTC