- From: Jeremy Carroll <jjc@hplb.hpl.hp.com>
- Date: Tue, 2 Oct 2001 12:11:00 +0100
- To: <www-i18n-comments@w3.org>
Background ========== I saw your request not to review the current draft of charmod; but took it to mean to not *specifically* review it. I had a few new comments against the previous draft that still seem pertinent. My comments are based on "implementation experience" i.e. trying to contribute bits of text to the RDF Core WG designed to go into a spec. conforming with (the previous version of) charmod. I will make these comments in the language of the new version. If you prefer, I would be happy to repost these comments when the next public comment period. I split separate issues under separate headings in this e-mail. FYI my attempt to explore RDF literals and charmod conformance is currently http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2001Sep/0341.html [In the RDF Core WG, I currently have an action to separate out the charmod-and-literals issues from the other literals issues, I'll be happy to post a pointer to that to this list when I have done it. Please e-mail me if that will be helpful.] Issue XML comments ================== Example: "suc<!-- comment -->̧on" By charmod, "suc<!-- comment -->̧on" is fully normalized. An XML processor that strips comments then ends up with a non-normalized string, which appears to require further normalization. Early uniform normalization may be better represented by including comments in the defn of full normalization. Issue XPath string-value ======================== More generally, for those many XML specs based around an XPath Nodeset data model the XPath string-value is the crucial representation of strings from the XML document. For XML elements this is defined at: http://www.w3.org/TR/xpath.html#element-nodes as "the concatenation of the string-values of all text node descendants of the element node in document order." Unfortunately requiring all of these string-value's to be in NFC may be burdensome on document authoring tools. Issue Full Normalization as document syntax dependent. ===================================================== The second note in subsection "4.2.2. Fully Normalized Text" acknowledges that "Full normalization is specified against the context of a markup language". I wonder whether this should be upgraded to a requirements on a specification, that if it defines a class of documents, it should define full normalization for those documents. (e.g. if there is a new syntax then that introduces full normalization, but also a new interpretation may stress certain string concatenations which then are included in the definition of full normalization). e.g. for RDF/XML the formation of literals (which are a particular XPath string-value) is a stressed concatenation, whereas other XPath string-values are unimportant and can safely be left unnormalized. There is also merit in defining full normalization once and for all for XML. Notice my lack of commitment one way or the other. Issue Full Normalization as a Web Content requirement ===================================================== See particularly: http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2001Sep/0347.html where Graham Klyne says: > OK. If it's important, then why not "documents MUST be W3C-normalized"? I had no answer to that, it does seem to summarise early uniform normalization more concisely than "4.3 Responsiblity for Normalization", which I had used as my template. This turns the requirements on recipients and producers [I] to equivalent requirements on specifications of documents and documents themselves [S][C] Aside: I find the [S] [I] [C] labels a very significant improvement to charmod. Issue Responsibilities "Proxy" versus "Recipient" ================================================= Considering section 4.3 Responsibility for Normalization, When considering an RDF Processor (whatever that is) I have difficulty in deciding whether the "proxy" rules or the "recipient" rules apply. In particular, the requirement that proxies MUST NOT reject un-normalized data forces a decision as to the role of a component which may be unnatural. (consider a web site mirror for example, that could be considered a proxy or a recipient and a producer: one sort of RDF processor may be quite like a mirror except it picks up document *fragments* from around the web and merges them into a single document). I think that "Proxies MAY reject unnormalized data" would be consistent with the early uniform normalization framework, and resolve this issue. ---- Congratulations on your latest working draft. I have found your work genuinely helpful. Jeremy Carroll
Received on Tuesday, 2 October 2001 07:11:45 UTC