- From: by way of <staschuk@telusplanet.net>
- Date: Wed, 08 Jan 2003 06:26:47 -0700
- To: W3C XML Schema Comments list <www-xml-schema-comments@w3.org>
Quoth noah_mendelsohn@us.ibm.com: > Steven Taschuk writes: > > _Part 2: Datatypes_ defines canonical lexical > > representations for most of the built-in simple types, > > but their use is unclear. [...] > > a) May wish to build implementations that start with a value and > eventually serialize to characters. [...] Ah, yes. Good point. > > Trolling through the archives, I find a suggestion that > > canonicalization is useful in the context of signed > > XML [...] > > Hard to comment without seeing the note in question. [...] Fair enough. I refer to "XML Schema and the necessity for canonical representations", <dee3@us.ibm.com>, 1999-05-21: <http://lists.w3.org/Archives/Public/www-xml-schema-comments/1999AprJun/0060.html> I gather that that note was written fairly early in the process, to argue for the need for canonical representations in the first place. Digital signatures are just one example of an application for which canonicalization issues are important; others certainly exist, and I have no particular stake in signatures specifically. > [...] Specifically, such a c14n would support signatures in > cases where you truly do not care that a float: > > 100 > > has been rewritten as > > 1.0E+2 > > The fact is, there are some applications for which you do NOT want the > signature to match on the above; you want to know that someone has > tampered with your document. [...] I think the W3C can at best > standardize c14n conventions for some of the most common use cases. Absolutely. Let me clarify the angle I'm approaching this from. Whatever equivalence relation on documents I wish to use in a particular application, it is useful to have a canonicalizer for that relation, that is, a processor which takes as input an arbitrary document and produces as output an equivalent canonical form, under the equivalence relation of interest. (This is not the only way to implement an equivalence relation, but it has the merit of loose coupling: it permits, for example, digital signature software, version management systems, and file comparison tools to operate on byte or character streams without any knowledge of the equivalence relation I deem to be most appropriate for the case at hand.) XML Schema implies a model of what XML documents consist of; I feel it is desirable to be able to write such a canonicalizer for the equivalence relation under which documents are equivalent if they differ only in ways not reflected in that model. Among other things, this includes the use of alternative lexical representations for the same value. So far this is all obvious. Now, how should such a canonicalizer canonicalize representations of user-defined simple types? A naïve implementation would apply algorithms appropriate for the built-in types from which they are derived -- if this approach were sound, it would have the merit of being applicable to any simple type whatsoever (provided schema information were available). My onTheHour example, however, shows that this approach can generate "canonical" documents that are not schema-valid. (Schema-invalid canonical documents might be tolerable if schema-valid versions could be reconstructed at need; but as you pointed out on a related point, this is in general a theorem-proving exercise.) This is the problem I refer to when I say that a canonicalizer needs special knowledge of all the simple types it encounters, namely knowledge of how to canonicalize representations of those types. This requirement seems to make canonical lexical representations much less useful than they would otherwise be, indeed, to make impossible what is to me the most obvious and desirable use of them... which is what prompted my question. [...] > > While I'm at it, why isn't canonical form a facet of > > the type? > > IMO, because you can't alter or depend onthe canonical form when creating > restrictions. [...] A good point. But you can't alter the equal facet either (unless I'm missing something in the recommendation). > > Incidentally, the above example, silly as it is, > > illustrates an important respect in which values of a > > type derived by restriction cannot be treated by a > > generic processor as values of the base type. [...] > > I don't understand. You're "onTheHour" times aren't legal as both > lexical and value space forms for xsd:dateTime? They are, and surely that is sufficient for most processors; I meant to refer to canonicalizers specifically, which cannot canonicalize representations of a type as if they were representations of the base type without sometimes producing schema-invalid output. -- Steven Taschuk | Receive them ignorant; staschuk@telusplanet.net | dispatch them confused. | (Weschler's Teaching Motto)
Received on Wednesday, 8 January 2003 08:31:15 UTC