- From: David Booth <david@dbooth.org>
- Date: Tue, 31 Jul 2012 22:48:53 -0400
- To: "Peter F. Patel-Schneider" <pfpschneider@gmail.com>
- Cc: public-rdf-comments <public-rdf-comments@w3.org>
On Tue, 2012-07-31 at 16:24 -0400, Peter F. Patel-Schneider wrote: > On 07/31/2012 03:59 PM, David Booth wrote: > > Hi Peter, > > > > On Tue, 2012-07-31 at 15:36 -0400, Peter F. Patel-Schneider wrote: > >> Hmm. > >> > >> Your two examples have different canonical forms in XML. I do not believe > >> that going beyond XML canonicalization is a good idea. > > What downside do you see? > > If RDF goes beyond XML canonicalization is it doing something to XML datatypes > that is not part of the XML specification. This appears to be driving a > further wedge between RDF and XML data. I guess I'm not following what you mean. For example, the xsd:datetimeStamp datatype already requires a timezoneFrag to be specified, and one permissible timezoneFrag is "Z" (meaning UTC). If RDF canonicalization suggested that the timezoneFrag always be "Z", what wedge would that drive between RDF and XML data? > > [...] > > > > >> In any case, I don't see the point here. If equality-unique canonical forms > >> are only encouraged, then applications will still have to do datatype-aware > >> comparisons. > > Only if they need to handle all possible data serializations. If 90% > > of the available datasets use the canonical forms then many apps will > > not need to do datatype-aware comparisons, though the ones that need to > > cover 100% will. > > If even 99.99% of available datasets use the canonical forms then all apps > should still be prepared for non-canonical forms. To do otherwise is to be > wrong. It would be wrong for *some* apps, but by no means all. You can't paint all apps with the same brush. For example, if there are 100 datasets available, and 100 apps, and 90 of the datasets use the canonical forms, and 40 of the apps only need the datasets that use the canonical forms, then that substantially lowers the implementation barrier for those 40 apps. > That is not to say that being wrong is not useful on occasion, but I > don't see that there is any good to be had here in the WG suggesting canonical > forms be used exclusively. I just described some substantial good. I'm not suggesting that canonicalization be used *exclusively*, but merely that it be *encouraged*, because it does significantly simplify processing when it can be used. > > > > I think it is important to keep the RDF entry barrier as low as possible > > whenever possible, in order to support scruffy apps that are good enough > > for many purposes, even if they don't handle every case. > > > > David > > > It is important that apps should do the right thing. For example, should apps > ignore character encoding? How hard is doing datatype-aware processing of > literals, compared with all the rest of the stuff that is required to handle RDF? It depends entirely on the application. In the case of xsd:datetime, for example, it means the literal must be completely parsed into its year, month, day, hour, minute, seconds and timezone offset, and then datetime arithmetic -- which is *not* simple -- must be used to properly add the timezone offset in order to compare two values. All this instead of a simple, string comparison! But the worst part is that the application has to *understand* the different datatypes, and this means that the code either has to special case every datatype, or it has to implement some kind of general datatype-handling framework. Suddenly, an app that could have been a one-off, three-line perl script blows up into something that requires significantly more development effort. The RDF model is so simple. It would be nice if it could be processed very simply whenever possible. "Make the simple cases simple", etc. > > peter > > PS: Yes, I do use text processors to handle RDF, and quite often, even > analysing the 2011 Billion Triple Challenge triples using sed and grep. > However, I check to ensure that the right thing happens. Right, that's exactly the kind of simplified processing that I think we should facilitate as often as possible. -- David Booth, Ph.D. http://dbooth.org/ Opinions expressed herein are those of the author and do not necessarily reflect those of his employer.
Received on Wednesday, 1 August 2012 02:49:23 UTC