- From: Arnaud Le Hors <lehors@us.ibm.com>
- Date: Thu, 14 Aug 2014 08:01:42 -0700
- To: David Booth <david@dbooth.org>
- Cc: public-rdf-shapes@w3.org
- Message-ID: <OFD6F2A548.F61D7E54-ON88257D34.00517AF2-88257D34.00528BF5@us.ibm.com>
Hi David, Maybe I'm just missing something but I have to admit not to be convinced by your argument that this is a necessity for validation. Rather, it seems to me that you're just trying to piggyback on top of this WG to have it do something that you think would be useful. I understand you have good intentions but I'm sure you know that every deliverable has a cost, even if optional, and I'd rather we don't add to a charter that is already going to require a lot of work. Regards. -- Arnaud Le Hors - Senior Technical Staff Member, Open Web Standards - IBM Software Group David Booth <david@dbooth.org> wrote on 08/13/2014 08:14:38 PM: > From: David Booth <david@dbooth.org> > To: "Peter F. Patel-Schneider" <pfpschneider@gmail.com>, public-rdf- > shapes@w3.org > Date: 08/13/2014 08:15 PM > Subject: Re: regression testing [was Re: summarizing proposed > changes to charter] > > On 08/13/2014 10:04 PM, Peter F. Patel-Schneider wrote: > > OK, even though regression testing doesn't need canonicalization, it is > > useful to have RDF canonicalization to support a particular regression > > testing system. > > > > But how is the lack of a W3C-blessed method for RDF canonicalization > > hindering the development or deployment of this system? How would a > > W3C-blessed method for RDF canonicalization help the development or > > deployment of this system? > > > > The system could use any canonical form whatsoever, after all, right? > > Yes and no. The lack of a W3C-blessed method of RDF canonicalization > makes the comparison dependant on the particular canonicalization tool > that is used, which means that RDF data produced by different tools (or > different versions of the same tool) could not be reliably compared. In > many scenarios this won't be an issue, but it will in some. > > But more importantly, the lack of a standard RDF canonicalization method > discourages the development of canonicalization tools. Canonicalization > has gotten little attention in RDF tools, in my view largely *because* > of the difficulty of doing it and the lack of a W3C-blessed method. It > is non-trivial to implement, and if one's implementation would just end > up as one's own idiosyncratic canonicalization anyway, instead of being > an implementation of a standard, then there isn't as much motivation to > do it. I think a W3C-blessed method would help a lot. > > Would you be okay with canonicalization being an OPTIONAL deliverable? > > David > > > > > peter > > > > > > On 08/13/2014 12:00 PM, David Booth wrote: > >> Hi Peter, > >> > >> On 08/13/2014 01:25 PM, Peter F. Patel-Schneider wrote: > >>> On 08/13/2014 08:45 AM, David Booth wrote: > >>>> Hi Peter, > >>>> > >>>> Here is my main use case for RDF canonicalization. > >>>> > >>>> The RDF Pipeline Framework http://rdfpipeline.org/ allows any kind of > >>>> data to > >>>> be manipulated in a data production pipeline -- not just RDF. The > >>>> Framework > >>>> has regression tests that, when run, are used to validate the > >>>> correctness of > >>>> the output of each node in a pipeline. A test passes if the actual > >>>> node > >>>> output exactly matches the expected node output, *after* filtering out > >>>> ignorable differences. (For example, differences in dates and times > >>>> are > >>>> typically treated as ignorable -- they don't cause a test to fail.) > >>>> Since a > >>>> generic comparison tool is used (because the pipeline is permitted to > >>>> carry > >>>> *any* kind of data), data serialization must be predictable and > >>>> canonical. > >>>> This works great for all kinds of data *except* RDF. > >>> > >>> Why? You could just use RDF graph or dataset isomorphism. Those are > >>> already defined by W3C. Well maybe you need to modify the graphs first > >>> (e.g., to fudge dates and times), but you are already doing that for > >>> other data types. > >>> > >>>> If a canonical form of RDF were defined, then the exact same tools > >>>> that are > >>>> used to compare other kinds of data for differences could also be used > >>>> for > >>>> comparing RDF. > >>> > >>> What are these tools? Why should a tool to determine whether two > >>> strings are the same also work for determining whether two XML documents > >>> are the same. Oh, maybe you think that you should first canonicalize > >>> everything and then do string comparison. However, you are deluding > >>> yourself that this is using the same tools for comparing different kinds > >>> of data. The tool that you are actually using to compare, e.g., XML > >>> documents, is the composition of the datatype-specific canonicalizer and > >>> a string comparer. There is no free lunch---you still need tools > >>> specific to each datatype. > >> > >> Not quite. cmp is used for comparison of *serialized* data, and > >> canonicalization is part of the data *serialization* process -- not > >> the data > >> *comparison* process. The serialization process must necessarily > >> understand > >> what kind of data it is -- there is no way around that -- so that is the > >> logical place to do the canonicalization. But the comparison process > >> does > >> *not* know what kind of data is being compared -- nor should it have > >> to. It's > >> the serializer's job to produce a predictable, repeatable > >> serialization of the > >> data. This works great and is trivially easy for everything *except* > >> RDF, > >> because of the instability of blank node labels. In RDF, comparison is > >> embarrassingly difficult. > >> > >> One could argue that my application could use some workaround to solve > >> this > >> problem, but that belies the fact that the root cause of the problem > >> is *not* > >> some weird thing my application is trying to do, it is a weakness of RDF > >> itself -- a gap in the RDF specs. This gap makes RDF harder to use > >> than it > >> needs to be. If we want RDF to be adopted by a wider audience -- and I > >> certainly do -- then we need to fix obvious gaps like this. > >> > >> I hope that helps clarify why I see this as a problem. Given the > >> above, would > >> you be okay with canonicalization being an OPTIONAL deliverable? > >> > >> Thanks, > >> David > >> > >>> > >>>> I consider this a major deficiency in RDF that really needs to be > >>>> corrected. > >>>> Any significant software effort uses regression tests to validate > >>>> changes. > >>>> But comparing two documents is currently complicated and difficult > >>>> with RDF > >>>> data. RDF canonicalization would make it as easy as it is for every > >>>> other > >>>> data representation. > >>> > >>> How so? Right now you can just use a tool that does RDF graph or > >>> dataset isomorphism. Under your proposal you would need a tool that > >>> does RDF graph or dataset canonicalization, which is no easier than > >>> isomorphism checking. What's the difference? > >>> > >>>> I realize that this is a slightly different -- and more stringent -- > >>>> notion of > >>>> RDF validation than just looking at the general shape of the data, > >>>> because it > >>>> requires that the data not only has the expected shape, but also > >>>> contains the > >>>> expected *values*. Canonicalization would solve this problem. > >>> > >>> Canonicalization is a part of a solution to a problem that is already > >>> solved. > >>> > >>> > >>>> Given this motivation, would you be okay with RDF canonicalization > >>>> being > >>>> included as an OPTIONAL deliverable in the charter? > >>>> > >>>> Thanks, > >>>> David > >>> > >>> > >>> peter > >>> > >>>> On 08/13/2014 01:11 AM, Peter F. Patel-Schneider wrote: > >>>>> I'm still not getting this at all. > >>>>> > >>>>> How does canonicalization help me determine that I got the RDF data > >>>>> that > >>>>> I expected (exact or otherwise)? For example, how does > >>>>> canonicalization > >>>>> help me determine that I got some RDF data that tells me the phone > >>>>> numbers of my friends? > >>>>> > >>>>> I just can't come up with a use case at all related to RDF data > >>>>> validation where canonicalization is relevant, except for signing RDF > >>>>> graphs, and that can just as easily be done at the surface syntax > >>>>> level, > >>>>> and signing is quite tangential to the WG's purpose, I think. > >>>>> > >>>>> peter > >>>>> > >>>>> > >>>>> On 08/12/2014 09:17 PM, David Booth wrote: > >>>>>> I think "canonicalization" would be a clearer term, as in: > >>>>>> > >>>>>> "OPTIONAL - A Recommendation for canonical serialization > >>>>>> of RDF graphs and RDF datasets." > >>>>>> > >>>>>> The purpose of this (to me) is to be able to validate that I got the > >>>>>> *exact* > >>>>>> RDF data that I expected -- not merely the right classes and > >>>>>> predicates and > >>>>>> such. Would you be okay with including this in the charter? > >>>>>> > >>>>>> Thanks, > >>>>>> David > >>>>>> > >>>>>> On 08/12/2014 10:00 PM, Peter F. Patel-Schneider wrote: > >>>>>>> I'm still not exactly sure just what normalization means in this > >>>>>>> context > >>>>>>> or what relationship it has to RDF validation. > >>>>>>> > >>>>>>> peter > >>>>>>> > >>>>>>> > >>>>>>> On 08/12/2014 06:55 PM, David Booth wrote: > >>>>>>>> +1 for all except one item. > >>>>>>>> > >>>>>>>> I'd like to make one last ditch attempt to include graph > >>>>>>>> normalization > >>>>>>>> as an > >>>>>>>> OPTIONAL deliverable. I expect the WG to treat it as low priority, > >>>>>>>> and would > >>>>>>>> only anticipate a normalization document being produced if someone > >>>>>>>> takes the > >>>>>>>> personal initiative to draft it. I do not see any significant > >>>>>>>> harm in > >>>>>>>> including it in the charter on that basis, but I do see a benefit, > >>>>>>>> because if > >>>>>>>> the WG did somehow get to it then it would damn nice to have, so > >>>>>>>> that > >>>>>>>> we could > >>>>>>>> finally validate RDF data by having a standard way to compare > >>>>>>>> two RDF > >>>>>>>> documents for equality, like we can routinely do with every other > >>>>>>>> data > >>>>>>>> representation. > >>>>>>>> > >>>>>>>> Peter, would that be okay with you, to include graph > >>>>>>>> normalization as > >>>>>>>> OPTIONAL > >>>>>>>> that way? > >>>>>>>> > >>>>>>>> Thanks, > >>>>>>>> David > >>>>>>>> > >>>>>>>> On 08/12/2014 08:55 PM, Eric Prud'hommeaux wrote: > >>>>>>>>> Hi all, we can have a face-to-face at the W3C Technical Plenary in > >>>>>>>>> November if we can quickly endorse a good-enough charter. As it > >>>>>>>>> stands now, it isn't clear that the group will be able to reach > >>>>>>>>> consensus within the Working Group, let alone get through the > >>>>>>>>> member > >>>>>>>>> review without objection. > >>>>>>>>> > >>>>>>>>> Please review the proposals that I've culled from the list. I > >>>>>>>>> encournage compromise on all our parts and we'll have to suppress > >>>>>>>>> the > >>>>>>>>> desire to wordsmith. (Given the 3-month evaluation period, > >>>>>>>>> wordsmithing won't change much anyways.) > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> separate semantics: > >>>>>>>>> > >>>>>>>>> "Peter F. Patel-Schneider" <pfpschneider@gmail.com> - > >>>>>>>>> Message-ID: > >>>>>>>>> <53E2AFBD.9050102@gmail.com> > >>>>>>>>> A syntax and semantics for shapes specifying how to construct > >>>>>>>>> shape > >>>>>>>>> expressions and how shape expressions are evaluated against RDF > >>>>>>>>> graphs. > >>>>>>>>> "Dam, Jesse van" <jesse.vandam@wur.nl> - Message-ID: > >>>>>>>>> <63CF398D7F09744BA51193F17F5252AB1FD60B24@SCOMP0936.wurnet.nl> > >>>>>>>>> defining the the (direct) semantics meaning of shapes and > >>>>>>>>> defining the > >>>>>>>>> associated validation process. > >>>>>>>>> > >>>>>>>>> opposition: Holger Knublauch > >>>>>>>>> > >>>>>>>>> proposed resolution: include, noting that if SPARQL is judged > >>>>>>>>> to be > >>>>>>>>> useful for the semantics, there's nothing preventing us from > >>>>>>>>> using it. > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> make graph normalization optional or use-case specific: > >>>>>>>>> > >>>>>>>>> "Peter F. Patel-Schneider" <pfpschneider@gmail.com> - > >>>>>>>>> Message-ID: > >>>>>>>>> <53E2AFBD.9050102@gmail.com> > >>>>>>>>> 3 OPTIONAL A specification of how shape verification > >>>>>>>>> interacts > >>>>>>>>> with > >>>>>>>>> inference. > >>>>>>>>> Jeremy J Carroll <jjc@syapse.com> - Message-Id: > >>>>>>>>> <D954B744-05CD-4E5C-8FC2-C08A9A99BA9F@syapse.com> > >>>>>>>>> the WG will consider whether it is necessary, practical or > >>>>>>>>> desireable > >>>>>>>>> to normalize a graph... > >>>>>>>>> A graph normalization method, suitable for the use cases > >>>>>>>>> determined by > >>>>>>>>> the group.... > >>>>>>>>> David Booth <david@dbooth.org> - Message-ID: > >>>>>>>>> <53E28D07.9000804@dbooth.org> > >>>>>>>>> OPTIONAL - A Recommendation for > >>>>>>>>> normalization/canonicalization > >>>>>>>>> of RDF > >>>>>>>>> graphs and RDF datasets that are serialized in N-Triples and > >>>>>>>>> N-Quads. > >>>>>>>>> opposition - don't do it at all: > >>>>>>>>> "Peter F. Patel-Schneider" <pfpschneider@gmail.com> - > >>>>>>>>> Message-ID: > >>>>>>>>> <53E3A4CB.4040200@gmail.com> > >>>>>>>>> the WG should not be working on this. > >>>>>>>>> > >>>>>>>>> proposed resolution: withdrawn, to go to new light-weight, > >>>>>>>>> focused > >>>>>>>>> WG, > >>>>>>>>> removing this text: > >>>>>>>>> [[ > >>>>>>>>> The WG MAY produce a Recommendation for graph normalization. > >>>>>>>>> ]] > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> mandatory human-facing language: > >>>>>>>>> > >>>>>>>>> "Dam, Jesse van" <jesse.vandam@wur.nl> - Message-ID: > >>>>>>>>> <63CF398D7F09744BA51193F17F5252AB1FD60B24@SCOMP0936.wurnet.nl> > >>>>>>>>> ShExC mandatory, but potentially as a Note. > >>>>>>>>> David Booth <david@dbooth.org> - Message-ID: > >>>>>>>>> <53E28D07.9000804@dbooth.org> > >>>>>>>>> In Section 4 (Deliverables), change "OPTIONAL - Compact, > >>>>>>>>> human-readable > >>>>>>>>> syntax" to "Compact, human-readable syntax", i.e., make it > >>>>>>>>> required. > >>>>>>>>> Jeremy J Carroll <jjc@syapse.com> - Message-Id: > >>>>>>>>> <54AA894F-F4B4-4877-8806-EB85FB5A42E5@syapse.com> > >>>>>>>>> > >>>>>>>>> opposition - make it OPTIONAL > >>>>>>>>> "Peter F. Patel-Schneider" <pfpschneider@gmail.com> - > >>>>>>>>> Message-ID: > >>>>>>>>> <53E2AFBD.9050102@gmail.com> > >>>>>>>>> OPTIONAL A compact, human-readable syntax for expressing > >>>>>>>>> shapes. > >>>>>>>>> > >>>>>>>>> proposed resolution: keep as OPTIONAL, not mentioning ShExC, > >>>>>>>>> but > >>>>>>>>> clarifying that it's different from the RDF syntax. > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> report formats: > >>>>>>>>> Dimitris Kontokostas <kontokostas@informatik.uni-leipzig.de> > >>>>>>>>> provide flexible validation execution plans that range from: > >>>>>>>>> Success / fail > >>>>>>>>> Success / fail per constraint > >>>>>>>>> Fails with error counts > >>>>>>>>> Individual resources that fail per constraint > >>>>>>>>> And enriched failed resources with annotations > >>>>>>>>> > >>>>>>>>> proposed resolution: no change, noting that no one seconded > >>>>>>>>> this > >>>>>>>>> proposal. > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> test suite/validator: > >>>>>>>>> > >>>>>>>>> Dimitris Kontokostas <kontokostas@informatik.uni-leipzig.de> > >>>>>>>>> Validation results are very important for the progress of > >>>>>>>>> this > >>>>>>>>> WG and > >>>>>>>>> should be a standalone deliverable. > >>>>>>>>> David Booth <david@dbooth.org> - Message-ID: > >>>>>>>>> <53E28D07.9000804@dbooth.org> > >>>>>>>>> Test Suite, to help ensure interoperability and correct > >>>>>>>>> implementation. > >>>>>>>>> The group will chose the location of this deliverable, such as > >>>>>>>>> a git > >>>>>>>>> repository. > >>>>>>>>> > >>>>>>>>> proposed resolution: leave from charter as WGs usually > >>>>>>>>> choose to > >>>>>>>>> do this > >>>>>>>>> anyways and it has no impact on IP commitments. > >>>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>> > >>> > >>> > >>> > > > > > > >
Received on Thursday, 14 August 2014 15:02:16 UTC