- From: David Booth <david@dbooth.org>
- Date: Thu, 14 Aug 2014 13:26:25 -0400
- To: Arnaud Le Hors <lehors@us.ibm.com>
- CC: public-rdf-shapes@w3.org
On 08/14/2014 11:01 AM, Arnaud Le Hors wrote: > Hi David, > > Maybe I'm just missing something but I have to admit not to be convinced > by your argument that this is a necessity for validation. Rather, it > seems to me that you're just trying to piggyback on top of this WG to > have it do something that you think would be useful. In a sense I am, because as I mentioned before, this is a somewhat different notion of validation than just looking at the shape of the data. I agree that it is not a necessity for *shape* validation, but I do see it as important for validating, in a uniform way, that actual data is equivalent to expected data. But I understand that that is tangential to the main use case that the group wants to focus on, so I won't push it further. > > I understand you have good intentions but I'm sure you know that every > deliverable has a cost, even if optional, and I'd rather we don't add to > a charter that is already going to require a lot of work. Ok, I'll drop the request to include it. Thanks for considering. David > > Regards. > -- > Arnaud Le Hors - Senior Technical Staff Member, Open Web Standards - > IBM Software Group > > > David Booth <david@dbooth.org> wrote on 08/13/2014 08:14:38 PM: > > > From: David Booth <david@dbooth.org> > > To: "Peter F. Patel-Schneider" <pfpschneider@gmail.com>, public-rdf- > > shapes@w3.org > > Date: 08/13/2014 08:15 PM > > Subject: Re: regression testing [was Re: summarizing proposed > > changes to charter] > > > > On 08/13/2014 10:04 PM, Peter F. Patel-Schneider wrote: > > > OK, even though regression testing doesn't need canonicalization, it is > > > useful to have RDF canonicalization to support a particular regression > > > testing system. > > > > > > But how is the lack of a W3C-blessed method for RDF canonicalization > > > hindering the development or deployment of this system? How would a > > > W3C-blessed method for RDF canonicalization help the development or > > > deployment of this system? > > > > > > The system could use any canonical form whatsoever, after all, right? > > > > Yes and no. The lack of a W3C-blessed method of RDF canonicalization > > makes the comparison dependant on the particular canonicalization tool > > that is used, which means that RDF data produced by different tools (or > > different versions of the same tool) could not be reliably compared. In > > many scenarios this won't be an issue, but it will in some. > > > > But more importantly, the lack of a standard RDF canonicalization method > > discourages the development of canonicalization tools. Canonicalization > > has gotten little attention in RDF tools, in my view largely *because* > > of the difficulty of doing it and the lack of a W3C-blessed method. It > > is non-trivial to implement, and if one's implementation would just end > > up as one's own idiosyncratic canonicalization anyway, instead of being > > an implementation of a standard, then there isn't as much motivation to > > do it. I think a W3C-blessed method would help a lot. > > > > Would you be okay with canonicalization being an OPTIONAL deliverable? > > > > David > > > > > > > > peter > > > > > > > > > On 08/13/2014 12:00 PM, David Booth wrote: > > >> Hi Peter, > > >> > > >> On 08/13/2014 01:25 PM, Peter F. Patel-Schneider wrote: > > >>> On 08/13/2014 08:45 AM, David Booth wrote: > > >>>> Hi Peter, > > >>>> > > >>>> Here is my main use case for RDF canonicalization. > > >>>> > > >>>> The RDF Pipeline Framework http://rdfpipeline.org/allows any kind of > > >>>> data to > > >>>> be manipulated in a data production pipeline -- not just RDF. The > > >>>> Framework > > >>>> has regression tests that, when run, are used to validate the > > >>>> correctness of > > >>>> the output of each node in a pipeline. A test passes if the actual > > >>>> node > > >>>> output exactly matches the expected node output, *after* > filtering out > > >>>> ignorable differences. (For example, differences in dates and times > > >>>> are > > >>>> typically treated as ignorable -- they don't cause a test to fail.) > > >>>> Since a > > >>>> generic comparison tool is used (because the pipeline is > permitted to > > >>>> carry > > >>>> *any* kind of data), data serialization must be predictable and > > >>>> canonical. > > >>>> This works great for all kinds of data *except* RDF. > > >>> > > >>> Why? You could just use RDF graph or dataset isomorphism. Those are > > >>> already defined by W3C. Well maybe you need to modify the graphs > first > > >>> (e.g., to fudge dates and times), but you are already doing that for > > >>> other data types. > > >>> > > >>>> If a canonical form of RDF were defined, then the exact same tools > > >>>> that are > > >>>> used to compare other kinds of data for differences could also > be used > > >>>> for > > >>>> comparing RDF. > > >>> > > >>> What are these tools? Why should a tool to determine whether two > > >>> strings are the same also work for determining whether two XML > documents > > >>> are the same. Oh, maybe you think that you should first canonicalize > > >>> everything and then do string comparison. However, you are deluding > > >>> yourself that this is using the same tools for comparing > different kinds > > >>> of data. The tool that you are actually using to compare, e.g., XML > > >>> documents, is the composition of the datatype-specific > canonicalizer and > > >>> a string comparer. There is no free lunch---you still need tools > > >>> specific to each datatype. > > >> > > >> Not quite. cmp is used for comparison of *serialized* data, and > > >> canonicalization is part of the data *serialization* process -- not > > >> the data > > >> *comparison* process. The serialization process must necessarily > > >> understand > > >> what kind of data it is -- there is no way around that -- so that > is the > > >> logical place to do the canonicalization. But the comparison process > > >> does > > >> *not* know what kind of data is being compared -- nor should it have > > >> to. It's > > >> the serializer's job to produce a predictable, repeatable > > >> serialization of the > > >> data. This works great and is trivially easy for everything *except* > > >> RDF, > > >> because of the instability of blank node labels. In RDF, > comparison is > > >> embarrassingly difficult. > > >> > > >> One could argue that my application could use some workaround to solve > > >> this > > >> problem, but that belies the fact that the root cause of the problem > > >> is *not* > > >> some weird thing my application is trying to do, it is a weakness > of RDF > > >> itself -- a gap in the RDF specs. This gap makes RDF harder to use > > >> than it > > >> needs to be. If we want RDF to be adopted by a wider audience -- > and I > > >> certainly do -- then we need to fix obvious gaps like this. > > >> > > >> I hope that helps clarify why I see this as a problem. Given the > > >> above, would > > >> you be okay with canonicalization being an OPTIONAL deliverable? > > >> > > >> Thanks, > > >> David > > >> > > >>> > > >>>> I consider this a major deficiency in RDF that really needs to be > > >>>> corrected. > > >>>> Any significant software effort uses regression tests to validate > > >>>> changes. > > >>>> But comparing two documents is currently complicated and difficult > > >>>> with RDF > > >>>> data. RDF canonicalization would make it as easy as it is for every > > >>>> other > > >>>> data representation. > > >>> > > >>> How so? Right now you can just use a tool that does RDF graph or > > >>> dataset isomorphism. Under your proposal you would need a tool that > > >>> does RDF graph or dataset canonicalization, which is no easier than > > >>> isomorphism checking. What's the difference? > > >>> > > >>>> I realize that this is a slightly different -- and more stringent -- > > >>>> notion of > > >>>> RDF validation than just looking at the general shape of the data, > > >>>> because it > > >>>> requires that the data not only has the expected shape, but also > > >>>> contains the > > >>>> expected *values*. Canonicalization would solve this problem. > > >>> > > >>> Canonicalization is a part of a solution to a problem that is already > > >>> solved. > > >>> > > >>> > > >>>> Given this motivation, would you be okay with RDF canonicalization > > >>>> being > > >>>> included as an OPTIONAL deliverable in the charter? > > >>>> > > >>>> Thanks, > > >>>> David > > >>> > > >>> > > >>> peter > > >>> > > >>>> On 08/13/2014 01:11 AM, Peter F. Patel-Schneider wrote: > > >>>>> I'm still not getting this at all. > > >>>>> > > >>>>> How does canonicalization help me determine that I got the RDF data > > >>>>> that > > >>>>> I expected (exact or otherwise)? For example, how does > > >>>>> canonicalization > > >>>>> help me determine that I got some RDF data that tells me the phone > > >>>>> numbers of my friends? > > >>>>> > > >>>>> I just can't come up with a use case at all related to RDF data > > >>>>> validation where canonicalization is relevant, except for > signing RDF > > >>>>> graphs, and that can just as easily be done at the surface syntax > > >>>>> level, > > >>>>> and signing is quite tangential to the WG's purpose, I think. > > >>>>> > > >>>>> peter > > >>>>> > > >>>>> > > >>>>> On 08/12/2014 09:17 PM, David Booth wrote: > > >>>>>> I think "canonicalization" would be a clearer term, as in: > > >>>>>> > > >>>>>> "OPTIONAL - A Recommendation for canonical serialization > > >>>>>> of RDF graphs and RDF datasets." > > >>>>>> > > >>>>>> The purpose of this (to me) is to be able to validate that I > got the > > >>>>>> *exact* > > >>>>>> RDF data that I expected -- not merely the right classes and > > >>>>>> predicates and > > >>>>>> such. Would you be okay with including this in the charter? > > >>>>>> > > >>>>>> Thanks, > > >>>>>> David > > >>>>>> > > >>>>>> On 08/12/2014 10:00 PM, Peter F. Patel-Schneider wrote: > > >>>>>>> I'm still not exactly sure just what normalization means in this > > >>>>>>> context > > >>>>>>> or what relationship it has to RDF validation. > > >>>>>>> > > >>>>>>> peter > > >>>>>>> > > >>>>>>> > > >>>>>>> On 08/12/2014 06:55 PM, David Booth wrote: > > >>>>>>>> +1 for all except one item. > > >>>>>>>> > > >>>>>>>> I'd like to make one last ditch attempt to include graph > > >>>>>>>> normalization > > >>>>>>>> as an > > >>>>>>>> OPTIONAL deliverable. I expect the WG to treat it as low > priority, > > >>>>>>>> and would > > >>>>>>>> only anticipate a normalization document being produced if > someone > > >>>>>>>> takes the > > >>>>>>>> personal initiative to draft it. I do not see any significant > > >>>>>>>> harm in > > >>>>>>>> including it in the charter on that basis, but I do see a > benefit, > > >>>>>>>> because if > > >>>>>>>> the WG did somehow get to it then it would damn nice to have, so > > >>>>>>>> that > > >>>>>>>> we could > > >>>>>>>> finally validate RDF data by having a standard way to compare > > >>>>>>>> two RDF > > >>>>>>>> documents for equality, like we can routinely do with every > other > > >>>>>>>> data > > >>>>>>>> representation. > > >>>>>>>> > > >>>>>>>> Peter, would that be okay with you, to include graph > > >>>>>>>> normalization as > > >>>>>>>> OPTIONAL > > >>>>>>>> that way? > > >>>>>>>> > > >>>>>>>> Thanks, > > >>>>>>>> David > > >>>>>>>> > > >>>>>>>> On 08/12/2014 08:55 PM, Eric Prud'hommeaux wrote: > > >>>>>>>>> Hi all, we can have a face-to-face at the W3C Technical > Plenary in > > >>>>>>>>> November if we can quickly endorse a good-enough charter. > As it > > >>>>>>>>> stands now, it isn't clear that the group will be able to reach > > >>>>>>>>> consensus within the Working Group, let alone get through the > > >>>>>>>>> member > > >>>>>>>>> review without objection. > > >>>>>>>>> > > >>>>>>>>> Please review the proposals that I've culled from the list. I > > >>>>>>>>> encournage compromise on all our parts and we'll have to > suppress > > >>>>>>>>> the > > >>>>>>>>> desire to wordsmith. (Given the 3-month evaluation period, > > >>>>>>>>> wordsmithing won't change much anyways.) > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> separate semantics: > > >>>>>>>>> > > >>>>>>>>> "Peter F. Patel-Schneider" <pfpschneider@gmail.com> - > > >>>>>>>>> Message-ID: > > >>>>>>>>> <53E2AFBD.9050102@gmail.com> > > >>>>>>>>> A syntax and semantics for shapes specifying how to > construct > > >>>>>>>>> shape > > >>>>>>>>> expressions and how shape expressions are evaluated against RDF > > >>>>>>>>> graphs. > > >>>>>>>>> "Dam, Jesse van" <jesse.vandam@wur.nl> - Message-ID: > > >>>>>>>>> <63CF398D7F09744BA51193F17F5252AB1FD60B24@SCOMP0936.wurnet.nl> > > >>>>>>>>> defining the the (direct) semantics meaning of shapes and > > >>>>>>>>> defining the > > >>>>>>>>> associated validation process. > > >>>>>>>>> > > >>>>>>>>> opposition: Holger Knublauch > > >>>>>>>>> > > >>>>>>>>> proposed resolution: include, noting that if SPARQL is > judged > > >>>>>>>>> to be > > >>>>>>>>> useful for the semantics, there's nothing preventing us from > > >>>>>>>>> using it. > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> make graph normalization optional or use-case specific: > > >>>>>>>>> > > >>>>>>>>> "Peter F. Patel-Schneider" <pfpschneider@gmail.com> - > > >>>>>>>>> Message-ID: > > >>>>>>>>> <53E2AFBD.9050102@gmail.com> > > >>>>>>>>> 3 OPTIONAL A specification of how shape verification > > >>>>>>>>> interacts > > >>>>>>>>> with > > >>>>>>>>> inference. > > >>>>>>>>> Jeremy J Carroll <jjc@syapse.com> - Message-Id: > > >>>>>>>>> <D954B744-05CD-4E5C-8FC2-C08A9A99BA9F@syapse.com> > > >>>>>>>>> the WG will consider whether it is necessary, practical or > > >>>>>>>>> desireable > > >>>>>>>>> to normalize a graph... > > >>>>>>>>> A graph normalization method, suitable for the use cases > > >>>>>>>>> determined by > > >>>>>>>>> the group.... > > >>>>>>>>> David Booth <david@dbooth.org> - Message-ID: > > >>>>>>>>> <53E28D07.9000804@dbooth.org> > > >>>>>>>>> OPTIONAL - A Recommendation for > > >>>>>>>>> normalization/canonicalization > > >>>>>>>>> of RDF > > >>>>>>>>> graphs and RDF datasets that are serialized in N-Triples and > > >>>>>>>>> N-Quads. > > >>>>>>>>> opposition - don't do it at all: > > >>>>>>>>> "Peter F. Patel-Schneider" <pfpschneider@gmail.com> - > > >>>>>>>>> Message-ID: > > >>>>>>>>> <53E3A4CB.4040200@gmail.com> > > >>>>>>>>> the WG should not be working on this. > > >>>>>>>>> > > >>>>>>>>> proposed resolution: withdrawn, to go to new light-weight, > > >>>>>>>>> focused > > >>>>>>>>> WG, > > >>>>>>>>> removing this text: > > >>>>>>>>> [[ > > >>>>>>>>> The WG MAY produce a Recommendation for graph normalization. > > >>>>>>>>> ]] > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> mandatory human-facing language: > > >>>>>>>>> > > >>>>>>>>> "Dam, Jesse van" <jesse.vandam@wur.nl> - Message-ID: > > >>>>>>>>> <63CF398D7F09744BA51193F17F5252AB1FD60B24@SCOMP0936.wurnet.nl> > > >>>>>>>>> ShExC mandatory, but potentially as a Note. > > >>>>>>>>> David Booth <david@dbooth.org> - Message-ID: > > >>>>>>>>> <53E28D07.9000804@dbooth.org> > > >>>>>>>>> In Section 4 (Deliverables), change "OPTIONAL - Compact, > > >>>>>>>>> human-readable > > >>>>>>>>> syntax" to "Compact, human-readable syntax", i.e., make it > > >>>>>>>>> required. > > >>>>>>>>> Jeremy J Carroll <jjc@syapse.com> - Message-Id: > > >>>>>>>>> <54AA894F-F4B4-4877-8806-EB85FB5A42E5@syapse.com> > > >>>>>>>>> > > >>>>>>>>> opposition - make it OPTIONAL > > >>>>>>>>> "Peter F. Patel-Schneider" <pfpschneider@gmail.com> - > > >>>>>>>>> Message-ID: > > >>>>>>>>> <53E2AFBD.9050102@gmail.com> > > >>>>>>>>> OPTIONAL A compact, human-readable syntax for expressing > > >>>>>>>>> shapes. > > >>>>>>>>> > > >>>>>>>>> proposed resolution: keep as OPTIONAL, not mentioning ShExC, > > >>>>>>>>> but > > >>>>>>>>> clarifying that it's different from the RDF syntax. > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> report formats: > > >>>>>>>>> Dimitris Kontokostas <kontokostas@informatik.uni-leipzig.de> > > >>>>>>>>> provide flexible validation execution plans that range > from: > > >>>>>>>>> Success / fail > > >>>>>>>>> Success / fail per constraint > > >>>>>>>>> Fails with error counts > > >>>>>>>>> Individual resources that fail per constraint > > >>>>>>>>> And enriched failed resources with annotations > > >>>>>>>>> > > >>>>>>>>> proposed resolution: no change, noting that no one seconded > > >>>>>>>>> this > > >>>>>>>>> proposal. > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> test suite/validator: > > >>>>>>>>> > > >>>>>>>>> Dimitris Kontokostas <kontokostas@informatik.uni-leipzig.de> > > >>>>>>>>> Validation results are very important for the progress of > > >>>>>>>>> this > > >>>>>>>>> WG and > > >>>>>>>>> should be a standalone deliverable. > > >>>>>>>>> David Booth <david@dbooth.org> - Message-ID: > > >>>>>>>>> <53E28D07.9000804@dbooth.org> > > >>>>>>>>> Test Suite, to help ensure interoperability and correct > > >>>>>>>>> implementation. > > >>>>>>>>> The group will chose the location of this deliverable, such as > > >>>>>>>>> a git > > >>>>>>>>> repository. > > >>>>>>>>> > > >>>>>>>>> proposed resolution: leave from charter as WGs usually > > >>>>>>>>> choose to > > >>>>>>>>> do this > > >>>>>>>>> anyways and it has no impact on IP commitments. > > >>>>>>>>> > > >>>>>>>> > > >>>>>>> > > >>>>>>> > > >>>>>>> > > >>>>>>> > > >>>>> > > >>>>> > > >>>>> > > >>>>> > > >>>> > > >>> > > >>> > > >>> > > > > > > > > > > >
Received on Thursday, 14 August 2014 17:26:54 UTC