- From: Markus Lanthaler <markus.lanthaler@gmx.net>
- Date: Fri, 15 Aug 2014 18:13:49 +0200
- To: <public-rdf-shapes@w3.org>
- Cc: "'Manu Sporny'" <msporny@digitalbazaar.com>, "'Dave Longley'" <dlongley@digitalbazaar.com>
On 15 Aug 2014 at 04:15, Holger Knublauch wrote: > I sympathize with your desire to piggyback on a WG for this work, > but I also see the risk of spreading too far. There are probably > other ways in the W3C processes, to allow very small working groups > to proceed with such deliverables, outside of the larger WGs for the > "big picture" items? If not, then the W3C process probably has an > unnecessary limitation. As I understand this problem is something > that less than 3 people can write up, it gets reviewed and could be > signed off without impacting any other semantic web standards. > > On the specific topic, we have had requests for reliably sorted > Turtle files for years, especially so that people can compare > versions of the same file with concurrent versioning systems. > TopBraid includes this feature now also in the Free Edition, > whenever you save .ttl files. Jeremy knows much more about this > algorithm than I do, and it is certainly a useful feature. Whether > this needs to be a "standard" or just an algorithm from an open > source library is a question that I cannot answer. The JSON-LD CG worked on graph normalization [1] but there wasn't really enough interest (apart from Digital Bazaar) to continue the work on it. Nevertheless I think setting up a separate CG focusing on just this might be interesting and will help to find out if there's enough interest to produce a spec. I CCed Manu and Dave as they might interested or have something more to say. [1] http://json-ld.org/spec/latest/rdf-graph-normalization/ -- Markus Lanthaler @markuslanthaler On 8/15/2014 3:26, David Booth wrote: > On 08/14/2014 11:01 AM, Arnaud Le Hors wrote: >> Hi David, >> >> Maybe I'm just missing something but I have to admit not to be convinced >> by your argument that this is a necessity for validation. Rather, it >> seems to me that you're just trying to piggyback on top of this WG to >> have it do something that you think would be useful. > > In a sense I am, because as I mentioned before, this is a somewhat > different notion of validation than just looking at the shape of the > data. I agree that it is not a necessity for *shape* validation, but > I do see it as important for validating, in a uniform way, that actual > data is equivalent to expected data. But I understand that that is > tangential to the main use case that the group wants to focus on, so I > won't push it further. > >> >> I understand you have good intentions but I'm sure you know that every >> deliverable has a cost, even if optional, and I'd rather we don't add to >> a charter that is already going to require a lot of work. > > Ok, I'll drop the request to include it. Thanks for considering. > > David > >> >> Regards. >> -- >> Arnaud Le Hors - Senior Technical Staff Member, Open Web Standards - >> IBM Software Group >> >> >> David Booth <david@dbooth.org> wrote on 08/13/2014 08:14:38 PM: >> >> > From: David Booth <david@dbooth.org> >> > To: "Peter F. Patel-Schneider" <pfpschneider@gmail.com>, public-rdf- >> > shapes@w3.org >> > Date: 08/13/2014 08:15 PM >> > Subject: Re: regression testing [was Re: summarizing proposed >> > changes to charter] >> > >> > On 08/13/2014 10:04 PM, Peter F. Patel-Schneider wrote: >> > > OK, even though regression testing doesn't need >> canonicalization, it is >> > > useful to have RDF canonicalization to support a particular >> regression >> > > testing system. >> > > >> > > But how is the lack of a W3C-blessed method for RDF >> canonicalization >> > > hindering the development or deployment of this system? How >> would a >> > > W3C-blessed method for RDF canonicalization help the development or >> > > deployment of this system? >> > > >> > > The system could use any canonical form whatsoever, after all, >> right? >> > >> > Yes and no. The lack of a W3C-blessed method of RDF canonicalization >> > makes the comparison dependant on the particular canonicalization >> tool >> > that is used, which means that RDF data produced by different >> tools (or >> > different versions of the same tool) could not be reliably >> compared. In >> > many scenarios this won't be an issue, but it will in some. >> > >> > But more importantly, the lack of a standard RDF canonicalization >> method >> > discourages the development of canonicalization tools. >> Canonicalization >> > has gotten little attention in RDF tools, in my view largely >> *because* >> > of the difficulty of doing it and the lack of a W3C-blessed >> method. It >> > is non-trivial to implement, and if one's implementation would >> just end >> > up as one's own idiosyncratic canonicalization anyway, instead of >> being >> > an implementation of a standard, then there isn't as much >> motivation to >> > do it. I think a W3C-blessed method would help a lot. >> > >> > Would you be okay with canonicalization being an OPTIONAL >> deliverable? >> > >> > David >> > >> > > >> > > peter >> > > >> > > >> > > On 08/13/2014 12:00 PM, David Booth wrote: >> > >> Hi Peter, >> > >> >> > >> On 08/13/2014 01:25 PM, Peter F. Patel-Schneider wrote: >> > >>> On 08/13/2014 08:45 AM, David Booth wrote: >> > >>>> Hi Peter, >> > >>>> >> > >>>> Here is my main use case for RDF canonicalization. >> > >>>> >> > >>>> The RDF Pipeline Framework http://rdfpipeline.org/allows any >> kind of >> > >>>> data to >> > >>>> be manipulated in a data production pipeline -- not just RDF. >> The >> > >>>> Framework >> > >>>> has regression tests that, when run, are used to validate the >> > >>>> correctness of >> > >>>> the output of each node in a pipeline. A test passes if the >> actual >> > >>>> node >> > >>>> output exactly matches the expected node output, *after* >> filtering out >> > >>>> ignorable differences. (For example, differences in dates >> and times >> > >>>> are >> > >>>> typically treated as ignorable -- they don't cause a test to >> fail.) >> > >>>> Since a >> > >>>> generic comparison tool is used (because the pipeline is >> permitted to >> > >>>> carry >> > >>>> *any* kind of data), data serialization must be predictable and >> > >>>> canonical. >> > >>>> This works great for all kinds of data *except* RDF. >> > >>> >> > >>> Why? You could just use RDF graph or dataset isomorphism. >> Those are >> > >>> already defined by W3C. Well maybe you need to modify the graphs >> first >> > >>> (e.g., to fudge dates and times), but you are already doing >> that for >> > >>> other data types. >> > >>> >> > >>>> If a canonical form of RDF were defined, then the exact same >> tools >> > >>>> that are >> > >>>> used to compare other kinds of data for differences could also >> be used >> > >>>> for >> > >>>> comparing RDF. >> > >>> >> > >>> What are these tools? Why should a tool to determine whether two >> > >>> strings are the same also work for determining whether two XML >> documents >> > >>> are the same. Oh, maybe you think that you should first >> canonicalize >> > >>> everything and then do string comparison. However, you are >> deluding >> > >>> yourself that this is using the same tools for comparing >> different kinds >> > >>> of data. The tool that you are actually using to compare, >> e.g., XML >> > >>> documents, is the composition of the datatype-specific >> canonicalizer and >> > >>> a string comparer. There is no free lunch---you still need tools >> > >>> specific to each datatype. >> > >> >> > >> Not quite. cmp is used for comparison of *serialized* data, and >> > >> canonicalization is part of the data *serialization* process -- >> not >> > >> the data >> > >> *comparison* process. The serialization process must necessarily >> > >> understand >> > >> what kind of data it is -- there is no way around that -- so that >> is the >> > >> logical place to do the canonicalization. But the comparison >> process >> > >> does >> > >> *not* know what kind of data is being compared -- nor should it >> have >> > >> to. It's >> > >> the serializer's job to produce a predictable, repeatable >> > >> serialization of the >> > >> data. This works great and is trivially easy for everything >> *except* >> > >> RDF, >> > >> because of the instability of blank node labels. In RDF, >> comparison is >> > >> embarrassingly difficult. >> > >> >> > >> One could argue that my application could use some workaround >> to solve >> > >> this >> > >> problem, but that belies the fact that the root cause of the >> problem >> > >> is *not* >> > >> some weird thing my application is trying to do, it is a weakness >> of RDF >> > >> itself -- a gap in the RDF specs. This gap makes RDF harder to >> use >> > >> than it >> > >> needs to be. If we want RDF to be adopted by a wider audience -- >> and I >> > >> certainly do -- then we need to fix obvious gaps like this. >> > >> >> > >> I hope that helps clarify why I see this as a problem. Given the >> > >> above, would >> > >> you be okay with canonicalization being an OPTIONAL deliverable? >> > >> >> > >> Thanks, >> > >> David >> > >> >> > >>> >> > >>>> I consider this a major deficiency in RDF that really needs >> to be >> > >>>> corrected. >> > >>>> Any significant software effort uses regression tests to >> validate >> > >>>> changes. >> > >>>> But comparing two documents is currently complicated and >> difficult >> > >>>> with RDF >> > >>>> data. RDF canonicalization would make it as easy as it is >> for every >> > >>>> other >> > >>>> data representation. >> > >>> >> > >>> How so? Right now you can just use a tool that does RDF graph or >> > >>> dataset isomorphism. Under your proposal you would need a >> tool that >> > >>> does RDF graph or dataset canonicalization, which is no easier >> than >> > >>> isomorphism checking. What's the difference? >> > >>> >> > >>>> I realize that this is a slightly different -- and more >> stringent -- >> > >>>> notion of >> > >>>> RDF validation than just looking at the general shape of the >> data, >> > >>>> because it >> > >>>> requires that the data not only has the expected shape, but also >> > >>>> contains the >> > >>>> expected *values*. Canonicalization would solve this problem. >> > >>> >> > >>> Canonicalization is a part of a solution to a problem that is >> already >> > >>> solved. >> > >>> >> > >>> >> > >>>> Given this motivation, would you be okay with RDF >> canonicalization >> > >>>> being >> > >>>> included as an OPTIONAL deliverable in the charter? >> > >>>> >> > >>>> Thanks, >> > >>>> David >> > >>> >> > >>> >> > >>> peter >> > >>> >> > >>>> On 08/13/2014 01:11 AM, Peter F. Patel-Schneider wrote: >> > >>>>> I'm still not getting this at all. >> > >>>>> >> > >>>>> How does canonicalization help me determine that I got the >> RDF data >> > >>>>> that >> > >>>>> I expected (exact or otherwise)? For example, how does >> > >>>>> canonicalization >> > >>>>> help me determine that I got some RDF data that tells me the >> phone >> > >>>>> numbers of my friends? >> > >>>>> >> > >>>>> I just can't come up with a use case at all related to RDF data >> > >>>>> validation where canonicalization is relevant, except for >> signing RDF >> > >>>>> graphs, and that can just as easily be done at the surface >> syntax >> > >>>>> level, >> > >>>>> and signing is quite tangential to the WG's purpose, I think. >> > >>>>> >> > >>>>> peter >> > >>>>> >> > >>>>> >> > >>>>> On 08/12/2014 09:17 PM, David Booth wrote: >> > >>>>>> I think "canonicalization" would be a clearer term, as in: >> > >>>>>> >> > >>>>>> "OPTIONAL - A Recommendation for canonical serialization >> > >>>>>> of RDF graphs and RDF datasets." >> > >>>>>> >> > >>>>>> The purpose of this (to me) is to be able to validate that I >> got the >> > >>>>>> *exact* >> > >>>>>> RDF data that I expected -- not merely the right classes and >> > >>>>>> predicates and >> > >>>>>> such. Would you be okay with including this in the charter? >> > >>>>>> >> > >>>>>> Thanks, >> > >>>>>> David >> > >>>>>> >> > >>>>>> On 08/12/2014 10:00 PM, Peter F. Patel-Schneider wrote: >> > >>>>>>> I'm still not exactly sure just what normalization means >> in this >> > >>>>>>> context >> > >>>>>>> or what relationship it has to RDF validation. >> > >>>>>>> >> > >>>>>>> peter >> > >>>>>>> >> > >>>>>>> >> > >>>>>>> On 08/12/2014 06:55 PM, David Booth wrote: >> > >>>>>>>> +1 for all except one item. >> > >>>>>>>> >> > >>>>>>>> I'd like to make one last ditch attempt to include graph >> > >>>>>>>> normalization >> > >>>>>>>> as an >> > >>>>>>>> OPTIONAL deliverable. I expect the WG to treat it as low >> priority, >> > >>>>>>>> and would >> > >>>>>>>> only anticipate a normalization document being produced if >> someone >> > >>>>>>>> takes the >> > >>>>>>>> personal initiative to draft it. I do not see any >> significant >> > >>>>>>>> harm in >> > >>>>>>>> including it in the charter on that basis, but I do see a >> benefit, >> > >>>>>>>> because if >> > >>>>>>>> the WG did somehow get to it then it would damn nice to >> have, so >> > >>>>>>>> that >> > >>>>>>>> we could >> > >>>>>>>> finally validate RDF data by having a standard way to >> compare >> > >>>>>>>> two RDF >> > >>>>>>>> documents for equality, like we can routinely do with every >> other >> > >>>>>>>> data >> > >>>>>>>> representation. >> > >>>>>>>> >> > >>>>>>>> Peter, would that be okay with you, to include graph >> > >>>>>>>> normalization as >> > >>>>>>>> OPTIONAL >> > >>>>>>>> that way? >> > >>>>>>>> >> > >>>>>>>> Thanks, >> > >>>>>>>> David >> > >>>>>>>> >> > >>>>>>>> On 08/12/2014 08:55 PM, Eric Prud'hommeaux wrote: >> > >>>>>>>>> Hi all, we can have a face-to-face at the W3C Technical >> Plenary in >> > >>>>>>>>> November if we can quickly endorse a good-enough charter. >> As it >> > >>>>>>>>> stands now, it isn't clear that the group will be able >> to reach >> > >>>>>>>>> consensus within the Working Group, let alone get >> through the >> > >>>>>>>>> member >> > >>>>>>>>> review without objection. >> > >>>>>>>>> >> > >>>>>>>>> Please review the proposals that I've culled from the >> list. I >> > >>>>>>>>> encournage compromise on all our parts and we'll have to >> suppress >> > >>>>>>>>> the >> > >>>>>>>>> desire to wordsmith. (Given the 3-month evaluation period, >> > >>>>>>>>> wordsmithing won't change much anyways.) >> > >>>>>>>>> >> > >>>>>>>>> >> > >>>>>>>>> separate semantics: >> > >>>>>>>>> >> > >>>>>>>>> "Peter F. Patel-Schneider" <pfpschneider@gmail.com> - >> > >>>>>>>>> Message-ID: >> > >>>>>>>>> <53E2AFBD.9050102@gmail.com> >> > >>>>>>>>> A syntax and semantics for shapes specifying how to >> construct >> > >>>>>>>>> shape >> > >>>>>>>>> expressions and how shape expressions are evaluated >> against RDF >> > >>>>>>>>> graphs. >> > >>>>>>>>> "Dam, Jesse van" <jesse.vandam@wur.nl> - Message-ID: >> > >>>>>>>>> >> <63CF398D7F09744BA51193F17F5252AB1FD60B24@SCOMP0936.wurnet.nl> >> > >>>>>>>>> defining the the (direct) semantics meaning of >> shapes and >> > >>>>>>>>> defining the >> > >>>>>>>>> associated validation process. >> > >>>>>>>>> >> > >>>>>>>>> opposition: Holger Knublauch >> > >>>>>>>>> >> > >>>>>>>>> proposed resolution: include, noting that if SPARQL is >> judged >> > >>>>>>>>> to be >> > >>>>>>>>> useful for the semantics, there's nothing preventing us >> from >> > >>>>>>>>> using it. >> > >>>>>>>>> >> > >>>>>>>>> >> > >>>>>>>>> make graph normalization optional or use-case specific: >> > >>>>>>>>> >> > >>>>>>>>> "Peter F. Patel-Schneider" <pfpschneider@gmail.com> - >> > >>>>>>>>> Message-ID: >> > >>>>>>>>> <53E2AFBD.9050102@gmail.com> >> > >>>>>>>>> 3 OPTIONAL A specification of how shape verification >> > >>>>>>>>> interacts >> > >>>>>>>>> with >> > >>>>>>>>> inference. >> > >>>>>>>>> Jeremy J Carroll <jjc@syapse.com> - Message-Id: >> > >>>>>>>>> <D954B744-05CD-4E5C-8FC2-C08A9A99BA9F@syapse.com> >> > >>>>>>>>> the WG will consider whether it is necessary, >> practical or >> > >>>>>>>>> desireable >> > >>>>>>>>> to normalize a graph... >> > >>>>>>>>> A graph normalization method, suitable for the use >> cases >> > >>>>>>>>> determined by >> > >>>>>>>>> the group.... >> > >>>>>>>>> David Booth <david@dbooth.org> - Message-ID: >> > >>>>>>>>> <53E28D07.9000804@dbooth.org> >> > >>>>>>>>> OPTIONAL - A Recommendation for >> > >>>>>>>>> normalization/canonicalization >> > >>>>>>>>> of RDF >> > >>>>>>>>> graphs and RDF datasets that are serialized in N-Triples >> and >> > >>>>>>>>> N-Quads. >> > >>>>>>>>> opposition - don't do it at all: >> > >>>>>>>>> "Peter F. Patel-Schneider" <pfpschneider@gmail.com> - >> > >>>>>>>>> Message-ID: >> > >>>>>>>>> <53E3A4CB.4040200@gmail.com> >> > >>>>>>>>> the WG should not be working on this. >> > >>>>>>>>> >> > >>>>>>>>> proposed resolution: withdrawn, to go to new >> light-weight, >> > >>>>>>>>> focused >> > >>>>>>>>> WG, >> > >>>>>>>>> removing this text: >> > >>>>>>>>> [[ >> > >>>>>>>>> The WG MAY produce a Recommendation for graph >> normalization. >> > >>>>>>>>> ]] >> > >>>>>>>>> >> > >>>>>>>>> >> > >>>>>>>>> mandatory human-facing language: >> > >>>>>>>>> >> > >>>>>>>>> "Dam, Jesse van" <jesse.vandam@wur.nl> - Message-ID: >> > >>>>>>>>> >> <63CF398D7F09744BA51193F17F5252AB1FD60B24@SCOMP0936.wurnet.nl> >> > >>>>>>>>> ShExC mandatory, but potentially as a Note. >> > >>>>>>>>> David Booth <david@dbooth.org> - Message-ID: >> > >>>>>>>>> <53E28D07.9000804@dbooth.org> >> > >>>>>>>>> In Section 4 (Deliverables), change "OPTIONAL - >> Compact, >> > >>>>>>>>> human-readable >> > >>>>>>>>> syntax" to "Compact, human-readable syntax", i.e., make it >> > >>>>>>>>> required. >> > >>>>>>>>> Jeremy J Carroll <jjc@syapse.com> - Message-Id: >> > >>>>>>>>> <54AA894F-F4B4-4877-8806-EB85FB5A42E5@syapse.com> >> > >>>>>>>>> >> > >>>>>>>>> opposition - make it OPTIONAL >> > >>>>>>>>> "Peter F. Patel-Schneider" <pfpschneider@gmail.com> - >> > >>>>>>>>> Message-ID: >> > >>>>>>>>> <53E2AFBD.9050102@gmail.com> >> > >>>>>>>>> OPTIONAL A compact, human-readable syntax for >> expressing >> > >>>>>>>>> shapes. >> > >>>>>>>>> >> > >>>>>>>>> proposed resolution: keep as OPTIONAL, not mentioning >> ShExC, >> > >>>>>>>>> but >> > >>>>>>>>> clarifying that it's different from the RDF syntax. >> > >>>>>>>>> >> > >>>>>>>>> >> > >>>>>>>>> report formats: >> > >>>>>>>>> Dimitris Kontokostas >> <kontokostas@informatik.uni-leipzig.de> >> > >>>>>>>>> provide flexible validation execution plans that range >> from: >> > >>>>>>>>> Success / fail >> > >>>>>>>>> Success / fail per constraint >> > >>>>>>>>> Fails with error counts >> > >>>>>>>>> Individual resources that fail per constraint >> > >>>>>>>>> And enriched failed resources with annotations >> > >>>>>>>>> >> > >>>>>>>>> proposed resolution: no change, noting that no one >> seconded >> > >>>>>>>>> this >> > >>>>>>>>> proposal. >> > >>>>>>>>> >> > >>>>>>>>> >> > >>>>>>>>> test suite/validator: >> > >>>>>>>>> >> > >>>>>>>>> Dimitris Kontokostas >> <kontokostas@informatik.uni-leipzig.de> >> > >>>>>>>>> Validation results are very important for the >> progress of >> > >>>>>>>>> this >> > >>>>>>>>> WG and >> > >>>>>>>>> should be a standalone deliverable. >> > >>>>>>>>> David Booth <david@dbooth.org> - Message-ID: >> > >>>>>>>>> <53E28D07.9000804@dbooth.org> >> > >>>>>>>>> Test Suite, to help ensure interoperability and >> correct >> > >>>>>>>>> implementation. >> > >>>>>>>>> The group will chose the location of this deliverable, >> such as >> > >>>>>>>>> a git >> > >>>>>>>>> repository. >> > >>>>>>>>> >> > >>>>>>>>> proposed resolution: leave from charter as WGs usually >> > >>>>>>>>> choose to >> > >>>>>>>>> do this >> > >>>>>>>>> anyways and it has no impact on IP commitments. >> > >>>>>>>>> >> > >>>>>>>> >> > >>>>>>> >> > >>>>>>> >> > >>>>>>> >> > >>>>>>> >> > >>>>> >> > >>>>> >> > >>>>> >> > >>>>> >> > >>>> >> > >>> >> > >>> >> > >>> >> > > >> > > >> > > >> > >
Received on Friday, 15 August 2014 16:14:32 UTC