Re: regression testing [was Re: summarizing proposed changes to charter]

Hi David,

Maybe I'm just missing something but I have to admit not to be convinced 
by your argument that this is a necessity for validation. Rather, it seems 
to me that you're just trying to piggyback on top of this WG to have it do 
something that you think would be useful.

I understand you have good intentions but I'm sure you know that every 
deliverable has a cost, even if optional, and I'd rather we don't add to a 
charter that is already going to require a lot of work.

Regards.
--
Arnaud  Le Hors - Senior Technical Staff Member, Open Web Standards - IBM 
Software Group


David Booth <david@dbooth.org> wrote on 08/13/2014 08:14:38 PM:

> From: David Booth <david@dbooth.org>
> To: "Peter F. Patel-Schneider" <pfpschneider@gmail.com>, public-rdf-
> shapes@w3.org
> Date: 08/13/2014 08:15 PM
> Subject: Re: regression testing [was Re: summarizing proposed 
> changes to charter]
> 
> On 08/13/2014 10:04 PM, Peter F. Patel-Schneider wrote:
> > OK, even though regression testing doesn't need canonicalization, it 
is
> > useful to have RDF canonicalization to support a particular regression
> > testing system.
> >
> > But how is the lack of a W3C-blessed method for RDF canonicalization
> > hindering the development or deployment of this system?  How would a
> > W3C-blessed method for RDF canonicalization help the development or
> > deployment of this system?
> >
> > The system could use any canonical form whatsoever, after all, right?
> 
> Yes and no.  The lack of a W3C-blessed method of RDF canonicalization 
> makes the comparison dependant on the particular canonicalization tool 
> that is used, which means that RDF data produced by different tools (or 
> different versions of the same tool) could not be reliably compared.  In 

> many scenarios this won't be an issue, but it will in some.
> 
> But more importantly, the lack of a standard RDF canonicalization method 

> discourages the development of canonicalization tools.  Canonicalization 

> has gotten little attention in RDF tools, in my view largely *because* 
> of the difficulty of doing it and the lack of a W3C-blessed method.  It 
> is non-trivial to implement, and if one's implementation would just end 
> up as one's own idiosyncratic canonicalization anyway, instead of being 
> an implementation of a standard, then there isn't as much motivation to 
> do it.  I think a W3C-blessed method would help a lot.
> 
> Would you be okay with canonicalization being an OPTIONAL deliverable?
> 
> David
> 
> >
> > peter
> >
> >
> > On 08/13/2014 12:00 PM, David Booth wrote:
> >> Hi Peter,
> >>
> >> On 08/13/2014 01:25 PM, Peter F. Patel-Schneider wrote:
> >>> On 08/13/2014 08:45 AM, David Booth wrote:
> >>>> Hi Peter,
> >>>>
> >>>> Here is my main use case for RDF canonicalization.
> >>>>
> >>>> The RDF Pipeline Framework http://rdfpipeline.org/ allows any kind 
of
> >>>> data to
> >>>> be manipulated in a data production pipeline -- not just RDF. The
> >>>> Framework
> >>>> has regression tests that, when run, are used to validate the
> >>>> correctness of
> >>>> the output of each node in a pipeline.  A test passes if the actual
> >>>> node
> >>>> output exactly matches the expected node output, *after* filtering 
out
> >>>> ignorable differences.  (For example, differences in dates and 
times
> >>>> are
> >>>> typically treated as ignorable -- they don't cause a test to fail.)
> >>>> Since a
> >>>> generic comparison tool is used (because the pipeline is permitted 
to
> >>>> carry
> >>>> *any* kind of data), data serialization must be predictable and
> >>>> canonical.
> >>>> This works great for all kinds of data *except* RDF.
> >>>
> >>> Why?  You could just use RDF graph or dataset isomorphism.  Those 
are
> >>> already defined by W3C.  Well maybe you need to modify the graphs 
first
> >>> (e.g., to fudge dates and times), but you are already doing that for
> >>> other data types.
> >>>
> >>>> If a canonical form of RDF were defined, then the exact same tools
> >>>> that are
> >>>> used to compare other kinds of data for differences could also be 
used
> >>>> for
> >>>> comparing RDF.
> >>>
> >>> What are these tools?  Why should a tool to determine whether two
> >>> strings are the same also work for determining whether two XML 
documents
> >>> are the same. Oh, maybe you think that you should first canonicalize
> >>> everything and then do string comparison.  However, you are deluding
> >>> yourself that this is using the same tools for comparing different 
kinds
> >>> of data.  The tool that you are actually using to compare, e.g., XML
> >>> documents, is the composition of the datatype-specific canonicalizer 
and
> >>> a string comparer.  There is no free lunch---you still need tools
> >>> specific to each datatype.
> >>
> >> Not quite.  cmp is used for comparison of *serialized* data, and
> >> canonicalization is part of the data *serialization* process -- not
> >> the data
> >> *comparison* process.   The serialization process must necessarily
> >> understand
> >> what kind of data it is -- there is no way around that -- so that is 
the
> >> logical place to do the canonicalization.  But the comparison process
> >> does
> >> *not* know what kind of data is being compared -- nor should it have
> >> to.  It's
> >> the serializer's job to produce a predictable, repeatable
> >> serialization of the
> >> data.  This works great and is trivially easy for everything *except*
> >> RDF,
> >> because of the instability of blank node labels.  In RDF, comparison 
is
> >> embarrassingly difficult.
> >>
> >> One could argue that my application could use some workaround to 
solve
> >> this
> >> problem, but that belies the fact that the root cause of the problem
> >> is *not*
> >> some weird thing my application is trying to do, it is a weakness of 
RDF
> >> itself -- a gap in the RDF specs.  This gap makes RDF harder to use
> >> than it
> >> needs to be.  If we want RDF to be adopted by a wider audience -- and 
I
> >> certainly do -- then we need to fix obvious gaps like this.
> >>
> >> I hope that helps clarify why I see this as a problem.  Given the
> >> above, would
> >> you be okay with canonicalization being an OPTIONAL deliverable?
> >>
> >> Thanks,
> >> David
> >>
> >>>
> >>>> I consider this a major deficiency in RDF that really needs to be
> >>>> corrected.
> >>>> Any significant software effort uses regression tests to validate
> >>>> changes.
> >>>> But comparing two documents is currently complicated and difficult
> >>>> with RDF
> >>>> data.  RDF canonicalization would make it as easy as it is for 
every
> >>>> other
> >>>> data representation.
> >>>
> >>> How so?  Right now you can just use a tool that does RDF graph or
> >>> dataset isomorphism.  Under your proposal you would need a tool that
> >>> does RDF graph or dataset canonicalization, which is no easier than
> >>> isomorphism checking. What's the difference?
> >>>
> >>>> I realize that this is a slightly different -- and more stringent 
--
> >>>> notion of
> >>>> RDF validation than just looking at the general shape of the data,
> >>>> because it
> >>>> requires that the data not only has the expected shape, but also
> >>>> contains the
> >>>> expected *values*.  Canonicalization would solve this problem.
> >>>
> >>> Canonicalization is a part of a solution to a problem that is 
already
> >>> solved.
> >>>
> >>>
> >>>> Given this motivation, would you be okay with RDF canonicalization
> >>>> being
> >>>> included as an OPTIONAL deliverable in the charter?
> >>>>
> >>>> Thanks,
> >>>> David
> >>>
> >>>
> >>> peter
> >>>
> >>>> On 08/13/2014 01:11 AM, Peter F. Patel-Schneider wrote:
> >>>>> I'm still not getting this at all.
> >>>>>
> >>>>> How does canonicalization help me determine that I got the RDF 
data
> >>>>> that
> >>>>> I expected (exact or otherwise)?  For example, how does
> >>>>> canonicalization
> >>>>> help me determine that I got some RDF data that tells me the phone
> >>>>> numbers of my friends?
> >>>>>
> >>>>> I just can't come up with a use case at all related to RDF data
> >>>>> validation where canonicalization is relevant, except for signing 
RDF
> >>>>> graphs, and that can just as easily be done at the surface syntax
> >>>>> level,
> >>>>> and signing is quite tangential to the WG's purpose, I think.
> >>>>>
> >>>>> peter
> >>>>>
> >>>>>
> >>>>> On 08/12/2014 09:17 PM, David Booth wrote:
> >>>>>> I think "canonicalization" would be a clearer term, as in:
> >>>>>>
> >>>>>>    "OPTIONAL - A Recommendation for canonical serialization
> >>>>>>     of RDF graphs and RDF datasets."
> >>>>>>
> >>>>>> The purpose of this (to me) is to be able to validate that I got 
the
> >>>>>> *exact*
> >>>>>> RDF data that I expected -- not merely the right classes and
> >>>>>> predicates and
> >>>>>> such.  Would you be okay with including this in the charter?
> >>>>>>
> >>>>>> Thanks,
> >>>>>> David
> >>>>>>
> >>>>>> On 08/12/2014 10:00 PM, Peter F. Patel-Schneider wrote:
> >>>>>>> I'm still not exactly sure just what normalization means in this
> >>>>>>> context
> >>>>>>> or what relationship it has to RDF validation.
> >>>>>>>
> >>>>>>> peter
> >>>>>>>
> >>>>>>>
> >>>>>>> On 08/12/2014 06:55 PM, David Booth wrote:
> >>>>>>>> +1 for all except one item.
> >>>>>>>>
> >>>>>>>> I'd like to make one last ditch attempt to include graph
> >>>>>>>> normalization
> >>>>>>>> as an
> >>>>>>>> OPTIONAL deliverable.  I expect the WG to treat it as low 
priority,
> >>>>>>>> and would
> >>>>>>>> only anticipate a normalization document being produced if 
someone
> >>>>>>>> takes the
> >>>>>>>> personal initiative to draft it.  I do not see any significant
> >>>>>>>> harm in
> >>>>>>>> including it in the charter on that basis, but I do see a 
benefit,
> >>>>>>>> because if
> >>>>>>>> the WG did somehow get to it then it would damn nice to have, 
so
> >>>>>>>> that
> >>>>>>>> we could
> >>>>>>>> finally validate RDF data by having a standard way to compare
> >>>>>>>> two RDF
> >>>>>>>> documents for equality, like we can routinely do with every 
other
> >>>>>>>> data
> >>>>>>>> representation.
> >>>>>>>>
> >>>>>>>> Peter, would that be okay with you, to include graph
> >>>>>>>> normalization as
> >>>>>>>> OPTIONAL
> >>>>>>>> that way?
> >>>>>>>>
> >>>>>>>> Thanks,
> >>>>>>>> David
> >>>>>>>>
> >>>>>>>> On 08/12/2014 08:55 PM, Eric Prud'hommeaux wrote:
> >>>>>>>>> Hi all, we can have a face-to-face at the W3C Technical 
Plenary in
> >>>>>>>>> November if we can quickly endorse a good-enough charter.  As 
it
> >>>>>>>>> stands now, it isn't clear that the group will be able to 
reach
> >>>>>>>>> consensus within the Working Group, let alone get through the
> >>>>>>>>> member
> >>>>>>>>> review without objection.
> >>>>>>>>>
> >>>>>>>>> Please review the proposals that I've culled from the list.  I
> >>>>>>>>> encournage compromise on all our parts and we'll have to 
suppress
> >>>>>>>>> the
> >>>>>>>>> desire to wordsmith. (Given the 3-month evaluation period,
> >>>>>>>>> wordsmithing won't change much anyways.)
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> separate semantics:
> >>>>>>>>>
> >>>>>>>>>    "Peter F. Patel-Schneider" <pfpschneider@gmail.com> -
> >>>>>>>>> Message-ID:
> >>>>>>>>> <53E2AFBD.9050102@gmail.com>
> >>>>>>>>>      A syntax and semantics for shapes specifying how to 
construct
> >>>>>>>>> shape
> >>>>>>>>> expressions and how shape expressions are evaluated against 
RDF
> >>>>>>>>> graphs.
> >>>>>>>>>    "Dam, Jesse van" <jesse.vandam@wur.nl> - Message-ID:
> >>>>>>>>> <63CF398D7F09744BA51193F17F5252AB1FD60B24@SCOMP0936.wurnet.nl>
> >>>>>>>>>      defining the the (direct) semantics meaning of shapes and
> >>>>>>>>> defining the
> >>>>>>>>> associated validation process.
> >>>>>>>>>
> >>>>>>>>>    opposition: Holger Knublauch
> >>>>>>>>>
> >>>>>>>>>    proposed resolution: include, noting that if SPARQL is 
judged
> >>>>>>>>> to be
> >>>>>>>>> useful for the semantics, there's nothing preventing us from
> >>>>>>>>> using it.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> make graph normalization optional or use-case specific:
> >>>>>>>>>
> >>>>>>>>>    "Peter F. Patel-Schneider" <pfpschneider@gmail.com> -
> >>>>>>>>> Message-ID:
> >>>>>>>>> <53E2AFBD.9050102@gmail.com>
> >>>>>>>>>      3 OPTIONAL A specification of how shape verification
> >>>>>>>>> interacts
> >>>>>>>>> with
> >>>>>>>>> inference.
> >>>>>>>>>    Jeremy J Carroll <jjc@syapse.com> - Message-Id:
> >>>>>>>>> <D954B744-05CD-4E5C-8FC2-C08A9A99BA9F@syapse.com>
> >>>>>>>>>      the WG will consider whether it is necessary, practical 
or
> >>>>>>>>> desireable
> >>>>>>>>> to normalize a graph...
> >>>>>>>>>      A graph normalization method, suitable for  the use cases
> >>>>>>>>> determined by
> >>>>>>>>> the group....
> >>>>>>>>>    David Booth <david@dbooth.org> - Message-ID:
> >>>>>>>>> <53E28D07.9000804@dbooth.org>
> >>>>>>>>>      OPTIONAL - A Recommendation for
> >>>>>>>>> normalization/canonicalization
> >>>>>>>>> of RDF
> >>>>>>>>> graphs and RDF datasets that are serialized in N-Triples and
> >>>>>>>>> N-Quads.
> >>>>>>>>> opposition - don't do it at all:
> >>>>>>>>>    "Peter F. Patel-Schneider" <pfpschneider@gmail.com> -
> >>>>>>>>> Message-ID:
> >>>>>>>>> <53E3A4CB.4040200@gmail.com>
> >>>>>>>>>      the WG should not be working on this.
> >>>>>>>>>
> >>>>>>>>>    proposed resolution: withdrawn, to go to new light-weight,
> >>>>>>>>> focused
> >>>>>>>>> WG,
> >>>>>>>>> removing this text:
> >>>>>>>>>    [[
> >>>>>>>>>    The WG MAY produce a Recommendation for graph 
normalization.
> >>>>>>>>>    ]]
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> mandatory human-facing language:
> >>>>>>>>>
> >>>>>>>>>    "Dam, Jesse van" <jesse.vandam@wur.nl> - Message-ID:
> >>>>>>>>> <63CF398D7F09744BA51193F17F5252AB1FD60B24@SCOMP0936.wurnet.nl>
> >>>>>>>>>      ShExC mandatory, but potentially as a Note.
> >>>>>>>>>    David Booth <david@dbooth.org> - Message-ID:
> >>>>>>>>> <53E28D07.9000804@dbooth.org>
> >>>>>>>>>      In Section 4 (Deliverables), change "OPTIONAL - Compact,
> >>>>>>>>> human-readable
> >>>>>>>>> syntax" to "Compact, human-readable syntax", i.e., make it
> >>>>>>>>> required.
> >>>>>>>>>    Jeremy J Carroll <jjc@syapse.com> - Message-Id:
> >>>>>>>>> <54AA894F-F4B4-4877-8806-EB85FB5A42E5@syapse.com>
> >>>>>>>>>
> >>>>>>>>>    opposition - make it OPTIONAL
> >>>>>>>>>    "Peter F. Patel-Schneider" <pfpschneider@gmail.com> -
> >>>>>>>>> Message-ID:
> >>>>>>>>> <53E2AFBD.9050102@gmail.com>
> >>>>>>>>>      OPTIONAL A compact, human-readable syntax for expressing
> >>>>>>>>> shapes.
> >>>>>>>>>
> >>>>>>>>>    proposed resolution: keep as OPTIONAL, not mentioning 
ShExC,
> >>>>>>>>> but
> >>>>>>>>> clarifying that it's different from the RDF syntax.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> report formats:
> >>>>>>>>>    Dimitris Kontokostas 
<kontokostas@informatik.uni-leipzig.de>
> >>>>>>>>>      provide flexible validation execution plans that range 
from:
> >>>>>>>>>        Success / fail
> >>>>>>>>>        Success / fail per constraint
> >>>>>>>>>        Fails with error counts
> >>>>>>>>>        Individual resources that fail per constraint
> >>>>>>>>>        And enriched failed resources with annotations
> >>>>>>>>>
> >>>>>>>>>    proposed resolution: no change, noting that no one seconded
> >>>>>>>>> this
> >>>>>>>>> proposal.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> test suite/validator:
> >>>>>>>>>
> >>>>>>>>>    Dimitris Kontokostas 
<kontokostas@informatik.uni-leipzig.de>
> >>>>>>>>>      Validation results are very important for the progress of
> >>>>>>>>> this
> >>>>>>>>> WG and
> >>>>>>>>> should be a standalone deliverable.
> >>>>>>>>>    David Booth <david@dbooth.org> - Message-ID:
> >>>>>>>>> <53E28D07.9000804@dbooth.org>
> >>>>>>>>>      Test Suite, to help ensure interoperability and correct
> >>>>>>>>> implementation.
> >>>>>>>>> The group will chose the location of this deliverable, such as
> >>>>>>>>> a git
> >>>>>>>>> repository.
> >>>>>>>>>
> >>>>>>>>>    proposed resolution: leave from charter as WGs usually
> >>>>>>>>> choose to
> >>>>>>>>> do this
> >>>>>>>>> anyways and it has no impact on IP commitments.
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>
> >>>
> >>>
> >>>
> >
> >
> >
> 

Received on Thursday, 14 August 2014 15:02:16 UTC