W3C home > Mailing lists > Public > public-rdf-shapes@w3.org > July 2014

Re: [OFFICIAL] - RE: name of the group

From: Dave Reynolds <dave.e.reynolds@gmail.com>
Date: Tue, 22 Jul 2014 22:34:29 +0100
Message-ID: <53CED8E5.8000309@gmail.com>
To: Kendall Clark <kendall@clarkparsia.com>, Paul <paul@proxml.be>
CC: Paul Davidson <Paul.Davidson@sedgemoor.gov.uk>, "public-rdf-shapes@w3.org" <public-rdf-shapes@w3.org>
As Karen says, part of it is intended usage:

For constraint validation you are interested in executability and look 
for a language with high expressivity.  If the constraint is hard to 
understand that doesn't matter.

For shapes/profiles/whatever you are aiming for ease of comprehension. 
Both for humans and for simple software to e.g. create a UI for data 
input or presentation. Limited expressivity is acceptable.

I accept this is a soft, subjective distinction.

At a marginally more technical level then there are things you want to 
express in describing data that aren't formal constraints. An obvious 
example is optionals in an open world. For example to say "you don't 
have to give a description for these resources but if you do give one or 
more then please use dct:description (rather than e.g. rdfs:comment, 
skos:note, myontology:summary ...)". I've seen people try to express 
this in OWL as

     dct:description min 0

 From a formal point of view this is vacuous, especially in an open 
world where we do allow other properties to appear in the data as well 
as those in the shape/profile/whatever. There is no constraint to check, 
if we have no dct:descriptions or 10 of them that's equally fine.

However, from the point of view of guiding users and UI generators you 
can adopt the heurstic that if it is mentioned in an axiom like this 
then it must be somehow a preferred property and so include it in forms 
and presentation templates. Several systems, including commercial ones 
use this sort of heuristic but there's no constraint validation here.

In the semi-structured database world I seem to recall a notion of a 
"data guide" as distinct from a schema, which is an extreme version of 
this. A Google search turned up several papers by Serge Abiteboul 
mentioning data guides, such as [1], but none of them quite match my 
hazy memory.

Dave

[1] http://cs.brown.edu/courses/cs295-11/2006/semistructured.pdf

On 22/07/14 15:25, Kendall Clark wrote:
> Can someone help me understand what the technical difference is 
> between these requirements about "shape" what is otherwise called 
> constraint validation? I'm genuinely confused if these aren't just 
> synonyms.
>
> Cheers,
> Kendall
>
>
> On Tue, Jul 22, 2014 at 10:17 AM, Paul <paul@proxml.be 
> <mailto:paul@proxml.be>> wrote:
>
>     Paul, Dave,
>
>     That's very similar to experiences and expectations at the Flemish
>     and Dutch government.
>
>
>     Paul
>
>
>
>     On 22 Jul 2014, at 16:10, Paul Davidson
>     <Paul.Davidson@Sedgemoor.gov.uk
>     <mailto:Paul.Davidson@Sedgemoor.gov.uk>> wrote:
>
>     > Thanks Dave
>     >
>     > Yes - my requirement is about having some confidence about the
>     properties, classes etc that a data producer has used, and will
>     continue to use, and being able to encourage other data producers
>     to adopt the same 'shape'.  As Local Authorities, there are
>     hundreds of councils, all providing similar services, and to be
>     able to combine data from each, we need some way of expressing a
>     desired shape, and to discover data that is in that shape.
>     >
>     > Paul Davidson
>     > Chief Information Officer
>     > Sedgemoor District Council
>     > UK
>     >
>     > -----Original Message-----
>     > From: Dave Reynolds [mailto:dave.e.reynolds@gmail.com
>     <mailto:dave.e.reynolds@gmail.com>]
>     > Sent: 22 July 2014 15:05
>     > To: public-rdf-shapes@w3.org <mailto:public-rdf-shapes@w3.org>
>     > Subject: Re: name of the group
>     >
>     > On 22/07/14 13:54, Sandro Hawke wrote:
>     >> On 07/22/2014 08:20 AM, Irene Polikoff wrote:
>     >>> +1 for renaming the group.
>     >>> Not only does the name pre-impose the outcome, even more
>     importantly,
>     >>> it introduces a brand new terminology where none is required.
>     >>>
>     >>>
>     >>> There are already widely understood and used ways to talk
>     about this
>     >>> topic such as constraint and data validation.
>     >>
>     >> The workshop was called "RDF Validation Workshop" and people pushed
>     >> back that this was about more than validation, so the name
>     should be broader.
>     >>
>     >> I hear "constraints" meaning a lot of different things, even
>     within RDF.
>     >>
>     >> I think consensus at the Validation Workshop was that the core
>     notion
>     >> was about what we usually call graph patterns, but with additional
>     >> things like constraining the types and values of literals, and
>     making
>     >> these patterns recursive/reusable.    So the name "pattern" no
>     longer
>     >> really applied either.
>     >>
>     >> IBM had proposed "resource shapes", and so "shapes" ended up
>     being the
>     >> word that stuck, and after some recent discussion, we migrated to
>     >> "data shapes" for the broader context, to help avoid confusion for
>     >> people who think it might be about visual or physical stuff.
>     >>
>     >> There's nothing about that name that pre-supposes the technology.
>     >> SPARQL, SPIN, OWL, ICV, ...  are perfectly reasonable
>     technologies for
>     >> declaring data shapes, give or take some tweaks that have been
>     mentioned.
>     >
>     > +1
>     >
>     > The requirement I've personally heard most strongly expressed by
>     those I've worked with in UK Gov circles is that given by Paul
>     Davidson in his presentation at the workshop.
>     >
>     > He called for some simple, easy to understand and deploy means
>     to declare and discover the "shape" (for what of a better term) of
>     data.
>     >
>     > For a data producer to be able say "our data stitches together
>     some bits of foaf, org, dct, skos etc *this* way, so here's what
>     you should expect to see in our data (though there might be other
>     properties we haven't mentioned)".
>     >
>     > For a data consumer to say "we'd like your data to include at
>     least these types and properties or we won't know what to do with
>     it, if you are going to express concept X then please use property
>     p for it (though p is optional), you may also use other properties
>     we don't know about but that's fine."
>     >
>     > Formally checking that data matches this "shape" is a useful but
>     not primary requirement for those users. They are not looking for
>     really complex data validation, data quality is typically
>     validated elsewhere in the chain by rather powerful existing data
>     tools.
>     >
>     > We have tried wteo "actually you can say (most) of that in OWL
>     but you have to apply the semantics a little differently and find
>     some way to associate the OWL 'constraints' with your data". That
>     didn't fly for these particular users - they find the
>     specifications and narrative around OWL too complex and alien to
>     meet the "simple to understand" and "simple to deploy"
>     requirement. Though personally it largely works for me.
>     >
>     > Similarly "why not just express it in SPARQL" didn't fly, fine
>     for implementation under the hood but not as a way to comprehend
>     what the shape specification is saying (whether by human or machine).
>     >
>     > Probably the IBM resource shapes proposal is the closest in
>     spirit to this requirement so the name "RDF Data Shapes" seems
>     like a pretty accurate name to me.
>     >
>     > An alternative would be profile. That's the term we used in the
>     GLD vocabulary Recommendations and it does seem to be closely
>     related to the Dublin Core notion of application profiles.
>     >
>     > [Note: This is my interpretation of what people like Paul were
>     saying but I don't formally represent him or any other W3C member
>     so any misunderstanding is mine. The chances of my being able join
>     the WG, if it actually got off the ground, are very low so I'll
>     mostly try to keep out of the discussion.]
>     >
>     > Dave
>     >
>     >
>     >
>     >
>     ________________________________________________________________________
>     > This e-mail has been scanned for all viruses by Claranet. The
>     service is powered by MessageLabs. For more information on a
>     proactive anti-virus service working around the clock, around the
>     globe, visit:
>     > http://www.claranet.co.uk
>     >
>     ________________________________________________________________________
>     >
>     > ________________________________
>     >
>     > Follow Sedgemoor District Council on Twitter:
>     > http://twitter.com/#!/SedgemoorDC
>     <http://twitter.com/#%21/SedgemoorDC>
>     >
>     >
>     > Disclaimer:
>     >
>     > Views:
>     > The views expressed in this electronic communication are those
>     of the writer and not, unless otherwise stated, the views of
>     Sedgemoor District Council.
>     >
>     > Confidentiality:
>     > The addressee(s) of this electronic communication shall treat
>     its content in confidence and take all reasonable steps to ensure
>     it is not accessed or made available to any third party. Sedgemoor
>     District Council will not be liable for any unauthorised access to
>     the contents during transit or whilst stored on electronic media
>     outside of its direct control.
>     >
>     > Viruses:
>     > Sedgemoor District Council take all reasonable steps to ensure
>     that this communication and any attachments are virus free,
>     however, the Council cannot accept liability in respect of any
>     complaint arising as a result of this message or its attachments.
>     >
>     >
>     >
>
>
>     Kind Regards,
>     Paul Hermans
>
>     -------------------------
>     ProXML bvba
>     Linked Data services
>     (w) www.proxml.be <http://www.proxml.be>
>     (e) paul@proxml.be <mailto:paul@proxml.be>
>     (tw)  @PaulZH
>     (t) +32 15 23 00 76 <tel:%2B32%2015%2023%2000%2076>
>     (m) +32 473 66 03 20 <tel:%2B32%20473%2066%2003%2020>
>
>
>
>
>
>
>
Received on Tuesday, 22 July 2014 21:35:03 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 17:02:39 UTC