Re: [OFFICIAL] - RE: name of the group from Karen Coyle on 2014-07-22 (public-rdf-shapes@w3.org from July 2014)

From: Karen Coyle <kcoyle@kcoyle.net>
Date: Tue, 22 Jul 2014 12:42:41 -0700
To: "public-rdf-shapes@w3.org" <public-rdf-shapes@w3.org>
Message-ID: <53CEBEB1.1060302@kcoyle.net>
Irene,

As someone who does basic SQL with a cheat sheet and has managed a few 
simple SPARQL queries, I could agree that for simple business queries 
they may well be equivalent. But we aren't talking about queries for 
simple business cases - we're talking about highly complex validation 
strategies. At the RDF validation workshop, people talked of having 
dozens or even hundreds of SPARQL queries to fully test their data. So I 
don't think the comparison works well.

I also think that we may be talking about different users. There are the 
folks who will develop major software products that will be used by 
significantly large enterprises. Then there are the individual users in 
those enterprises. These will vary from those who maintain the software, 
understand it in depth, and train and support the end users.

The users I work with are unable to purchase enterprise-level software, 
have data stores (sometimes in spreadsheets still) as small as 5-30K 
items (records), manage all of the functions of an archive with 1.5 FTE 
and some volunteers, and whose data currently consists of 10 of the 
Dublin Core 15 elements. Yet, these archives have highly valuable unique 
materials that we should all want to be able to discover on the open Web.

There are at least thousands of users of this type. These users need a 
very simple solution -- let's call it a "small language" - that they can 
use to describe their data so that they and others can understand it and 
validate it. Even if some particularly savvy members of the community 
develop open source software solutions that these folks can use, each 
library or archive or museum needs to be able to define their data usage 
in simple terms. I will go out on a limb here and say that it is NOT 
acceptable to me (and I speak for myself only) that solutions have to 
come in the form of major software packages created for an entirely 
different set of needs. This community MUST be able to develop its own 
solutions where that makes sense, and at the same time the communication 
about data profiles/shapes needs to be universally understood.

kc



On 7/22/14, 10:52 AM, Irene Polikoff wrote:
> If one was to look at the existing analogies, then XML Schema is for defining how data should look like so that it could be validated. Thus, these are really just two sides of the same coin - the first one is what you do, the second one is why you do this - what you expect the software to do with your definition. Same with SPIN, ICV and, probably, ShEX.
>
> Speaking of analogies, it may be a somewhat tangential, but still interesting to consider how the relational database community embraced SQL including (shockingly :)) having many business users learn it as opposed to saying it was too hard and something simpler (but using a different syntax, semantics, more layers, farther removed from how the data is stored, etc.) should be introduced. And to what extent this approach played a positive role in the adoption of RDBMs-based technology. Having worked for many years with both SQL and SPARQL, I see SPARQL as being no harder and even easier than SQL.
>
> Irene
>
> -----Original Message-----
> From: Karen Coyle [mailto:kcoyle@kcoyle.net]
> Sent: Tuesday, July 22, 2014 11:38 AM
> To: public-rdf-shapes@w3.org
> Subject: Re: [OFFICIAL] - RE: name of the group
>
>
>
> On 7/22/14, 7:25 AM, Kendall Clark wrote:
>> Can someone help me understand what the technical difference is
>> between these requirements about "shape" what is otherwise called
>> constraint validation? I'm genuinely confused if these aren't just synonyms.
>
> I'd say that it isn't a "technical difference" but a "human difference"
> - looking at cognitive psych work on concepts, most persons will more likely have a concept of a shape or profile or mental picture of their data than of "constraint validation". Constraint validation is the action you can develop based on shapes, but shapes are what the data means to the people who work with it.
>
> If our audience is only machines, then perhaps SPARQL + closed world OWL is sufficient (which we won't know until we have done a fairly intense set of case studies), but if your audience is the people who must think about their data and express what the boundaries and limitations they wish for their data, then you need a human-facing language that helps them both conceptualize the possibilities and to express those concepts in a way that can be translated to machine-functionality.
>
> Unless we're in the age of Skynet, humans are in the equation for providing the intellectual meaning to the data, and to the use of the data to create more knowledge.
>
> Some on this list seem to want to standardize the underlying code, assuming that there will be a nice interface imposed over it for people to use. Others want to standardize a simple language that can be converted to underlying code, that could be used directly or sit between a UI and the machine code.
>
> Personally, I have already found that without a deep understanding of the underlying code (which often is not documented), the UIs that I have encountered do not explain to me what transformations are happening between the UI and the underlying code. I cannot trust the results I get. (Is it a bug, or a feature?) The distance between the UI and the code is too far.
>
> An intermediate language, let's call it "python/ruby/perl/.../ for RDF"
> would make it possible for thousands of skilled developers to write their own code, and to have predictable results.
>
> That's my 3 cents.
>
> kc
>
>
>>
>> Cheers,
>> Kendall
>>
>>
>> On Tue, Jul 22, 2014 at 10:17 AM, Paul <paul@proxml.be
>> <mailto:paul@proxml.be>> wrote:
>>
>>      Paul, Dave,
>>
>>      That's very similar to experiences and expectations at the Flemish
>>      and Dutch government.
>>
>>
>>      Paul
>>
>>
>>
>>      On 22 Jul 2014, at 16:10, Paul Davidson
>>      <Paul.Davidson@Sedgemoor.gov.uk
>>      <mailto:Paul.Davidson@Sedgemoor.gov.uk>> wrote:
>>
>>       > Thanks Dave
>>       >
>>       > Yes - my requirement is about having some confidence about the
>>      properties, classes etc that a data producer has used, and will
>>      continue to use, and being able to encourage other data producers to
>>      adopt the same 'shape'.  As Local Authorities, there are hundreds of
>>      councils, all providing similar services, and to be able to combine
>>      data from each, we need some way of expressing a desired shape, and
>>      to discover data that is in that shape.
>>       >
>>       > Paul Davidson
>>       > Chief Information Officer
>>       > Sedgemoor District Council
>>       > UK
>>       >
>>       > -----Original Message-----
>>       > From: Dave Reynolds [mailto:dave.e.reynolds@gmail.com
>>      <mailto:dave.e.reynolds@gmail.com>]
>>       > Sent: 22 July 2014 15:05
>>       > To: public-rdf-shapes@w3.org <mailto:public-rdf-shapes@w3.org>
>>       > Subject: Re: name of the group
>>       >
>>       > On 22/07/14 13:54, Sandro Hawke wrote:
>>       >> On 07/22/2014 08:20 AM, Irene Polikoff wrote:
>>       >>> +1 for renaming the group.
>>       >>> Not only does the name pre-impose the outcome, even more
>>      importantly,
>>       >>> it introduces a brand new terminology where none is required.
>>       >>>
>>       >>>
>>       >>> There are already widely understood and used ways to talk about
>>      this
>>       >>> topic such as constraint and data validation.
>>       >>
>>       >> The workshop was called "RDF Validation Workshop" and people pushed
>>       >> back that this was about more than validation, so the name
>>      should be broader.
>>       >>
>>       >> I hear "constraints" meaning a lot of different things, even
>>      within RDF.
>>       >>
>>       >> I think consensus at the Validation Workshop was that the core
>>      notion
>>       >> was about what we usually call graph patterns, but with additional
>>       >> things like constraining the types and values of literals, and
>>      making
>>       >> these patterns recursive/reusable.    So the name "pattern" no
>>      longer
>>       >> really applied either.
>>       >>
>>       >> IBM had proposed "resource shapes", and so "shapes" ended up
>>      being the
>>       >> word that stuck, and after some recent discussion, we migrated to
>>       >> "data shapes" for the broader context, to help avoid confusion for
>>       >> people who think it might be about visual or physical stuff.
>>       >>
>>       >> There's nothing about that name that pre-supposes the technology.
>>       >> SPARQL, SPIN, OWL, ICV, ...  are perfectly reasonable
>>      technologies for
>>       >> declaring data shapes, give or take some tweaks that have been
>>      mentioned.
>>       >
>>       > +1
>>       >
>>       > The requirement I've personally heard most strongly expressed by
>>      those I've worked with in UK Gov circles is that given by Paul
>>      Davidson in his presentation at the workshop.
>>       >
>>       > He called for some simple, easy to understand and deploy means to
>>      declare and discover the "shape" (for what of a better term) of data.
>>       >
>>       > For a data producer to be able say "our data stitches together
>>      some bits of foaf, org, dct, skos etc *this* way, so here's what you
>>      should expect to see in our data (though there might be other
>>      properties we haven't mentioned)".
>>       >
>>       > For a data consumer to say "we'd like your data to include at
>>      least these types and properties or we won't know what to do with
>>      it, if you are going to express concept X then please use property p
>>      for it (though p is optional), you may also use other properties we
>>      don't know about but that's fine."
>>       >
>>       > Formally checking that data matches this "shape" is a useful but
>>      not primary requirement for those users. They are not looking for
>>      really complex data validation, data quality is typically validated
>>      elsewhere in the chain by rather powerful existing data tools.
>>       >
>>       > We have tried wteo "actually you can say (most) of that in OWL
>>      but you have to apply the semantics a little differently and find
>>      some way to associate the OWL 'constraints' with your data". That
>>      didn't fly for these particular users - they find the specifications
>>      and narrative around OWL too complex and alien to meet the "simple
>>      to understand" and "simple to deploy" requirement. Though personally
>>      it largely works for me.
>>       >
>>       > Similarly "why not just express it in SPARQL" didn't fly, fine
>>      for implementation under the hood but not as a way to comprehend
>>      what the shape specification is saying (whether by human or machine).
>>       >
>>       > Probably the IBM resource shapes proposal is the closest in
>>      spirit to this requirement so the name "RDF Data Shapes" seems like
>>      a pretty accurate name to me.
>>       >
>>       > An alternative would be profile. That's the term we used in the
>>      GLD vocabulary Recommendations and it does seem to be closely
>>      related to the Dublin Core notion of application profiles.
>>       >
>>       > [Note: This is my interpretation of what people like Paul were
>>      saying but I don't formally represent him or any other W3C member so
>>      any misunderstanding is mine. The chances of my being able join the
>>      WG, if it actually got off the ground, are very low so I'll mostly
>>      try to keep out of the discussion.]
>>       >
>>       > Dave
>>       >
>>       >
>>       >
>>       >
>>      ________________________________________________________________________
>>       > This e-mail has been scanned for all viruses by Claranet. The
>>      service is powered by MessageLabs. For more information on a
>>      proactive anti-virus service working around the clock, around the
>>      globe, visit:
>>       > http://www.claranet.co.uk
>>       >
>>      ________________________________________________________________________
>>       >
>>       > ________________________________
>>       >
>>       > Follow Sedgemoor District Council on Twitter:
>>       > http://twitter.com/#!/SedgemoorDC
>>       >
>>       >
>>       > Disclaimer:
>>       >
>>       > Views:
>>       > The views expressed in this electronic communication are those of
>>      the writer and not, unless otherwise stated, the views of Sedgemoor
>>      District Council.
>>       >
>>       > Confidentiality:
>>       > The addressee(s) of this electronic communication shall treat its
>>      content in confidence and take all reasonable steps to ensure it is
>>      not accessed or made available to any third party. Sedgemoor
>>      District Council will not be liable for any unauthorised access to
>>      the contents during transit or whilst stored on electronic media
>>      outside of its direct control.
>>       >
>>       > Viruses:
>>       > Sedgemoor District Council take all reasonable steps to ensure
>>      that this communication and any attachments are virus free, however,
>>      the Council cannot accept liability in respect of any complaint
>>      arising as a result of this message or its attachments.
>>       >
>>       >
>>       >
>>
>>
>>      Kind Regards,
>>      Paul Hermans
>>
>>      -------------------------
>>      ProXML bvba
>>      Linked Data services
>>      (w) www.proxml.be <http://www.proxml.be>
>>      (e) paul@proxml.be <mailto:paul@proxml.be>
>>      (tw)  @PaulZH
>>      (t) +32 15 23 00 76 <tel:%2B32%2015%2023%2000%2076>
>>      (m) +32 473 66 03 20 <tel:%2B32%20473%2066%2003%2020>
>>
>>
>>
>>
>>
>>
>>
>
> --
> Karen Coyle
> kcoyle@kcoyle.net http://kcoyle.net
> m: 1-510-435-8234
> skype: kcoylenet
>
>
>

-- 
Karen Coyle
kcoyle@kcoyle.net http://kcoyle.net
m: 1-510-435-8234
skype: kcoylenet
Received on Tuesday, 22 July 2014 19:43:14 UTC