Re: AW: Thoughts on validation requirements from Arthur Ryman on 2014-07-31 (public-rdf-shapes@w3.org from July 2014)

From: Arthur Ryman <ryman@ca.ibm.com>
Date: Thu, 31 Jul 2014 15:26:50 -0400
To: Holger Knublauch <holger@topquadrant.com>
Cc: public-rdf-shapes@w3.org
Message-ID: <OFE012D19E.C18280FA-ON85257D26.006A3DAA-85257D26.006AD4E5@ca.ibm.com>
Holger,

 An RDF document is a set of triples (statements). The triples do not 
necessarily have to look like they came from an object (in the OO sense), 
i.e. that there is one distinguished root node (this, self) that has some 
type (class) attribute, and other triples (properties) that have the root 
node as their subject, and so on recursively. I

f you think of an RDF document as a graph, then there should be no special 
significance to a node that happens to have a type attribute. We should be 
able to state constraints on the graph in the absence of type triples. 
This is very consistent with SPARQL. SPARQL includes a graph pattern 
matching language. SPARQL queries are not associated with any specific 
type, so why should constraints be?

However, even if the RDF document was the representation of some OO-like 
object that had a root node and some type (class) attribute, its shape 
could vary depending on the context. For example, when you create a 
resource using POST, certain triples will be absent, e.g. creation date, 
and if present they may be ignored or the server may fail the request. 
When you GET that resource you will normally see other triples that the 
server added, e.g. creation date, and these will occur exactly once. 

OSLC lets you associate shapes with operations. Resource instances may 
link to shape resources that describe the instance. There is no need to 
base the association between resources and constraints using an RDF type. 
A shape could apply to several types or to no types.

Regards, 
___________________________________________________________________________
Arthur Ryman, PhD

Chief Data Officer, Rational
Chief Architect, Portfolio & Strategy Management
Distinguished Engineer | Master Inventor | Academy of Technology

Toronto Lab | +1-905-413-3077 (office) | +1-416-939-5063 (mobile)





From:   Holger Knublauch <holger@topquadrant.com>
To:     public-rdf-shapes@w3.org, 
Date:   07/30/2014 06:28 PM
Subject:        Re: AW: Thoughts on validation requirements



I would definitely be in favor of attaching constraints to classes. This 
is IMHO most intuitive, compatible to similar approaches (e.g. OWL 
restrictions, UML/OCL, general object-orientation) and makes it easy to 
apply inheritance (constraints defined on a superclass also apply to its 
subclasses). Why reinvent the wheel with another parallel structure to 
maintain?

Holger


On 7/31/14, 12:57 AM, Eric Prud'hommeaux wrote:
> * Peter F. Patel-Schneider <pfpschneider@gmail.com> [2014-07-29 
08:01-0700]
>> On 07/29/2014 03:43 AM, Eric Prud'hommeaux wrote:
>>> * Peter F. Patel-Schneider <pfpschneider@gmail.com> [2014-07-28 
07:54-0700]
>>>> On 07/28/2014 02:20 AM, Eric Prud'hommeaux wrote:
>>>>> On Jul 28, 2014 12:08 AM, "Peter F. Patel-Schneider" 
<pfpschneider@gmail.com>
>>>>> wrote:
>> [...]
>>
>>>> An RDF document, on the other hand, almost invariably contains
>>>> multiple somethings, very often not arranged in a tree, and
>>>> sometimes even without any connection between them.  In RDF it is
>>>> generally permissable to have any sort of information, whereas XML
>>>> information is generally required to fit into what is expected.
>>> I agree, but fear this is a sort of selection bias.
>> Well obviously there is a bias towards using RDF for multiple
>> somethings, because RDF is good at that and other formats are not.
>> Because of this virtuous bias, there is the concomitant bias that
>> there is relatively less RDF that is used for single somethings.
>> There is, of course, nothing wrong with this so far.
>>
>> It may be that because RDF is good for multiple somethings, some
>> people think that it is not good for single somethings.  If so, this
>> would be somewhat unfortunate.
> Agreed, and that's probably a point that will require constant
> reminders, though the cases I'm referring to use multiple somethings,
> see "Linked Data Basic Profile 1.0 - Use Cases and Requirements".
> <http://www.w3.org/Submission/2012/SUBM-ldbpucr-20120326/#usecases>
> Below, Consider a HospitalTransferRecord from Clinic A to Clinic B.
> This would incorporate a bunch of somethings like a target problem,
> vitals, prescriptions, and a patient (well, more rigorously just a
> person temporarily acting in the role of patient).
>
>
>> However, this certainly doesn't mean that RDF validation should
>> ignore the common situation of multiple somethings, most or all with
>> explicit types.  Nor does it mean that RDF validation should be
>> targeted towards single untyped somethings.  To do either of these
>> is to ignore RDF's strengths.
> I see the multiple somethings as a strong case for detaching the shape
> (the way that a particular app is using these types) from the types
> themselves. Even if Clinic A and Clinic B are in the same clincal
> network, they'll capture different information about e.g. the
> admitting physician's credentials. In OWL, one would probably capture
> these as anonymous restrictions, e.g. ClinicB:AdminissionRecord:
>
>    Class: ClinicB:AdmissionRecord
>      SubClassOf:
>        clin:AdmissionRecord,
>        clin:admitter only
>          ((clin:credential some (clin:authority only ({"AMA" , "GMC"})))
>           and (clin:credential min 1 owl:Thing))
>
>
>> So I remain very skeptical that ShEx is a viable start towards RDF
>> validation, as it appears to me to be targeted towards an uncommon
>> use of RDF and not easily extended to nicely cover the bulk of
>> extant and proposed RDF.
>>
>>> Perhaps the
>>> majority of LDP uses include a backend which is not a triple store
>>> (possibly SQL, possibly state stored in the position of a lightswitch
>>> on a wall). In these cases, the data one posts must be limited to the
>>> exact arrangement of somethings that the server expects or data will
>>> be (silently) dropped. I suspect that the majority of the business use
>>> cases on the horizon for RDF involve services that are not willing to
>>> store arbitrary triples.
>> Even if true this is at best an argument for validation that covers
>> all (local) triples.  It still doesn't get one from multiple
>> somethings to single somethings.  I'm also still skeptical that
>> covering all (local) triples is a good idea even here, as it would
>> prohibit, for example, extra information coming from a node
>> belonging to an unexpected (or maybe even expected) subtype.
>>
>>>> Validation then should work differently in RDF than in XML.  My view
>>>> of RDF validation is determining whether the instances of a type
>>>> (not necessarily explicitly signalled by an rdf:type link) meet some
>>>> constraint, and that RDF validation generally involves multiple
>>>> types, often unrelated types.  I don't see how ShEx can do this, and
>>>> thus my questions as to how ShEx can do RDF validation.
>>> What if shapes were types? I think that would meet your definition.
>> Well, that's the method used in Stardog ICV, and in lots of work on
>> constraints over logical formalisms (including description logics).
>
> I don't see ShEx has having a problem with multiple somethings. The
> ShExC for the above ClinicB:AdmissionRecord could set licensing
> requirements on the admitting physician and coding requirements on the
> principle complaint:
>
>    ClinicB:AdmissionRecord {
>      clin:admitter {
>        clin:credential { clin:authority ("AMA" | "GMC") }+
>      }
>      clin:principleComplaint {
>        hl7:coding { hl7:CD.CodingSystem ("SNOMED" | "LOINC") }
>      }+
>    }
>
>
>> However, just making shapes be types doesn't immediately get one
>> from ShEx to something that can nicely handle multiple somethings in
>> RDF.  One also needs machinery to require that each instance of a
>> particular type must match a particular constraint type.
> Why do we neet to attach it to a type? Wouldn't that mean that every
> reusable object would have to have a bunch of types attempting to
> predict all of the ways that data might be used? For instance, would
> the admitting physician need to have type arcs asserting that he/she
> was a bethIsreal:SurgicalPhysician, bethIsreal:EDAdmittingPhysician,
> BOSchildrens:Surgeon, mgh:ThoracicSurgeon, mgh:AdmittingPhysician?
>
> I'd expect that the physician's record should only advertise the
> type arcs that are part of some shared ontology:
>    <Pat> a foaf:Person , clin:Physician .
> If the type arcs are only notionally attached to the data for the
> purposes of verification, then the argument that they need to be types
> is circular; they're only there because some verification system thinks
> in terms of types.
>
>
>>> There's some language (ShEx, Resource Shapes, Description Set Profiles
>>> or something else whose name I can't recall) to verify that a node in
>>> an instance graph matches a declared structure in a schema. Some
>>> mechanism like oslc:resourceShape associates a graph node with that
>>> structure. Does that fit your view?
>> Maybe.  I'm not sure how Resource Shapes 2.0 works, as the
>> description is very loose.  It does appear that typed shapes are
>> what is intended to be used for what I think of as the usual case of
>> RDF validation - requiring that instances of a class have a
>> particular shape.  However, some aspects of Resource Shapes 2.0
>> appear to be inimical to type hierarchies.
> It seems like predicates like oslc:resourceShape give us the duck
> typing that we need to get practical interoperability out of our
> reusable somethings.
>
>
>> peter
Received on Thursday, 31 July 2014 19:27:23 UTC