Re: AW: Thoughts on validation requirements from Holger Knublauch on 2014-07-30 (public-rdf-shapes@w3.org from July 2014)

From: Holger Knublauch <holger@topquadrant.com>
Date: Thu, 31 Jul 2014 08:27:21 +1000
To: public-rdf-shapes@w3.org
Message-ID: <53D97149.6000905@topquadrant.com>
I would definitely be in favor of attaching constraints to classes. This 
is IMHO most intuitive, compatible to similar approaches (e.g. OWL 
restrictions, UML/OCL, general object-orientation) and makes it easy to 
apply inheritance (constraints defined on a superclass also apply to its 
subclasses). Why reinvent the wheel with another parallel structure to 
maintain?

Holger


On 7/31/14, 12:57 AM, Eric Prud'hommeaux wrote:
> * Peter F. Patel-Schneider <pfpschneider@gmail.com> [2014-07-29 08:01-0700]
>> On 07/29/2014 03:43 AM, Eric Prud'hommeaux wrote:
>>> * Peter F. Patel-Schneider <pfpschneider@gmail.com> [2014-07-28 07:54-0700]
>>>> On 07/28/2014 02:20 AM, Eric Prud'hommeaux wrote:
>>>>> On Jul 28, 2014 12:08 AM, "Peter F. Patel-Schneider" <pfpschneider@gmail.com>
>>>>> wrote:
>> [...]
>>
>>>> An RDF document, on the other hand, almost invariably contains
>>>> multiple somethings, very often not arranged in a tree, and
>>>> sometimes even without any connection between them.  In RDF it is
>>>> generally permissable to have any sort of information, whereas XML
>>>> information is generally required to fit into what is expected.
>>> I agree, but fear this is a sort of selection bias.
>> Well obviously there is a bias towards using RDF for multiple
>> somethings, because RDF is good at that and other formats are not.
>> Because of this virtuous bias, there is the concomitant bias that
>> there is relatively less RDF that is used for single somethings.
>> There is, of course, nothing wrong with this so far.
>>
>> It may be that because RDF is good for multiple somethings, some
>> people think that it is not good for single somethings.  If so, this
>> would be somewhat unfortunate.
> Agreed, and that's probably a point that will require constant
> reminders, though the cases I'm referring to use multiple somethings,
> see "Linked Data Basic Profile 1.0 - Use Cases and Requirements".
> <http://www.w3.org/Submission/2012/SUBM-ldbpucr-20120326/#usecases>
> Below, Consider a HospitalTransferRecord from Clinic A to Clinic B.
> This would incorporate a bunch of somethings like a target problem,
> vitals, prescriptions, and a patient (well, more rigorously just a
> person temporarily acting in the role of patient).
>
>
>> However, this certainly doesn't mean that RDF validation should
>> ignore the common situation of multiple somethings, most or all with
>> explicit types.  Nor does it mean that RDF validation should be
>> targeted towards single untyped somethings.  To do either of these
>> is to ignore RDF's strengths.
> I see the multiple somethings as a strong case for detaching the shape
> (the way that a particular app is using these types) from the types
> themselves. Even if Clinic A and Clinic B are in the same clincal
> network, they'll capture different information about e.g. the
> admitting physician's credentials. In OWL, one would probably capture
> these as anonymous restrictions, e.g. ClinicB:AdminissionRecord:
>
>    Class: ClinicB:AdmissionRecord
>      SubClassOf:
>        clin:AdmissionRecord,
>        clin:admitter only
>          ((clin:credential some (clin:authority only ({"AMA" , "GMC"})))
>           and (clin:credential min 1 owl:Thing))
>
>
>> So I remain very skeptical that ShEx is a viable start towards RDF
>> validation, as it appears to me to be targeted towards an uncommon
>> use of RDF and not easily extended to nicely cover the bulk of
>> extant and proposed RDF.
>>
>>> Perhaps the
>>> majority of LDP uses include a backend which is not a triple store
>>> (possibly SQL, possibly state stored in the position of a lightswitch
>>> on a wall). In these cases, the data one posts must be limited to the
>>> exact arrangement of somethings that the server expects or data will
>>> be (silently) dropped. I suspect that the majority of the business use
>>> cases on the horizon for RDF involve services that are not willing to
>>> store arbitrary triples.
>> Even if true this is at best an argument for validation that covers
>> all (local) triples.  It still doesn't get one from multiple
>> somethings to single somethings.  I'm also still skeptical that
>> covering all (local) triples is a good idea even here, as it would
>> prohibit, for example, extra information coming from a node
>> belonging to an unexpected (or maybe even expected) subtype.
>>
>>>> Validation then should work differently in RDF than in XML.  My view
>>>> of RDF validation is determining whether the instances of a type
>>>> (not necessarily explicitly signalled by an rdf:type link) meet some
>>>> constraint, and that RDF validation generally involves multiple
>>>> types, often unrelated types.  I don't see how ShEx can do this, and
>>>> thus my questions as to how ShEx can do RDF validation.
>>> What if shapes were types? I think that would meet your definition.
>> Well, that's the method used in Stardog ICV, and in lots of work on
>> constraints over logical formalisms (including description logics).
>
> I don't see ShEx has having a problem with multiple somethings. The
> ShExC for the above ClinicB:AdmissionRecord could set licensing
> requirements on the admitting physician and coding requirements on the
> principle complaint:
>
>    ClinicB:AdmissionRecord {
>      clin:admitter {
>        clin:credential { clin:authority ("AMA" | "GMC") }+
>      }
>      clin:principleComplaint {
>        hl7:coding { hl7:CD.CodingSystem ("SNOMED" | "LOINC") }
>      }+
>    }
>
>
>> However, just making shapes be types doesn't immediately get one
>> from ShEx to something that can nicely handle multiple somethings in
>> RDF.  One also needs machinery to require that each instance of a
>> particular type must match a particular constraint type.
> Why do we neet to attach it to a type? Wouldn't that mean that every
> reusable object would have to have a bunch of types attempting to
> predict all of the ways that data might be used? For instance, would
> the admitting physician need to have type arcs asserting that he/she
> was a bethIsreal:SurgicalPhysician, bethIsreal:EDAdmittingPhysician,
> BOSchildrens:Surgeon, mgh:ThoracicSurgeon, mgh:AdmittingPhysician?
>
> I'd expect that the physician's record should only advertise the
> type arcs that are part of some shared ontology:
>    <Pat> a foaf:Person , clin:Physician .
> If the type arcs are only notionally attached to the data for the
> purposes of verification, then the argument that they need to be types
> is circular; they're only there because some verification system thinks
> in terms of types.
>
>
>>> There's some language (ShEx, Resource Shapes, Description Set Profiles
>>> or something else whose name I can't recall) to verify that a node in
>>> an instance graph matches a declared structure in a schema. Some
>>> mechanism like oslc:resourceShape associates a graph node with that
>>> structure. Does that fit your view?
>> Maybe.  I'm not sure how Resource Shapes 2.0 works, as the
>> description is very loose.  It does appear that typed shapes are
>> what is intended to be used for what I think of as the usual case of
>> RDF validation - requiring that instances of a class have a
>> particular shape.  However, some aspects of Resource Shapes 2.0
>> appear to be inimical to type hierarchies.
> It seems like predicates like oslc:resourceShape give us the duck
> typing that we need to get practical interoperability out of our
> reusable somethings.
>
>
>> peter
Received on Wednesday, 30 July 2014 22:27:54 UTC