Re: shapes as classes from Johnston, Patrick - Hoboken on 2014-12-28 (public-data-shapes-wg@w3.org from December 2014)

From: Johnston, Patrick - Hoboken <pjohnston@wiley.com>
Date: Sun, 28 Dec 2014 18:25:57 -0500
To: Eric Prud'hommeaux <eric@w3.org>
CC: "kcoyle@kcoyle.net" <kcoyle@kcoyle.net>, "public-data-shapes-wg@w3.org" <public-data-shapes-wg@w3.org>, Arthur Ryman <ryman@ca.ibm.com>
Message-ID: <D0C5EBE5.D639%pjohnston@wiley.com>


On 12/28/14, 5:16 PM, "Eric Prud'hommeaux" <eric@w3.org> wrote:

>* Johnston, Patrick - Hoboken <pjohnston@wiley.com> [2014-12-28
>14:26-0500]
>> I am still struggling to understand the fuss about disconnected graphs.
>>If
>> you are prepared to impose constraints then surely there is an implicit
>> connection, so they aren¹t _really_ disconnected. The connection may not
>> be realized through a class construct, but it should be realizable
>>through
>> a shape, otherwise there is nothing to constrain (I think I am agreeing
>> with Holger here). The shape needs a scope on which it can act: that may
>> manifest as membership of a class (rdf:type); as some other explicit
>> linkage (foo:isConstrainedBy) we may define through this group; as a
>> SPARQL query on a graph store; or through something as tenuous as the
>>web
>> address through which the graph is rendered. The latter is I assume what
>> S35 is trying to highlight? I don¹t understand why Peter says this would
>> be out of scope: RDF evolved from the notion of being able to connect
>>web
>> resources*, I would be disappointed if what we cook up here didn¹t cover
>> that as a scenario.
>>
>> S35
>> ===
>>
>> In S35, you use the JSON-LD @graph construct
>> (http://json-ld.org/spec/ED/json-ld-syntax/20120522/#named-graphs) to
>> create a list of resources under a named graph. Presumably, the
>>underlying
>> story is that the developer wanted to provide a means to filtering
>> resources available through a given endpoint by their project grouping.
>> So, I would browse to <https://a.example.com/acclist>, get back the
>>graph
>> which contained the list of projects, as shown in the example, and I
>>could
>> then go to <https://a.example.com/acclist#alpha> and get those
>>resources?
>> Nigglingly, this doesn¹t follow convention
>> (http://www.w3.org/wiki/HashVsSlash). Or is the intention that I would
>>get
>> a fuller graph with all possible resources at
>> <https://a.example.com/acclist>? There is a wee bit of context missing
>> from this to make it into a functioning user story.
>>
>> So, really, Arthur, it looks like the underlying requirement is that you
>> want shapes to apply to JSON-LD named graphs (and lists). This I am
>> personally fine with, I have never been a fan of reverting to linked
>>lists
>> in RDF. However, the resources in the graph are only Œdisconnected¹ in
>>the
>> sense that the straight RDF representation loses the notion of the graph
>> (I took the example and ran it through http://json-ld.org/playground/):
>>
>>      <https://a.example.com/acclist#alpha>
>> <http://purl.org/dc/terms/description> "Resources for Alpha project" .
>>      <https://a.example.com/acclist#alpha> <http://purl.org/dc/terms/title>
>> "Alpha" .
>>      <https://a.example.com/acclist#alpha>
>> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>> <http://open-services.net/ns/core/acc#AccessContext> .
>>      <https://a.example.com/acclist#beta>
>> <http://purl.org/dc/terms/description> "Resources for Beta project" .
>>      <https://a.example.com/acclist#beta> <http://purl.org/dc/terms/title>
>> "Beta" .
>>      <https://a.example.com/acclist#beta>
>> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>> <http://open-services.net/ns/core/acc#AccessContext> .
>>      <https://a.example.com/acclist>
>> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>> <http://open-services.net/ns/core/acc#AccessContextList> .
>>
>> What isn¹t clear is an example of the kind of shape you want to apply to
>> this graph. All of these resources have an RDF type, so if I wanted to
>> constrain by acc:AccessContext I could, and wouldn¹t need to even think
>>of
>> the @graph construct. Would the intention be to constrain every resource
>> to be tagged with one or more of the members of acclist, say through a
>> specific predicate? What kind of shape relies specifically on the @graph
>> construct?
>
>This requirement on its own doesn't appear to require named graphs,
>just the ability to match disconnected stuff in a single graph. At
>least that's how I understood it in <http://w3.org/brief/NDI4>.

Right, but what would make it valid is if it did require such a thing.
I was trying to tease out a scenario where it would apply below - if you
take a look at the JSON-LD example in http://schema.org/Article, it does
use a named graph, though I don’t think that should be required.
@graph and @list are higher order animals than plain RDF, and as you can
see above n-triples serialization is semantically lossy compared with
JSON-LD. For this reason, I think it is worth pursuing.

>
>
>> While by no means part of the Œnew wave¹ of programmers, I can see that
>>a
>> more pragmatic approach to this sort of thing would make this far more
>> accessible in practical application.
>>
>> Another example
>> ===============
>>
>> Let¹s say I publish news articles as HTML web pages on my site, and I
>>mark
>> each page up with the semantic tagging of my choice (RDFa or JSON-LD). I
>> want to say that every news article has at least one author and a single
>> copyright notice, for example. I could say that every article page is a
>> manifestation of a specific class identified by a page URI on my site,
>>and
>> that leads me to the class-based approach to shaping. I could say that I
>> really don¹t care - I am just working to a deadline and all that
>>interests
>> me is that I get the SEO tagging right and my page shows up pretty in
>> search results. What I want to do is ensure that every news article I
>> publish can be validated, and this is where I want to be able to invoke
>> the shape concept.
>>
>> Really, I am imposing additional constraints based on
>> http://schema.org/Article, even though I am not saying it explicitly
>> (schema.org is scared of OWL).
>>
>> So all I do is state that every page on my site (a set of graphs?)
>>adheres
>> to the constraint that says every page must have at least one author and
>> one, and only one, copyright notice, and I call that a shape. In JSON-LD
>> terms each page is not necessarily a named graph, it is just a set of
>> triples returned when I parse the page. I could represent it as a named
>> graph if I set the @id property to the URI of the page of the article.
>> However, I don¹t see that I should _need_ to declare this explicitly -
>> here I would merely want to say that all articles on my site (pages
>>tagged
>> appropriately - see the examples at http://schema.org/Article) are in
>> scope for my shape.
>>
>> Large scale stores (S34)
>> ========================
>>
>> I understand that at large scale you might want to optimize the shapes
>>to
>> make them computationally efficient
>>
>>(https://www.w3.org/2014/data-shapes/wiki/User_Stories#S34:_Large-scale_d

>>at
>> aset_validation), so I can see that the scope of a shape could be
>>defined
>> in terms of a SPARQL query, or some other similar construct. However, I
>> see this as leading to a different set of requirements than those
>>implied
>> by S35, so I am not sure why Dimitris has commented that they are
>> connected.
>>
>> The soapbox bit
>> ===============
>>
>> I am not sufficiently well-versed in the theory to understand why a
>>shape
>> cannot always be a class in the pure RDF schema sense. I just want a
>>Thing
>
>One issue is that some data doesn't have type arcs and pretty much no
>data has type arcs that fully enumarate all of the shapes that it
>might fit. For example, your system might use foaf:Person in a couple
>ways with different constraints on what properties must or may
>appear. You can attach one schema to foaf:Person for one use but as
>soon as you have two, you have mutually inconsistency and effectively
>a truth maintenance system.

OK, I can see that decoupling one from the other would have its uses, but
I think of this as an edge case rather than the norm.
I would prefer to see a sensible defaulting mechanism that would allow a
programmer to assume a 1:1 relationship between class and constraint
unless explicitly stated otherwise.

>
>
>> that can have a defined scope (as described above) and one or more
>> constraints it imposes on the resources that fall within that scope. If
>> that¹s not always a class, fine. When classes are shapes, though, I
>>would
>> like them to be able to be interoperable with OWL constructs. This is
>> simply because when I define an ontology for a graph I fully control, I
>> actually mean the OWL to manifest as closed world assertions on that
>> graph. This is the Stardog ICV approach - call it the 'closed world
>> approximation¹ on the open model.
>>
>> I also expect shapes to be able to coexist with existing validation
>> mechanisms such as schematron for XML, for example - this is actually
>> pretty important as we evolve EPUB in the publishing space (I work for
>> John Wiley) to incorporate semantic tagging, and as the web itself
>>absorbs
>> the notion of packages which are more than simply web pages
>> (http://w3ctag.github.io/packaging-on-the-web/). In other words, RDF
>> doesn¹t exist in isolation, and nor should shapes. Even in the world of
>> graph stores, hybrid approaches are starting to take hold, and I would
>> expect something post-SPARQL in our future.
>
>I'd be interested in banging out a use case with you. Do you have some
>leads?

In terms of which? Happy to have a go, I am still finding my sea legs with
the W3C and this group, which is why I’ve been a bit quiet.

In terms of the closed world approximation, I have butted against this a
number of times in my frustrations with OWL, but isn’t it one of the main
requirements already?
My take on this is we build content systems to curate data. Because there
is no standardized means to maintain integrity, graph stores are
necessarily deployed as consumers of pre-curated data, so they can at best
aspire to become application databases in their current incarnations. If
they want to join their more mature cousins, they need to provide tools to
maintain integrity.

In terms of XML + RDFa + EPUB (and what we do about validating HTML
production in general) - actively pursuing this at the moment. It’s not
fully formed, but it could turn into a use case. It’s one of the main
reasons I was interested in joining this group.
I am not sure it necessarily leads to something different from what has
been submitted already, but it would be interesting to test against and
ensure we are taking into account RDF context in hybrid documents.
For example, if I were to restrict to the XML serialization of HTML5, i
could feasibly use schematron in the RDFa example at
http://schema.org/Article to require at least one <span
property=“author”/> in an HTML container with @typeof=“ScholarlyArticle”.
In effect, I have defined a sort of shape, albeit with a vocabulary that
can only be used in an XML/RDFa context. I don’t know if we can generalize
this sort of thing into the shape specification, but the approaches should
be compatible. JSON-LD is more straightforwardly “shapeable", in that it
is isolated from the remainder of the markup - there I would expect shapes
to be compatible with JSON-schema, but I am much less familiar with that
space.


>
>
>> chz
>> Patrick Johnston (magyarblip)
>>
>> *Please don¹t hit me over the head about using the word Œresource¹.
>>
>>
>>
>>
>> On 12/23/14, 2:06 PM, "Karen Coyle" <kcoyle@kcoyle.net> wrote:
>>
>> >Thanks, Eric. The visualization really helps. I can now see that what
>> >holds these two together is in the "proxy" statements, and that I
>>wasn't
>> >noticing the subtle differences in the URIs. (Also, I do wish that the
>> >ORE proxy were a bit more amply defined. [1]) I'm not sure what makes a
>> >package a package in Arthur's case. Arthur?
>> >
>> >kc
>> >[1] http://www.openarchives.org/ore/1.0/datamodel.html#Proxies

>> >
>> >On 12/23/14 9:35 AM, Eric Prud'hommeaux wrote:
>> >> * Karen Coyle <kcoyle@kcoyle.net> [2014-12-20 08:22-0800]
>> >>>
>> >>>
>> >>> On 12/19/14 8:11 PM, Peter F. Patel-Schneider wrote:
>> >>>> The narrative for S35 says "There is no path from the
>> >>>> acc:AccessContextList node to either of the acc:AccessContext
>>nodes.
>> >>>> There is an implicit containment relation of acc:AccessContext
>>nodes
>> >>>>in
>> >>>> the acc:AccessContextList by virtue of these nodes being in the
>>same
>> >>>> information resource."  This implicit connection is not part of
>>RDF.
>> >>>
>> >>> An example would really help here. I have what may be a similar
>> >>> example from the Europeana data. I'm not sure if this mailing list
>> >>> takes attachments, so the (short) example is here:
>> >>>
>> >>> http://kcoyle.net/temp/edmtest.ttl

>> >>>
>> >>> I cut the data down from something with dozens of related files and
>> >>> subject headings, but I think I kept the structure intact. The main
>> >>> nodes of the model are edm:ProvidedCHO and ore:Aggregation. The data
>> >>> is natively in RDF/XML but I have trouble reading that so I
>> >>> converted it to TTL.
>> >>>
>> >>> Q: Is this an example of what is being discussed here?
>> >>
>> >> Running this through dot (attached), it seems like this includes a
>> >> couple bibliographic resources (uh oh, "resources"!) which proxy for
>>a
>> >> third. This seems to be a well-connected graph. Arthur's example is
>>of
>> >> data which has no connections apart from some implied by being in the
>> >> same package.
>> >>
>> >> <X> a <Foo> .
>> >> <Y> a <Foo> .
>> >> <Z> a <FooList> .
>> >>
>> >> The presence of something of type FooList appears to trigger some
>> >> special processing which kicks off a search for <Foo>s (and possibly
>> >> whines if there aren't any). Arthur, is that right?
>> >>
>> >> I'm not confident this is a good idea, but to try it out, I mocked up
>> >> a notion of a conomitant shape:
>> >>
>> >> [[
>> >>    start= {
>> >>      a (oslc:AccessContextList),
>> >>      CONCOMITANT @<ContextShape>+
>> >>    }
>> >>
>> >>    <ContextShape> {
>> >>      a (oslc:AccessContext),
>> >>      dc:description xsd:string,
>> >>      dc:title xsd:string
>> >>    }
>> >> ]]
>> >> with a questionable RDF representation:
>> >> [[
>> >>      rs:property [
>> >>          rs:name "???" ;
>> >>          se:concomitantShape true ;
>> >>          rs:valueShape <ContextShape> ;
>> >>          rs:occurs rs:One-or-many ;
>> >>      ] ;
>> >> ]]
>> >>
>> >> http://w3.org/brief/NDI4

>> >>
>> >>
>> >>> Thanks,
>> >>> kc
>> >>>
>> >>>
>> >>>>
>> >>>>
>> >>>> peter
>> >>>>
>> >>>>
>> >>>> On 12/19/2014 06:01 PM, Karen Coyle wrote:
>> >>>>> DC has at least one similar case, in use today. Can you, however,
>>say
>> >>>>> what you
>> >>>>> mean by "some characteristic of two nodes"? What "characteristics"
>> >>>>> would put
>> >>>>> them out of scope?
>> >>>>>
>> >>>>> kc
>> >>>>>
>> >>>>> On 12/19/14 4:12 PM, Peter F. Patel-Schneider wrote:
>> >>>>>> If the only connection is that they are in the same graph, then
>>it
>> >>>>>>might
>> >>>>>> be in scope.  However, if there is some indication that the
>> >>>>>>connection
>> >>>>>> is somehow special because of the some characteristic of two
>>nodes
>> >>>>>>that
>> >>>>>> are both in a particular graph, then I would say that this is
>>out of
>> >>>>>> scope.
>> >>>>>>
>> >>>>>> It appears to me that the latter is the case.
>> >>>>>>
>> >>>>>> peter
>> >>>>>>
>> >>>>>>
>> >>>>>> On 12/19/2014 12:42 PM, Arthur Ryman wrote:
>> >>>>>>> "Peter F. Patel-Schneider" <pfpschneider@gmail.com> wrote on
>> >>>>>>>12/19/2014
>> >>>>>>> 02:40:44 PM:
>> >>>>>>>
>> >>>>>>>> From: "Peter F. Patel-Schneider" <pfpschneider@gmail.com>
>> >>>>>>>> To: Arthur Ryman/Toronto/IBM@IBMCA,
>>public-data-shapes-wg@w3.org
>> >>>>>>>> Date: 12/19/2014 02:41 PM
>> >>>>>>>> Subject: Re: shapes as classes
>> >>>>>>>>
>> >>>>>>>> S35 talks about an implicit connection between
>>acc:AcccessContext
>> >>>>>>>> nodes
>> >>>>>>> and
>> >>>>>>>> acc:AccessContextList nodes.  This implicit connection appears
>>to
>> >>>>>>>> me to
>> >>>>>>> be
>> >>>>>>>> outside the scope of RDF.
>> >>>>>>>>
>> >>>>>>>> peter
>> >>>>>>>>
>> >>>>>>>
>> >>>>>>> Peter,
>> >>>>>>> I think this implicit connection is in scope because the concept
>> >>>>>>>of an
>> >>>>>>> RDF
>> >>>>>>> graph is within the scope of RDF. The implicit connection
>>between
>> >>>>>>>the
>> >>>>>>> nodes is a consequence of them being in the same RDF graph. A
>>shape
>> >>>>>>> language should let me describe a constraint such as "The graph
>> >>>>>>>must
>> >>>>>>> have
>> >>>>>>> exactly one node of type acc:AccessContextList, and zero or
>>nodes
>> >>>>>>>of
>> >>>>>>> type
>> >>>>>>> acc:AccessContext."
>> >>>>>>>
>> >>>>>>> -- Arthur
>> >>>>>>>
>> >>>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>
>> >>>>
>> >>>
>> >>> --
>> >>> Karen Coyle
>> >>> kcoyle@kcoyle.net http://kcoyle.net

>> >>> m: 1-510-435-8234
>> >>> skype: kcoylenet/+1-510-984-3600
>> >>>
>> >>
>> >
>> >--
>> >Karen Coyle
>> >kcoyle@kcoyle.net http://kcoyle.net

>> >m: 1-510-435-8234
>> >skype: kcoylenet/+1-510-984-3600
>> >
>>
>
>--
>-ericP
>
>office: +1.617.599.3509
>mobile: +33.6.80.80.35.59
>
>(eric@w3.org)
>Feel free to forward this message to any list for any purpose other than
>email address distribution.
>
>There are subtle nuances encoded in font variation and clever layout
>which can only be seen by printing this message on high-clay paper.
Received on Tuesday, 30 December 2014 20:27:00 UTC