Another possible problem? Re: Reproducing Gregg/Niklas' thoughts (@itemref issue) (ISSUE-144) from Ivan Herman on 2012-12-08 (public-rdfa-wg@w3.org from December 2012)

From: Ivan Herman <ivan@w3.org>
Date: Sat, 8 Dec 2012 09:45:57 -0500
To: Gregg Kellogg <gregg@greggkellogg.net>
Cc: Niklas Lindström <lindstream@gmail.com>, Dan Brickley <danbri@danbri.org>, W3C RDFa WG <public-rdfa-wg@w3.org>
Message-Id: <71A67CF3-7D38-432E-9409-9A7E43CBE503@w3.org>
I also see another potential issue, namely a significant difference between @itemref and this approach (unless I am mistaken in my understanding of @itemref).

@itemref is, essentially, syntactical. The DOM manipulation approach I had really seems to model what is happening there as far as microdata is concerned. What this means is that, again unless I really misunderstand things, the following:

<div id="c">
 <p>Band: <span itemprop="name">Jazz Band</span></p>
 <p>Size: <span itemprop="size">12</span> players</p>
</div>

<div itemtype="http://a.b.c/A">
  ...
  <... itemref="c"
</div>

<div itemtype="http://p.q.r/B">
  ...
  <... itemref="c"
</div>

would yield, in RDF, something like

[ a <http://a.b.c/A> .
  <http://a.b.c/name> "Jazz Band" ;
  <http://a.b.c/size> "12" ;
]

[ a <http://p.q.r/B> .
  <http://p.q.r/name> "Jazz Band" ;
  <http://p.q.r/size> "12" ;
]

Ie, the expansion rules for the property names are, shall we say, contextual.

However, this is _not_ the case for the entailment-based approach. The URI-s for the properties depend on the prefix settings in effect in the prototype generation, and they do not depend on the context where the prototype is used.

I am a bit concerned about that: it is a subtle but fairly critical difference between microdata and RDFa which may lead to practical issues. Of course, if we consider microdata as a single-vocabulary (ie, schema.org) syntax, this may not be an issue. Although even that is not absolutely true; users have the possibility to use extensions to the schema.org hierarchy, in which case the problem may appear.







On Dec 7, 2012, at 21:42 , Gregg Kellogg wrote:

> On Dec 7, 2012, at 5:29 PM, Niklas Lindström <lindstream@gmail.com> wrote:
> 
>> Ivan, Gregg,
>> 
>> Thanks for examining this! I don't need to write a proposal then, you
>> readily got the gist of it.
>> 
>> Here's a working implementation in Python:
>> 
>> -- 8< --
>> from rdflib import *
>> 
>> RDFA = Namespace("http://www.w3.org/ns/rdfa#")
>> 
>> def expand_rdfa_prototype(graph, keep_prototype=False):
>>   for s, proto in graph.subject_objects(RDFA.ref):
>>       if (proto, RDF.type, RDFA.Prototype) in graph:
>>           for p, o in graph.predicate_objects(proto):
>>               if (p, o) != (RDF.type, RDFA.Prototype):
>>                   graph.add((s, p, o))
>>   if keep_prototype:
>>       return
>>   for s, proto in graph.subject_objects(RDFA.ref):
>>       graph.remove((s, RDFA.ref, proto))
>>       if (proto, RDF.type, RDFA.Prototype) in graph:
>>           graph.remove((proto, None, None))
>> -- >8 --
>> 
>> Ivan, feel free to add that to pyRdfa for experimentation. Although I
>> imagine you're already writing code for this. ;)
>> 
>> I'm somewhat optimistic at the moment. This does seem to provide the
>> necessary mechanics.
>> 
>> I need to stress a couple of things though:
>> 
>> 1) IMO, this *must not* be promoted as an alternate way of describing
>> *one* resource in different parts of the page. We have always had
>> @resource (and in full RDFa, @about) for this very reason. I fear that
>> that may have been overlooked recently. For instance, four of Gregg's
>> examples would be much clearer using that instead of any
>> itemref/prototype feature. A prototype feature should only be about
>> reproducing descriptions for multiple *different* resources.
> 
> I wouldn't construe those examples as promoting best practice, just reproducing equivalents for existing microdata tests. It's important to see that this mechanism does everything that @itemref does in microdata.
> 
>> 2) @itemref is sometimes used for copying statements already in scope
>> for describing an item. E.g. the title and author of a page, reused
>> for e.g. an event. In the case we've seen with e.g. NCSU, that *may*
>> lead to odd data though (e.g. domain violations). And I really think
>> it is much better (as in easier so see what's going on) to just
>> reproduce any smaller parts that are shared, using <link> and <meta>
>> (if they are to be hidden). That said, this Prototype feature does
>> work here as well, by using of a nested rdfa:ref to capture the piece
>> which is both to describe the current subject, and reused elsewhere.
>> Example:
>> 
>>   <div property="about" typeof="CreativeWork">
>>     ....
>>    <div property="rdfa:ref" typeof="rdfa:Prototype" resource="_:main_image">
>>       <img alt="Sketch" property="image" src="/building_sketch.jpg" />
>>     </div>
>>     ....
>>   <div property="about" typeof="LandmarksOrHistoricalBuildings">
>>     <link property="rdfa:ref" resource="_:main_image" />
>>     ....
>> 
>> Here, the CreativeWork and LandmarksOrHistoricalBuildings share the
>> same image relation, via a prototype which is "folded in" by the
>> prototype post-processing. Of course, as just stated, I certainly
>> think it's better to just link to it twice (and in this case it would
>> save bytes, altough in the NCSU original it might be a tie, since the
>> real link is 225 chars long). In any case, the prototype feature is
>> much more verbose than @itemref (the wrapping ref div can be replaced
>> by just an @id in the image, to be ref:ed in just an @itemref on the
>> landmarks div). Although I still believe that this is fine, since it
>> ought to prevent prototypes from being overused where simple
>> repetition is just plain.. simpler.
> 
> I don't think the added verbosity of the RDFa version creates too much of an issue, as it really is a different expression of the same pattern.
> 
>> 3) The Prototype feature may come off as using semantics for what
>> seems like a syntax issue. Granted, "rdfa:ref" can be interpreted like
>> "reference to a prototypical resource whose (non-meta) characteristics
>> also apply to this resource" (and by "meta" I mean its rdfa:Prototype
>> class). Somewhat like a *very* distilled form of a union of
>> onProperty+hasValue OWL restrictions (see my example at [1]). So it
>> might not be too artificial (even for RDF people).
>> 
>> 4) To me, the most important question continues to be: how is this
>> data supposed to be consumed? Does a ProductModel and variants thereof
>> actually suffice in reality [2]? Should a name, image and keywords be
>> copied verbatim for both a page, the work it describes, and the
>> building that the work depicts? What is necessary, and what is SEO
>> guesswork? If it's the latter, can it be acceptable to simply use
>> <link> and <meta>? Or does the interlinking between resources provide
>> a better context anyway, with clean, descriptive data enabling
>> services to make rich snippets usable? Or does embedded metadata have
>> to be denormalized? If so, will RDFa Prototypes be a viable option?
> 
> Unfortunately, getting SEO folks to re-model their data is not going to work, particularly as it's based on examples that are already out there, AFAIK. The meme is here to stay, so regardless of the absolute need, it's important for RDFa to have an equivalent mechanism; the fact that it really falls out of the design, through a modest extension of vocabulary expansion, just goes to show how useful a model RDF and RDFa actually are for doing these kinds of things.
> 
> The key will be, if we build it, will Google consume it? I would hope so.
> 
> Gregg
> 
>> Let's continue to debate this, and gather more feedback.
>> 
>> Best regards,
>> Niklas
>> 
>> [1]: https://gist.github.com/4039715
>> [2]: https://github.com/niklasl/rdf-sparql-lab/blob/master/schema.org/tests/expand-model/001-in.html
>> 
>> (PS. Just for the record: we did in the past (on the subject of
>> @itemref) discuss supporting multiple resources in @about/@resource. I
>> do not suggest to debate it again, but I don't want us to completely
>> forget about it. Though it would only solve some of the cases.)
>> 
>> On Fri, Dec 7, 2012 at 11:30 PM, Gregg Kellogg <gregg@greggkellogg.net> wrote:
>>> I'v updated my distiller at http://rdf.greggkellogg.net/distiller with support for rdf:ref. To make this work, be sure to check the "Expand graph" checkbox.
>>> 
>>> All in all, implementing it took about an hour, most of which was for creating tests. It provides essentially equivalent functionality to @itemref, but in a more RDF-friendly way. I recommend adding support for the feature to HTML5+RDFa.
>>> 
>>> Gregg Kellogg
>>> gregg@greggkellogg.net
>>> 
>>> On Dec 7, 2012, at 2:12 PM, Gregg Kellogg <gregg@greggkellogg.net> wrote:
>>> 
>>>> I added experimental support to my parser (will deploy to distiller later) as part of vocabulary expansion. I pretty much implement Ivan's algorithm as part of RDFa vocabulary expansion, with the following difference:
>>>> 
>>>> I modified the DELETE clause to remove the rdfa:Prototype on the subject resource as well:
>>>> 
>>>> DELETE DATA {
>>>> ?x rdfa:ref ?PR .
>>>> ?x rdf:type rdfa:Prototype .
>>>> ?PR ?p ?y .
>>>> }
>>>> 
>>>> Here are some example tests, based on those used by the Microdata RDF note:
>>>> 
>>>> To a single ID:
>>>> 
>>>>        <div>
>>>>          <div typeof="schema:Person">
>>>>            <link property="rdfa:ref" resource="_:a"/>
>>>>          </div>
>>>>          <p resource="_:a" typeof="rdfa:Prototype">Name: <span property="schema:name">Amanda</span></p>
>>>>        </div>
>>>> 
>>>> should produce
>>>> 
>>>>        @prefix schema: <http://schema.org/> .
>>>>        [a schema:Person; schema:name "Amanda"] .
>>>> 
>>>> Adds additional property:
>>>> 
>>>>      <div>
>>>>        <div typeof="schema:Person">
>>>>          <p>My name is <span property="schema:name">Gregg</span></p>
>>>>          <link property="rdfa:ref" resource="_:surname"/>
>>>>        </div>
>>>>        <p resource="_:surname" typeof="rdfa:Prototype">My name is <span property="schema:name">Kellogg</span></p>
>>>>      </div>
>>>> 
>>>> should produce
>>>> 
>>>>        @prefix schema: <http://schema.org/> .
>>>>        [ a schema:Person; schema:name "Gregg", "Kellogg"] .
>>>> 
>>>> Multiple subjects with different types:
>>>> 
>>>>        <div>
>>>>          <div typeof="schema:Person">
>>>>            <link property="rdfa:ref" resource="_:a"/>
>>>>          </div>
>>>>          <div typeof="foaf:Person">
>>>>            <link property="rdfa:ref" resource="_:a"/>
>>>>          </div>
>>>>          <p resource="_:a" typeof="rdfa:Prototype">Name: <span property="schema:name foaf:name">Amanda</span></p>
>>>>        </div>
>>>> 
>>>> should produce
>>>> 
>>>>        @prefix foaf: <http://xmlns.com/foaf/0.1/> .
>>>>        @prefix schema: <http://schema.org/> .
>>>>        [ a schema:Person; schema:name "Amanda"; foaf:name "Amanda"] .
>>>>        [ a foaf:Person; schema:name "Amanda"; foaf:name "Amanda"] .
>>>> 
>>>> Multiple references:
>>>> 
>>>>        <div>
>>>>          <div typeof="schema:Person">
>>>>            <link property="rdfa:ref" resource="_:a"/>
>>>>            <link property="rdfa:ref" resource="_:b"/>
>>>>          </div>
>>>>          <p resource="_:a" typeof="rdfa:Prototype">Name: <span property="schema:name">Amanda</span></p>
>>>>          <p resource="_:b" typeof="rdfa:Prototype"><span property="schema:band">Jazz Band</span></p>
>>>>        </div>
>>>> 
>>>> should produce
>>>> 
>>>>        @prefix schema: <http://schema.org/> .
>>>>        [ a schema:Person;
>>>>          schema:name "Amanda";
>>>>          schema:band "Jazz Band";
>>>>        ] .
>>>> 
>>>> 
>>>> With chaining:
>>>> 
>>>>        <div>
>>>>          <div typeof="schema:Person">
>>>>            <link property="rdfa:ref" resource="_:a"/>
>>>>            <link property="rdfa:ref" resource="_:b"/>
>>>>          </div>
>>>>          <p resource="_:a" typeof="rdfa:Prototype">Name: <span property="schema:name">Amanda</span></p>
>>>>          <div resource="_:b" typeof="rdfa:Prototype">
>>>>            <div property="schema:band" typeof=" schema:MusicGroup">
>>>>              <link property="rdfa:ref" resource="_:c"/>
>>>>            </div>
>>>>          </div>
>>>>          <div resource="_:c" typeof="rdfa:Prototype">
>>>>           <p>Band: <span property="schema:name">Jazz Band</span></p>
>>>>           <p>Size: <span property="schema:size">12</span> players</p>
>>>>          </div>
>>>>        </div>
>>>> 
>>>> should produce
>>>> 
>>>>        @prefix schema: <http://schema.org/> .
>>>>        [ a schema:Person;
>>>>          schema:name "Amanda" ;
>>>>          schema:band [
>>>>            a schema:MusicGroup;
>>>>            schema:name "Jazz Band";
>>>>            schema:size "12"
>>>>          ]
>>>>        ] .
>>>> 
>>>> Shared resource:
>>>> 
>>>>        <div>
>>>>          <div typeof=""><link property="rdfa:ref" resource="_:a"/></div>
>>>>          <div typeof=""><link property="rdfa:ref" resource="_:a"/></div>
>>>>          <div resource="_:a" typeof="rdfa:Prototype">
>>>>            <div property="schema:refers-to" typeof="">
>>>>              <span property="schema:name">Amanda</span>
>>>>            </div>
>>>>          </div>
>>>>        </div>
>>>> 
>>>> should produce:
>>>> 
>>>>        @prefix schema: <http://schema.org/> .
>>>>        [ schema:refers-to _:a ] .
>>>>        [ schema:refers-to _:a ] .
>>>>        _:a schema:name "Amanda"
>>>> 
>>>> 
>>>> I'll have my updated distiller released support this later today.
>>>> 
>>>> Gregg
>>>> 
>>>> On Dec 7, 2012, at 11:22 AM, Ivan Herman <ivan@w3.org> wrote:
>>>> 
>>>>> 
>>>>> On Dec 7, 2012, at 14:00 , Dan Brickley wrote:
>>>>> 
>>>>>> 
>>>>>> On 7 Dec 2012 15:21, "Ivan Herman" <ivan@w3.org> wrote:
>>>>>>> 
>>>>>>> Hi guys,
>>>>>>> 
>>>>>>> I tried to reproduce what Gregg/Niklas were considering yesterday and, I believe, here are the rules that we may define and then use a post-processing step on the resulting graph that execute those:
>>>>>>> 
>>>>>>> INSERT DATA {
>>>>>>> ?x ?p ?y .
>>>>>>> }
>>>>>>> DELETE DATA {
>>>>>>> ?x rdfa:ref ?PR .
>>>>>>> ?PR ?p ?y .
>>>>>>> }
>>>>>>> WHERE {
>>>>>>> ?x rdfa:ref ?PR .
>>>>>>> ?PR ?p ?y.
>>>>>>> ?PR a rdfa:Prototype .
>>>>>>> }
>>>>>>> 
>>>>>>> Ie, if I have somewhere:
>>>>>>> 
>>>>>>> <div resource="#p" typeof="rdfa:Prototype">
>>>>>>> <span property="foo">bar</span>
>>>>>>> </div>
>>>>>> 
>>>>>> ....ah, so you're using special terms in an rdf vocab, to avoid making extra syntax?
>>>>>> 
>>>>>> If this <div> had nested subelements, which part would be in the Prototype?
>>>>> 
>>>>> Everything. The whole lot:-)
>>>> 
>>>>>> 
>>>>>>> ...
>>>>>>> ...
>>>>>>> 
>>>>>>> <div resource="#A">
>>>>>>> <span property="yep">Yep Yep</span>
>>>>>>> <span property="rdfa:ref" resource="#p"/>
>>>>>>> </div>
>>>>>>> 
>>>>>>> then what I would the following graph:
>>>>>>> 
>>>>>>> <#A>
>>>>>>> <yep> "Yep Yep" ;
>>>>>>> <foo> "bar" .
>>>>>>> 
>>>>>>> <#p> a rdfa:Prototype ;
>>>>>>> <foo> "bar" .
>>>>>>> 
>>>>>>> 
>>>>>>> Which is roughly a @itemref as we know it. I think it works and can be implemented without too much problems.
>>>>>> 
>>>>>> 
>>>>>> Thanks for investigating this issue!
>>>>>> 
>>>>>>> Here, though, the problems I see with this. I do not consider these as show stoppers but we have to realize those
>>>>>>> 
>>>>>>> - As you see, the triples on the prototype itself also make it in the final graph. I am not sure it is o.k., but I also do not know how to remove them. We could define, in the SPARQL 1.1 terms, some sort of a property path based DELETE DATA clause, but implementation of that might be a bit difficult. I am not sure it is worth it.
>>>>>>> 
>>>>>> 
>>>>>> I assume SPARQL is purely for documentational convenience / spec here, and not a real dependency?
>>>>> 
>>>>> Yes. At the moment, that is the only syntax that can express all these rules (cannot express removal in N3:-(
>>>> 
>>>> Pretty easy to do; I just create an additional rule to match the statements to be removed, and remove them from my output graph.
>>>> 
>>>>>> 
>>>>>>> - The pattern I used above is of course fine. But what happens if the user does the following:
>>>>>>> 
>>>>>>> <div property="rdf:type" resource="rdfa:Prototype>
>>>>>>> <span property="foo">bar</span>
>>>>>>> </div>
>>>>>>> 
>>>>>>> the subject, ie, the ?PR in the SPARQL pattern, would be anything that was inherited, which may lead to funny situations. In other words, we do give a rope to the user to hand himself, although I agree that this is very much a corner case.
>>>>>> I do worry about mixing vocab and syntax for such reasons.
>>>>>> 
>>>>>>> - Would the execution of those rules be a required feature? If so, we would have to talk to the Google implementers (via DanBri) whether they would implement this at all. If not, the major use case of introducing this falls...
>>>>>> I don't fully understand. But I'd like to work this through next week with examples...
>>>>> 
>>>>> O.k.
>>>>> 
>>>>> For reference, there was another approach:
>>>>> 
>>>>> http://lists.w3.org/Archives/Public/public-rdfa-wg/2012Nov/0003.html
>>>>> 
>>>>> which was based on the idea of a DOM manipulation *before* any type of RDFa processing, but reproducing a similar feature to @itemref. There are also issues with that one
>>>>> 
>>>>> - does not work (well) if a streaming parser is used
>>>>> - for any implementation that is in a browser, it should start by duplicating the DOM and work on that one; indeed, manipulating the DOM that is also used for display is not a good idea:-(
>>>>> 
>>>>> I am not 100% which of the two approaches I prefer (if we do anything, that is). I still tend to prefer the DOM manipulation one that seems to have less caveats for me, but that is just a mild preference...
>>>>> 
>>>>> Ivan
>>>>> 
>>>>> 
>>>>>> 
>>>>>> cheers,
>>>>>> 
>>>>>> Dan
>>>>>> 
>>>>>>> Food for thought...
>>>>>>> 
>>>>>>> Ivan
>>>>>>> 
>>>>>>> 
>>>>>>> ----
>>>>>>> Ivan Herman, W3C Semantic Web Activity Lead
>>>>>>> Home: http://www.w3.org/People/Ivan/
>>>>>>> mobile: +31-641044153
>>>>>>> FOAF: http://www.ivan-herman.net/foaf.rdf
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>> 
>>>>> 
>>>>> ----
>>>>> Ivan Herman, W3C Semantic Web Activity Lead
>>>>> Home: http://www.w3.org/People/Ivan/
>>>>> mobile: +31-641044153
>>>>> FOAF: http://www.ivan-herman.net/foaf.rdf
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
> 
> 


----
Ivan Herman, W3C Semantic Web Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
FOAF: http://www.ivan-herman.net/foaf.rdf
Received on Saturday, 8 December 2012 14:46:26 UTC