Re: Weighing the ideas around itemref from Niklas Lindström on 2012-12-10 (public-rdfa-wg@w3.org from December 2012)

From: Niklas Lindström <lindstream@gmail.com>
Date: Mon, 10 Dec 2012 19:16:05 +0100
To: Ivan Herman <ivan@w3.org>
Cc: Gregg Kellogg <gregg@greggkellogg.net>, public-rdfa-wg <public-rdfa-wg@w3.org>, Dan Brickley <danbri@danbri.org>
Message-ID: <CADjV5jd1QVM7_nZGtQ1xZ-g5OiPAiAppo3R489V=VPJtH4Jong@mail.gmail.com>
Gregg, Ivan,

On Mon, Dec 10, 2012 at 12:21 PM, Ivan Herman <ivan@w3.org> wrote:
> Looking at this arguments... I have the impression that, in many places, it is some sort of a judgement call. I also think that for this group to go out and say "this is how you ought to do that, the way you do it is semantically wrong", etc, will not really be a good idea.

I agree with you both here. I do recognize the common practice of
reusing the same value for items of different abstraction levels, like
in the bibliographic scenarios (that's what I currently work with,
albeit in their rawest form (MARC..)). I personally (and
professionally) strive to find higher abstractions to facilitate
discovery and usage, which certainly affects my values here (and
appreciation of e.g. ProductModel). But I also agree that we cannot
enforce that approach. It is not always cost effective or even viable.
There is no "right" or "wrong"; these are all conceptual
constructions. I am actually not a platonist, far from it. :)

And it is also common to make repeated statements about a set of
resources, as Gregg exemplifies (e.g. the creator/location/licence for
a set of pictures).

My questions are only about the actual, observed frequency and size.
Since the data extracted will contain these "duplications", I wondered
if the repetition in RDFa is acceptable as well. I suppose this in
turn comes from my inclination to "admit the cost" of repetition up
front, instead of "hiding" it with syntax (as the include pattern in
microformats/microdata/general templating does). But that comes from
work with *raw* data. Publishing data in HTML is a different use case.
So I can certainly respect this need. For all the odd things @itemref
do (or is misconceived to do), it certainly eliminates that
repetition.

And for this, RDFa Prototypes seem to work really well!

.. Granted, I keep thinking that I'd like some way to restrict the
properties that are inherited, so that you could pick just some
properties from e.g. a ProductModel into a Product, without wrapping
that which is shared between a ProductModel and Products in a
prototype. (This is necessary to avoid e.g. the Product inheriting the
ProductModel type and potentially other "local" properties, like
releaseDate.) But all I come up with are constructs like this (which
like prototype would use an expansion step on the resulting graph), to
be used in each Product:

    <span property="rdfa:inherits" typeof>
      <link property="rdfa:source" resource="#model">
      <link property="rdfa:prop" resource="schema:name">
      <link property="rdfa:prop" resource="schema:manufacturer">
    </span>

Basically as heavy as just repeating the data. Granted, with some
syntax extension, it could be done e.g. like:

    <link inherits="name manufacturer" resource="#work">

But the rdfa:ref/rdfa:Prototype combo is probably still the simplest
thing which solves both these cases *and* the case where the shared
properties are "meaningless" on their own (as in Gregg's ImageObject
example). And it does this without additional syntax, using a pattern
I've been pondering for a while now. Perhaps I should just be happy.
:)

> I would propose the following as a way forward
>
> - we get a consensus on the technical design, including the necessary spec text; once we have consensus, having 1-2 implementation is also a good idea for people to experiment with (Gregg has one, I will try to find the time to do one, although that may happen only at the end of the month or in January)

I can add that Python I've shown to the pyRdfa part of RDFLib (in a
branch), and we can take it from there.

(And I'll also add this to clj-rdfa if we decide it's to be done.)

> - we put this feature into the Last Call document, but we list it as an 'AT RISK' feature, possibly putting a short note into the document with the reason(s) we consider it at risk. We can put then a note in the document giving a _concise_ reason why we consider it at risk (possibility for misuse, clearer modeling, etc)
> - in our public communication at LC we emphasize this AT RISK stuff, we really ask for public comments on this and we decide based on those
>
> How does that sound?

I'm on board with this.

Best regards,
Niklas


> Ivan
>
>
> On Dec 9, 2012, at 21:16 , Gregg Kellogg wrote:
>
>> On Dec 9, 2012, at 5:09 PM, Niklas Lindström <lindstream@gmail.com> wrote:
>>
>>> I've been weighing the inputs and ideas that have been put forward
>>> regarding ISSUE-144 (an @itemref-like feature) [1]. Let's discuss the
>>> requirements here. Whatever we do, we must not rush things.
>>>
>>> I still believe we need more knowledge about the actual needs. What
>>> publishers need to do to capture their relevant content, and what
>>> consumers need of that and how to use it. We need solid examples,
>>> otherwise the claim that this is much required in the wild is moot.
>>>
>>> I have observed two aspects of @itemref which we do *not* need to reproduce:
>>>
>>> 1) Based on the microdata spec, and some online articles (e.g. [2],
>>> [3]), @itemref is sometimes used to capture data about a resource
>>> which is not nested within the element/@itemscope in question. As has
>>> been said many times, this has always been a basic feature of RDFa,
>>> since it supports specifying the subject (using @resource (and in
>>> full, @about)) in the portion where the data exists.
>>>
>>> (I have used this many times, e.g. when adding RDFa to our company
>>> intranet. Using @itemref there instead would have made the solution
>>> brittle and hard to grasp, since the subject for a piece of content
>>> would only be discernible by noticing that the local @id was actually
>>> used in an @itemref elsewhere in the template source, often spread
>>> across files.)
>>
>> Yes, and people coming from microdata don't always get this, as they typically use @itemscope without @itemid, and therefore get a unique BNode (or item if staying within the microdata-json rep) on each use, so the idea of spreading the assertions across the DOM is foreign; this is just a matter of education, IMO.
>>
>>> 2) In this stackoverflow thread [4] the practise of linking to
>>> resources is oddly done by using @itemref (instead of using references
>>> with an @itemid, which is mainly like @resource). It also seems that
>>> GoodRelations recommends this [5]. IIUC, the consequence, according to
>>> the microdata algorithm, is basically a copying by value, resulting in
>>> two different items (bnodes), instead of linking to the same resource.
>>> This copying is not apparent of course; in the thread this is thought
>>> of as linking data.
>>>
>>> (And I don't blame them; the name @iremref certainly implies that it
>>> is used for making item references, not element references for a
>>> parser to jump to.. From what I've gathered, it basically instructs a
>>> parser that "this item is also described by the content blocks at
>>> these IDs". Basically an @itemdescriptionref.. Please correct me if
>>> I'm missing a point here.)
>>
>> Yes, I responded to Aaron Bradley on Manu's G+ thread (https://plus.google.com/u/1/102122664946994504971/posts/Zoq5EiNR9pw), he was noting @itemref as being important for this very reason, using an @itemref to reference an element with an @itemprop relating to a new item. I did point out to him that this ends up creating two items with the same information, and probably isn't really what he wants anyway, but such is the weight of examples that mis-use the syntax.
>>
>>> So whatever we're after here, it doesn't need to be the exact
>>> equivalent of @itemref (in fact, given the above, that would be a
>>> costly choice in terms of complexity). We need to define the core of
>>> what is sought after.
>>>
>>> AFAIK, we have so far received two instances where an @itemref feature
>>> is said to be needed:
>>>
>>> 1) Martin Hepp initially reported that @itemref is necessary in real
>>> world usage. Unfortunately, we haven't gotten any real examples
>>> supporting this claim. From what I have gathered, the case described
>>> is readily solved by using a ProductModel. If is is not, it is yet
>>> unclear whether adding link and meta elements would suffice or not.
>>> How many properties are to be copied into each product? (And are there
>>> no dedicated product pages with more details, given the apparent need
>>> to discover each product in search engines?)
>>
>> It may be that ProductModel handles this case semantically, but I can imaging a number of other cases where something similar is done, for example a sequence of photos that all relate to the same subject, where the photo is itself the primary resource:
>>
>> <figure typeof="schema:ImageObject">
>>  <img property="schema:contentUrl" src="img1"/>
>>  <figurecaption property="schema:name">Image 1</figurecaption>
>>  <link property="rdfa:ref" resource="_:imagecontents"/>
>> </figure>
>>
>> <figure typeof="schema:ImageObject">
>>  <img property="schema:contentUrl" src="img2"/>
>>  <figurecaption property="schema:name">Image 2</figurecaption>
>>  <link property="rdfa:ref" resource="_:imagecontents"/>
>> </figure>
>>
>> <div typeof="rdfa:Prototype" resource="_:imagecontents">
>>  <a property="schema:location" href="someplace">Some place</a>
>>  <a property="schema:about" href="someone">Some one</a>
>>  ...
>> </div>
>>
>> In any case, I think the community has made the case that an @itemref-like feature is necessary. There are certainly cases where existing use of @itemref can be replaced with distributing @resource across a page, but it may be that this usage pattern is foreign enough for people coming from an SEO perspective, that it is still important. Also, past experience indicates that getting people to solve their problems by re-modeling to work around it (e.g., Product/ProductModel) is just not realistic; this can become an argument for the new users of semantic markup that the solutions put forward by the Semantic Web crowd are just not sympathetic to the needs of web developers.
>>
>>> What other cases like this are there? Do sub-events need to copy
>>> certain properties from their parent events? All properties or just a
>>> handful? (Certainly the "subEvent" relation must not be copied, so
>>> using @itemref to copy the parent data seems off.) This is the source
>>> of our prototype idea, which I've been entertaining for a while. It is
>>> alluring, and quite easy to implement. But that doesn't equate with
>>> utility. I still don't know if the needs presented really demand it. I
>>> think we need more experience and input here.
>>>
>>> 2) The other case is from Jason Ronallo. This example [6] uses
>>> @itemref to reuse a name, an image and a set of keywords between an
>>> ItemPage, LandmarksOrHistoricalBuildings and CreativeWork (please see
>>> the source to understand the details). In a way it is similar to the
>>> Product case, with the potential difference (depending on how
>>> important the ProductModel is as a concept in that example) that the
>>> copied data here is also used within an itemscope to describe an
>>> entity. And especially that this is mostly about picking out a few
>>> pieces to avoid repetition in hidden @content. I do sympathize with
>>> the desire to avoid duplication, but in general I still think the
>>> repetition in meta and link elements would be fairly negligible. (And
>>> that such direct use makes the effect much clearer, and reduces
>>> complexity.)
>>
>> This could also be seen in bibliographic use, where you have a Work, Product and Manifestation that share many properties, but are certainly semantically distinct entities.
>>
>>> I've put a version of that example as RDFa using our experimental
>>> prototypes at [7]. It certainly works, but I wonder if it's necessary.
>>>
>>> Perhaps it would be enough if there was a mechanism to reuse a literal
>>> value from another place in the document? That would remove the need
>>> of sometimes copying the same textual value into several descriptions
>>> within a page (often hidden in @content of meta elements). This could
>>> be done by adding a new @contentref attribute:
>>>
>>>   <div resource="#page" typeof="ItemPage">
>>>     <h1 property="name" id="page_name">A Very Long Name Which Would
>>> Be Tedious To Repeat</h1>
>>>     <div property="about" resource="#creativework" typeof="CreativeWork">
>>>       <meta property="name" contentref="page_name"/>
>>>     </div>
>>>   </div>
>>>
>>> This would only copy the literal value. The @property (and any
>>> @datatype) will be on the "start" element which uses the @contentid.
>>> So in a way it would be like @datetime or @value in HTML5, just
>>> indirected via an @id lookup. Adding just this would still require
>>> repetition of links and meta elements though (e.g. for multiple
>>> keywords). It would just remove the need for repeating literal
>>> content. The question is still open whether that would suffice. I'm
>>> suggesting this mostly to promote a balance of requirements.
>>>
>>> The remaining, *very important* question, is whether search engines
>>> penalize usage of meta and link elements? This has come up time and
>>> again as a point of uncertainty for authors. I hope Schema.org
>>> representatives can answer this, since it is a generally useful
>>> pattern at times. I would expect it to be perfectly fine to add some
>>> precision, as long as neither content nor links deviate in subject
>>> matter. (There are many other ways to add hidden content for
>>> subversive SEO purposes.)
>>
>> This question comes up time and time again. Only time will tell what emerges, but IMO, uses of link and meta should prove to be okay, the warning against invisible markup is typically for large blocks of text which are moved off page or hidden in an obvious attempt to fool the ranking algorithms. We will probably see attempts to semantically fool updated algorithms too, if the use of schema.org really takes off. Presumably the solution to that would be to do some semantic textual analysis to see if it corresponds to the asserted markup.
>>
>> Gregg
>>
>>> Best regards,
>>> Niklas
>>>
>>> [1]: http://www.w3.org/2010/02/rdfa/track/issues/144
>>> [2]: http://html5doctor.com/microdata/
>>> [3]: http://net.tutsplus.com/tutorials/html-css-techniques/html5-microdata-welcome-to-the-machine/
>>> [4]: http://stackoverflow.com/questions/8726413/schema-org-itemref-linking-multiple-sportevents-to-a-single-place
>>> [5]: http://wiki.goodrelations-vocabulary.org/Cookbook/Video_content
>>> [6]: http://d.lib.ncsu.edu/collections/catalog/mc00096-001-ff0155-000-001_0001
>>> [7]: https://gist.github.com/4243921
>>>
>>
>>
>
>
> ----
> Ivan Herman, W3C Semantic Web Activity Lead
> Home: http://www.w3.org/People/Ivan/
> mobile: +31-641044153
> FOAF: http://www.ivan-herman.net/foaf.rdf
>
>
>
>
>
Received on Monday, 10 December 2012 18:17:05 UTC