Weighing the ideas around itemref from Niklas Lindström on 2012-12-10 (public-rdfa-wg@w3.org from December 2012)

From: Niklas Lindström <lindstream@gmail.com>
Date: Mon, 10 Dec 2012 02:09:46 +0100
To: public-rdfa-wg <public-rdfa-wg@w3.org>
Cc: Dan Brickley <danbri@danbri.org>
Message-ID: <CADjV5jfDpUvWEg5mYLQgAPLvvXNNELkpBibLMdrpa7RSM_eymQ@mail.gmail.com>

I've been weighing the inputs and ideas that have been put forward
regarding ISSUE-144 (an @itemref-like feature) [1]. Let's discuss the
requirements here. Whatever we do, we must not rush things.

I still believe we need more knowledge about the actual needs. What
publishers need to do to capture their relevant content, and what
consumers need of that and how to use it. We need solid examples,
otherwise the claim that this is much required in the wild is moot.

I have observed two aspects of @itemref which we do *not* need to reproduce:

1) Based on the microdata spec, and some online articles (e.g. [2],
[3]), @itemref is sometimes used to capture data about a resource
which is not nested within the element/@itemscope in question. As has
been said many times, this has always been a basic feature of RDFa,
since it supports specifying the subject (using @resource (and in
full, @about)) in the portion where the data exists.

(I have used this many times, e.g. when adding RDFa to our company
intranet. Using @itemref there instead would have made the solution
brittle and hard to grasp, since the subject for a piece of content
would only be discernible by noticing that the local @id was actually
used in an @itemref elsewhere in the template source, often spread
across files.)

2) In this stackoverflow thread [4] the practise of linking to
resources is oddly done by using @itemref (instead of using references
with an @itemid, which is mainly like @resource). It also seems that
GoodRelations recommends this [5]. IIUC, the consequence, according to
the microdata algorithm, is basically a copying by value, resulting in
two different items (bnodes), instead of linking to the same resource.
This copying is not apparent of course; in the thread this is thought
of as linking data.

(And I don't blame them; the name @iremref certainly implies that it
is used for making item references, not element references for a
parser to jump to.. From what I've gathered, it basically instructs a
parser that "this item is also described by the content blocks at
these IDs". Basically an @itemdescriptionref.. Please correct me if
I'm missing a point here.)

So whatever we're after here, it doesn't need to be the exact
equivalent of @itemref (in fact, given the above, that would be a
costly choice in terms of complexity). We need to define the core of
what is sought after.

AFAIK, we have so far received two instances where an @itemref feature
is said to be needed:

1) Martin Hepp initially reported that @itemref is necessary in real
world usage. Unfortunately, we haven't gotten any real examples
supporting this claim. From what I have gathered, the case described
is readily solved by using a ProductModel. If is is not, it is yet
unclear whether adding link and meta elements would suffice or not.
How many properties are to be copied into each product? (And are there
no dedicated product pages with more details, given the apparent need
to discover each product in search engines?)

What other cases like this are there? Do sub-events need to copy
certain properties from their parent events? All properties or just a
handful? (Certainly the "subEvent" relation must not be copied, so
using @itemref to copy the parent data seems off.) This is the source
of our prototype idea, which I've been entertaining for a while. It is
alluring, and quite easy to implement. But that doesn't equate with
utility. I still don't know if the needs presented really demand it. I
think we need more experience and input here.

2) The other case is from Jason Ronallo. This example [6] uses
@itemref to reuse a name, an image and a set of keywords between an
ItemPage, LandmarksOrHistoricalBuildings and CreativeWork (please see
the source to understand the details). In a way it is similar to the
Product case, with the potential difference (depending on how
important the ProductModel is as a concept in that example) that the
copied data here is also used within an itemscope to describe an
entity. And especially that this is mostly about picking out a few
pieces to avoid repetition in hidden @content. I do sympathize with
the desire to avoid duplication, but in general I still think the
repetition in meta and link elements would be fairly negligible. (And
that such direct use makes the effect much clearer, and reduces
complexity.)

I've put a version of that example as RDFa using our experimental
prototypes at [7]. It certainly works, but I wonder if it's necessary.

Perhaps it would be enough if there was a mechanism to reuse a literal
value from another place in the document? That would remove the need
of sometimes copying the same textual value into several descriptions
within a page (often hidden in @content of meta elements). This could
be done by adding a new @contentref attribute:

<div resource="#page" typeof="ItemPage">
<h1 property="name" id="page_name">A Very Long Name Which Would
Be Tedious To Repeat</h1>
<div property="about" resource="#creativework" typeof="CreativeWork">
<meta property="name" contentref="page_name"/>
</div>
</div>

This would only copy the literal value. The @property (and any
@datatype) will be on the "start" element which uses the @contentid.
So in a way it would be like @datetime or @value in HTML5, just
indirected via an @id lookup. Adding just this would still require
repetition of links and meta elements though (e.g. for multiple
keywords). It would just remove the need for repeating literal
content. The question is still open whether that would suffice. I'm
suggesting this mostly to promote a balance of requirements.

The remaining, *very important* question, is whether search engines
penalize usage of meta and link elements? This has come up time and
again as a point of uncertainty for authors. I hope Schema.org
representatives can answer this, since it is a generally useful
pattern at times. I would expect it to be perfectly fine to add some
precision, as long as neither content nor links deviate in subject
matter. (There are many other ways to add hidden content for
subversive SEO purposes.)

Best regards,
Niklas

[1]: http://www.w3.org/2010/02/rdfa/track/issues/144
[2]: http://html5doctor.com/microdata/
[3]: http://net.tutsplus.com/tutorials/html-css-techniques/html5-microdata-welcome-to-the-machine/
[4]: http://stackoverflow.com/questions/8726413/schema-org-itemref-linking-multiple-sportevents-to-a-single-place
[5]: http://wiki.goodrelations-vocabulary.org/Cookbook/Video_content
[6]: http://d.lib.ncsu.edu/collections/catalog/mc00096-001-ff0155-000-001_0001
[7]: https://gist.github.com/4243921

Received on Monday, 10 December 2012 01:10:44 UTC