recommended pattern for markup-valued 'breadcrumb' properties in RDFa

Dear RDFa WG,

I'm looking for some advice on schema.org markup options. I hope to
join the WG shortly but wanted to start a conversation as early as
possible.

Schema.org's markup for breadcrumbs is both popular and (currently)
broken. The issue at http://www.w3.org/2011/webschema/track/issues/10
gives some backstory, but factors include Microdata's rule for
concatenating subelements, as well as the difficulty of representing
ordered lists of link/label pairs as simple triples without complex
markup. For the purposes of this mail, I am only interested in the
RDFa 1.1 possibilities.

Egor (cc:'d) has made a draft of a proposal for improving our design,
http://www.w3.org/wiki/WebSchemas/Breadcrumbs . This draft explores an
approach that makes explicit within the extracted graph, the ordering,
labelling and URLs from a 'breadcrumbs' section of HTML.

I would very much like to get the RDFa WG's perspective on this issue.

Looking at http://www.w3.org/TR/2012/REC-rdfa-core-20120607/#markup-fragments-and-rdfa
and http://www.w3.org/TR/2012/REC-rdfa-core-20120607/#s-xml-literals
it seems an alternate design might be possible with RDFa. Instead of
trying to make the entire 'breadcrumb' structure explicit as a graph,
we could put the whole breadcrumb into a single property value as a
larger piece of markup. The current spec shows this example:

<h2 property="dc:title" datatype="rdf:XMLLiteral">
  E = mc<sup>2</sup>: The Most Urgent Problem of Our Time
</h2>

...presumably this will be adjusted in the HTML+RDFa world. There was
discussion in the RDF WG earlier this year towards HTMLLiteral or HTML
as a datatype; http://lists.w3.org/Archives/Public/public-rdf-wg/2012May/0612.html
and the latest drafts now have such a datatype:

http://dvcs.w3.org/hg/rdf/raw-file/default/rdf-concepts/index.html#section-html
http://www.w3.org/TR/2012/WD-rdf11-concepts-20120605/#section-html
(latest public and editor's drafts seem identical)

"5.2 The rdf:HTML Datatype

RDF provides for HTML content as a possible literal value. This allows
markup in literal values. Such content is indicated in an RDF graph
using a literal whose datatype is a special built-in datatype
rdf:HTML. This datatype is defined as follows[...]"

Let's look at the older Microdata example we still publish and
schema.org. Can we talk through how this might look as an HTML
fragment?

First, the current example:

<body itemscope itemtype="http://schema.org/WebPage">
...
<div itemprop="breadcrumb">
  <a href="category/books.html">Books</a> >
  <a href="category/books-literature.html">Literature & Fiction</a> >
  <a href="category/books-classics">Classics</a>
</div> ...
</body>

Now, let's put that in RDFa 1.1, with the whole markup block as the
value of the 'breadcrumb' property:

<body typeof="http://schema.org/WebPage">
...
<div property="breadcrumb" datatype="rdf:HTML">
  <a href="category/books.html">Books</a> >
  <a href="category/books-literature.html">Literature & Fiction</a> >
  <a href="category/books-classics">Classics</a>
</div> ...
</body>


While this meets our goal of simple markup, I see a couple of
potential problems. Firstly the name of the datatype looks a little
odd from an HTML markup perspective.  Secondly, the RDF spec requires
that all supporting context, declarations and base URIs be packed into
the markup. So the relative URIs wouldn't work.

"Any language annotation (lang="…") or XML namespaces (xmlns) desired
in the HTML content must be included explicitly in the HTML literal.
Relative URLs in attributes such as hrefdo not have a well-defined
base URL and are best avoided."

My conclusion so far is that our markup would have to be either

A)
<body typeof="http://schema.org/WebPage">
...
<div property="breadcrumb" datatype="rdf:HTML">
  <a href="http://example.com/category/books.html">Books</a> >
  <a href="http://example.com/category/books-literature.html">Literature
& Fiction</a> >
  <a href="http://example.com/category/books-classics">Classics</a>
</div> ...
</body>

B) put base="http://example.com/" in the HTML <head>.

>From http://www.w3.org/TR/2012/REC-rdfa-core-20120607/#s_curieprocessing
I understand that an RDFa 1.1 parser will help by resolving relative
URI paths, but only for the values of the core RDFa attributes. Am I
correct to understand that they will not rewrite rdf:HTML markup
blocks to make URI references absolute?


Apologies for the long mail, but both crawl data and schema.org site
logs show that breadcrumb markup is of great interest to Web
developers, so I would like to do everything possible to explore the
design space while we still have some possibility to fine-tune the
designs at schema.org and in the RDFa/HTML spec.

Does the direction I sketch make sense, from an RDFa WG perspective?
Is there anything we can do to make the markup easier for publishers
and developers? Would another named markup datatype that absolute-ized
relative links be feasible at this stage? Did I miss any other design
options? Would more formal requirements analysis be useful?

cheers,

Dan

Received on Monday, 12 November 2012 12:24:50 UTC