[whatwg] Annotating structured data that HTML has no semantics for

On Thu, May 14, 2009 at 1:25 PM, Dan Brickley <danbri at danbri.org> wrote:
> Having HTML5-microdata -to- RDF parsers is pretty critical to having test
> cases that help us all understand where RDFa-Classic and HTML5 diverge. I'm
> very happy to see this work being done and that there are multiple
> implementations.
>
> As far as I can see, the main point of divergence is around URI abbreviation
> mechanisms. But also HTML5 might not have a notion equivalent to RDF/RDFa's
> bNodes construct. The sooner we have these parsers the sooner we'll know for
> sure.

If I understand RDF correctly, the idea is that everything can be
URIs, subjects and objects can instead be blank nodes, and objects can
instead be literals. If we restrict literals to strings (optionally
with languages), then I think all triples must follow one of these
eight patterns:

  <urn:subject> <urn:predicate> <urn:object> .
  <urn:subject> <urn:predicate> "object" .
  <urn:subject> <urn:predicate> "object"@lang .
  <urn:subject> <urn:predicate> _:X .
  _:X <urn:predicate> <urn:object> .
  _:X <urn:predicate> "object" .
  _:X <urn:predicate> "object"@lang .
  _:X <urn:predicate> _:Y .

These cases can be trivially mapped into HTML5 microdata as:

  <div item>
    <link itemprop="about" href="urn:subject">
    <link itemprop="urn:predicate" href="urn:object">
  </div>

  <div item>
    <link itemprop="about" href="urn:subject">
    <meta itemprop="urn:predicate" content="object">
  </div>

  <div item>
    <link itemprop="about" href="urn:subject">
    <meta itemprop="urn:predicate" content="object" lang="lang">
  </div>

  <div item>
    <link itemprop="about" href="urn:subject">
    <meta itemprop="urn:predicate" item id="X">
  </div>

  <link subject="X" itemprop="urn:predicate" href="urn:object">

  <meta subject="X" itemprop="urn:predicate" content="object">

  <meta subject="X" itemprop="urn:predicate" content="object" lang="lang">

  <meta subject="X" itemprop="urn:predicate" item id="Y">

(There's the caveat about <link> and <meta> being moved into <head> in
some browsers; you can replace them with <a> and <span> instead.)

These aren't the most elegant ways of expressing complex structures
(because they don't make much use of nesting), but hopefully they
demonstrate that it's possible to express any RDF graph (that only
uses string literals) by decomposing into triples and then writing as
HTML with these patterns.

(If all the triples using a blank node have the same subject, then you
don't need to use 'id' and 'subject' because you can just nest the
markup instead, I think.)

With my parser (in Firefox 3.0), the output triples (sorted into a
clearer order) are:

  <> <http://www.w3.org/1999/xhtml/vocab#item> <urn:subject> .
  <> <http://www.w3.org/1999/xhtml/vocab#item> <urn:subject> .
  <> <http://www.w3.org/1999/xhtml/vocab#item> <urn:subject> .
  <> <http://www.w3.org/1999/xhtml/vocab#item> <urn:subject> .
  <urn:subject> <urn:predicate> <urn:object> .
  <urn:subject> <urn:predicate> "object" .
  <urn:subject> <urn:predicate> "object"@lang .
  <urn:subject> <urn:predicate> _:n0 .
  _:n0 <urn:predicate> <urn:object> .
  _:n0 <urn:predicate> "object" .
  _:n0 <urn:predicate> "object"@lang .
  _:n0 <urn:predicate> _:n1 .

which corresponds to what was desired.

So, I can't see any limits on expressivity other than that literals
must be strings. (But I'm not at all an expert on RDF, and I may have
missed something in the microdata spec, so please let me know if I'm
wrong!)

-- 
Philip Taylor
excors at gmail.com

Received on Thursday, 14 May 2009 06:54:21 UTC