W3C home > Mailing lists > Public > public-rdf-in-xhtml-tf@w3.org > September 2009

Re: Testing Google's Rich Snippets RDFa support

From: Mark Birbeck <mark.birbeck@webbackplane.com>
Date: Wed, 16 Sep 2009 21:20:38 +0100
Message-ID: <640dd5060909161320i8ed9e01q5263f4b317af786a@mail.gmail.com>
To: Philip Taylor <pjt47@cam.ac.uk>
Cc: Toby Inkster <tai@g5n.co.uk>, Othar Hansson <othar@othar.com>, RDFa mailing list <public-rdf-in-xhtml-tf@w3.org>
Hi Philip/Toby,

Just to clarify a couple of things:

>> Neither. I am claiming that implementers will often want to implement a
>> superset of RDFa. e.g. they'll want to parse RDFa plus some other HTML
>> semantics (like <blockquote@cite>, <title>, etc).
>
> Ah, that sounds like a valid option. I assume by "superset" you mean
> specifically that the set of RDF triples extracted is a superset of the
> default graph defined by RDFa.

RDFa allows for additional graphs of data to be derived from a
document, but it doesn't allow the default graph to contain more or
less than can be interpreted using its rules.

I don't want to put words into Toby's mouth, but I'm pretty sure that
he's well aware of this -- so by "superset" I think he's just being
very general, and saying "understanding more attribute/element
patterns than the RDFa spec has allowed for", but I don't think he's
being specific and saying "put the derived triples into the default
graph".

Anyway, the way to look at the spec is that it's very draconian when
it comes to the default graph -- we're very precise about the triples
that can and must be generated from particular markup. But it's very
liberal when it comes to other graphs -- you can 'derive' any values
you like from other patterns, assume default prefixes, or whatever.

The idea was to allow people to experiment with new formats and
approaches, but without polluting the default graph. And obviously if
some technique catches on, and finds its way back into the main spec,
then those triples could then be defined to be part of the default
graph.


> But in that case, your earlier proposal ...
>
>> The best way to forgive webmasters who forget to declare the 'v' CURIE
>> prefix would be to pre-populate the "list of URI mappings" which is
>> described as initially empty in
>> <http://www.w3.org/TR/rdfa-syntax/#sec_5.5.>.
>
> ... seems to fail for input like:
>
>  <div xmlns:example="http://example.com/">
>    <span typeof="example:foo">
>      <span typeof="v:Person"> <!-- undeclared prefix v -->
>        <span property="example:bar">baz</span>
>      </span>
>    </span>
>  </div>
>
> Per RDFa, the second typeof must be ignored entirely, so the example:bar
> property will be associated with the example:foo, resulting in some triples
> in the default graph. If the prefix 'v' was pre-populated, the second typeof
> would set a new subject and the example:foo would have no properties, so the
> result would not be a superset of the default graph.

Right. But as discussed in another thread, I'm suggesting that we
issue an errata on this, because we shouldn't be ignoring the presence
of @typeof (any more than we ignore the presence of @rel or @rev).
Instead we should be ignoring *values* that we don't understand.

(More on this below.)


> I can't think of any straightforward modifications to the RDFa processing
> model that would be forgiving to authors who forget to declare prefixes,
> without sometimes violating the spec by failing to extract triples that are
> meant to be in the default graph. (But maybe I'm just not thinking hard
> enough!)
>
> (I suppose it would always be possible to run the proper RDFa processing
> model once, then run it again in error-forgiveness mode, then union the
> results, which would guarantee it's a superset, but that really doesn't
> sound like a good idea...)

You are right that there is no way we can work out what undeclared
prefixes mean, but there are two things of interest here.

First, if we issue an errata that says we don't ignore @typeof, but
merely ignore values we don't recognise, then your pattern:

  <div xmlns:example="http://example.com/">
    <span typeof="example:foo">
      <span typeof="???"> <!-- anything! -->
        <span property="example:bar">baz</span>
      </span>
    </span>
  </div>

will always generate two different bnodes for the two @typeof
occurrences, regardless of what the values contain. This means that at
the very least you can say that the property 'example:bar' will never
be attached to an object of type 'example:foo' (whereas with the
current algorithm, you can never be sure of that).

The second thing of interest is the point I made earlier about the
spec *explicitly* allowing values that aren't understood according to
the current spec, to still be processed, as long as they are placed
into some other graph. This means that you can add anything you like
to your RDFa processor, provided you are consistent with the basics.

This means that although it would be wrong in RDFa 1.0 for some
processor to automatically generate a predicate for v:Person in the
default graph, if there was no prefix for 'v' (as Toby is saying), it
would be perfectly acceptable to assume a default predicate, as long
as the triple is in some other graph.

(Of course this theoretical RDFa processor would need to provide a way
to get at this named graph, but that would simply be a case of
documenting this extension feature in its API.)

Unfortunately, if we all start relying on our own set of default
prefixes, and encouraging people to produce documents that have
practically nothing in the default graph, and everything in some
second graph, then we're not going to be able to process each others
data in a generic way. :)

But that's the next step. The key thing for now is to do what you have
done, and encourage implementers to get the default graph exactly
right.

Regards,

Mark

-- 
Mark Birbeck, webBackplane

mark.birbeck@webBackplane.com

http://webBackplane.com/mark-birbeck

webBackplane is a trading name of Backplane Ltd. (company number
05972288, registered office: 2nd Floor, 69/85 Tabernacle Street,
London, EC2A 4RR)
Received on Wednesday, 16 September 2009 20:25:44 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 16 September 2009 20:25:45 GMT