The implied @about="": Explanation and some problems

Hello all,

I have an (old) action item to explain why we have the rule for
setting @about on head/body. I'll explain it, and then also flag up
some problems with it.

The background is this; say you navigate to the following URL in your browser:

  <http://a.b/c/d.e#f>

You don't want the RDFa parser to use that full URL for generating
triples, because it means you'll get a different set of triples
depending on how you navigate to that page:

  <http://a.b/c/d.e#g>
  <http://a.b/c/d.e#h>
  <http://a.b/c/d.e#i>

etc.

So instead, we want to say that any fragment identifiers are removed
when using the URL as a subject. However, creating such a rule is a
little protocol-specific -- the document could in principle come from
anywhere -- so instead I added rules that effectively coerce the first
subject to being an absolute URL, created from the relative URL of "".

The reason this works is this. Say you have an algorithm for creating
an absolute URI, which takes a base path and the path to convert:

  makeAbsolute(base, uri)

If you now feed this function the base URI from our example, and the
relative path of "", then the following *must* be true, according to
RFC 3986:

  makeAbsolute("http://a.b/c/d.e#g", "") == "http://a.b/c/d.e"

In other words, saying that there is an implied @about="" becomes a
protocol independent way of tidying up the URL.

(As it happens, a parser is more likely to do this:

  baseURI = makeAbsolute("http://a.b/c/d.e#g", "")

  subjectURI =  makeAbsolute(baseURI, "")

so the 'tidying up' was done on the base URL, but the effect is the same.)

Now, although the principle seems sound, the rules defined in order to
achieve it might be causing some problems. The first is one that I
think was flagged up by Ivan a while ago, but I'll list it here to jog
your memories; if you put a subject onto the root (most likely the
HTML element), then your subject gets overridden when parsing hits the
<head> and <body>:

  <html about="http://somewhereelse.com/">
    ...
  </html>

Perhaps we could just live with that, but advise people that if they
want to do this they should really be using <base>. But either way,
it's still a quirk.

The second issue is the use of @typeof on <body> or <head>; I have a
vague recollection this also came up in Ivan's example, but I might be
wrong, but either way, it also came up for me today when I was asked
to check someone's RDFa documents. They have this in their document:

  <body typeof="foaf:Document">
    ...
  </body>

My parser gave this triple:

  <> a <http://xmlns.com/foaf/0.1/Document> .

and since I was expecting the subject to be a bnode, I assumed there
was a bug in my parser. However, looking at the spec I see that we do
indeed place the 'implied @about' at a higher level than the bnode:

  4. If the [current element] contains no @rel or @rev attribute, then the next
      step is to establish a value for [new subject]. Any of the
attributes that can
      carry a resource can set [new subject];

      [new subject] is set to the URI obtained from the first match
from the following
      rules:

        @about...@src...@resource...@href, etc.;

      If no URI is provided by a resource attribute, then the first
match from the
      following rules will apply:

        if the element is the head or body element then act as if
there is an empty
        @about present, and process it according to the rule for @about, above;

        if @typeof is present, obtained according to the section on
CURIE and URI
        Processing, then [new subject] is set to be a newly created [bnode].

        otherwise, if [parent object] is present, [new subject] is set
to the value of
        [parent object]. Additionally, if @property is not present
then the [skip
        element] flag is set to 'true';

It's these last three rules that we're focusing on.

I would argue that since the intention of setting @about="" on
head/body was simply to 'tidy up' the initial subject so that it
didn't have any fragment identifiers, then the rule that achieves this
should only be applied if the subject wasn't set in any other way.

This would be easily achieved if we moved the rule to the end of the
group of three quoted above. That would solve both problems mentioned
at the top, because:

  * if @typeof is used on <head> or <body> then a bnode is created,
making it consistent
    with processing in other situations;

  * if there is a parent subject (i.e., on <html>) then that is used,
and no 'implied @about'
    is needed.

You could argue that this then removes an easy way to indicate the
type of the *document*, but I think the answer to that is that the
reordered rules would simply force you to be explicit; if you want to
set the type of the document rather than generating a bnode, then you
would simply do this:

  <body about="" typeof="foaf:Document">
    ...
  </body>


I realise changing the spec or issuing an errata is not something that
can be taken lightly, so this email is primarily about completing my
action item to explain what the implied @about was all about.

Then I suppose the next step would be to see whether we just live with
the quirks that we have, or whether we want to tidy them up. And if
so, we should probably try to find out if anyone is actually producing
documents with @typeof on <body> or <head>, and if they are, what's
the effect they are trying to achieve.

Regards,

Mark

-- 
Mark Birbeck, webBackplane

mark.birbeck@webBackplane.com

http://webBackplane.com/mark-birbeck

webBackplane is a trading name of Backplane Ltd. (company number
05972288, registered office: 2nd Floor, 69/85 Tabernacle Street,
London, EC2A 4RR)

Received on Wednesday, 1 April 2009 09:16:24 UTC