ACTION: MarkB to email TF with his bnode proposal

Hello everyone,

One of the remaining issues with XHTML 2 and its metadata story is how to
handle bnodes. The current proposal in RDF/A is to use an XPointer
mechanism, but it has been described as not being aesthetically pleasing.
Obviously, that's a matter of taste, but perhaps if we have a couple more
proposals to mull over, it may help us to reach an acceptable decision.

First, I'll recap what we want to achieve.


ANONYMOUS NODES
As you all know, in RDF you can make statements about anonymous nodes by
using a bnode. In terms of RDFCONCEPTS, no limit is placed on what form this
bnode can take, other than it must not be from the set of string literals,
and also not from the set of URIs.

To remind ourselves why we would want to do this; we have an anonymous node
that has a property of an email address "mailto:mark.birbeck@x-port.net" and
another anonymous node that has a property of an email address of
"mailto:Steven.Pemberton@cwi.nl" and we want to say that the two people
identified by the email addresses know each other. This:

 _:a foaf:knows _:b

would do the trick.


BNODES IN XHTML 2
When tackling problems like this, I've tended to look at some syntax and
then ask what would an we and HTML authors 'expect' it to mean. So, let's
begin with the CC example given at the end of the current RDF/A draft [1]:

<p>
  This document is licensed under a
  <a rel="cc:license" href="http://creativecommons.org/licenses/by-sa/2.0/">
      Creative Commons License
  </a>
  which, among other things, requires that you provide 
  attribution to the author,
  <a rel="dc:creator" href="http://ben.adida.net">Ben Adida</a>.
</p>

NOTE: RDF/A has changed the inheritance rules, so the current draft is out
of date. In the new syntax, this fragment is making statements about *the
document* and not the <p> -- so ignore the prose just after the example.
      

Now, we can easily make further statements about Ben in our document. There
are a number of ways to do it, but one is this:

  <a rel="dc:creator" href="http://ben.adida.net">Ben Adida</a>.

  <meta about="http://ben.adida.net" property="foaf:name"
    content="Ben Adida" />

However, the big question is, what would we expect to be the meaning of a
similar structure that used fragment identifiers:

  <a rel="dc:creator" href="#ben">Ben Adida</a>.

  <meta id="#ben" property="foaf:name"
    content="Ben Adida" />

I think it's clear that '#ben' is an anonymous node, and so intuitively @id
is actually the equivalent of @rdf:nodeID. If this were not the case, then
we would be saying that the creator of the document were an HTML node, and
that HTML node had a foaf:name of "Ben Adida".

An interesting thing here is that this is conveniently what a non-RDF person
would most likely understand this to mean anyway -- that the document was
created by a thing that has the name "Ben Adida". They wouldn't necessarily
insert 'thing' in there, but it's extremely unlikely that most authors would
not think that whatever it was that created this document, it certainly had
the name "Ben".

So we've effectively 'slipped in' anonymous nodes without any real trouble.
And note that this is *not* what would happen if we introduced some special
way of naming anonymous nodes -- such as @bnodeID, or my XPointer proposal
-- since then you have to explain why an anonymous node is different to
another node. (I'll discuss this a little more below, but I'm effectively
saying that the exception is to 'name' a node, not to make it anonymous.)


RDF/XML
As it happens, we have effectively mirrored the RDF/XML syntax. IN RDF/XML
you have two ways to name a node:

 * use @rdf:about or its abbreviated form, @rdf:ID;
 * use @rdf:nodeID.

However, since RDF/XML uses striping, the attribute rdf:nodeID is used for
both the subject and object -- this is not possible in RDF/A. We would
therefore need to say something like:

 * @id is equivalent to @rdf:nodeID as the subject;
 * @href with a fragment identifier is equivalent to
   @rdf:nodeID as the object.

(The second bullet is qualified, below.)


ANONYMOUS OR PARTLY ANONYMOUS?
If we accept that this is a better *syntax* than we currently have, the only
issue that remains is what exactly should be serialised. There are two
straightforward choices:

 * serialise the URIs 'as is';
 * do some conversion with a 'bnode formula'.

I favour the second solution, for reasons I'll explain.


SERIALISE 'AS IS'
One possibility is that the triples generated by a document with @id used in
the statements are simply serialised 'as is'. This means in effect there are
no bnodes. Every node with an @id becomes a fully referenceable item from
other locations. Putting aside the philosophical points from the RDF
standpoint, there are actually quite fundamental problems with doing this,
which I'll explain in a moment.


CONVERT TO BNODES
The second solution is, on serialisation, to convert the @id values (and any
@href with a fragment identifier) to bnodes. The consequence is that once in
a triple store, no other triples outside of the original document could
refer to this data -- it would really be anonymous.

Note that there is nothing to stop someone referring to this @id from
another set of triples. However, by taking the second approach (converting
@id to bnodes) we ensure that this external reference is actually about the
HTML node and not Ben. For example, it would be legitimate for some editing
software to say that this node was created on Friday, but by making the node
anonymous we don't end up with the problem that we now have a set of triples
that say that Ben was created by some software on Friday.


NAMED NODES
So, the only bit missing is if I really did want to allow people to make
statements about my statements. What if I wanted to actually say that this
really is the definitive location for Ben's data? I would say that we
already have a mechanism, and that is to use @about:

  <a rel="dc:creator" href="#ben">Ben Adida</a>.

  <meta about="#ben" property="foaf:name"
    content="Ben Adida" />

We are therefore asking authors to *explicitly* say that they want their
data to be referenceable from external statements, which I believe is
actually more 'correct' for an HTML author.

To put it a different way, whilst the RDF author makes all their data
accessible, but uses rdf:nodeID when they want to 'hide it', the HTML author
who knows nothing of RDF is by default creating 'local' data. It's only if
they want to 'publish' their data that they need to understand a little more
about RDF, and move from @id, to @about.

NOTE: For this to work, we need to be able to distinguish between @href
pointing to an anonymous node, and @href pointing to a named node when
serialising. However, I think this is possible, and in fact, the only check
you need to do is whether an @id exists with that name. If there is no @id,
then we serialise as normal, even if there is no @about with that @id.

NOTE; We'd need to decide whether @id and @about can exist on the same
element. At first sight it looks like it would be best if they didn't.


TAG AND URIs
Finally, as I said on the call, I don't believe that munging the URIs into
bnodes breaks anything at the TAG level. The URI for the *HTML node* is
still intact, and @id is still being used in the usual way. What we are
saying is that at the level of addressing HTML nodes, @id retains its
current and accepted use, but at the level of naming concepts (or metadata
for serialisation) we are saying that it does not have the same use, and in
fact, I would say that it *cannot* have its current use without causing the
"Ben was created by GoLive on Friday" type of problems.

Regards,

Mark

[1] <http://www.w3.org/MarkUp/2004/rdf-a.html#div248219168>


Mark Birbeck
CEO
x-port.net Ltd.

e: Mark.Birbeck@x-port.net
t: +44 (0) 20 7689 9232
w: http://www.formsPlayer.com/
b: http://internet-apps.blogspot.com/

Download our XForms processor from
http://www.formsPlayer.com/

Received on Wednesday, 13 April 2005 14:01:57 UTC