Once more on bnodes...

Hi all,

As Steven promised, the following are my attempts to make some progress on
what is hopefully the only remaining issue in our RDF-in-XHTML2 story.

As it stands the problem is that we need bnodes, but we don't quite know how
to mark them up. Why do we need them? I would say that it's because we don't
want people accidentally making statements about things that they shouldn't
be making statements about. There may be a more 'correct' set of RDF
terminology to explain this, but I'll give you my understanding, which is if
I say this:

  _:a dc:creator "Mark Birbeck"

there is no way that someone else can say anything about _:a since they
cannot generate a reference to my anonymous nodes.

If we move to the world of XML, how is this requirement addressed? In
RDF/XML it's addressed by using the attribute nodeID, either as a subject or
an object. Since RDF/XML is 'striped' then there is never an ambiguity about
whether we are dealing with a subject or an object so we only need one
attribute.

In XHTML 2 it's not yet addressed, but it's clear that if we were to use
attributes to solve it, we would need two of them, since the syntax for
carrying RDF in XHTML 2 allows one element to be used to represent an entire
triple (we therefore need to differentiate between subjects and objects).

I have gone very deeply into a number of possible solutions, and will
re-present them now. The first one I proposed publicly was the XPointer
solution, although it wasn't obviously the first one I investigated--that
was simply having two attributes. The next solution I proposed was making
all @ids anonymous. I'll look at all three now:

  * two attributes method;
  * 'reverse-@id' method;
  * XPointer method.



TWO ATTRIBUTES
This appears to be the easiest, and short of acceptance of anything else, is
probably acceptable to myself and Steven. The solution would be to mirror
RDF/XML in having an attribute that indicates bnodes, but as pointed out
earlier, we would need two of them (ignore the names in the examples below).
For example, a statement about an anonymous node:

  <meta subjectbnode="a" property="dc:creator">Mark Birbeck</meta>

  _:a dc:creator "Mark Birbeck" .


and a statement using an anonymous node as an object:

  <link rel="dc:creator" objectbnode="b" />

  <> dc:creator _:b .


However, given it's so easy, why have we spent so much time and effort
trying to find a better solution? The main reason is that it doesn't really
feel right to augment XHTML 2 with something that is actually relevant to
the domain of RDF. That probably sounds wrong, since we have provided a
means to carry RDF within XHTML 2! But the difference is that everything we
have done along this road in the last year or more has been to try and find
ways for the *ordinary* usage of XHTML to produce RDF. That's why we have
leveraged <meta>, <link> and @rel--we haven't invented new tags and
attributes.

The fact that some nodes should not be addressable outside of the document
feels to me like a problem to do with triple stores and the use of URIs as
the identifiers for the subjects of statements, and so if there is a
solution to the problem that is outside of XHTML 2 then that is to be
preferred.

If we have to add two new attributes, we will, but I'll also comment a
little more on whether this solution is really as good as people think. For
a start, we are asking for XML to have two ways of naming things, dependent
on how they will be referred to. This isn't the case in RDF/XML because it
makes no claims other than being a serialisation for RDF. There is no XML
@id for example, so there is no confusion or mixing up of levels. However,
XHTML is not primarily a serialisation language for RDF, but a language that
carries semantics which we hope to be able to easily map to RDF. It *does*
have a way of naming nodes already, and it seems very odd to introduce two
ways of naming those nodes.



REVERSE-@ID
Another proposal I posted to this list was the idea that we flip everything
on its head, and any reference to a node in the document was *always*
anonymous. This would give us this:

  <link rel="dc:creator" href="#b" />
  <span id="b">...</span>

  <> dc:creator _:b .

A serialiser would simply 'know' that fragments referring to named nodes are
actually references to anonymous nodes. This would mean that the serialiser
would have to look at the target of the statement to see what was being
referred to (although actually that's just an ID look-up).

Note that this does not stop other people making statements about my
elements--it simply stops your statements and mine running in together. So
if the Creative Commons document has a key paragraph:

  <div id="keyp">
    <meta property="dc:creator">John Doe</meta>
  </div>

we can all make references to it:

  <link rel="cc:xx" href="http://...#keyp" />

but they will not be merged with any statements made inside the target
document:

  _:keyp dc:creator "John Doe" .
  <http://example.com/mydoc> cc:xx <http://...#keyp> .

The idea was that if an author really wanted their statements to 'merge'
with the rest of the worlds' statements, then they would do this:

  <div about="#keyp">
    <meta property="dc:creator">John Doe</meta>
  </div>

  <http://...#keyp> dc:creator "John Doe" .
  <http://example.com/mydoc> cc:xx <http://...#keyp> .

One of the key advantages of this is that the author is generally creating
'local' metadata by default. I say 'generally' because the metadata about
the document is still 'global', and will 'merge' with other data:

  <head>
    <meta property="dc:creator">Mark Birbeck</meta>
  </head>

It's just now, any use of 'second-level' metadata will be 'local' unless
made explicit.

Another advantage of this technique is that it distinguishes between making
further statements about the same subject, and making further statements
about an HTML node. For example, these are all statements about the same
thing:

  <div about="#keyp">
    <meta property="dc:creator">John Doe</meta>
  </div>
  ...
  <div about="#keyp">
    <meta property="dc:date">2005-05-31</meta>
  </div>

And this is a statement about the mark-up:

  <div id="a" about="#keyp">
    <meta property="dc:creator">John Doe</meta>
  </div>
  ...
  <div about="#a">
    <meta property="dc:creator">Jane Doe</meta>
  </div>

This says that the actual license (#keyp) was updated by John Doe, and the
node in the HTML document (#a) that carries this meta-information, was
updated by Jane Doe.

However, the big drawback is that we want to use @id to identify the target
of a link traversal in XHTML, at the same time as allowing authors to make
statements about the target:

  <a rel="cc:xx" href="http://...#keyp">license</a>

This still works, but over in the document itself, the author is unable to
make any statements about the thing being linked to, since the presence of
@id makes all of their further statements about the anonymous node. To put
it a different way, you can't use the most basic of the XHTML document
cross-referencing mechanisms, at the same time as making publicly available
statements about your nodes.



XPOINTER
Which led me back to my first proposal, in the draft of RDF/A--the XPointer
one! The syntax is like this:

  <link rel="dc:creator" href="#bnode('b')" />

  <> dc:creator _:b .

Although the XPointer expression has the form of a URI, the XPointer should
first be de-referenced, so what is actually stored in the triple store is
the *result* of executing the XPointer function, not this unaltered URI.
It's essentially a 'request' to the serialiser to come up with a different
URI than the one it would come up with ordinarily.

People have objected to it on "aesthetic" grounds, and on the basis that it
creates confusion. Of course the aesthetic side is obviously going to be a
matter of taste, although I will say that using an XPointer scheme does have
the merit of taking the problem out of the realm of mark-up, and recognises
the whole process for what it is--a request to the serialiser to 'cloak' the
URI. It also saves inventing more attributes, which I think we should try to
avoid if we can; at the moment incorporating RDF/A into XHTML 2 involves
small, incremental changes, but introducing two new attributes creates the
possibility of confusion even amongst those that don't even need to use the
new attributes.

As to the comments about confusion, I have to say that I don't agree.
Firstly, I think the people who will usually be using bnodes will easily
understand this. And secondly, the confusion would be with RDF/XML, and not
RDF 'abstract', and I don't feel that there is an obligation to provide a
continuity.



CONCLUSION
My original arguments for the XPointer approach were based on a lot of work
to try and find a solution, and having tried a number of alternatives to
accommodate various objections to it, I'm afraid I've been repeatedly drawn
back to it. I'd recommend that people reflect on the issues a little, and
then we just adopt (flip a coin? ;)) one or other solution.

Regards,

Mark


Mark Birbeck
CEO
x-port.net Ltd.

e: Mark.Birbeck@x-port.net
t: +44 (0) 20 7689 9232
w: http://www.formsPlayer.com/
b: http://internet-apps.blogspot.com/

Download our XForms processor from
http://www.formsPlayer.com/

Received on Monday, 6 June 2005 11:23:38 UTC