Re: High-level comments on RDFa Syntax Document from Mark Birbeck on 2008-01-06 (public-rdf-in-xhtml-tf@w3.org from January 2008)

From: Mark Birbeck <mark.birbeck@x-port.net>
Date: Sun, 6 Jan 2008 21:30:45 +0000
To: "Manu Sporny" <msporny@digitalbazaar.com>
Cc: "RDFa mailing list" <public-rdf-in-xhtml-tf@w3.org>
Message-ID: <a707f8300801061330h1c720f10q952c162d0e9e954f@mail.gmail.com>
Hi Manu,

Thanks for an excellent review.

> I spent a good deal of time looking at the processing model and found a
> couple of minor issues with it.
>
> Issue #1:
> The biggest one was in how the "new subject"/"current subject" is set
> and used for completing incomplete triples (hanging rels). I think there
> are some nasty side-effects in the way "new subject" is set and used to
> initialize "current subject". Most notably, it looks like this:
>
> <div resource="#betty" rel="foaf:knows">
>    <span resource="#fred"></span>
> </div>
>
> generates the following triple:
>
> <#betty> <foaf:knows> <#fred> .

No, this still generates:

  <> <foaf:knows> <#betty> .

as usual. The key to whether an attribute is a subject or an object
(or both) is its relationship to other resources. This is unavoidable
if we are to do proper chaining.

Start with the simplest example:

  <div about="#betty" rel="foaf:knows" resource="#fred"></div>

Now say that we want to do some chaining:

  <div about="#betty" rel="foaf:knows" resource="#fred">
    <div rel="foaf:knows" resource="#manu"></div>
  </div>

Since we want to be able to support all sorts of cut-and-paste
permutations, we allows support for 'incomplete triples', so that the
following syntax has the same meaning:

  <div about="#betty" rel="foaf:knows">
    <div about="#fred" rel="foaf:knows" resource="#manu"></div>
  </div>

That's nice, because if we go back to the beginning:

  <div about="#betty" rel="foaf:knows" resource="#fred"></div>

we could have simply wrapped this with another statement and all would
have been well:

  <div about="#manu" rel="foaf:knows">
    <div about="#betty" rel="foaf:knows" resource="#fred"></div>
  </div>

Powerful cut-and-paste features, in my view.

Now, let's make the hierarchy clearer:

  <div about="#betty" rel="foaf:knows">
    <div resource="#fred"></div>
  </div>

Again, no problem there. But what if I cut-and-paste a relationship
between Fred and you:

  <div about="#betty" rel="foaf:knows">
    <div resource="#fred">
      <div rel="foaf:knows" resource="#manu"></div>
    </div>
  </div>

Since the whole point of chaining is that a resource can be both a
subject and object at certain times, there is no reason that this
should no be parsed, and the middle @resource is both an object and a
subject. But thanks to the power of cut-and-paste, we're left with the
possibility that the author may remove the outer statement concerning
Betty, which would leave:

  <div resource="#fred">
    <div rel="foaf:knows" resource="#manu"></div>
  </div>

In my view that should still be valid. (And in fact if it isn't, the
whole chaining thing falls down.)

Note that the key to the whole thing is the presence of @rel or @rev
on the same element as an attribute.


>  Issue #2:
> The "recurse" flag is disabled after a triple is generated. This makes
> the parser stop entirely when the first triple is generated - which
> isn't what we want...

It's not quite as bad as you say, but you are right that there is a
flaw in the logic--thanks for spotting it. :)

My thinking was that since it doesn't make any difference if you
unconditionally switch off recursion in all of the following
situations:

  <div property="dc:title">
    E = mc<sup>2</sup>: The Most Urgent Problem of Our Time
  </div>

  <div property="dc:title" datatype="rdf:XMLLiteral">King Lear</div>


  <div property="dc:title">Macbeth</div>

then we might as well turn off [recurse] whenever there is a
[property] value. Unfortunately, recursion should NOT be turned off if
the object literal was obtained via @content, so I'll fix that,
thanks.


> Issue #3:
> @instanceof generates a new bnode, even if @about is present? That's how
> I interpreted the processing rules (see HTML file).

Do you mean because I haven't stressed that only one of the rules
applies? If so, I think that's the same point that Ivan raised, and
should be resolved now.


> -----------------------------------------------------------------------
> There were several detail-oriented things that confused me in Section 7,
> the CURIE specification (see HTML file).
> -----------------------------------------------------------------------
> The dbpedia namespace should have a more verbose namespace abbreviation,
> some might confuse p: with a property on the <p> HTML tag in the examples.
>
> p: http://dbpedia.org/property/
>
> Perhaps the following should be used:
>
> dbp: http://dbpedia.org/property/
>
> Or:
>
>  ped: http://dbpedia.org/property/
> dbpedia:  http://dbpedia.org/property/

Fair point. I've gone for 'dbp'.


> There is a lot of material about RDF on the web, and a growing range of tools
> that will support RDFa...

Done.

> Some open technical issues are also identified with the same markup. These
> include an open issue on the interpretation of @instanceof when @about is
> present, and  These include the handling of some unprefixed @rel and @rev
> values, whether @src sets the subject, and whether @instanceof can apply
> to @resource.

The unprefixed @rel/@rev issue has been resolved, and I thought @src
had too. I've added the @resource one that you mention, though.

>  [It's also a gigantic pain in the ass to author RDF/XML by hand...]

:) Would you like a comment to that effect added? Although I obviously
think that XHTML+RDFa is easier to code than RDF/XML, I hadn't put
that in since it seems like a value-laden observation. What do others
think?


> @src [ This is currently under debate, @src might be used to set the subject -
> should be marked via an editor's note ]

I thought this was resolved, although Ivan's view seemed to be that
@src is no longer an object, which is different to how I perceived it.

> <html
>   xmlns="
> http://www.w3.org/1999/xhtml"
>   xmlns:bib="http://example.org/"
> [This should be xmlns:biblio=http://example.org/biblio/0.1/ to match the URL
> provided later in the document]


You and Ivan both have eagle-eyes! Thanks.


> ...and the RDF Sytax Document [RDF-SYNTAX].

Done.

> URIs are most commonly used to identify web pages, but RDF makes use of them
> as a way to provide unique identifiers for concepts. For example, we could identify
> the subject of all of our statements (the first part of each triple) by using the
> DBPedia [?ref] [Where's this reference?] URI for Albert Einstein, instead of the
> ambiguous string 'Albert':

Added.

> Here 'p:' has been mapped to the URI for DBPedia, and 'foaf:' has been mapped
> to the URI for the 'Friend of a Friend' taxonomy.
> [p: is too short and could be confusing - use dbp: instead]

Done.

> There MUST be a DOCTYPE declaration in the document prior to the root element.
> If present, the public identifier included in the DOCTYPE declaration must reference
> the DTD found in Appendix B - XHTML+RDFa Document Type Definition using its
> Public Identifier. The system identifier may  [Should this be MAY] be modified
> appropriately. [Is the DOCTYPE strictly required, I thought we discussed that it
> SHOULD be there, not MUST be there... what if someone wants to cut/paste a
> snipped of RDFa into their HTML document?]

This is a tricky one. To be fully XHTML conformant the DOCTYPE is
needed, but that doesn't mean that some processor couldn't make use of
a document that doesn't contain the DOCTYPE. This may need further
discussion though, if it is confusing.


>  For further information on using media types with XHTML family markup languages,
> see the informative note [XHTMLMIME]. [Just curious - why are we using this
> instread of "application/xhtml+rdfa"? Isn't xhtml+xml sort of redundant? I'm guessing
> that it's because browsers wouldn't recognize xhtml+rdfa?]

>From the XHTML 2 Working Group side of things, XHTML now includes
RDFa, so there would be no distinction.


> A conforming RDFa Processor MUST make available to a consuming application a
> single RDF [graph] containing all possible triples generated by using the rules in the
> Processing Model section. This is the 'default [graph]'. [Should this be [default
> graph]?]

I don't think so; 'graph' is a defined term in RDF, which is what I'm
trying to highlight here. However, I don't think there is a notion of
'default graph', except perhaps in SPARQL.

> if @instanceof is present  and @about is not present, then [new subject] is set to be
> a [bnode];

As per Ivan's comments I've tried to clarify that in this block the
first matching rule applies.

> by using @resource, if present. ... [I thought @resource cannot set the subject on
> the current element, which is what effectively happens in the next step. I thought
> @resource could only set the RDF object, as stated earlier in the document in
> Section 2.1. Should this be called [chained object]?]

I agree that the wording in 2.1 needs tightening up a little, but I
believe this rule to be the only way to be consistent with chaining.

> if [new subject] was set to a non-null value in the previous step, it is now used to:
>
> complete any incomplete triples;
> furnish a new value for [current subject]. [Doesn't this mean that this:
> <div resource="#betty" rel="foaf:knows"><div resource="#jack"></div></div>
> would generate: <#betty> <foaf:knows> <#jack> . -- We don't want that, do we?]

No. The key is the presence of @rel or @rev. (See also the notes at
top of email.)


> ... If [direction] is 'forward' then the following triple is generated: [There should be
> a clear distinction between [new subject] and [chained object] -- otherwise we end
> up with the resource generating triples, as shown above.]

See notes at the top.


> subject  the [current subject]  predicate  the predicate from the iterated incomplete
> triple  object  [new subject]
> If [direction] is 'reverse' not 'forward' then this is the triple generated:

I've been going backwards and forwards between using:

  [forwards] == true

and:

  [direction] == 'forwards'

Either way, I generally prefer to have one 'true' condition, and then
a negation to express the opposite condition.

> Once all 'incomplete triples' have been resolved, [current subject] is set to [new
> subject]. [This is problematic, see comments about [chained object] above...]

See above.


> that is not present, @src is used [It is currently under debate as to whether @src
> should set the subject or the object] . If none of these are present but @rel or
> @rev is present, then [current object resource] is set to null.

Well...my point is that it could set both, depending on its position.


> ...a string created by concatenating the text nodes and inner content of each of
> the child elements in turn, of the [current element]. The final string includes the
> datatype, as described in [RDFCONCEPTS].

Good point. I need to do a bit more on this anyway, though, since we
agreed that we're going to use the XPath wording.


> Once object resolution is complete, the processor will have two objects, one
> a resource and the other a literal

Done.


> ...Once the triple has been created, the [recurse] flag is set to false. [If the recurse
> flag is set to false at this point, no other triples will be generated from child elements,
> correct? This isn't what we want, is it? I thought we wanted to disable the "recurse"
> flag only when [XML Literal] was the datatype of the current object?]

See notes above.


> If the [recurse] flag is true, all elements that are children of the [current element] are
> processed using the rules described here [But the recurse flag is never true at this
> point, it is always false after a triple is generated!].

Only if there was an object literal, so it's not _quite_ as bad as you
think. But you are right that some 'correct' use-cases won't get
processed properly with my current rules, such as the nested <div> in
this example:

  <div
    about="#manu"
      property="foaf:name" content="Manu"
      rel="foaf:knows"
  >
    <div about="#mark" />
  </div>

I.e., the presence of @property inhibits further processing, even
though the object literal is provided by @content, rather than the
element content. (The same goes for the use of @datatype="".)


>  dbp: http://dbpedia.org/property/

Done.

> @instanceof is unique in that it sets both a predicate and an object at the same
> time, and inline content might set an object if @content is not present, but
> @property is [I thought @instanceof only applied to @about?].

This seems to be a big source of confusion. In my motivation for
@instanceof to behave in the way I proposed, I was trying to argue
that it should apply to the subject of a triple, and then separately
we'd have rules that set the subject. So yes, @about can set the
subject, but so could @src, @resource or @href when they occur on
their own. (This is why I tried to establish the chaining rules before
clarifying the behaviour of @instanceof in the previous long debates.)


> <p about="#bbq" instanceof="cal:Vevent">[Should @instanceof be highlighted
> in red here too?]

Done.


> As described in the previous two sections, @about will always take precedence and
> mark a new subject, but if no @about value is available then @instanceof will do the
> same job, although using an implied identifier, or bnode. [This is a bit confusing... do
> you mean "using an implied identifier, which is a bnode", here? If so, couldn't you
> just say "using a bnode"?]

How about "i.e., a bnode"? I'd like to keep the word 'implied' because
I'm trying to draw attention to the commonality between setting an
identifier _explicitly_ and setting one _implicitly_.


> In this situation, all statements that are 'contained' by the object resource
> representing Germany (the value in @resource) will have the same subject, making
> it easy for authors to add additional statements: [While I agree with allowing the
> author to do this, I don't see how we prevent the following from happening: <div
> resource="#betty" rel="foaf:knows"><span resource="#phil"></span></div> -- and
> I'm pretty sure we never talked about allowing something like this to happen...
> maybe I wasn't there for that discussion?]

There is no problem here; @resource will be an object when @rel is
present, so all you'll get is this:

  <> foaf:knows <#betty> .

Also, the inner @resource will have no effect.


> Note also that the same principle described here applies to @src and @href.
> [So doesn't this mean we can also do: <div href="#betty" rel="foaf:knows"><span
> href="#phil"></span></div> and it would generate <#betty> <foaf:knows> <#phil>
> . ?]

No. It would generate:

  <> foaf:knows <#betty> .

as usual. The key is the presence of @rel or @rev.


> In this example there is This example starts with one incomplete triple:

I'm not quite sure what you're getting at here. I've left it unchanged
for now, but feel free to explain further.


> For example, when @instanceof creates a new bnode (as described above), that will
> be used to complete any incomplete triples' [Is that trailing ' gramatically correct - I
> don't know...].

Eagle-eyes... :) Actually the problem is a missing apostrophe in front
of the word 'incomplete', but well spotted.

Thanks again for a very thorough review, Manu, it's much appreciated.

Regards,

Mark
-- 
  Mark Birbeck, formsPlayer

  mark.birbeck@formsPlayer.com | +44 (0) 20 7689 9232
  http://www.formsPlayer.com |  http://internet-apps.blogspot.com

  standards. innovation.
Received on Sunday, 6 January 2008 21:30:52 UTC