Re: Comments on RDF/A spec from Jeremy Carroll on 2005-10-27 (public-rdf-in-xhtml-tf@w3.org from October 2005)

From: Jeremy Carroll <jjc@hpl.hp.com>
Date: Thu, 27 Oct 2005 12:00:52 +0100
To: Ben Adida <ben@mit.edu>
CC: public-rdf-in-xhtml-tf@w3.org
Message-ID: <4360B364.5040502@hpl.hp.com>
Inline comments - with some of your text snipped to reduce length

Ben Adida wrote:
>
>> Looks very good. Fixes the inheritance problems of last year's version.
>> Although, with this certain idioms might become a bit wordy (e.g. an 
>> object consisting
>> of a bnode with properties hanging off it, now is best marked up as 
>> explicit triples, one after the other, with no nesting). No change 
>> suggested.
>
> Yes, indeed, could be a bit wordy. I'm trying to partially address 
> that with predicate inheritance under specific circumstances:
>
> http://www.w3.org/2001/sw/BestPractices/HTML/2005-current-issues#predicate-inheritance 
>
>
(long comment - sorry)
Perhaps overly influenced by both RDF/XML and N3 I feel that predicate 
inheritance is not a common usage pattern, and so think any additional 
complexity is probably not worth it. (Trade off between difficulty in 
learning the rules and ease of writing the syntax)

I think Mark's rule of thumb that an HTML author should write would they 
would write and it should mean what they expect it to mean is a good 
one; and that gives us the basics. I think the big improvement in this 
version from last year's version reflects that efforts to second guess 
what people might think more complex structures might mean were probably 
misguided. i.e. overall additional complexity is unhelpful.

Two very basic patterns supported, and used in almost every document, by 
both N3 and RDF/XML are:

1. Making multiple statements with the same subject
2. Making multiple statements about the uri/blank object of some other 
statement.

(2 being a special case of 1)

RDF/A supports 1 but not 2. I think it is OK to take the position that 2 
is just a special case of 1, and the extra overhead in supporting 2 is 
not worth it. Also RDF/A supports @rev that does allow an idiom for case 2.

example:

RDF/XML

<rdf:Description rdf:about="&eg;doc">
<dc:creator>
<rdf:Description>
<ex:creatorType rdf:resource="&ex;Illustrator"/>
<vcard:FullName>Pablo Picasso</vcard:FullName>#
</rdf:Description>
</dc:creator>
</rdf:Description>


Triples:


eg:doc dc:creator _:b .
_:b ex:creatorType ex:Illustrator .
_:b vcard:FullName "Pablo Picasso" .

RDF/A

<p>
<link href="[eg:doc]" rev="[dc:creator]"/>
<link href="[ex:Illustrator]" rel="[ex:creatorType]"/>
<meta property="[vcard:FullName]" content="Pablo Picasso"/>
</p>



Which is sort of OK, but has following weaknesses:
a) the bnode in some sense corresponds to the <p> element in the RDF/A.
This has no motivation and is merely an artifact of the serialization
b) The current RDF/A syntax does not permit us to use a similar construct
in which the bnode _:b sits on a meta or link element (although we
could repeat it on each of the three elements)
This is because a form such as:
<link>
<link href="[eg:doc]" rev="[dc:creator]"/>
<link href="[ex:Illustrator]" rel="[ex:creatorType]"/>
<meta property="[vcard:FullName]" content="Pablo Picasso"/>
</link>
seems to be making a description about reifications of an empty set of 
triples generated by the <link> element (I am assuming that the 
reification rules apply to each of the triples generated by the parent 
<link> or <meta>, although the document only gives examples and 
description when there is exactly one such triple)

A further RDF/A version, which allows inline "Pablo Picasso" would be:

<p about="[_:b]">
<link href="[eg:doc]" rev="[dc:creator]"/>
<link href="[ex:Illustrator]" rel="[ex:creatorType]"/>
<span property="[vcard:FullName]">Pablo Picasso</span>
</p>

although with that the object is an XMLLiteral rather than a plain one.


>> At least one issue not on list:
>> - language tags in XML Literals, see comment 7 below.
>>
>> 1) encoding
>
> should be fixed with the new document:
> http://www.w3.org/2001/sw/BestPractices/HTML/2005-rdfa-syntax

Works for me.
>
>> 5) 5.1.2.1
>> Minor comment: it is possible to use rdf:XMLLiteral and content 
>> attribute. However an exmaple is hard to construct, more later, 
>> possibly much later.
>
> okay, I'll wait for your example to do something here, but will mark 
> an issue.
>
Let's put that as an action on me, it is not necessary for f2f 
discussion though, so I will probably do it after.
>> 6) Typed literals
>> The document seems to only allow typed literals with content attribute
>> I think we can also permit typed literals with lexical form given by 
>> the concatenation of the text() descendents of the element.
>
> can you say a bit more about this?

OK.

Suggested rule:

If there is a property attribute, then a triple is generated, and the 
object is formed by following rules:

1) If there is a datatype attribute then the resulting literal is a 
typed literal with that type:
and
1.1) If there is a content attribute that is the lexical form of the 
typed literal
1.2) else the lexical form of the typed literal is the string value of 
the xpath expression:
descendant::text() [I think that is correctly expressed, but it may need 
fine tuning]
i.e. the concatenation of the text content of this element and all its 
descendants,
(concatenated in document order)
2) Else
2.1) If there is a content attribute then that is the lexical form, and 
the literal
is a plain literal, using the inscope xml:lang
2.2) Else the literal has datatype rdf:XMLLiteral and its lexical form is
the exclusive canonicalization of the content of the current element.

1.2 here being the answer to the specific question.


In addition, here:
>> 8) plain literals from text() nodes
>> There is no method for generating plain literals from the children 
>> text() nodes.
>> Plain literals can only be generated using the @content attribute.
>> This may have been desirable behaviour. No change suggested.
>
> We may want to allow for concatenation of text() nodes... I'll add 
> that as an issue to
>
> http://www.w3.org/2001/sw/BestPractices/HTML/2005-current-issues#plain-literals 
>


For example, I believe the current RSS consensus is that it is better 
not to include markup in the content of a feed. To support this, we 
could have a pseudo-datatype [xh2:plain] that modifies case 1 by "if the 
datatype is xh2:plain then the result is a plain literal, with language 
tag being the current in-scope xml:lang and the lexical form as in 1.1 
or 1.2.

On the other hand, I believe the I18N group would argue that markup 
should always be allowed for natural language text and the (putative) 
RSS consensus is misguided. Also the KISS rule would argue against the 
[xh2:plain] since we have had to introduce a new concept 'pseudo-datatype'.
> Is this a similar issue to #5 above?
>
No

>> 7) lang tag in XML Literals 5.1.2.1, 4.4.1
>> The behaviour for literal objects, no content attribute, and no 
>> datatype attribute constructs an rdf:XMLLiteral and looses any lang 
>> tag from the context. I suggest this is a mistake, and should be 
>> fixed by inserting a span or div as appropriate.
>
> can you send an example
>

OK. I am looking at:

http://www.bbc.co.uk/turkish/

and its corresponding RSS feed:

http://www.bbc.co.uk/turkish/index.xml

(I was looking at the persian feed, but I have some turkish and can't 
read the persian font so ...)

This HTML fragment

<body lang="tr">

...
<div class="indexheadline"><a href="/turkish/europe/story/2005/10/051027_hamptoncourt.shtml">AB liderlerinin gündemi küreselleşme</a></div>

...
</body>

(actually the lang="tr" isn't there, they currently are only indicating 
language with

<meta http-equiv="Content-Language" content="tr">

I think this may be a temporary glitch)

This corresponds to this RDF/XML


<item 
rdf:about="http://www.bbc.co.uk/go/wsy/pub/rss/1.0/-/turkish/europe/story/2005/10/051027_hamptoncourt.shtml">
<title xml:lang="tr">AB liderlerinin gündemi küreselleşme</title>
...
</item>

Mapping this example into RDF/A, XHTML2 we might have:

<body xml:lang="tr">

...
<div class="indexheadline"
  href="/turkish/europe/story/2005/10/051027_hamptoncourt.shtml"
>AB liderlerinin gündemi küreselleşme</div>

...
</body>

Ugggghhhh !!!
I can't make a triple linking the href and the content, I have to 
duplicate the URI :(
So here goes:

<body xml:lang="tr">

...
<div class="indexheadline" 
  href="/turkish/europe/story/2005/10/051027_hamptoncourt.shtml"
>
<span 
about="/turkish/europe/story/2005/10/051027_hamptoncourt.shtml"
rel="[rdf:type]"
href="[rss:item]"
 property="[rss:title]"
>AB liderlerinin gündemi küreselleşme</span></div>

...
</body>

The corresponding RDF/XML according to the current rules is:

<item 
rdf:about="http://www.bbc.co.uk/go/wsy/pub/rss/1.0/-/turkish/europe/story/2005/10/051027_hamptoncourt.shtml">
<title rdf:parseType="Literal">AB liderlerinin gündemi küreselleşme</title>
...
</item>

note: the xml:lang has got lost, and I have to use heuristics to know that
AB liderlerinin gündemi küreselleşme
is turkish, despite this information being explicit in the XHTML2 document.
A possible fix is to insert a <span> in the rules for forming an 
XMLLiteral on which to hold the xml:lang information.
e.g. we could output
<item 
rdf:about="http://www.bbc.co.uk/go/wsy/pub/rss/1.0/-/turkish/europe/story/2005/10/051027_hamptoncourt.shtml">
<title rdf:parseType="Literal"
 ><span xml:lang="tr">AB liderlerinin gündemi küreselleşme</span></title>
...
</item>







>> 8) plain literals from text() nodes
>> There is no method for generating plain literals from the children 
>> text() nodes.
>> Plain literals can only be generated using the @content attribute.
>> This may have been desirable behaviour. No change suggested.
>
> We may want to allow for concatenation of text() nodes... I'll add 
> that as an issue to
>
> http://www.w3.org/2001/sw/BestPractices/HTML/2005-current-issues#plain-literals 
>
>
> -Ben
>
Received on Thursday, 27 October 2005 11:02:50 UTC