RE: datatyping in RDFa (was Re: Implementing an RDFa parser...) from Mark Birbeck on 2006-06-05 (public-rdf-in-xhtml-tf@w3.org from June 2006)

From: Mark Birbeck <mark.birbeck@x-port.net>
Date: Mon, 5 Jun 2006 17:01:59 +0100
To: <public-rdf-in-xhtml-tf@w3.org>
Message-ID: <025401c688b9$60a57020$6502a8c0@Jan>
Hi Dan,

Elias wrote:
> > 
> > The examples are missing the geo declaration. I used 
> > http://www.w3.org/2003/01/geo/

and you replied, Dan:

> If you mean the namespace described by that doc, the ns should be 
>  	xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#"
> 
> Judging from the example in 2005-rdfa-syntax, the intent does 
> indeed seem to be to use that namespace. This is a good thing 
> since the ns is a reasonably popular one, and its usage 
> provided some background for efforts such as http://www.georss.org/

Yes, the syntax document does use your geo work, as do many of the other
examples we've been working on. I quite like it... :)


> Unfortunately...
> 
> > Also, some of the triples containing literals are missing 
> > ^^rdf:XMLLiteral. I'm not sure what you meant for geo:lat, 
> geo:long, 
> > dc:title, foaf:name.
> 
> ...we run into this problem. The range of the geo: wgs84 
> properties geo:lat and geo:long are *not* rdf:XMLLiteral. 
> When I pointed this out before w.r.t. the FOAF examples in 
> the RDFa primer, the result was the replacement of FOAF in 
> the examples with 
> xmlns:contact="http://www.w3.org/2001/vcard-rdf/3.0#"

On the second point, as far as I know we 'commented out' the FOAF example
because of the whole WebArch thing--although I think that was still prompted
by a comment from you, so that may be what you are remembering. In passing,
I'd just say that I'm keen to get the FOAF example back in again once
everyone is happy with it (and of course, the WebArch question!) since I see
RDFa as an ideal way to publish FOAF documents. At the moment there is quite
a lot of hackery involved in becoming a member of the FOAF community.

On the first point though, about geo:lat and geo:long not being
rdf:XMLLiterals, this is an interesting one. One of the justifications for
making rdf:XMLLiteral the default datatype for the content of <meta> was
that:

  Note: RDF applications may use additional equivalence
  relations, such as that which relates an xsd:string
  with an rdf:XMLLiteral corresponding to a single text
  node of the same string.

(See
http://www.w3.org/TR/2004/REC-rdf-concepts-20040210/#dfn-rdf-XMLLiteral)

The idea was that since an xsd:string can also be an rdf:XMLLiteral, we
didn't need to get the author to worry about switching between this:

  <meta property="food:fruit">
    <banana colour="yellow" />
  </meta>

and this:

  <meta property="food:fruit">banana</meta>

Both would be parsed in a consistent way, like this:

  <> food:fruit "<banana colour="yellow" />"^^rdf:XMLLiteral .
  <> food:fruit "banana"^^rdf:XMLLiteral .

However, I think you've drawn out an error in the documents, since the prose
seems to be implying that this rule also applies to <span> and so on, which
is not the case. I think this might be a hangover from some earlier drafts,
since the original drafts of RDFa tried to do everything with attributes and
as a consequence made <meta> no different to <span> or <div>. But after a
lot of work on the IPTC's requirements it became clear that the only easy
way to get reification was to make <meta> and <link> 'special', so this was
changed quite a while ago.

I'll see if we can get this clarified...thanks for flagging it up.

(More on this topic below.)


>  ...a 
> namespace which doesn't use rdf:range at all (but which in 
> practice hasn't been used much with RDF datatyping either, 
> and whose properties are capitalised btw, unlike those in the 
> primer currently). Dublin Core, similarly, hasn't been using 
> rdf:XMLLiteral, but unlike FOAF: and Geo:
> namespaces, doesn't use rdfs:range so the conflict between 
> DC: common practice and RDFa examples isn't formally visible.
> 
> I see two options here:
> 
>  a) change the RDFa stuff again to use a different rdf:XMLLiteral
>    based namspace.
>  b) change the namespaces RDFa uses to have range of rdf:XMLLiteral
>    on the currently used properties, or add new properties with
>    that range.
>  c) change RDFa to make plain-literals easier to generate.

If we correct these inconsistencies in the documents I think we'll have what
you want.


> FWIW in the case of geo:, we did once use datatyping (some 
> numeric type, not XMLLiteral). I removed this because the 
> syntactic overhead in the RDF/XML notation was too high (eg. 
> it stopped us putting the data in XML attributes). 
> 
> The tradeoffs outlined in
> http://www.w3.org/TR/rdf-concepts/#section-Literals continue 
> to haunt us...
> 
> [[
> Note: RDF applications may use additional equivalence 
> relations, such as that which relates an xsd:string  with an 
> rdf:XMLLiteral corresponding to a single text node of the same string.
> ]] 
>
> ...is intriguing.  I remember Jeremy Carroll was doing some 
> work in this area, per discussions in the Galway SWBDP WG 
> F2F, but I'm not sure how far things got. 

Oops...well, I'll leave my quote of the same thing above, since I'd have to
rewrite the first part of my mail ;)

Just on this though, XSLT has this notion of external general parsed
entities, or something like that...basically, the idea of a blob of XML,
which may not actually include opening and closing tags, etc. This is a more
generalised idea, but includes the notion of allowing a simple string of
text to be classed as XML, as implied by this quote.


In the remainder of your post, I agree with the issues you raise. If we
assume that <span> and so on, no longer generate XML literals then I believe
the big issue we are left with is that rdf:XMLLiteral does not have a
language. This is a pain, because it's always been a goal that <meta>+RDFa
is a handy way to get i18n into documents in situations where you couldn't
in the past, in particular over on the XHTML side of the RDFa coin, where
we've used the <meta> element to solve a number of accessibility and i18n
issues.

For example, this is not i18ned:

  <img src="..." title="A picture of a banana" />

But this is (or was meant to be until you threw a spanner in the works):

  <img src="...">
    <meta property="title" xml:lang="en">A picture of a banana</meta>
  </img>

as is this:

  <img src="...">
    <meta property="title" xml:lang="en">A picture of a banana</meta>
    <meta property="title" xml:lang="fr">Ceci n'est pas une banane</meta>
  </img>

As you've now made me realise, if we make the default into an XML literal
then it doesn't have a language, and we can't do these things.

One of the things I was assuming way back when we decided to make
rdf:XMLLiteral the default was that a parser might not like having to
distinguish between these two:

  <meta property="food:fruit">
    <banana colour="yellow" />
  </meta>

  <meta property="food:fruit">banana</meta>

But I'm now wondering whether we should just put the onus back on to the
parser, and get it to do the work to distinguish between strings and XML
literals. The above would therefore generate this:

  <> food:fruit "<banana colour="yellow" />"^^rdf:XMLLiteral .
  <> food:fruit "banana" .

This would allow us to have language values on the strings as we'd always
thought we were:

  <meta property="food:fruit" xml:lang="en">banana</meta>

  <> food:fruit "banana"@en .

I think with these two factors taken together--that <span> defaults to a
plain literal and <meta> is only an rdf:XMLLiteral if it has child
elements--then I don't think we need the extra attribute that you are
tentatively proposing.



> http://www.w3.org/TR/rdf-sparql-query/#func-str is also 
> relevant somehow, also http://www.w3.org/TR/rdf-sparql-query/#tests
> 
> "Literals may be cast to typed literals in order to use the 
> SPARQL operators."
> 
> I believe at least some RDF databases (3Store?) 
> optimistically attempt to cast strings into datatypes they're 
> lexically compatible with, so that indices can be built in 
> advance of queries. This might have some bearing upon our 
> sense for "best practice" in this area; in other words, ... 
> should SWBPD be pushing more heavily the use of RDF's 
> datatyping construct? Or is it possible to get some of the 
> benefits of datatyping without having to use RDF's 
> instance-level datatyping mechanism?
> 
> Here's where I think we are...
> 
> Currently: 
> 
> In RDF/XML, datatyping is a syntactic burden, and so the 
> default is that documents generate plain literals (which 
> carry lang tags but don't deal well with markup). These 
> literals are I18N-happy in that they're explicitly 
> lang-tagged (whether or not that makes sense for the 
> particular property such is currently mechnically 
> invisitble). But they're I18N-unhappy in that they don't 
> support inline markup without entity-escaping, thus making it 
> hard to use things like Ruby markup 
> (http://en.wikipedia.org/wiki/Ruby_characters).
> 
> By contrast, in RDF/A, datatyping is a different kind of 
> syntactic burden. The default is that documents generate 
> rdf:XMLLiteral literals (which *don't* carry lang tags, but 
> which do allow inline markup; useful for Ruby, and for those 
> aforementioned language tags).
> 
> Where does this leave folk trying to manage schemas in a 
> syntax- independent manner? In an awkward position, I think! 
> The design of an RDF property, in practice, is something that 
> is rarely done without consideration of it's likely 
> deployment in markup. When there are two alternate encodings 
> of that property (RDF/XML, RDF/a), each with conflicting 
> defaults for textually valued properties, ... schema authors 
> are put in a very difficult situation. The only schemas which 
> can be used in simple idiomatic markup across both encodings 
> are those (such as the ageing vcard-3 proposal from Renato, 
> or Dublin Core) which don't (yet) specify a range. It would 
> be a shame if the best we can recommend to schema authors at 
> this stage in time (ie. after
> 8 years of RDFS, and 4 years of OWL work) is that schema 
> authors say nothing about their properties' ranges.
> 
> Having said all that, ... I look back at the RDFa syntax 
> spec, and see that it is at least possible to do property 
> generation with plain literals:
> 
> http://www.w3.org/2001/sw/BestPractices/HTML/2005-rdfa-syntax#
> id0x08541ce8
> [[
> Without a datatype attribute, the object literal will either 
> be a plain literal or an XML literal, depending on whether 
> the content  attribute is used. For example, consider the 
> following XHTML with RDF/A which designates the author of a web page:
> 
> <html xmlns="http://www.w3.org/1999/xhtml">
>     <head property="dc:creator" content="Mark Birbeck">
>             <title>Internet Applications</title>
> 	        </head>
> 		...
> ]]
> 
> 
> We can also do things like this:
> 
> <span about="http://example.org/foo"
>       property="ex:bar" content="10" datatype="xsd:integer">ten</span>
> 
> ...so presumably, 
> 
>       datatype="rdfs:Literal" would do the job here.
> 
> 
> But both are kind of ugly, as evidenced by their not being 
> used in the current spec examples for geo: and foaf:, perhaps.
> 
> What could be done instead? I guess it comes back to the 
> "property=" attribute. We could have "xmlproperty=" for 
> rdf:XMLLiteral, or "plainproperty" for plain rdfs:Literal.
> 
> This is a somewhat esoteric point for content authors, ... 
> will they tolerate having an extra property available to 
> them? It's hard to know the costs/benefits here. If they 
> don't have some syntactic support for getting their 
> datatyping right, ... things will only break further down the 
> processing pipeline. Eg. their lat/long properties won't 
> match the usual SPARQL queries for the geo: namespace, and 
> their data won't show up on the map.
> 
> My vote would be for adding "plainproperty" (or some better name" 
> as a shorthand syntax for "property='foo' datatype='rdfs:Literal' ".
> 
> ie. this is a design for taking option (c), above.
> 
> Thoughts?
> 
> Dan
> 
> 
> 
> Dan
> 
> 
>
Received on Monday, 5 June 2006 16:02:24 UTC