datatyping in RDFa (was Re: Implementing an RDFa parser...)

(long; sorry...)

* Elias Torres <elias@torrez.us> [2006-06-03 14:43-0400]
> I decided to implement an RDFa parser [1] 

Nice work :)

> Section 6.2

(ie.
http://www.w3.org/2001/sw/BestPractices/HTML/2005-rdfa-syntax#id0x06668bb8)
> 
> The examples are missing the geo declaration. I used
> http://www.w3.org/2003/01/geo/

If you mean the namespace described by that doc, the ns should be 
 	xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#"

Judging from the example in 2005-rdfa-syntax, the intent does indeed
seem to be to use that namespace. This is a good thing since the ns
is a reasonably popular one, and its usage provided some background for 
efforts such as http://www.georss.org/

Unfortunately...

> Also, some of the triples containing literals are missing
> ^^rdf:XMLLiteral. I'm not sure what you meant for geo:lat, geo:long,
> dc:title, foaf:name.

...we run into this problem. The range of the geo: wgs84 properties
geo:lat and geo:long are *not* rdf:XMLLiteral. When I pointed this out
before w.r.t. the FOAF examples in the RDFa primer, the result was the 
replacement of FOAF in the examples with 
xmlns:contact="http://www.w3.org/2001/vcard-rdf/3.0#"  ...a namespace
which doesn't use rdf:range at all (but which in practice hasn't been
used much with RDF datatyping either, and whose properties 
are capitalised btw, unlike those in the primer currently). Dublin Core,
similarly, hasn't been using rdf:XMLLiteral, but unlike FOAF: and Geo:
namespaces, doesn't use rdfs:range so the conflict between DC: common
practice and RDFa examples isn't formally visible.

I see two options here:

 a) change the RDFa stuff again to use a different rdf:XMLLiteral
   based namspace.
 b) change the namespaces RDFa uses to have range of rdf:XMLLiteral
   on the currently used properties, or add new properties with
   that range.
 c) change RDFa to make plain-literals easier to generate.

FWIW in the case of geo:, we did once use datatyping (some numeric type,
not XMLLiteral). I removed this because the syntactic overhead in 
the RDF/XML notation was too high (eg. it stopped us putting the 
data in XML attributes). 

The tradeoffs outlined in
http://www.w3.org/TR/rdf-concepts/#section-Literals continue to haunt
us...

[[
Note: RDF applications may use additional equivalence relations, such as
that which relates an xsd:string  with an rdf:XMLLiteral corresponding
to a single text node of the same string.
]] 

...is intriguing.  I remember Jeremy Carroll was doing some work in
this area, per discussions in the Galway SWBDP WG F2F, but I'm not sure
how far things got. 

http://www.w3.org/TR/rdf-sparql-query/#func-str is also relevant
somehow, also http://www.w3.org/TR/rdf-sparql-query/#tests

"Literals may be cast to typed literals in order to use the SPARQL
operators."

I believe at least some RDF databases (3Store?) optimistically attempt
to cast strings into datatypes they're lexically compatible with, so
that indices can be built in advance of queries. This might have some 
bearing upon our sense for "best practice" in this area; in other words,
... should SWBPD be pushing more heavily the use of RDF's datatyping
construct? Or is it possible to get some of the benefits of datatyping
without having to use RDF's instance-level datatyping mechanism?

Here's where I think we are...

Currently: 

In RDF/XML, datatyping is a syntactic burden, and so the 
default is that documents generate plain literals (which carry lang 
tags but don't deal well with markup). These literals are I18N-happy in
that they're explicitly lang-tagged (whether or not that makes sense 
for the particular property such is currently 
mechnically invisitble). But they're I18N-unhappy in that they don't 
support inline markup without entity-escaping, thus making it hard to 
use things like Ruby markup (http://en.wikipedia.org/wiki/Ruby_characters).

By contrast, in RDF/A, datatyping is a different kind of syntactic 
burden. The default is that documents generate rdf:XMLLiteral literals
(which *don't* carry lang tags, but which do allow inline markup; 
useful for Ruby, and for those aforementioned language tags).

Where does this leave folk trying to manage schemas in a syntax-
independent manner? In an awkward position, I think! The design of 
an RDF property, in practice, is something that is rarely done without
consideration of it's likely deployment in markup. When there are two
alternate encodings of that property (RDF/XML, RDF/a), each with conflicting
defaults for textually valued properties, ... schema authors are 
put in a very difficult situation. The only schemas which can 
be used in simple idiomatic markup across both encodings are those 
(such as the ageing vcard-3 proposal from Renato, or Dublin Core) 
which don't (yet) specify a range. It would be a shame if the best 
we can recommend to schema authors at this stage in time (ie. after 
8 years of RDFS, and 4 years of OWL work) is that schema authors say 
nothing about their properties' ranges.

Having said all that, ... I look back at the RDFa syntax spec,
and see that it is at least possible to do property generation with
plain literals:

http://www.w3.org/2001/sw/BestPractices/HTML/2005-rdfa-syntax#id0x08541ce8
[[
Without a datatype attribute, the object literal will either be a plain
literal or an XML literal, depending on whether the content  attribute
is used. For example, consider the following XHTML with RDF/A which
designates the author of a web page:

<html xmlns="http://www.w3.org/1999/xhtml">
    <head property="dc:creator" content="Mark Birbeck">
            <title>Internet Applications</title>
	        </head>
		...
]]


We can also do things like this:

<span about="http://example.org/foo"
      property="ex:bar" content="10" datatype="xsd:integer">ten</span>

...so presumably, 

      datatype="rdfs:Literal" would do the job here.


But both are kind of ugly, as evidenced by their not being 
used in the current spec examples for geo: and foaf:, perhaps.

What could be done instead? I guess it comes back to the 
"property=" attribute. We could have "xmlproperty=" for rdf:XMLLiteral,
or "plainproperty" for plain rdfs:Literal.

This is a somewhat esoteric point for content authors, ... will they 
tolerate having an extra property available to them? It's hard to 
know the costs/benefits here. If they don't have some syntactic support
for getting their datatyping right, ... things will only break further
down the processing pipeline. Eg. their lat/long properties won't match
the usual SPARQL queries for the geo: namespace, and their data won't 
show up on the map.

My vote would be for adding "plainproperty" (or some better name" 
as a shorthand syntax for "property='foo' datatype='rdfs:Literal' ".

ie. this is a design for taking option (c), above.

Thoughts?

Dan



Dan

Received on Sunday, 4 June 2006 10:21:12 UTC