Re: datatyping in RDFa (was Re: Implementing an RDFa parser...) from Elias Torres on 2006-06-04 (public-rdf-in-xhtml-tf@w3.org from June 2006)

From: Elias Torres <elias@torrez.us>
Date: Sun, 4 Jun 2006 08:52:23 -0400
To: "Dan Brickley" <danbri@danbri.org>
Cc: public-rdf-in-xhtml-tf@w3.org
Message-ID: <905f7c910606040552n1abc2776l1f747dbcb546e630@mail.gmail.com>
On 6/4/06, Dan Brickley <danbri@danbri.org> wrote:
>
> (long; sorry...)
>
> * Elias Torres <elias@torrez.us> [2006-06-03 14:43-0400]
> > I decided to implement an RDFa parser [1]
>
> Nice work :)
>
> > Section 6.2
>
> (ie.
> http://www.w3.org/2001/sw/BestPractices/HTML/2005-rdfa-syntax#id0x06668bb8)
> >
> > The examples are missing the geo declaration. I used
> > http://www.w3.org/2003/01/geo/
>
> If you mean the namespace described by that doc, the ns should be
>         xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#"
>
> Judging from the example in 2005-rdfa-syntax, the intent does indeed
> seem to be to use that namespace. This is a good thing since the ns
> is a reasonably popular one, and its usage provided some background for
> efforts such as http://www.georss.org/
>
> Unfortunately...
>
> > Also, some of the triples containing literals are missing
> > ^^rdf:XMLLiteral. I'm not sure what you meant for geo:lat, geo:long,
> > dc:title, foaf:name.
>
> ...we run into this problem. The range of the geo: wgs84 properties
> geo:lat and geo:long are *not* rdf:XMLLiteral. When I pointed this out
> before w.r.t. the FOAF examples in the RDFa primer, the result was the
> replacement of FOAF in the examples with
> xmlns:contact="http://www.w3.org/2001/vcard-rdf/3.0#"  ...a namespace
> which doesn't use rdf:range at all (but which in practice hasn't been
> used much with RDF datatyping either, and whose properties
> are capitalised btw, unlike those in the primer currently). Dublin Core,
> similarly, hasn't been using rdf:XMLLiteral, but unlike FOAF: and Geo:
> namespaces, doesn't use rdfs:range so the conflict between DC: common
> practice and RDFa examples isn't formally visible.
>
> I see two options here:
>
>  a) change the RDFa stuff again to use a different rdf:XMLLiteral
>    based namspace.
>  b) change the namespaces RDFa uses to have range of rdf:XMLLiteral
>    on the currently used properties, or add new properties with
>    that range.
>  c) change RDFa to make plain-literals easier to generate.
>
> FWIW in the case of geo:, we did once use datatyping (some numeric type,
> not XMLLiteral). I removed this because the syntactic overhead in
> the RDF/XML notation was too high (eg. it stopped us putting the
> data in XML attributes).
>
> The tradeoffs outlined in
> http://www.w3.org/TR/rdf-concepts/#section-Literals continue to haunt
> us...
>
> [[
> Note: RDF applications may use additional equivalence relations, such as
> that which relates an xsd:string  with an rdf:XMLLiteral corresponding
> to a single text node of the same string.
> ]]
>
> ...is intriguing.  I remember Jeremy Carroll was doing some work in
> this area, per discussions in the Galway SWBDP WG F2F, but I'm not sure
> how far things got.
>
> http://www.w3.org/TR/rdf-sparql-query/#func-str is also relevant
> somehow, also http://www.w3.org/TR/rdf-sparql-query/#tests
>
> "Literals may be cast to typed literals in order to use the SPARQL
> operators."
>
> I believe at least some RDF databases (3Store?) optimistically attempt
> to cast strings into datatypes they're lexically compatible with, so
> that indices can be built in advance of queries. This might have some
> bearing upon our sense for "best practice" in this area; in other words,
> ... should SWBPD be pushing more heavily the use of RDF's datatyping
> construct? Or is it possible to get some of the benefits of datatyping
> without having to use RDF's instance-level datatyping mechanism?
>
> Here's where I think we are...
>
> Currently:
>
> In RDF/XML, datatyping is a syntactic burden, and so the
> default is that documents generate plain literals (which carry lang
> tags but don't deal well with markup). These literals are I18N-happy in
> that they're explicitly lang-tagged (whether or not that makes sense
> for the particular property such is currently
> mechnically invisitble). But they're I18N-unhappy in that they don't
> support inline markup without entity-escaping, thus making it hard to
> use things like Ruby markup (http://en.wikipedia.org/wiki/Ruby_characters).
>
> By contrast, in RDF/A, datatyping is a different kind of syntactic
> burden. The default is that documents generate rdf:XMLLiteral literals
> (which *don't* carry lang tags, but which do allow inline markup;
> useful for Ruby, and for those aforementioned language tags).
>
> Where does this leave folk trying to manage schemas in a syntax-
> independent manner? In an awkward position, I think! The design of
> an RDF property, in practice, is something that is rarely done without
> consideration of it's likely deployment in markup. When there are two
> alternate encodings of that property (RDF/XML, RDF/a), each with conflicting
> defaults for textually valued properties, ... schema authors are
> put in a very difficult situation. The only schemas which can
> be used in simple idiomatic markup across both encodings are those
> (such as the ageing vcard-3 proposal from Renato, or Dublin Core)
> which don't (yet) specify a range. It would be a shame if the best
> we can recommend to schema authors at this stage in time (ie. after
> 8 years of RDFS, and 4 years of OWL work) is that schema authors say
> nothing about their properties' ranges.
>
> Having said all that, ... I look back at the RDFa syntax spec,
> and see that it is at least possible to do property generation with
> plain literals:
>
> http://www.w3.org/2001/sw/BestPractices/HTML/2005-rdfa-syntax#id0x08541ce8
> [[
> Without a datatype attribute, the object literal will either be a plain
> literal or an XML literal, depending on whether the content  attribute
> is used. For example, consider the following XHTML with RDF/A which
> designates the author of a web page:
>
> <html xmlns="http://www.w3.org/1999/xhtml">
>     <head property="dc:creator" content="Mark Birbeck">
>             <title>Internet Applications</title>
>                 </head>
>                 ...
> ]]
>
>
> We can also do things like this:
>
> <span about="http://example.org/foo"
>       property="ex:bar" content="10" datatype="xsd:integer">ten</span>
>
> ...so presumably,
>
>       datatype="rdfs:Literal" would do the job here.
>
>
> But both are kind of ugly, as evidenced by their not being
> used in the current spec examples for geo: and foaf:, perhaps.
>
> What could be done instead? I guess it comes back to the
> "property=" attribute. We could have "xmlproperty=" for rdf:XMLLiteral,
> or "plainproperty" for plain rdfs:Literal.
>
> This is a somewhat esoteric point for content authors, ... will they
> tolerate having an extra property available to them? It's hard to
> know the costs/benefits here. If they don't have some syntactic support
> for getting their datatyping right, ... things will only break further
> down the processing pipeline. Eg. their lat/long properties won't match
> the usual SPARQL queries for the geo: namespace, and their data won't
> show up on the map.
>
> My vote would be for adding "plainproperty" (or some better name"
> as a shorthand syntax for "property='foo' datatype='rdfs:Literal' ".

I saw in the spec datatype="plaintext" to designate the content of the
RDFa element as a plain literal so you can take into the account the
xml:lang attribute. It's kind of hidden, but I'm not sure if it's what
you are asking for.

Also, notice the datatype="xsd:string" which will remove any markup
from the XMLLiteral turning into a xsd:string.

Just pointing those out. I'll be digesting the rest of your email to
improve my understanding on the issue..

-Elias

>
> ie. this is a design for taking option (c), above.
>
> Thoughts?
>
> Dan
>
>
>
> Dan
>
>
>
Received on Sunday, 4 June 2006 12:52:39 UTC