Re: [RDFa] rdf:XMLLiteral (was RE: Missing issue on the list: identification of RDFa content) from Mark Birbeck on 2007-03-18 (public-rdf-in-xhtml-tf@w3.org from March 2007)

From: Mark Birbeck <mark.birbeck@x-port.net>
Date: Sun, 18 Mar 2007 16:49:29 -0700
To: "Elias Torres" <elias@torrez.us>
Cc: "Ian Davis" <iand@internetalchemy.org>, "Ben Adida" <ben@adida.net>, public-rdf-in-xhtml-tf@w3.org
Message-ID: <640dd5060703181649u5952b226sdadc45ddc8b2b565@mail.gmail.com>
Hi Elias/Ian,

I'm afraid I'm missing from this discussion, first what we *lose* by
using rdf:XMLLiteral, and second, some clear-cut explanation of why
plain literals are *logically* the correct default, rather than just
simply someone's 'preference'.

The feature being referred to has been in RDFa and its predecessors
pretty much since the beginning, and there is a strong *logical*
justification for it. I certainly don't mind if some stronger argument
is presented, but I have not yet seen an alternative proposal that
makes the case logically, and in my view there needs to be one.

So, to go through the rationale again, my thinking was this:

1. There are two alternatives, to use plain literals or XML literals.
(There is no logical foundation for still using typed literals, but of
using some other datatype.)

2. Using plain literals seems the most obvious at first sight, since
it is basic and unadorned, and appears to help SPARQL. However, as I
showed, this latter justification is spurious, since SPARQL requires
queries to be constructed in such a way that language and datatype are
ignored _anyway_, even if the data originated from non-RDFa sources.

3. Plain literals have the main problem that they *remove* the XHTML
author's intent; anyone creating RDF by using RDFa in XHTML is
obviously using XHTML; anyone using 'sup' and 'sub' in the title of a
document obviously knows what they are doing at the level of both the
book and XHTML, and this should be preserved.

4. You could argue that such authors should add
@datatype="rdf:XMLLiteral" to really prove that they know what they
are doing, but with all due respect to the proponents of this view,
this is _exactly_ the kind of authoring requirement that has kept
RDF/XML sidelined! The whole idea of RDFa is to allow authors to add
minimal mark-up to their documents to create RDF, almost without
thinking about it, since the result is a whole load of lovely triples
for the 'RDF community' to process...so ease of authoring is a
fundamental design goal which @datatype="rdf:XMLLiteral" completely
breaks.


SUMMARY

I would suggest therefore there are two main points here; the first is
that using plain literals is actually *incorrect* since the author has
used XHTML for their mark-up, and therefore does have XML literals.
Interestingly, RDF/XML has this problem in reverse; since it uses XML
as a 'transparent carrier' for RDF, then the syntax has to provide a
way to flag up XML literals so that a processor knows when the RDF/XML
'contains' XML. We don't have that problem, since there is no point at
which the mark-up can represent *only* RDF--the mark-up is always the
mark-up.

The second key point is this; what does the 'RDF community' lose by
making the default datatype rdf:XMLLiteral, anyway? Or to put it the
more important way round, what do the XHTML authoring community gain
by making the default datatype into plain literals?

The goal of RDFa since the beginning has been to make it as easy as
possible for XHTML authors to add RDF to their documents, so as to
create a 'foundation' for the semantic web. I would therefore ask that
we don't get distracted by what might seem easier for our triple
stores.

Regards,

Mark


On 16/03/07, Elias Torres <elias@torrez.us> wrote:
>
> Are we neglecting the 80/20 rule? I think it comes in very handy
> sometimes when we have strong technical reasons but no consensus from
> the community.
>
> I also prefer the default be plain literal since that's the most common
> case as I believe Mark has acknowledged. If the author has markup and
> wants XMLLiteral then she just adds datatype. I think that makes more
> sense (even though Mark's email on the subject is so thorough and
> technically convincing) because I am thinking of HTML as well.
>
> For example, if XMLLiteral was so important, then people would need to
> do something similar to what they do in Atom XHTML content payloads:
> they wrap with a div and the xhtml namespace declaration. I think that
> having XMLLiteral default w/o all of the baggage of xmlns prefixes and
> such is not that useful. And if we were to add all of the processing to
> make sure no xmlns prefix declaration is lost, then it's too much work.
>
> Ian Davis wrote:
> >
> > On 16/03/2007 16:20, Ben Adida wrote:
> >> Ian Davis wrote:
> >>> Yes, that doesn't cater for the <sup>2</sup> argument
> >>
> >> What would you suggest in that case with no datatype? Stripping HTML
> >> tags?
> >
> > Taking the string value:
> >
> > http://www.w3.org/TR/xpath#dt-string-value
>
> At first I thought I'd rather have the markup, but I think you are
> right. If we have markup in plain literals we run the risk of dealing
> with crappy displays of titles containing HTML in feeds like with RSS. I
> wonder if we need an XHTML specific datatype like Atom uses to indicate
> in a model that this is XHTML and can be rendered in a browser. Maybe
> XHTMLLiteral extends XMLLiteral? Is this crazy?
>
> -Elias
>
> >
> > Ian
> >
>
>


-- 
  Mark Birbeck, formsPlayer

  mark.birbeck@x-port.net | +44 (0) 20 7689 9232
  http://www.formsPlayer.com | http://internet-apps.blogspot.com

  standards. innovation.
Received on Sunday, 18 March 2007 23:49:33 UTC