Coming to consensus on the default datatype

Hi all,

I see that my "sick leave" only caused an increase in mailing list
traffic :) Maybe I should get sick more often!

I want to address a few of the technical issues here, so that we can
focus on the decision at hand. I also want to clarify the setting for
these decisions, because it's clear that we all have slightly different
motivations. I'll quote from emails sent by Mark and Ian, though I'm not
trying to be complete here. (I don't have time, and I'm not trying to
prove anyone wrong, just trying to reexplain the setting for this
discussion and hopefully reach consensus and mutual appreciation of our
different viewpoints.)

Mark says:
> I would suggest therefore there are two main points here; the first is
> that using plain literals is actually *incorrect* since the author has
> used XHTML for their mark-up, and therefore does have XML literals.
> Interestingly, RDF/XML has this problem in reverse; since it uses XML
> as a 'transparent carrier' for RDF, then the syntax has to provide a
> way to flag up XML literals so that a processor knows when the RDF/XML
> 'contains' XML. We don't have that problem, since there is no point at
> which the mark-up can represent *only* RDF--the mark-up is always the
> mark-up.

I'm think RDFa may have a similar "problem." At some point, Ian sent the
following challenge triple to embed using RDFa:

=======
<http://example.com/doc> v:name "<span
property="v:nickname">neuro</span>"^^rdf:XMLLiteral .
=======

I'm not entirely sure how we would achieve the above without generating
another triple, which means we are dealing with "something weird." This
is not an argument for plain literals by default, of course, since that
wouldn't fix the above problem. I'm just making the point that we should
think carefully about this: we *do* have issues that are reminiscent of
RDF/XML: content and carrier are mixed. (And maybe Ian's example above
is simply not doable in RDFa, by the way.)

I also wonder about the issue that Ivan brought up: what about folks who
want to render their RDF using XHTML (say, their FOAF file), and then
want to add a bit of markup to make it "look nice." This is going to be
a very common use case, where the markup is added for presentation, not
at all for true semantics. We can't assume the markup is always
meaningful. In other words, we do have to consider when markup plays the
role of carrier, decoration, or content.

What I know for sure is that we have a use case document that we all
discussed that shows HTML and RDF triples that we wish to embed within
that HTML. Taking the FOAF example, plenty of people will want to embed
plain literals. We need to allow this simply, in a DRY-compliant way.

In another email, Mark said:
> And counting triple datatypes is certainly not a logical argument,
> since the whole point I have been making is that RDFa is *not* driven
> by RDF, but by XHTML.

I disagree. RDFa is driven by *both* XHTML and RDF needs. That's why
we're a joint task force. We cannot ignore the needs of the RDF
community just because they're not the same as XHTML authors' needs. We
simply don't have the freedom to start with a vision of RDF "from scratch."

At the same time, where I agree strongly with Mark is that we should use
the advantages of XHTML, because, as we've heard a few times from
observers, "RDFa is what the serialization of RDF should have been to
begin with." There is great power here in making XHTML markup the
default datatype, even without the lang (which can be part of the markup
anyways, right?)

Ian says:
> Either way, RDFa currently provides no way of encoding a triple with a 
> plain literal value using the text that the author has marked up. 
> Currently it requires duplication of the text in a content attribute. 
> This is a serious flaw.

I agree. We should not have this kind of flaw. It should be
straight-forward to do what Ian wants to do here.

At the same time, I do see Mark's point about the advantage of using the
full power of markup. It would be unfortunate to limit ourselves to what
RDF has done to date, which is to develop an entirely parallel web for
machines. We're bridging clickable and semantic, right?

Mark says:
> I'm trying to
> stress that RDFa is primarily about being able to extract metadata
> from documents that XHTML authors have created without them really
> being aware of RDF. The idea has always been that if we can make it
> easy for XHTML authors to add metadata then the 'RDF community' will
> benefit.

I know this is your goal, Mark, and I fully encourage it, but it's not
the only goal of members of this group. It's also about embedding RDF in
XHTML. In addition, I think we'd be fooling ourselves to think that
XHTML authors won't need to be aware of RDF at all. As soon as they
declare a namespace, they're going to wonder what the heck it means, and
they're going to have to be aware of *some* concepts of RDF. Hopefully
not everything up to and including bnodes, of course, but *something*.

Mark says:
> But more importantly, there is no social aspect to this issue. RDF
> defines both plain literals and typed literals, and it defines how
> literals are compared for equality. There's not much more to it than
> that, and to suggest that we should look at whether typed literals are
> more common than plain literals is just strange. It's like asking
> which numbers are more common. Should everyone who is 6 ft tall choose
> to be either 5 ft or 7ft, since those numbers are more common than 6?
> It's a non-argument, because we're not dealing with specific numbers,
> but a numbering _system_, and RDF is exactly the same--it's an entire
> system that you cannot cherry-pick.

Here you lose me. No specification lives in a bubble. Having the semweb
community publish their FOAF files as XHTML+RDFa is very much a use case
for this group. There is no doubt they will want to do so with plain
literals. Ergo, we need that to be easy and to show off the advantages
of RDFa, including DRY. Otherwise, they'll be link-rel-alternate'ing
till the cows come home.

Mark says:
> I keep insisting that if an author
> has put mark-up into their document, we should preserve as much of it
> as possible.

Yes, I agree wholeheartedly. We should look to the future, a bright
future where text is always markup, where text without markup is like
Courier font: tasteless and way too plain. I truly believe that. A
closer integration of HTML and RDF is crucial, because I do agree with
Mark that the RDF community missed some opportunities by ignoring HTML,
and that we have *some* power here to do better.

So here's a proposal for discussion, which is only a slight tweak on the
approach already mentioned by Mark and Steven, which is meant to make
for easy defaults and reasonably easy override in both directions with DRY.

1) Ian's use case:

<http://example.com/doc> dc:title "RDF or Bust" .

========
<h1 property="dc:title">RDF or Bust</h1>
========

2) Mark's use case:

<http://example.com/doc> dc:title "E=mc<sup>2</sup>"^^rdf:XMLLiteral .

========
<h1 property="dc:title">E=mc<sup>2</sup></h1>
========

3) Overriding the type when the markup is for presentation only:

<http://example.com/doc> dc:title "This guy is truly intelligent" .

========
<h1 property="dc:title" datatype="plain">
   This guy is <em>truly</em> intelligent
</h1>
========

4) Overriding the type to stay consistent with other XMLLiteral data stores:

<http://example.com/doc> dc:title "E equals mc squared"^^rdf:XMLLiteral .

========
<h1 property="dc:title" datatype="rdf:XMLLiteral">
   E equals mc squared
</h1>
========


I think we're pretty close to consensus on something like what Mark and
Steven proposed, possibly with the above twist. So let's aim for some
mutual understanding here, and let's get this issue resolved. I think
we're very close.

-Ben

Received on Wednesday, 21 March 2007 01:55:30 UTC