Re: [RDFa] rdf:XMLLiteral (was RE: Missing issue on the list: identification of RDFa content) from Ivan Herman on 2007-03-19 (public-rdf-in-xhtml-tf@w3.org from March 2007)

From: Ivan Herman <ivan@w3.org>
Date: Mon, 19 Mar 2007 10:07:42 +0100
To: mark.birbeck@x-port.net
Cc: Ian Davis <iand@internetalchemy.org>, public-rdf-in-xhtml-tf@w3.org
Message-ID: <45FE52DE.8080409@w3.org>
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Wow. The discussion has really become suddenly intense. Cheer up guys,
this is only technology:-)

Anyway...

Mark, I read all the arguments but, I am sorry to say, you still have
not convinced me, and I still believe that the default should be plain
literal....

There is the 'social' aspect Ian was referring to several times. Whether
we like it or not, most of the RDF-s used out there use plain literals.
I have seen very very rarely graphs with XMLLiteral, in fact, and I
think I have used it only once myself. I do not think it is o.k. if the
RDF graphs resulting from RDFa authoring get such a different flavour;
they should 'blend in' the RDF world. And, although your argumentation
around my old SPARQL issue was logically and technically correct, the
fact still remains that lots of SPARQL queries, though scruffy, work
with some of the assumptions that I made back then (ie, I did not check
the language tag, for example). We cannot ignore that, it *is* part of
the 80/20 cut. And remember: my SPARQL query *did* fail because of that!

[A small remark to Dan Brickley: even if any XMLLiteral is, in fact, a
general Literal according to RDFS, SPARQL endpoints do not necessarily
have an RDFS reasoner. Ie, in practice this relationship will not be
recognized...]

I also have a more technical issue. You convincingly argue with the
Einstein example:

E = mc<sup>2</sup>: The Most Urgent Problem of Our Time

where the <sup> tag plays an essential, shall we say, semantic role.
True. But I could just as well use another example, like

This guy is <em>truly</em> intelligent

where the author puts in the <em> tag for a visual emphasis only, but
the real "semantics" he/she wants to convey is "This guy is truly
intelligent", in which case the <em> tag really gets in the way (again,
whether this usage of <em> is technically correct or not is besides the
point; this *is* the way it is used many times!). The same holds for a
number of cases: if the text in question is inside a <h1> tag, some sort
of <span>, etc. In all those cases, keeping the extra XML tag in the
graph is counter-intuitive to me. Although I have no statistics, my gut
feeling tells me that these examples are in majority compared to the
Einstein example.

Finally, Ian's lingering question is still around: if I *want* a plain
literal, ie, I *want* the system to get rid of the extra xml tags, what
do I do? Does RDFa wants to introduce yet another keyword for this? Why
not follow the default mechanism that is used both by RDF/XML and Turtle
and nobody seems to have a problem with?

I am not 'officially' part of the Working Group, for obvious reasons,
but I would think that this is an issue that, eventually, should be
voted upon the group. This discussion has dragged on for a long time
and, somehow, should be closed...

Ivan

Mark Birbeck wrote:
> 
> Hi Ian,
> 
>> I'm looking specifically at the default of XMLLiteral. I've read Mark's
>> clear explanation of the thinking behind it[1] and understand the
>> arguments. My preference, from a usability and "principle of least
>> surprise" point of view is for the default to be plain literals.
> 
> I don't follow such arguments...who are we talking about surprising
> here? And in what way is there a question of usability? We need to be
> clear that the way RDFa operates currently is such that everything we
> are talking about is *hidden* from the author, and only affects what
> is stored in a triple store.
> 
> I should add though, that the only alternative on the table at the
> moment--to require the author to be explicit about using mark-up in
> strings--does not itself hold to the principle of least surprise _or_
> have a significant measure of usability. To mark up a document with
> 'sup' or 'sub' in the title would, with this approach, require the use
> of the datatype attribute and of course the specification of the RDF
> namespace:
> 
>  <h2 property="dc:title" datatype="rdf:XMLLiteral"
> xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
>    E = mc<sup>2</sup>: The Most Urgent Problem of Our Time
>  </h2>
> 
> I'm having trouble seeing why this is advantageous in any way.
> 
> 
>> However, I'm still working through the various pros/cons.
> 
> Ok.
> 
> 
>> I have a question which I couldn't find an answer to in my reading of
>> the syntax doc[2]. How would the following triple be encoded in RDFa
>> (given usual namespace prefix mappings):
>>
>> <http://example.com/doc> dc:title "RDF or Bust" .
>>
>> The natural place for this to fit would be on the <title> element of the
>> HTML document, or possible on an <h1>.
> 
> The only way to encode this *exactly* would be to do this:
> 
>  <span property="dc:title" content="RDF or Bust" />
> 
> However, I would suggest that in this discussion we have to keep
> reminding ourselves that we are dealing with XHTML that contains RDF,
> and *not* with RDF. So, to illustrate why this is significant, take
> the fact that best practice in XHTML is to have a default language at
> the root of the document. Your example might now look like this:
> 
>  <html xmlns="..." lang="en" xml:lang="en">
>    ...
>    <body>
>      <span property="dc:title" content="RDF or Bust" />
>    </body>
>  </html>
> 
> But the addition of the language attribute means that your triple
> immediately becomes this:
> 
>  <http://example.com/doc> dc:title "RDF or Bust"@en .
> 
> In other words, even taking the apparently simple route--using plain
> literals--you still don't actually get the *exact* triple that you
> want, because we are not dealing with just another RDF serialisation,
> we are dealing with RDF in XHTML.
> 
> Where do we go from here? Do we ignore the language settings? If so,
> that means that we are now ignoring even more of authors' intent from
> their mark-up. I'm afraid that's not a route we can go down.
> 
> So, all I did when trying to address this problem originally was ask,
> what is so bad about the following triple being used instead:
> 
>  <http://example.com/doc> dc:title "RDF or Bust"^^rdf:XMLLiteral .
> 
> Although it's not perfect, I found after a lot of research that there
> isn't as much wrong with it as one might initially think, and I've
> still not heard of another way to solve all of the issues and remain
> consistent with our design goals.
> 
> Regards,
> 
> Mark
> 

- --

Ivan Herman, W3C Semantic Web Activity Lead
URL: http://www.w3.org/People/Ivan/
PGP Key: http://www.cwi.nl/%7Eivan/AboutMe/pgpkey.html
FOAF: http://www.ivan-herman.net/foaf.rdf
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.1 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFF/lLedR3hQzQ/Gj0RAhhJAJ49P0KZkYaAjp+70K2BCgF/VmxCYgCeK0TF
+ellIDRueSsj2cHWHZ4r6Z0=
=n3UC
-----END PGP SIGNATURE-----
Received on Monday, 19 March 2007 09:07:46 UTC