Re: [RDFa] rdf:XMLLiteral (was RE: Missing issue on the list: identification of RDFa content) from Ivan Herman on 2007-03-19 (public-rdf-in-xhtml-tf@w3.org from March 2007)

From: Ivan Herman <ivan@w3.org>
Date: Mon, 19 Mar 2007 17:35:21 +0100
To: mark.birbeck@x-port.net
Cc: public-rdf-in-xhtml-tf@w3.org
Message-ID: <45FEBBC9.6070500@w3.org>
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1



Mark Birbeck wrote:
>
[snip]
> 
> As I have said elsewhere though, there is a modification of the
> current solution that I can see working, which is that we say that an
> element with no child element nodes becomes a plain literal, whilst
> elements with child element nodes become XML literals. This was
> discussed quite a long time ago by myself and Steven, but we never
> really pursued it, mainly because we thought people might find it
> unacceptable. Having said that, the issue of XML literals has only
> just recently started to be discussed again, so it's only now become
> prescient. Encouragingly, a few months ago I implemented exactly this
> algorithm in my RDFa parser, and I believe it to be pretty
> straightforward.
> 

This may be an acceptable compromise indeed.

> But I still want to emphasise that the solution that simply does not
> work is to 'flatten' all mark-up in situations where the author has
> not specifically asked for XML literals.
> 

I am not sure I understand your term 'does not work'...

> 
>> Anyway...
>>
>> Mark, I read all the arguments but, I am sorry to say, you still have
>> not convinced me, and I still believe that the default should be plain
>> literal....
> 
> Just so that we're clear, the current position in the RDFa spec is of
> XML literals, so the shoe is actually on the other foot...those
> opposed to it need to provide convincing arguments for *removing* this
> behaviour. I have still to hear a good argument for using _only_ plain
> literals, but I'd also like to hear views on the 'mixed' approach of
> generating _either_ a plain literal _or_ an XML literal, as
> appropriate.
> 

Well, I was not around when this decision was taken. True. But I do not
think it is a question of the right or left shoe. The issue has been
reopened, not only by me, a number of people think they have good
arguments, so things have to be discussed fairly. (Which we actually do,
so let us forget about those shoes...:-).

As I said, the mixed approach might actually work.

> 
>> There is the 'social' aspect Ian was referring to several times. Whether
>> we like it or not, most of the RDF-s used out there use plain literals.
>> I have seen very very rarely graphs with XMLLiteral, in fact, and I
>> think I have used it only once myself. I do not think it is o.k. if the
>> RDF graphs resulting from RDFa authoring get such a different flavour;
>> they should 'blend in' the RDF world.
> 
> I'm having trouble squaring this with my understanding of RDF. Do you
> curl up in bed and 'read' RDF graphs? ;) Or are they processed by
> machines? And how many times have you used xsd:base64Binary? In other
> words, existing data storage patterns tell us nothing about future
> ones,  features are not less important just because you haven't used
> them. Etc., etc. There does not seem to be any technical reason why
> triples that originate from RDFa would have any trouble 'blending'
> with triples from any other source.
> 

And I still believe there is a problem with blending.... See below

> 
>> And, although your argumentation
>> around my old SPARQL issue was logically and technically correct, the
>> fact still remains that lots of SPARQL queries, though scruffy, work
>> with some of the assumptions that I made back then (ie, I did not check
>> the language tag, for example). We cannot ignore that, it *is* part of
>> the 80/20 cut. And remember: my SPARQL query *did* fail because of that!
> 
> I'm really not sure how to reply...you seem to be saying that whilst
> my argument was correct both logically and technically, it is
> unacceptable because it doesn't factor in your mistaken assumptions
> about RDF Concepts' notion of equality. If so, then you have me... :)
> 
> Don't forget, though, that I showed that your query failed _anyway_,
> regardless of RDFa; the mistaken assumptions you were making about RDF
> equality were already tripping you up with data from your RDF/XML
> documents.
>

I think you are using pretty strong terms here with 'mistaken
assumption'. The fact is that if all graphs around use plain literals,
then there is no problem with the assumption. Show me one single foaf
file that uses XMLLiteral for a foaf:name (except those that originate
from RDFa, that is:-). Anybody using the dc terms use plain literal for
title, name, ... and there are lots of those.

A SPARQL processing that uses these foaf files will happily go on with
that assumption and will be usable for equality. Huge amount of data are
currently retrieved, eg, by the guys at the University of Berlin

http://sites.wiwiss.fu-berlin.de/suhl/bizer/ng4j/semwebclient/index.html#examples

using their Relational Database access, and queried with SPARQL. Does it
mean that RDF data coming from RDFa should not mash up with that?


> 
>> [A small remark to Dan Brickley: even if any XMLLiteral is, in fact, a
>> general Literal according to RDFS, SPARQL endpoints do not necessarily
>> have an RDFS reasoner. Ie, in practice this relationship will not be
>> recognized...]
> 
> It has nothing to do with reasoning; it's the RDF Concepts document
> that says that *both* plain literals and typed literals are of type
> 'literal'. And also, SPARQL knows about both types, independent of RDF
> Schema.
> 

I am sorry Mark, but I believe it has. The RDF Semantics document indeed
says that an XML Literal is a general Literal, it defines subclass and
typing relationships on those. But these are only used in practice (if
you like, the RDFS entailement rules are used on the graph expanding it)
if the RDF triple store in question implements that, which is equivalent
to the implementation of an RDFS reasoner. If this is not the case, than
those relationships are not recognized, and matching will fail. In the
case of SPARQL, there is no way a client can say (in SPARQL) to the
SPARQL endpoint to expand the graph according to the entailement rules,
this is an out-of-band instruction, and indeed some SPARQL endpoint do
not have this capability at all (eg, RDFLib with my SPARQL
implementation can be used for an endpoint, but RDFLib does not have an
RDFS reasoner...)

I am not saying this is an ideal situation. But this is the way it is.
And we have to live with it.

> 
>> I also have a more technical issue. You convincingly argue with the
>> Einstein example:
>>
>> E = mc<sup>2</sup>: The Most Urgent Problem of Our Time
>>
>> where the <sup> tag plays an essential, shall we say, semantic role.
>> True. But I could just as well use another example, like
>>
>> This guy is <em>truly</em> intelligent
>>
>> where the author puts in the <em> tag for a visual emphasis only, but
>> the real "semantics" he/she wants to convey is "This guy is truly
>> intelligent", in which case the <em> tag really gets in the way (again,
>> whether this usage of <em> is technically correct or not is besides the
>> point; this *is* the way it is used many times!). The same holds for a
>> number of cases: if the text in question is inside a <h1> tag, some sort
>> of <span>, etc. In all those cases, keeping the extra XML tag in the
>> graph is counter-intuitive to me. Although I have no statistics, my gut
>> feeling tells me that these examples are in majority compared to the
>> Einstein example.
> 
> What you describe is not  accepted or conventional usage of HTML and
> XHTML, in that nowadays you wouldn't find many people who put 'em'
> into their mark-up just for some visual effect, whilst not wanting the
> 'em' to be part of the text's meaning. The trend today is almost
> exactly the opposite, and is why I keep insisting that if an author
> has put mark-up into their document, we should preserve as much of it
> as possible.
> 

Please, Mark, give me some credit. Yes, <em> may not have been the good
example, I could have used a span with a class that the user could have
used for visual effect. The point is that, in my view at least, in many
many cases the extra markup is *not* an information one wants to keep.

Maybe it is indeed a different origin. Though I am of course using HTML,
but my vision for RDFa is not to expose the full structure into RDF but
the possibility to easily blend information that I want in the HTML
document with information that I want for mashup via RDF. And, for the
latter, I do not want to expose the HTML structure like h1, dl, em,
span, strong, etc.


> 
>> Finally, Ian's lingering question is still around: if I *want* a plain
>> literal, ie, I *want* the system to get rid of the extra xml tags, what
>> do I do? Does RDFa wants to introduce yet another keyword for this? Why
>> not follow the default mechanism that is used both by RDF/XML and Turtle
>> and nobody seems to have a problem with?
> 
> First, on producing plain literals, there is a way to do it, and that
> is to use the content attribute. 

True. But repeating the content is *really* not what we would like to
push the user to, would we...

>                                    But I would stress that it is the
> *need* for plain literals that is the edge case, since as I've tried
> to show, I can't find a situation yet where it makes a difference
> whether a simple string is represented by a plain literal or an XML
> literal.

Which means that all the RDF content out there is wrong because they use
plain literals? Isn't that a bit strong? Or I do not understand this
statement.

> 
> But second, you are moving the goalposts when you say that you "*want*
> the system to get rid of the extra xml tags". Where did that
> requirement come from? I feel the need to stress again that we are
> dealing here with XHTML authors, and not RDF ones. Which XHTML authors
> will want plain literals, or even know what they are? And who would
> want their mark-up 'flattened' rather than remaining as mark-up?
> 

??? I do not understand this argument. Isn't this what we are discussing
all along?



> And finally, we've agreed that we are not trying to support all of
> RDF, so if it's the case that it is not possible to create plain
> literals (which is not the case, but just say it was) then that is not
> in itself an argument against making XML literal the default, unless
> it can be shown that plain literals are needed for some significant
> use case.
> 

I have the impression that we are repeating ourselves. This is simply

E = mc<sup>2</sup>: The Most Urgent Problem of Our Time

vs.

This guy is <span class="makeitfancy">truly</span> intelligent

argument all along, and what we believe is the more convincing use case.
And we do not seem to be able to convince one another...:-(

But... after all, the original problem came from the fact that

<span property="bla">Something here</span>

ended as an XML Literal for 'Something here'. *That* created problems,
*that* was what triggered my original mails. And, indeed, your 'mixed'
solution would at least solve that, because this would end up as a plain
literal, right?

> 
>> I am not 'officially' part of the Working Group, for obvious reasons,
>> but I would think that this is an issue that, eventually, should be
>> voted upon the group. This discussion has dragged on for a long time
>> and, somehow, should be closed...
> 
> I don't know what to say to this either..."dragged on' is quite a
> loaded term. 

No load intended:-)

Ivan

>               Anyway, until someone can convincingly justify the
> removal of authors' mark-up then this issue still needs to be
> discussed, so I don't see the point in attempting to wind it up
> prematurely.
> 
> Regards,
> 
> Mark
> 

- --

Ivan Herman, W3C Semantic Web Activity Lead
URL: http://www.w3.org/People/Ivan/
PGP Key: http://www.cwi.nl/%7Eivan/AboutMe/pgpkey.html
FOAF: http://www.ivan-herman.net/foaf.rdf
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.1 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFF/rvJdR3hQzQ/Gj0RArxyAKCBByDgaywdXjsuXpd7kcSvQkGSwACgvcri
WOEwg3JX+la4wX/o6VqM7Fs=
=5GFx
-----END PGP SIGNATURE-----
Received on Monday, 19 March 2007 16:35:18 UTC