Re: [RDFa] rdf:XMLLiteral (was RE: Missing issue on the list: identification of RDFa content) from Mark Birbeck on 2007-03-19 (public-rdf-in-xhtml-tf@w3.org from March 2007)

From: Mark Birbeck <mark.birbeck@x-port.net>
Date: Mon, 19 Mar 2007 05:31:56 -0700
To: public-rdf-in-xhtml-tf@w3.org
Message-ID: <640dd5060703190531x2fdbd33ev4601cc48fbbbefa1@mail.gmail.com>
Hi Ivan,

> Wow. The discussion has really become suddenly intense. Cheer up guys,
> this is only technology:-)

Indeed. Kind of aggressive I feel, given that all I have done is try
to find a solution to this problem.
Although I can't tell people what to think or how to argue, perhaps I
can suggest that they at least take as their starting-point that the
plain literal solution is the most obvious, and I would of course
favour it if it could be made to work. I spent a lot of time on this a
number of years ago, but at the end of the day I felt that ignoring
mark-up that is placed in text by authors was just not acceptable.

As I have said elsewhere though, there is a modification of the
current solution that I can see working, which is that we say that an
element with no child element nodes becomes a plain literal, whilst
elements with child element nodes become XML literals. This was
discussed quite a long time ago by myself and Steven, but we never
really pursued it, mainly because we thought people might find it
unacceptable. Having said that, the issue of XML literals has only
just recently started to be discussed again, so it's only now become
prescient. Encouragingly, a few months ago I implemented exactly this
algorithm in my RDFa parser, and I believe it to be pretty
straightforward.

But I still want to emphasise that the solution that simply does not
work is to 'flatten' all mark-up in situations where the author has
not specifically asked for XML literals.


> Anyway...
>
> Mark, I read all the arguments but, I am sorry to say, you still have
> not convinced me, and I still believe that the default should be plain
> literal....

Just so that we're clear, the current position in the RDFa spec is of
XML literals, so the shoe is actually on the other foot...those
opposed to it need to provide convincing arguments for *removing* this
behaviour. I have still to hear a good argument for using _only_ plain
literals, but I'd also like to hear views on the 'mixed' approach of
generating _either_ a plain literal _or_ an XML literal, as
appropriate.


> There is the 'social' aspect Ian was referring to several times. Whether
> we like it or not, most of the RDF-s used out there use plain literals.
> I have seen very very rarely graphs with XMLLiteral, in fact, and I
> think I have used it only once myself. I do not think it is o.k. if the
> RDF graphs resulting from RDFa authoring get such a different flavour;
> they should 'blend in' the RDF world.

I'm having trouble squaring this with my understanding of RDF. Do you
curl up in bed and 'read' RDF graphs? ;) Or are they processed by
machines? And how many times have you used xsd:base64Binary? In other
words, existing data storage patterns tell us nothing about future
ones,  features are not less important just because you haven't used
them. Etc., etc. There does not seem to be any technical reason why
triples that originate from RDFa would have any trouble 'blending'
with triples from any other source.


> And, although your argumentation
> around my old SPARQL issue was logically and technically correct, the
> fact still remains that lots of SPARQL queries, though scruffy, work
> with some of the assumptions that I made back then (ie, I did not check
> the language tag, for example). We cannot ignore that, it *is* part of
> the 80/20 cut. And remember: my SPARQL query *did* fail because of that!

I'm really not sure how to reply...you seem to be saying that whilst
my argument was correct both logically and technically, it is
unacceptable because it doesn't factor in your mistaken assumptions
about RDF Concepts' notion of equality. If so, then you have me... :)

Don't forget, though, that I showed that your query failed _anyway_,
regardless of RDFa; the mistaken assumptions you were making about RDF
equality were already tripping you up with data from your RDF/XML
documents.


> [A small remark to Dan Brickley: even if any XMLLiteral is, in fact, a
> general Literal according to RDFS, SPARQL endpoints do not necessarily
> have an RDFS reasoner. Ie, in practice this relationship will not be
> recognized...]

It has nothing to do with reasoning; it's the RDF Concepts document
that says that *both* plain literals and typed literals are of type
'literal'. And also, SPARQL knows about both types, independent of RDF
Schema.


> I also have a more technical issue. You convincingly argue with the
> Einstein example:
>
> E = mc<sup>2</sup>: The Most Urgent Problem of Our Time
>
> where the <sup> tag plays an essential, shall we say, semantic role.
> True. But I could just as well use another example, like
>
> This guy is <em>truly</em> intelligent
>
> where the author puts in the <em> tag for a visual emphasis only, but
> the real "semantics" he/she wants to convey is "This guy is truly
> intelligent", in which case the <em> tag really gets in the way (again,
> whether this usage of <em> is technically correct or not is besides the
> point; this *is* the way it is used many times!). The same holds for a
> number of cases: if the text in question is inside a <h1> tag, some sort
> of <span>, etc. In all those cases, keeping the extra XML tag in the
> graph is counter-intuitive to me. Although I have no statistics, my gut
> feeling tells me that these examples are in majority compared to the
> Einstein example.

What you describe is not  accepted or conventional usage of HTML and
XHTML, in that nowadays you wouldn't find many people who put 'em'
into their mark-up just for some visual effect, whilst not wanting the
'em' to be part of the text's meaning. The trend today is almost
exactly the opposite, and is why I keep insisting that if an author
has put mark-up into their document, we should preserve as much of it
as possible.


> Finally, Ian's lingering question is still around: if I *want* a plain
> literal, ie, I *want* the system to get rid of the extra xml tags, what
> do I do? Does RDFa wants to introduce yet another keyword for this? Why
> not follow the default mechanism that is used both by RDF/XML and Turtle
> and nobody seems to have a problem with?

First, on producing plain literals, there is a way to do it, and that
is to use the content attribute. But I would stress that it is the
*need* for plain literals that is the edge case, since as I've tried
to show, I can't find a situation yet where it makes a difference
whether a simple string is represented by a plain literal or an XML
literal.

But second, you are moving the goalposts when you say that you "*want*
the system to get rid of the extra xml tags". Where did that
requirement come from? I feel the need to stress again that we are
dealing here with XHTML authors, and not RDF ones. Which XHTML authors
will want plain literals, or even know what they are? And who would
want their mark-up 'flattened' rather than remaining as mark-up?

And finally, we've agreed that we are not trying to support all of
RDF, so if it's the case that it is not possible to create plain
literals (which is not the case, but just say it was) then that is not
in itself an argument against making XML literal the default, unless
it can be shown that plain literals are needed for some significant
use case.


> I am not 'officially' part of the Working Group, for obvious reasons,
> but I would think that this is an issue that, eventually, should be
> voted upon the group. This discussion has dragged on for a long time
> and, somehow, should be closed...

I don't know what to say to this either..."dragged on' is quite a
loaded term. Anyway, until someone can convincingly justify the
removal of authors' mark-up then this issue still needs to be
discussed, so I don't see the point in attempting to wind it up
prematurely.

Regards,

Mark

-- 
  Mark Birbeck, formsPlayer

  mark.birbeck@x-port.net | +44 (0) 20 7689 9232
  http://www.formsPlayer.com | http://internet-apps.blogspot.com

  standards. innovation.
Received on Monday, 19 March 2007 12:32:02 UTC