Re: Change in definition of RDF literals from Martin Duerst on 2003-05-19 (w3c-rdfcore-wg@w3.org from May 2003)

From: Martin Duerst <duerst@w3.org>
Date: Mon, 19 May 2003 08:38:18 -0400
To: "Jeremy Carroll" <jjc@hplb.hpl.hp.com>, <w3c-i18n-ig@w3.org>
Cc: <w3c-rdfcore-wg@w3.org>
Message-Id: <4.2.0.58.J.20030515125610.06707a50@localhost>
Hello Jeremy,

We discussed this issue shortly at the last i18n core teleconference,
but most people didn't feel they understood the issues enough.

I feel that I have some understanding, and also a clear opinion.
(I think that the decision you told us about was clearly wrong.)

So I'm going ahead and start the discussion. I hope I will see many
of you in Budapest.

At 16:40 03/05/13 +0200, Jeremy Carroll wrote:


>Hello
>
>the RDF Core WG made a decision as part of its last call process that we
>decided to formally communicate to the I18N WG.
>
>Note, we are still looking forward to your review comments on our Last Call
>documents.

Yes, we know. I have read the RDF Primer and RDF Concepts drafts,
and have started the RDF Semantics draft, but got stuck, both
because of the topic/style and because of other urgent stuff.

We are trying to tell you about issues we find when we find
(and discuss) them, but we unfortunately are not finished yet.


>The decision made on Friday [1] is to modify the definition of a literal to
>exclude the possibility of typed literals having an associated language tag:
>
>[[
> > Option 4:
> > Language tag is simply dropped from all typed literals including
> > rdf:XMLLiteral

For typed literals other than XMLLiteral, you write in your


 >>>>
For all these datatypes, syntactically excluding the language tag from typed
literals merely better articulates the WG's earlier decision (approx Nov
2002) that such language tags had no meaning. That decision is clearly
articulated in three of the last call documents.
 >>>>

It's hard to find, and I wasn't aware of this. It is maybe the right
decision, but further discussion is needed for this.



>PROPOSE
>   Concepts is changed to say that a literal can have either a datatype or a
>language tag and not both.
>   rdf:XMLLiteral datatype is changed to have the identity as its lexical
>value mapping (no wrapping), with consequential change to the value space of
>rdf:XMLLiteral.
>   Other editors to make consequential changes.
>]]
>from [2]
>
>We specifically draw your attention to this being at variance with the
>decisions made at the inter-WG meeting at the Cannes Plenary in 2002
>concerning the scope of language tags (xml:lang) and embedded XML within RDF
>(the rdf:parseType="Literal" construct).

Thanks for pointing that out. I'm not at all happy with it.
 From an i18n point of view, marked-up text is an extension of
simple literals. The decision totally breaks this.

Not having language apply to certain datatypes seems to make sense.
In particular to XML Schema things such as dates and numbers.
However, it may not be the right thing e.g. for xsi:string.
It may also not be the right thing for certain datatypes from
other frameworks. At the minimum, this issue/restriction should
clearly be mentioned with the other assumptions about how datatypes
work (distinction of value space and lexical space).

Now back to parseType='Literal'.

Your example in your 'rationale' provides the simplest way of showing
the central problem. Let's look at four different cases.

A)
<rdf:Description xmlns="...xhtml...">
   <eg:prop xml:lang='en'
   >Hello World<eg:prop>
</rdf:Description>

B)
<rdf:Description xmlns="...xhtml...">
   <eg:prop rdf:parseType="Literal"
   ><span xml:lang="en">Hello World</span><eg:prop>
</rdf:Description>

C)
<rdf:Description xmlns="...xhtml...">
   <eg:prop rdf:parseType="Literal" xml:lang='en'
   >Hello <em xml:lang='sp'>Mundo</em><eg:prop>
</rdf:Description>

D)
<rdf:Description xmlns="...xhtml...">
   <eg:prop rdf:parseType="Literal"
   ><span xml:lang="en">Hello <em xml:lang='sp'>Mundo</em></span><eg:prop>
</rdf:Description>


With the recent decision, A), B), and D) would work, but C) would
not work as intended (RDF would ignore xml:lang). The user would
probably use A) and D). The difference is really difficult to explain,
completely artificial. What is more, A) and B) are not the same,
and C) and D) are not the same, even assuming that we went back
to the old solution. <span> looks like a fairly neutral and low-
profile element, but in some cases, there may be no such thing
in the markup we want to use. It looks to me as if this solution
will force applications to define the equivalence between
just text and text in some specific elements, which seems
completely unnecessary.

What is very important to understand is that in many cases,
the xml:lang will not be on the same element as rdf:parseType,
but much higher up in the tree, e.g. when all all text in
an RDF document is in English, it may be on the root element.
Also, the rdf:parseType may not actually be in the document,
it may be added in with some DTD fragment. In this way,
many documents that don't look like RDF can be RDF, as
we are seeing it with data-oriented RDF.

I think one way to see it is that the underlying problem is the use
of a datatype of rdf:XMLLiteral for parseType='Literal' is rather
artificial. When I read that for the first time, I thought that it
might be nice to allow XML Schema complex types there, which
would allow validation of the contents, and would bring simple
types and complex types closer together.
The alternative solution is to not treat parseType='Literal' as
a type at all, but as something separate, as a basic literal in
and by itself. One way to go would be to treat all literals as
being XML, with the simple case just having no markup. The
N-triples notation then would maybe just use some elements
of XML syntax, such as &amp; and &lt;. Just an idea.

With respect to the 'concern of infection' from xml:lang
attributes in a particular serialization, this exists both
for plain string literals and for XML literals. The I18N
WG has become aware of this problem quite a while ago,
both from the RDF Core WG as well as from other WGs, and
has worked together with the XML Core WG and coordinated
with other experts to establish the use of xml:lang=''
to shield subtrees from such 'infection'. So this should
not really be a concern.


So my conclusion is that due to the specific steps that
the RDF spec has evolved, we have arrived at a highly
undesirable state (a very local optimum). It seems
crucial to take a step back, make sure we understand
why parseType='Literal' was put into the spec in the
first type. Then I think that it will not be too
difficult to arrive at a much better solution.


Regards,     Martin.


>As an example:
>
><rdf:Description xml:lang="en">
>    <eg:prop rdf:parseType="Literal"><b>chat</b></eg:prop>
></rdf:Description>
>
>and
>
><rdf:Description xml:lang="fr">
>    <eg:prop rdf:parseType="Literal"><b>chat</b></eg:prop>
></rdf:Description>
>
>are given exactly the same representation as an RDF graph and exactly the
>same meaning. (Which differs from the Last Call documents in which the
>language tag is significant).
>
>The intention in these examples is now expressed as:
>
><rdf:Description>
>    <eg:prop rdf:parseType="Literal"><span
>  xml:lang="en"><b>chat</b></span></eg:prop>
></rdf:Description>
>
>and
>
><rdf:Description>
>    <eg:prop rdf:parseType="Literal"><span
>  xml:lang="fr"><b>chat</b></span></eg:prop>
></rdf:Description>
>
>I have produced a rationale [3] (not endorsed by the WG).
>
>Jeremy, on behalf of RDF Core
>
>
>[1] minutes (not yet approved)
>http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2003May/0138.html
>RESOLVED: Typed literals option 4 from msg 0086
>[2] proposal (#4)
>http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2003May/0086.html
>[3] rationale (personal not WG)
>http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2003May/0145.html
Received on Monday, 19 May 2003 08:55:36 UTC