- From: Martin Duerst <duerst@w3.org>
- Date: Mon, 19 May 2003 08:38:18 -0400
- To: "Jeremy Carroll" <jjc@hplb.hpl.hp.com>, <w3c-i18n-ig@w3.org>
- Cc: <w3c-rdfcore-wg@w3.org>
Hello Jeremy, We discussed this issue shortly at the last i18n core teleconference, but most people didn't feel they understood the issues enough. I feel that I have some understanding, and also a clear opinion. (I think that the decision you told us about was clearly wrong.) So I'm going ahead and start the discussion. I hope I will see many of you in Budapest. At 16:40 03/05/13 +0200, Jeremy Carroll wrote: >Hello > >the RDF Core WG made a decision as part of its last call process that we >decided to formally communicate to the I18N WG. > >Note, we are still looking forward to your review comments on our Last Call >documents. Yes, we know. I have read the RDF Primer and RDF Concepts drafts, and have started the RDF Semantics draft, but got stuck, both because of the topic/style and because of other urgent stuff. We are trying to tell you about issues we find when we find (and discuss) them, but we unfortunately are not finished yet. >The decision made on Friday [1] is to modify the definition of a literal to >exclude the possibility of typed literals having an associated language tag: > >[[ > > Option 4: > > Language tag is simply dropped from all typed literals including > > rdf:XMLLiteral For typed literals other than XMLLiteral, you write in your >>>> For all these datatypes, syntactically excluding the language tag from typed literals merely better articulates the WG's earlier decision (approx Nov 2002) that such language tags had no meaning. That decision is clearly articulated in three of the last call documents. >>>> It's hard to find, and I wasn't aware of this. It is maybe the right decision, but further discussion is needed for this. >PROPOSE > Concepts is changed to say that a literal can have either a datatype or a >language tag and not both. > rdf:XMLLiteral datatype is changed to have the identity as its lexical >value mapping (no wrapping), with consequential change to the value space of >rdf:XMLLiteral. > Other editors to make consequential changes. >]] >from [2] > >We specifically draw your attention to this being at variance with the >decisions made at the inter-WG meeting at the Cannes Plenary in 2002 >concerning the scope of language tags (xml:lang) and embedded XML within RDF >(the rdf:parseType="Literal" construct). Thanks for pointing that out. I'm not at all happy with it. From an i18n point of view, marked-up text is an extension of simple literals. The decision totally breaks this. Not having language apply to certain datatypes seems to make sense. In particular to XML Schema things such as dates and numbers. However, it may not be the right thing e.g. for xsi:string. It may also not be the right thing for certain datatypes from other frameworks. At the minimum, this issue/restriction should clearly be mentioned with the other assumptions about how datatypes work (distinction of value space and lexical space). Now back to parseType='Literal'. Your example in your 'rationale' provides the simplest way of showing the central problem. Let's look at four different cases. A) <rdf:Description xmlns="...xhtml..."> <eg:prop xml:lang='en' >Hello World<eg:prop> </rdf:Description> B) <rdf:Description xmlns="...xhtml..."> <eg:prop rdf:parseType="Literal" ><span xml:lang="en">Hello World</span><eg:prop> </rdf:Description> C) <rdf:Description xmlns="...xhtml..."> <eg:prop rdf:parseType="Literal" xml:lang='en' >Hello <em xml:lang='sp'>Mundo</em><eg:prop> </rdf:Description> D) <rdf:Description xmlns="...xhtml..."> <eg:prop rdf:parseType="Literal" ><span xml:lang="en">Hello <em xml:lang='sp'>Mundo</em></span><eg:prop> </rdf:Description> With the recent decision, A), B), and D) would work, but C) would not work as intended (RDF would ignore xml:lang). The user would probably use A) and D). The difference is really difficult to explain, completely artificial. What is more, A) and B) are not the same, and C) and D) are not the same, even assuming that we went back to the old solution. <span> looks like a fairly neutral and low- profile element, but in some cases, there may be no such thing in the markup we want to use. It looks to me as if this solution will force applications to define the equivalence between just text and text in some specific elements, which seems completely unnecessary. What is very important to understand is that in many cases, the xml:lang will not be on the same element as rdf:parseType, but much higher up in the tree, e.g. when all all text in an RDF document is in English, it may be on the root element. Also, the rdf:parseType may not actually be in the document, it may be added in with some DTD fragment. In this way, many documents that don't look like RDF can be RDF, as we are seeing it with data-oriented RDF. I think one way to see it is that the underlying problem is the use of a datatype of rdf:XMLLiteral for parseType='Literal' is rather artificial. When I read that for the first time, I thought that it might be nice to allow XML Schema complex types there, which would allow validation of the contents, and would bring simple types and complex types closer together. The alternative solution is to not treat parseType='Literal' as a type at all, but as something separate, as a basic literal in and by itself. One way to go would be to treat all literals as being XML, with the simple case just having no markup. The N-triples notation then would maybe just use some elements of XML syntax, such as & and <. Just an idea. With respect to the 'concern of infection' from xml:lang attributes in a particular serialization, this exists both for plain string literals and for XML literals. The I18N WG has become aware of this problem quite a while ago, both from the RDF Core WG as well as from other WGs, and has worked together with the XML Core WG and coordinated with other experts to establish the use of xml:lang='' to shield subtrees from such 'infection'. So this should not really be a concern. So my conclusion is that due to the specific steps that the RDF spec has evolved, we have arrived at a highly undesirable state (a very local optimum). It seems crucial to take a step back, make sure we understand why parseType='Literal' was put into the spec in the first type. Then I think that it will not be too difficult to arrive at a much better solution. Regards, Martin. >As an example: > ><rdf:Description xml:lang="en"> > <eg:prop rdf:parseType="Literal"><b>chat</b></eg:prop> ></rdf:Description> > >and > ><rdf:Description xml:lang="fr"> > <eg:prop rdf:parseType="Literal"><b>chat</b></eg:prop> ></rdf:Description> > >are given exactly the same representation as an RDF graph and exactly the >same meaning. (Which differs from the Last Call documents in which the >language tag is significant). > >The intention in these examples is now expressed as: > ><rdf:Description> > <eg:prop rdf:parseType="Literal"><span > xml:lang="en"><b>chat</b></span></eg:prop> ></rdf:Description> > >and > ><rdf:Description> > <eg:prop rdf:parseType="Literal"><span > xml:lang="fr"><b>chat</b></span></eg:prop> ></rdf:Description> > >I have produced a rationale [3] (not endorsed by the WG). > >Jeremy, on behalf of RDF Core > > >[1] minutes (not yet approved) >http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2003May/0138.html >RESOLVED: Typed literals option 4 from msg 0086 >[2] proposal (#4) >http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2003May/0086.html >[3] rationale (personal not WG) >http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2003May/0145.html
Received on Monday, 19 May 2003 08:55:36 UTC