- From: Martin Duerst <duerst@w3.org>
- Date: Wed, 09 Jul 2003 19:35:28 -0400
- To: w3c-rdfcore-wg@w3.org, w3c-i18n-ig@w3.org
As promised yesterday, here is an updated list of arguments from the I18N side on the issues surrounding 'XML Literals', and our understanding behind it. Many thanks to many of you for helping me to understand how to better explain things; anything that is not yet clear is clearly my fault. The rationale is split into various parts, such as technical and procedural arguments. Technical Arguments ------------------- One of the needs in internationalization is the need for what we may call 'micro-markup'. Although this is not necessarily very frequent, it turns up repeatedly in all kinds of instances such as: - multilingual text fragments - bidirectionality - ruby - special glyph variants - translation hints for localization There are other cases where similar micro-markup is desirable, such as using mathematical or chemical formulae in text. In the RDF context, the typical example for this is a book title. The RDF M&S specification uses a Math example, and not an i18n example, mainly because that was felt that it was easier for a wide audience to understand. The need for micro-markup for i18n is not limited to RDF at all. We have had long discussions with XML Schema about it, which have led to the 'text' type described in the XML Schema primer (see http://www.w3.org/TR/xmlschema-0/#any), which ended up in the XML Schema type library (see http://www.w3.org/2001/03/XMLSchema/TypeLibrary.xsd http://www.w3.org/2001/03/XMLSchema/TypeLibrary-text.xsd http://www.w3.org/2001/03/XMLSchema/TypeLibrary-nn-text.xsd, you may have to do 'view source' to get the full picture). We have also made review comments to other WGs requesting that elements rather than attributes be used for anything that looks like natural text rather than enumerations, numeric values, and the like, in order to be able to use micro-markup. It is also leading to changes from XHTML 1.0 to XHTML 2.0. The need for micro-markup in various ways makes it quite important that 'text' and 'text-with-markup' are not seen as two completely different things, but that text-with-markup, to whatever extent possible, be seen and handled as an extension of text. This is even more important because the need for micro-markup may often not appear for very long, in particular if mainly data of particular languages is handled. Therefore, it should be as natural as possible to make the transition from plain text to text with micro-markup. All this is not just an issue from the view of RDF/XML (Pat's view X), but very much also, or actually first and foremost, applies to the model and the graph (Pat's view G). The best thing would be the way M&S treats literals, with language information, and with literals containing markup as a natural extension of literals with only plain text. Please note that this does not at all preclude other usages of XML content in RDF literals, be it larger textual pieces (typical example is documentation) or, as some people in this discussion have, to my surprise, suggested, as blobs of data-oriented XML. There is no need for every XML literal to have a language in the Graph (we just want it to be possible to have one), and in RDF/XML, xml:lang="" does the job. We understand that there are some implementation problems, primarily motivated by the desire to store plain strings as strings without escaping, but we see this as an implementation issue, not as a reason for making plain literals and XML literals completely different things, and having them behave completely differently. We see parseType='Literal' as what it says, namely an instruction about how to parse RDF/XML, similar to the other values of parseType, and not as something that should be directly reflected in the graph. From an user point of view, users familar with XML will be surprised that xml:lang tag does affect plain literals (as the XML 1.0 REC says) but does not affect xml literals. Why are the rules for XML switched off just for XML? The idea that applications use an application-specific wrapper element is in conflict with the idea of micro-markup, which should be used only when necessary, and with the concept of markup integrity, which means that in some way or another, markup may always be significant, and changing it may be unapropriate. Procedural ---------- - It is our understanding that RDF Core was chartered with clarifying the RDF M&S spec, not changing it. Already by separating plain literals and XML literals, and much more by removing language information from XML literals, the new spec is a clear change from M&S, rather than a reinterpretation. - We agreed in Cannes that the ambiguity in M&S that RDF applications may or may not consider language information would be resolved to that the RDF graph would provide the language information. - Later, RDF Core asked us about the problem of integrating arbitrary pieces of XML without language information into an RDF/XML document. The same problem was brought up by XML Signature (or was it encryption) and SOAP. The I18N WG recognized this problem, checked with the experts on language tagging standards, and recommended to XML Core to issue an erratum to define xml:lang="" for this case, which they did. - Later, RDF Core asked about the applicability of language information to datatypes such as (XML Schema) integer. We told them that these were designed as language- and locale-independent datatypes, and so it would be appropriate to specify that they did not carry language information. - Although this was rather implicit (in the sense of a common understanding that didn't have to be made explicit), I think neither side ever assumed that removing language information from XML Schema simple datatypes would affect plain literals or XML literals. - After last call, RDF Core asked us whether we would be okay with removing language information from XML literals. It was nice for them to ask, but it also clearly indicates that they understood it to break our previous agreement. We had a look at it and decided that, for the reasons explained above, it would not be okay. It also helped us to understand that the RDF M&S design for literals had been changed rather substantially, with undesired consequences for internationalization, and that ideally, more than just putting language information back on XML literals was needed, but that if really necessary, we could live with only that change back (to the last call state). The bigger picture ------------------ Some people have said that the M&S solution for dealing with language information is not very good. We don't deny that there may be other, potentially better solutions. We also would have been ready (and would still be ready), I guess, to discuss such solutions with RDF Core. But we are not ready to trade something that we have currently (M&S, or with somewhat less satisfaction, the last call status) with something that is according to our understanding clearly less consistent and less useful, with or without the potential to 'get things fixed' in the future. Regards, Martin.
Received on Thursday, 10 July 2003 00:55:33 UTC