Re: Updated summary from I18N on 'XML Literals'

Hi Martin,

Thank you for this.  I'm sure it will be helpful in understanding your
point of view.

However, I note with some frustration, that after three weeks of
discussion, I still don't have a single use case illustrating
significant harm done to the cause of internationalization by RDFCore's
post last call decision.

As things stand, RDFCore has no case to answer.

Brian

On Thu, 2003-07-10 at 00:35, Martin Duerst wrote:
> As promised yesterday, here is an updated list of arguments
> from the I18N side on the issues surrounding 'XML Literals',
> and our understanding behind it.
> 
> Many thanks to many of you for helping me to understand
> how to better explain things; anything that is not yet
> clear is clearly my fault.
> 
> The rationale is split into various parts, such as technical
> and procedural arguments.
> 
> 
> Technical Arguments
> -------------------
> 
> One of the needs in internationalization is the need for what we
> may call 'micro-markup'. Although this is not necessarily very
> frequent, it turns up repeatedly in all kinds of instances such as:
>    - multilingual text fragments
>    - bidirectionality
>    - ruby
>    - special glyph variants
>    - translation hints for localization
> There are other cases where similar micro-markup is desirable,
> such as using mathematical or chemical formulae in text.
> 
> In the RDF context, the typical example for this is a book title.
> The RDF M&S specification uses a Math example, and not an i18n
> example, mainly because that was felt that it was easier for a
> wide audience to understand.
> 
> The need for micro-markup for i18n is not limited to RDF at all.
> We have had long discussions with XML Schema about it, which have
> led to the 'text' type described in the XML Schema primer (see 
> http://www.w3.org/TR/xmlschema-0/#any), which ended up in the
> XML Schema type library (see
> http://www.w3.org/2001/03/XMLSchema/TypeLibrary.xsd
> http://www.w3.org/2001/03/XMLSchema/TypeLibrary-text.xsd
> http://www.w3.org/2001/03/XMLSchema/TypeLibrary-nn-text.xsd,
> you may have to do 'view source' to get the full picture).
> 
> We have also made review comments to other WGs requesting that
> elements rather than attributes be used for anything that looks
> like natural text rather than enumerations, numeric values, and
> the like, in order to be able to use micro-markup. It is also
> leading to changes from XHTML 1.0 to XHTML 2.0.
> 
> The need for micro-markup in various ways makes it quite important
> that 'text' and 'text-with-markup' are not seen as two completely
> different things, but that text-with-markup, to whatever extent
> possible, be seen and handled as an extension of text. This is even
> more important because the need for micro-markup may often not
> appear for very long, in particular if mainly data of particular
> languages is handled. Therefore, it should be as natural as possible
> to make the transition from plain text to text with micro-markup.
> 
> All this is not just an issue from the view of RDF/XML (Pat's
> view X), but very much also, or actually first and foremost,
> applies to the model and the graph (Pat's view G). The best thing
> would be the way M&S treats literals, with language information,
> and with literals containing markup as a natural extension of
> literals with only plain text. Please note that this does not at
> all preclude other usages of XML content in RDF literals, be it
> larger textual pieces (typical example is documentation) or,
> as some people in this discussion have, to my surprise, suggested,
> as blobs of data-oriented XML. There is no need for every XML literal
> to have a language in the Graph (we just want it to be possible to
> have one), and in RDF/XML, xml:lang="" does the job.
> 
> 
> We understand that there are some implementation problems, primarily
> motivated by the desire to store plain strings as strings without
> escaping, but we see this as an implementation issue, not as a
> reason for making plain literals and XML literals completely
> different things, and having them behave completely differently.
> 
> 
> We see parseType='Literal' as what it says, namely an instruction
> about how to parse RDF/XML, similar to the other values of parseType,
> and not as something that should be directly reflected in the
> graph.
> 
> 
>  From an user point of view, users familar with XML will be surprised
> that xml:lang tag does affect plain literals (as the XML 1.0 REC says)
> but does not affect xml literals. Why are the rules for XML switched
> off just for XML?
> 
> 
> The idea that applications use an application-specific wrapper element
> is in conflict with the idea of micro-markup, which should be used
> only when necessary, and with the concept of markup integrity,
> which means that in some way or another, markup may always be
> significant, and changing it may be unapropriate.
> 
> 
> Procedural
> ----------
> 
> - It is our understanding that RDF Core was chartered with
>    clarifying the RDF M&S spec, not changing it. Already by
>    separating plain literals and XML literals, and much more
>    by removing language information from XML literals, the
>    new spec is a clear change from M&S, rather than a
>    reinterpretation.
> 
> - We agreed in Cannes that the ambiguity in M&S that RDF applications
>    may or may not consider language information would be resolved
>    to that the RDF graph would provide the language information.
> 
> - Later, RDF Core asked us about the problem of integrating
>    arbitrary pieces of XML without language information into
>    an RDF/XML document. The same problem was brought up by
>    XML Signature (or was it encryption) and SOAP. The I18N
>    WG recognized this problem, checked with the experts on
>    language tagging standards, and recommended to XML Core
>    to issue an erratum to define xml:lang="" for this case,
>    which they did.
> 
> - Later, RDF Core asked about the applicability of language
>    information to datatypes such as (XML Schema) integer.
>    We told them that these were designed as language- and
>    locale-independent datatypes, and so it would be appropriate
>    to specify that they did not carry language information.
> 
> - Although this was rather implicit (in the sense of a common
>    understanding that didn't have to be made explicit), I think
>    neither side ever assumed that removing language information
>    from XML Schema simple datatypes would affect plain literals
>    or XML literals.
> 
> - After last call, RDF Core asked us whether we would be okay
>    with removing language information from XML literals. It was
>    nice for them to ask, but it also clearly indicates that they
>    understood it to break our previous agreement. We had a look
>    at it and decided that, for the reasons explained above, it
>    would not be okay. It also helped us to understand that the
>    RDF M&S design for literals had been changed rather substantially,
>    with undesired consequences for internationalization, and that
>    ideally, more than just putting language information back on
>    XML literals was needed, but that if really necessary, we
>    could live with only that change back (to the last call state).
> 
> 
> 
> The bigger picture
> ------------------
> 
> Some people have said that the M&S solution for dealing with
> language information is not very good. We don't deny that there
> may be other, potentially better solutions. We also would have
> been ready (and would still be ready), I guess, to discuss such
> solutions with RDF Core. But we are not ready to trade something
> that we have currently (M&S, or with somewhat less satisfaction,
> the last call status) with something that is according to our
> understanding clearly less consistent and less useful, with
> or without the potential to 'get things fixed' in the future.
> 
> 
> 
> Regards,    Martin.
> 
> 
> 
> 
> 

Received on Thursday, 10 July 2003 05:07:47 UTC