W3C home > Mailing lists > Public > w3c-rdfcore-wg@w3.org > August 2003

Re: JJC's take on I18N concerns

From: Martin Duerst <duerst@w3.org>
Date: Thu, 14 Aug 2003 16:32:13 -0400
Message-Id: <>
To: Jeremy Carroll <jjc@hpl.hp.com>, w3c-i18n-ig@w3.org, w3c-rdfcore-wg@w3.org
Cc: swick@w3.org, timbl@w3.org, sandro@w3.org, djweitzner@w3.org

Hello Jeremy,

Just to make sure, here some responses:

At 21:32 03/08/13 +0300, Jeremy Carroll wrote:

>This is a reply to
>which was (AFAICT) endorsed by I18N at their recent telecon.
>I am also copying the recipients of
>(other than those who I believe are already on the To lists)
>1. The current approach fails to preserve markup integrity for XML
>literals that have been scraped or obtained from another repository.
>I18N is not convinced that there will not be use cases where markup
>integrity is important, and that the current approach will amount to an
>insuperable issue in those situations.
>A simple reversible algorithm for XHTML family is:
>- take the XML fragment
>- take the enclosing lang tag
>- wrap the XML fragment with a span elemetn if legal, or otherwise a div
>- apply the lnaguage tag to the span element
>This algorithm needs to be applieid systematically. In particular it must be
>applied to XML content consisting of precisely a span or a div element. This
>then ensures that the algorithm is reversible. Given reversibility markup
>integrity can be preserved.

This algorithm is restricted to the XHTML family,
and as you say, would need to be applied systematically.
Which spec will give the details, and which spec will
say that it has to be applied?

>For non-xhtml markup see 3.
>2. I18N feels that the currently proposed implementation is overly
>complicated for the user, and that this will introduce a strong risk
>that users do not implement language information properly.
>RDF Core had feedback against other implementions on grounds of their
>complexity. This was a tradeoff decision.

I think it is complexity for the user (somebody writing RDF (RDF/XML
or otherwise) or scraping,...) vs. complexity for implementers of
core software.

>3.  The current approach assumes the existence of constructs to describe
>language and carry language information in the native markup associated
>with a fragment.  Such constructs may not exist, in which case it seems
>impossible to ascribe such information at a meta level.  I18N feels that
>such a situation is very bad.
>RDF Core only has compelling use cases for XHTML and friends.
>A martkup intended to carry natural language without the ability to use XHTML
>constructs and without the ability to add arbitrary language markup is
>deficient, and RDF Core is not tasked with correcting those deficiencies.

RDF Core is not tasked to correct these deficiencies, and if they
exist, they are indeed deficiencies. This is not a strong argument
from our side, but just some additional point.

>4. It seems to I18N that it will be difficult to convert rdf created
>using the old syntax to the new syntax. Where legacy documents simply
>declared xml:lang at the top of the file, they will now have to declare
>it for every XML literal.  Also, there is no provision for automatic
>conversion from the old to the new syntax.
>Old style was vague, no indication that xhtml namespace needed declaring
>(predated xhtml?). Not really useable because of such problems, certainly not
>in a portable fashion. The old spec is sufficiently bad to make this problem
>a non-starter since it is not clear what old style xml literal are supposed
>to mean, particularly the treatment of namespaces. Also old spec was somewhat
>unclear how language was supposed to be treated.

There definitely were some vaguenesses, but we agreed on what these
were in the area of language tagging at the Technical Plenary in Cannes.
The lastcall draft has clarified these.

>5. I18N considers that it should be possible to conclude that a plain
>literal and an XML literal without markup are the same text. Introducing
>language markup as proposed in the current solution makes this
>impossible, since it is never clear whether the markup was in the
>original text or not.
>These seems like a more sophisticated language string oriented feature that
>belongs near the postponed issue
>I think RDF Core could consider broadening the scope of the postponed issue

This can be seen as part of a broader issue. But then likewise,
the equality of two plain literals could be seen as part of
the broader issue of matching substrings among plain literals.

>6. I18N has not been convinced that either of the alternative proposals
>for including language information are problematic, and feels they are
>more intuitive and workable than the current proposal because they do
>not entail the problems cited above.
>I think this is answered by Sandro
>There's a serious concern that people who don't care about XML wont
>bother to implement these bits if they are bolted onto to the side
>like that.  As just another datatype, it fits in smoothly, with no
>particular extra work required.  (except for that language tag...)
>Would you rather many implementations not support XML at all?
>(Perhaps not really a fair question....)

Implementing XML Literals right is basically just a combination of
plain literals and datatyped literals. So it's not that difficult
to implement.
I'm glad to volunteer for implementing XML Literals with language
information in one RDF implementation (I did this for normalization
checking for XML 1.1 over a weekend, and my current guess is that
implementing XML Literals with language information is easier
than implementing localization checking).

Also, there are numerous examples of specs where it took some time
to let implementations catch on. This is particularly true for i18n

Regards,    Martin.
Received on Thursday, 14 August 2003 17:33:41 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 7 January 2015 14:54:07 UTC