Re: JJC's take on I18N concerns from pat hayes on 2003-08-15 (w3c-rdfcore-wg@w3.org from August 2003)

From: pat hayes <phayes@ihmc.us>
Date: Fri, 15 Aug 2003 10:16:29 -0700
To: Martin Duerst <duerst@w3.org>
Cc: Jeremy Carroll <jjc@hpl.hp.com>, w3c-i18n-ig@w3.org, w3c-rdfcore-wg@w3.org
Message-Id: <p06001a01bb629cc1c63c@[10.0.1.2]>
>Hello Jeremy,
>
>Just to make sure, here some responses:
>
>At 21:32 03/08/13 +0300, Jeremy Carroll wrote:

I agree with Jeremy; further comments below.

>
>>This is a reply to
>>
>>http://lists.w3.org/Archives/Member/w3c-i18n-ig/2003Aug/0022
>>
>>which was (AFAICT) endorsed by I18N at their recent telecon.
>>
>>I am also copying the recipients of
>>http://lists.w3.org/Archives/Member/w3c-archive/2003Aug/0027
>>(other than those who I believe are already on the To lists)
>>
>>
>>[[
>>1. The current approach fails to preserve markup integrity for XML
>>literals that have been scraped or obtained from another repository.
>>I18N is not convinced that there will not be use cases where markup
>>integrity is important, and that the current approach will amount to an
>>insuperable issue in those situations.
>>]]
>>
>>A simple reversible algorithm for XHTML family is:
>>- take the XML fragment
>>- take the enclosing lang tag
>>- wrap the XML fragment with a span elemetn if legal, or otherwise a div
>>element.
>>- apply the lnaguage tag to the span element
>>
>>This algorithm needs to be applieid systematically. In particular it must be
>>applied to XML content consisting of precisely a span or a div element. This
>>then ensures that the algorithm is reversible. Given reversibility markup
>>integrity can be preserved.
>
>This algorithm is restricted to the XHTML family,
>and as you say, would need to be applied systematically.
>Which spec will give the details, and which spec will
>say that it has to be applied?
>
>>For non-xhtml markup see 3.
>>
>>[[
>>2. I18N feels that the currently proposed implementation is overly
>>complicated for the user, and that this will introduce a strong risk
>>that users do not implement language information properly.
>>]]
>>
>>RDF Core had feedback against other implementions on grounds of their
>>complexity. This was a tradeoff decision.
>
>I think it is complexity for the user (somebody writing RDF (RDF/XML
>or otherwise) or scraping,...) vs. complexity for implementers of
>core software.

You seem to be operating under assumptions about RDF/XML which are 
inappropriate. We do not anticipate that the typical RDF user will 
ever write or even see RDF/XML. In the RDF design it plays the role 
of a machine-oriented data transmission notation; it will be as 
remote from user experience as Javascript code is from the typical 
user of a web browser. The readability of RDF/XML is relevant only to 
implementers who are concerned with debugging their software.

>>[[
>>3.  The current approach assumes the existence of constructs to describe
>>language and carry language information in the native markup associated
>>with a fragment.  Such constructs may not exist, in which case it seems
>>impossible to ascribe such information at a meta level.

It is certainly not impossible, though it may indeed be ad-hoc. For 
example, one could use a construction like this:

_:x  ex:textIs "somepieceofXMLscrapedfromsomewhere"^^rdf:XMLliteral .
_:x  ex:language "fr" .

where the lang tag is coded as a plain literal.

>>I18N feels that
>>such a situation is very bad.
>>]]
>>
>>RDF Core only has compelling use cases for XHTML and friends.
>>A martkup intended to carry natural language without the ability to use XHTML
>>constructs and without the ability to add arbitrary language markup is
>>deficient, and RDF Core is not tasked with correcting those deficiencies.
>
>RDF Core is not tasked to correct these deficiencies, and if they
>exist, they are indeed deficiencies. This is not a strong argument
>from our side, but just some additional point.
>
>>[[
>>4. It seems to I18N that it will be difficult to convert rdf created
>>using the old syntax to the new syntax. Where legacy documents simply
>>declared xml:lang at the top of the file, they will now have to declare
>>it for every XML literal.  Also, there is no provision for automatic
>>conversion from the old to the new syntax.
>>]]
>>
>>Old style was vague, no indication that xhtml namespace needed declaring
>>(predated xhtml?). Not really useable because of such problems, certainly not
>>in a portable fashion. The old spec is sufficiently bad to make this problem
>>a non-starter since it is not clear what old style xml literal are supposed
>>to mean, particularly the treatment of namespaces. Also old spec was somewhat
>>unclear how language was supposed to be treated.
>
>There definitely were some vaguenesses, but we agreed on what these
>were in the area of language tagging at the Technical Plenary in Cannes.
>The lastcall draft has clarified these.

But there seems to be a basic assumption lying behind your point, 
that the current spec must be backwards compatible with M&S. This is 
impossible: indeed, our WG charter is to clarify and rationalize the 
M&S spec, which was known to be confused when the WG was founded.

>>[[
>>5. I18N considers that it should be possible to conclude that a plain
>>literal and an XML literal without markup are the same text.

They are clearly the same *text*; the literal strings will be 
identical. So, taken literally, your point here is already satisfied. 
If you meant to say that a plain literal, and an XML literal without 
markup, could denote the same value, then this raises a more complex 
issue concerning datatyping.  I would suggest that you raise this 
matter with the XML Schema group, since they have devoted 
considerable time and effort to considerations of identity between 
items in the value spaces of datatypes.  The consensus in XML Schema 
1.0 was that all such value spaces should be considered to be 
disjoint. I gather that the story in 1.1 will be more complex, and 
involve a rather subtle distinction between equality and identity.

>>Introducing
>>language markup as proposed in the current solution makes this
>>impossible, since it is never clear whether the markup was in the
>>original text or not.

That seems to be a weak argument. For example, the use of a 
distinctive wrapper (as suggested by Jeremy at one point) would 
probably suffice to keep the distinction clear, in practice.  But in 
any case, surely this would apply equally well to XML literals with 
language tags?

>>]]
>>These seems like a more sophisticated language string oriented feature that
>>belongs near the postponed issue
>>http://www.w3.org/2000/03/rdf-tracking/#rdfs-lang-vocab
>>
>>I think RDF Core could consider broadening the scope of the postponed issue
>
>This can be seen as part of a broader issue. But then likewise,
>the equality of two plain literals could be seen as part of
>the broader issue of matching substrings among plain literals.
>
>>[[
>>6. I18N has not been convinced that either of the alternative proposals
>>for including language information are problematic, and feels they are
>>more intuitive and workable than the current proposal because they do
>>not entail the problems cited above.
>>]]
>>I think this is answered by Sandro
>>http://lists.w3.org/Archives/Public/www-archive/2003Aug/0004
>>[[
>>There's a serious concern that people who don't care about XML wont
>>bother to implement these bits if they are bolted onto to the side
>>like that.  As just another datatype, it fits in smoothly, with no
>>particular extra work required.  (except for that language tag...)
>>Would you rather many implementations not support XML at all?
>>(Perhaps not really a fair question....)
>>]]
>
>Implementing XML Literals right is basically just a combination of
>plain literals and datatyped literals. So it's not that difficult
>to implement.

It is trickier than you might think when your implementation has to 
interoperate with other SW standards, such as OWL. For example, does 
the owl:Class containing

"nomarkuphere"
"nomarkuphere"^^rdf:XMLLiteral

have one or two items in it? How about the class containing

"nomarkuphere"
"nomarkuphere"^^ex:someDatatype

? You must provide an answer.  Does the answer change when you later 
discover that

ex:someDatatype owl:sameAs rdf:XMLliteral .

? It should not change.  These are the kinds of question that are 
settled by a formal semantics; so in order to make sure that a 
feature is implementable it is not sufficient to implement one app; 
you have to provide a formal semantics for the feature.

BTW, there is a real problem in having entailments depend on the 
detailed syntactic form of literal strings (which they would if you 
want XML literals without markup to be identical to plain literals). 
The whole point of datatyping is that each datatype provides criteria 
for identity between lexical forms which can be checked efficiently 
and locally. If the criteria vary depending on the form of the 
string, then one has effectively created multiple datatypes. If we 
were to say that XML strings without markup were identical to simple 
character strings, but those with markup were a distinct category, 
then this amounts to having two datatypes, rdf:XMLiteralWithMarkup 
and rdf:XMLLiteralWithoutMarkup, the latter being a subclass of 
xsd:string and the former being disjoint from it. We could have done 
this, but (apart from the obvious artificiality) it seems to be of 
little utility, since one could use simple literals or xsd:string to 
type the lexical forms which have no markup and gotten the very same 
entailments, in this case.

I confess to finding this entire issue of XML compliance rather 
boring, but I have always understood, from the WG discussions on this 
issue, that the whole point of XML literal typing was to distinguish 
text which came from an XML 'source' from text which simply happened 
to be like XML in some way, but was not itself genuine XML. Your 
insistence, that genuine XML  should be indistinguishable from faux 
XML in the case that neither contains MXL markup, seems to be 
fundamentally at odds with this basic idea of distinguishing 'real 
XML' from mere text.

Pat

-- 
---------------------------------------------------------------------
IHMC	(850)434 8903 or (650)494 3973   home
40 South Alcaniz St.	(850)202 4416   office
Pensacola			(850)202 4440   fax
FL 32501			(850)291 0667    cell
phayes@ihmc.us       http://www.ihmc.us/users/phayes
Received on Friday, 15 August 2003 13:16:36 UTC