Re: Comment on strings and languages in RDF from Graham Klyne on 2003-06-06 (w3c-rdfcore-wg@w3.org from June 2003)

From: Graham Klyne <gk@ninebynine.org>
Date: Fri, 06 Jun 2003 14:01:17 +0100
To: Martin Duerst <duerst@w3.org>, w3c-rdfcore-wg@w3.org
Cc: w3c-i18n-ig@w3.org, swick@w3.org
Message-Id: <5.1.0.14.2.20030606130832.02f79ee8@127.0.0.1>
This is a complicated issue, and I may have overlooked some of the reasons 
we are where we are, but I think Martin's description [1], second part 
(starting with "The orgininal RDF spec 
(http://www.w3.org/TR/1999/REC-rdf-syntax-19990222)had very few variation 
for literals."), is useful and well-presented background.

I think the arguments made reinforce the idea of losing XMLLiteral as a 
datatype, which I previously [2] argued in favour of in response to 
Martin's earlier comments [5], which were themselves in response to the 
decision [3] [4] to drop language tags and <rdf-wrapper> stuff from XML 
literals.

So, along the line, it seems we decided:
(a) that XML C14N was a parser issue, and the abstract graph syntax is 
presumed to contain only canonical XML literal data,
(b) to drop the language tag from XML literals, hence the need for 
<rdf-wrapper> in the abstract syntax.

Martin argues that (1) XML literals shouldn't be gratuitously different 
than plain literals, and (2) XML literals really need to have language tags.

Thus, I propose that parsetype='Literal' values be treated as plain 
literals, and that the parsetype='Literal' is simply a flag to the parset 
to not try to interpret the contained XML data as any form of RDF 
description.  The value of a literal thus described is simply the sequence 
of characters (after C14N applied to the RDF/XML) contained within the 
corresponding element, together with the in-scope XML language tag (if any).

The current approved definitions, per [6] [7] make it clear that a plain 
literal denotes either a string or a pair of two strings, one of which is a 
language tags.

After that, there's one final wrinkle raised by Martin's last message, 
namely the status of xsd:string datatyped literal values.  My view on these 
is that they are syntactically distinct values in the RDF graph, but under 
a suitable datatyped interpretation will denote the same thing as a 
corresponding plain literal without language tag.

#g
--

[1] http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2003Jun/0023.html

[2] http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2003May/0203.html

[3] Meeting minutes 9-May-2003 (item)
     http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2003May/0200.html

[4] Jeremy's outline proposal, option 4 approved per [3]
     http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2003May/0138.html

[5] Martin's comments in response to this decision:
     http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2003May/0200.html

[6] Meeting minutes 16-May-2003
     http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2003May/0199.html

[7] Jeremy's proposal for revised literal interpretation,
     broadly accepted per [6]:
     http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2003May/0151.html

[8] Meeting minutes 28-Mar-2003 (item 13):
     http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2003May/0199.html

[9] C14N proposal accepted per [5]:
     http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2003May/0199.html


At 17:50 05/06/03 -0400, Martin Duerst wrote:

>Dear RDF WG,
>
>I have been actioned by the I18N WG (Core TF) to write to you.
>
>This is partially a Last Call comment, and partially a comment
>on your recently announced post-Last Call changes. It affects
>several of your specifications.
>
>On its recent teleconference, the I18N WG (Core TF) agreed with
>my summary of the situation in
>http://www.w3.org/mid/4.2.0.58.J.20030605145023.06c05ce0@localhost.
>
>In particular, we have looked at the current (both in the Last
>Call as well as in your later proposal) status of string and
>language handling in RDF literals (plain literals, XML literals,
>typed literals of XML Schema Datatype 'string').
>
>The core arguments for our case are contained in the above email,
>but I'll copy them here for your easy reference:
>
> >>>>>>>>
>This situation is not at all satisfactory from the viewpoint
>of I18N because:
>- We have worked hard to eliminate artificial differences between
>   text strings that are essentially the same:
>   - by basing XML and RDF on Unicode, and therefore eliminating
>     differences in character encoding.
>   - by working on normalization (NFC) to reduce or avoid accidental
>     differences based on remaining encoding choices in Unicode
>   It would be very bad if after all that work, we were left with
>   gratuitously different ways of representing textual strings due
>   to idiosyncrasies of a type system.
>
>- Language tagging is an important aspect of internationalization.
>   Also, small-scale markup is important for internationalization
>   (multilanguage strings, bidirectionality, ruby, glyph variants,...).
>   Both are in many ways natural extensions of plain text strings
>   as soon as markup is available.
>
>   The current handling of XML literal strings without any actual
>   markup, as well as the recent change to ignore xml:lang on XML
>   literals, break this natural extension.
>
>   In addition, the recent change to ignore xml:lang on XML
>   literals makes language tagging more tedious in the prevalent
>   case of monolingual or mostly monolingual data.
> >>>>>>>>
>
>
>We think that this is a very important issue for RDF and I18N,
>and strongly urge you to find a better solution. We think the
>proposal given by Ralph is a very good start, but we are sure
>you will have other ideas.
>
>
>With kind regards,     Martin.

-------------------
Graham Klyne
<GK@NineByNine.org>
PGP: 0FAA 69FF C083 000B A2E9  A131 01B9 1C7A DBCA CB5E
Received on Friday, 6 June 2003 09:15:48 UTC