Re: RDF Concepts document Jan 23, lang comments from Tex Texin on 2003-04-04 (www-rdf-comments@w3.org from April to June 2003)

From: Tex Texin <tex@i18nguy.com>
Date: Fri, 04 Apr 2003 03:57:31 -0500
To: Jeremy Carroll <jjc@hplb.hpl.hp.com>
CC: Graham Klyne <GK@ninebynine.org>, Tex Texin <tex@XenCraft.com>, www-rdf-comments@w3.org, W3c I18n Group <w3c-i18n-ig@w3.org>, "Martin Dürst" <duerst@w3.org>
Message-ID: <3E8D48FB.72DFE4D8@i18nguy.com>
Jeremy, Graham,

Thanks for the responses. Here are my personal replies for now, I want the
i18n wg to review and sanction this response and will send you confirmation as
soon as it does. (Hopefully after next week's telecon).


1) I now understand that the lowercasing of the lang identifier is constrained
to the RDF graph.
The proposed text is a better solution as it makes the specification explicit,
but I would find the test cases as adequate to clarify the issue.

2) As for the issue of comparing literals, it would be good to postpone the
issue.

I will take up the discussion of comparing or matching identifiers in the i18n
wg and in other venues where identifiers and locales are now being discussed.

3) The IRI NFC issue was discussed in a previous mail by Martin Dürst. The
recommendation is to remove the reference to NFC with respect to
IRIs. I'll paraphrase the mail here, because I just noticed that the mail
didn't go to the RDF comments list:

"Making no mention for NFC in IRIs is better, because otherwise RDF may
conflict with the IRI spec, if the definition is tweaked.
On the question of NFC for RDF overall, we have backed up from the position
that everybody MUST check and reject. Check and reject for NFC is now just a
SHOULD.

It is still true, in all cases, that 'applications' (such as RDF) MUST NOT
normalize received text themselves.


Tex


Jeremy Carroll wrote:
> 
> RDF Core coordination:
>    Brian please assign issue numbers for
>      - language tag case
>      - language ranges
>    Please add Tex to the williams-02 IRI issue.
> 
> Tex, I have specific questions for you in the text below i.e.
> (A)  Would adding a test case(s) suffice for the language tag case issue,
> or do you request a note in the text?
> (B)  Is it worth adding a postponed issue for the language range comment,
> or do you request further WG consideration now, or are you happy to
> withdraw the comment?
> (C)   Do I18N-WG withdraw the advice that URIs in RDF should be in NFC, in
> favour of advice that we should defer to namespaces 1.1?
> 
> Picking up Graham's initial response:
> 
> Graham Klyne wrote:
> 
> > Tex,
> >
> > Thank you for your comments.
> >
> > My co-editor may wish to pick up on some of your points, but meanwhile
> > I'll respond, as he is travelling...
> 
> Thanks Graham ...
> 
> >
> >> 1) The requirement for lang identifiers to be lowercase seems needless
> >> (small
> >> cpu savings) and dangerous.
> >
> >
> > I think there may be a misunderstanding here.  There is no intent to
> > require that language identifiers be lower case in RDF documents.  The
> > lowercasing is applied in the process of creating an RDF graph, which is
> > an abstract syntax upon which the RDF formal semantics is based.
> >
> > [[
> > A literal in an RDF graph contains three components called:
> >
> > The lexical form being a Unicode [UNICODE] string in Normal Form C [NFC].
> > The language identifier as defined by [RFC-3066], normalized to lowercase.
> > The datatype URI being an RDF URI reference.
> > ]]
> >
> > The key phrase here is *in an RDF graph*.  The normalization to lower
> > case is applied precisely to achieve the case-insensitive comparison you
> > request.
> >
> > Please let me know if you think this remains an issue.
> 
> In my view, it does.
> Tex misread the text; we should ensure that that is a misreading and not a
> reading.
> 
> Two things we can do are:
> 1) add new test cases to reflect that
>     en-US en-us and en-Us
>     all mean the same thing.
> This would be easy, and unlikely to meet opposition.
> The test would should that in RDF/XML documents there is no expectation of
> case normalization on language tags; but that however you write the tag, it
> is the same tag.
> 
> 2) [with more potential opposition]
>     Add a note to concepts like:
> [[
> Note: The case normalization of language tags is part of
> the description of the abstract syntax, and implicitly the abstract
> behaviour of RDF applications. It is not intended to constrain an
> RDF implementation to actually normalize the case. Crucially, the result
> of comparing two language tags should not be sensitive to the case of
> the original input.
> ]]
> 
> (A) Tex, would (1) adequately address this issue; or would you strongly
> prefer text along the lines of (2).
> 
> Historical note: earlier drafts did not normalize language tags but had an
> explicit case insensitive comparision (referencing RFC 3066). However, this
> created difficulties in the semantics doc; and the current text provided a
> fix. The alternative would be to have too many instances of 'case
> insensitve comparison of language tags' in the semantics doc.
> 
> >
> >> 2) With respect to the rules for comparing literals:
> >> http://www.w3.org/TR/rdf-concepts/#section-Literal-Equality
> >
> >
> > You ask for, e.g., (lang="en", str) to be equivalent to (lang="en-gb",
> > str).
> >
> > I would oppose this change because this behaviour is explicitly
> > discouraged by RFC 3066:
> > [[
> > 2.4 Meaning of the language tag
> >
> >    The language tag always defines a language as spoken (or written,
> >    signed or otherwise signaled) by human beings for communication of
> >    information to other human beings.  Computer languages such as
> >    programming languages are explicitly excluded.  There is no
> >    guaranteed relationship between languages whose tags begin with the
> >    same series of subtags; specifically, they are NOT guaranteed to be
> >    mutually intelligible, although it will sometimes be the case that
> >    they are.
> > ]]
> > -- http://www.ietf.org/rfc/rfc3066.txt
> >
> > If you still feel that you would like the WG to consider your request,
> > please let us know and I will ask for it to be raised as a last-call issue.
> >
> 
> Tex seems to assume that we provide some mechanism for supporting such
> comparisons which are discussed in RFC 3066. Specifically the langauge
> range construct of section 2.5 of RFC 3066. We don't. RDF Model & Syntax
> didn't. It was not in  our charter to explore such mechanisms.
> 
> If the I18N-WG felt it important we could add a postponed issue to the RDF
> issue list on this one. This would put down a marker for the future.
> 
> (B) Tex, would you want a postponed issue?
> Do you want us to consider language ranges more widely?
> Or are you happy with the response, no we don't do that?
> 
> >> 3) "RDF URI References" are defined and are essentially IRI. It would be
> >> better if the spec could simply cite the upcoming IRI spec, ...
> >
> >
> > This issue has already been raised to the WG as "williams-01"
> > [http://www.w3.org/2001/sw/RDFCore/20030123-issues/#williams-02]
> >
> 
> A concern is that the current text partially reflects a meeting between RDF
> Core and I18N WGs at the Cannes plenary 2002. At that meeting the I18N WG
> were supportive of the constraint on IRIs that they should be in NFC.
> A plausible resoltion to the williams issue is to propose that we rename
> "RDF URI references" as "IRI" and defer to XML Namespaces 1.1.
> This will have the substantive change of permitting identifiers which are
> not in normal form C; these will be expressable in the abstract syntax, and
> in XML 1.0; but not in XML 1.1 (which is fully normalized).
> I also note that XML Namespaces 1.1 uses the term "IRI" to include IRI
> references.
> 
> A further note to this discussion is that there is deep opposition within
> the WG to citing a draft from a normative document; and so Tex's apparent
> preference for citing the IRI draft will not gather consensus.
> 
> (C) I request that I18N WG clearly indicate their preference here.
> 
> > If you feel that consideration of this issue does not address your
> > concern, please let us know that we can raise it as a separate issue.
> >
> > ...
> >
> > In summary, I have not yet requested that any of these be raised as
> > formal last-call issues for the reasons given.  If you find the reasons
> > less-than-convincing, please reply and I shall take appropriate steps to
> > have them considered.
> 
> My summary: in the absence of a reply I will propose to the WG:
> - adding test cases on the language tag case issue (but not any note)
> - dropping the language range issue
> - formally requesting a response from I18N-WG on the IRI NFC sub-issue.
> 
> Jeremy

-- 
-------------------------------------------------------------
Tex Texin   cell: +1 781 789 1898   mailto:Tex@XenCraft.com
Xen Master                          http://www.i18nGuy.com
                         
XenCraft		            http://www.XenCraft.com
Making e-Business Work Around the World
-------------------------------------------------------------
Received on Friday, 4 April 2003 03:59:46 UTC