Re: RDF Concepts document Jan 23, lang comments from Jeremy Carroll on 2003-03-11 (www-rdf-comments@w3.org from January to March 2003)

From: Jeremy Carroll <jjc@hplb.hpl.hp.com>
Date: Tue, 11 Mar 2003 08:36:26 +0000
To: Graham Klyne <GK@ninebynine.org>
CC: Tex Texin <tex@XenCraft.com>, Jeremy Carroll <jjc@hplb.hpl.hp.com>, www-rdf-comments@w3.org, W3c I18n Group <w3c-i18n-ig@w3.org>
Message-ID: <3E6DA00A.3030208@hpl.hp.com>
RDF Core coordination:
   Brian please assign issue numbers for
     - language tag case
     - language ranges
   Please add Tex to the williams-02 IRI issue.

Tex, I have specific questions for you in the text below i.e.
(A)  Would adding a test case(s) suffice for the language tag case issue, 
or do you request a note in the text?
(B)  Is it worth adding a postponed issue for the language range comment, 
or do you request further WG consideration now, or are you happy to 
withdraw the comment?
(C)   Do I18N-WG withdraw the advice that URIs in RDF should be in NFC, in 
favour of advice that we should defer to namespaces 1.1?

Picking up Graham's initial response:

Graham Klyne wrote:

> Tex,
> 
> Thank you for your comments.
> 
> My co-editor may wish to pick up on some of your points, but meanwhile 
> I'll respond, as he is travelling...


Thanks Graham ...


> 
>> 1) The requirement for lang identifiers to be lowercase seems needless 
>> (small
>> cpu savings) and dangerous.
> 
> 
> I think there may be a misunderstanding here.  There is no intent to 
> require that language identifiers be lower case in RDF documents.  The 
> lowercasing is applied in the process of creating an RDF graph, which is 
> an abstract syntax upon which the RDF formal semantics is based.
> 
> [[
> A literal in an RDF graph contains three components called:
> 
> The lexical form being a Unicode [UNICODE] string in Normal Form C [NFC].
> The language identifier as defined by [RFC-3066], normalized to lowercase.
> The datatype URI being an RDF URI reference.
> ]]
> 
> The key phrase here is *in an RDF graph*.  The normalization to lower 
> case is applied precisely to achieve the case-insensitive comparison you 
> request.
> 
> Please let me know if you think this remains an issue.


In my view, it does.
Tex misread the text; we should ensure that that is a misreading and not a 
reading.

Two things we can do are:
1) add new test cases to reflect that
    en-US en-us and en-Us
    all mean the same thing.
This would be easy, and unlikely to meet opposition.
The test would should that in RDF/XML documents there is no expectation of 
case normalization on language tags; but that however you write the tag, it 
is the same tag.

2) [with more potential opposition]
    Add a note to concepts like:
[[
Note: The case normalization of language tags is part of
the description of the abstract syntax, and implicitly the abstract
behaviour of RDF applications. It is not intended to constrain an
RDF implementation to actually normalize the case. Crucially, the result
of comparing two language tags should not be sensitive to the case of
the original input.
]]

(A) Tex, would (1) adequately address this issue; or would you strongly 
prefer text along the lines of (2).

Historical note: earlier drafts did not normalize language tags but had an 
explicit case insensitive comparision (referencing RFC 3066). However, this 
created difficulties in the semantics doc; and the current text provided a 
fix. The alternative would be to have too many instances of 'case 
insensitve comparison of language tags' in the semantics doc.


> 
>> 2) With respect to the rules for comparing literals:
>> http://www.w3.org/TR/rdf-concepts/#section-Literal-Equality
> 
> 
> You ask for, e.g., (lang="en", str) to be equivalent to (lang="en-gb", 
> str).
> 
> I would oppose this change because this behaviour is explicitly 
> discouraged by RFC 3066:
> [[
> 2.4 Meaning of the language tag
> 
>    The language tag always defines a language as spoken (or written,
>    signed or otherwise signaled) by human beings for communication of
>    information to other human beings.  Computer languages such as
>    programming languages are explicitly excluded.  There is no
>    guaranteed relationship between languages whose tags begin with the
>    same series of subtags; specifically, they are NOT guaranteed to be
>    mutually intelligible, although it will sometimes be the case that
>    they are.
> ]]
> -- http://www.ietf.org/rfc/rfc3066.txt
> 
> If you still feel that you would like the WG to consider your request, 
> please let us know and I will ask for it to be raised as a last-call issue.
> 


Tex seems to assume that we provide some mechanism for supporting such 
comparisons which are discussed in RFC 3066. Specifically the langauge 
range construct of section 2.5 of RFC 3066. We don't. RDF Model & Syntax 
didn't. It was not in  our charter to explore such mechanisms.

If the I18N-WG felt it important we could add a postponed issue to the RDF 
issue list on this one. This would put down a marker for the future.

(B) Tex, would you want a postponed issue?
Do you want us to consider language ranges more widely?
Or are you happy with the response, no we don't do that?


>> 3) "RDF URI References" are defined and are essentially IRI. It would be
>> better if the spec could simply cite the upcoming IRI spec, ...
> 
> 
> This issue has already been raised to the WG as "williams-01" 
> [http://www.w3.org/2001/sw/RDFCore/20030123-issues/#williams-02]
> 



A concern is that the current text partially reflects a meeting between RDF 
Core and I18N WGs at the Cannes plenary 2002. At that meeting the I18N WG 
were supportive of the constraint on IRIs that they should be in NFC.
A plausible resoltion to the williams issue is to propose that we rename 
"RDF URI references" as "IRI" and defer to XML Namespaces 1.1.
This will have the substantive change of permitting identifiers which are 
not in normal form C; these will be expressable in the abstract syntax, and 
in XML 1.0; but not in XML 1.1 (which is fully normalized).
I also note that XML Namespaces 1.1 uses the term "IRI" to include IRI 
references.

A further note to this discussion is that there is deep opposition within 
the WG to citing a draft from a normative document; and so Tex's apparent 
preference for citing the IRI draft will not gather consensus.

(C) I request that I18N WG clearly indicate their preference here.


> If you feel that consideration of this issue does not address your 
> concern, please let us know that we can raise it as a separate issue.
> 
> ...
> 
> In summary, I have not yet requested that any of these be raised as 
> formal last-call issues for the reasons given.  If you find the reasons 
> less-than-convincing, please reply and I shall take appropriate steps to 
> have them considered.


My summary: in the absence of a reply I will propose to the WG:
- adding test cases on the language tag case issue (but not any note)
- dropping the language range issue
- formally requesting a response from I18N-WG on the IRI NFC sub-issue.

Jeremy
Received on Tuesday, 11 March 2003 03:37:03 UTC