Re: RDF Concepts document Jan 23, lang comments from Graham Klyne on 2003-03-04 (www-rdf-comments@w3.org from January to March 2003)

From: Graham Klyne <GK@ninebynine.org>
Date: Tue, 04 Mar 2003 12:12:28 +0000
To: Tex Texin <tex@XenCraft.com>, Jeremy Carroll <jjc@hplb.hpl.hp.com>
Cc: www-rdf-comments@w3.org, W3c I18n Group <w3c-i18n-ig@w3.org>
Message-Id: <5.1.0.14.2.20030304115559.0393bec0@127.0.0.1>
Tex,

Thank you for your comments.

My co-editor may wish to pick up on some of your points, but meanwhile I'll 
respond, as he is travelling...

>1) The requirement for lang identifiers to be lowercase seems needless (small
>cpu savings) and dangerous.

I think there may be a misunderstanding here.  There is no intent to 
require that language identifiers be lower case in RDF documents.  The 
lowercasing is applied in the process of creating an RDF graph, which is an 
abstract syntax upon which the RDF formal semantics is based.

[[
A literal in an RDF graph contains three components called:

The lexical form being a Unicode [UNICODE] string in Normal Form C [NFC].
The language identifier as defined by [RFC-3066], normalized to lowercase.
The datatype URI being an RDF URI reference.
]]

The key phrase here is *in an RDF graph*.  The normalization to lower case 
is applied precisely to achieve the case-insensitive comparison you request.

Please let me know if you think this remains an issue.

>2) With respect to the rules for comparing literals:
>http://www.w3.org/TR/rdf-concepts/#section-Literal-Equality

You ask for, e.g., (lang="en", str) to be equivalent to (lang="en-gb", str).

I would oppose this change because this behaviour is explicitly discouraged 
by RFC 3066:
[[
2.4 Meaning of the language tag

    The language tag always defines a language as spoken (or written,
    signed or otherwise signaled) by human beings for communication of
    information to other human beings.  Computer languages such as
    programming languages are explicitly excluded.  There is no
    guaranteed relationship between languages whose tags begin with the
    same series of subtags; specifically, they are NOT guaranteed to be
    mutually intelligible, although it will sometimes be the case that
    they are.
]]
-- http://www.ietf.org/rfc/rfc3066.txt

If you still feel that you would like the WG to consider your request, 
please let us know and I will ask for it to be raised as a last-call issue.

>3) "RDF URI References" are defined and are essentially IRI. It would be
>better if the spec could simply cite the upcoming IRI spec, ...

This issue has already been raised to the WG as "williams-01" 
[http://www.w3.org/2001/sw/RDFCore/20030123-issues/#williams-02]

If you feel that consideration of this issue does not address your concern, 
please let us know that we can raise it as a separate issue.

...

In summary, I have not yet requested that any of these be raised as formal 
last-call issues for the reasons given.  If you find the reasons 
less-than-convincing, please reply and I shall take appropriate steps to 
have them considered.

Thank you,

#g
--

At 02:49 PM 2/28/03 -0500, Tex Texin wrote:

>To the editors of the RDF Concepts document Jan 23 edition:
>http://www.w3.org/TR/rdf-concepts/
>
>I have 3 comments- These are individual comments, the W3C I18n Core group will
>followup with you on their comments at next week's plenary meeting. I
>apologize my comments come after your deadline.
>
>1) The requirement for lang identifiers to be lowercase seems needless (small
>cpu savings) and dangerous.
>
>If different specs assert different rules for lang identifier casing, we may
>one day run into a conflict, if these are enforced.
>
>Individual specs should not make arbitrary policy.
>
>The particular rule of all lower case is in conflict with other
>recommendations and standards- RFC 3066 and ISO 3166 recommends (but does not
>require) the country code be uppercase.
>The casing rules made by ISO's TC 37/SC 2 committee were defined a long time
>ago and so are ingrained and prevalent in software.
>It is therefore likely that other technologies and applications that want to
>leverage RDF will already have tags in another casing form. For ease of
>interoperability, the tag should ideally be case-insensitive. If they must be
>case-sensitive, they should at least follow convention.
>
>I am told that some applications may follow the convention of lower language
>and upper country codes, so that if they are used individually, (perhaps when
>parsed and taken apart) they can be distinguished.
>
>I would urge you to make the lang identifier case-insensitive.







>2) With respect to the rules for comparing literals:
>http://www.w3.org/TR/rdf-concepts/#section-Literal-Equality
>
>For reasons of standardization and ease of use, there should exist a higher
>level matching rule that allows one to search for (lang="en", str) and to get
>matches to more detailed tags (lang="en-gb", str).
>This higher level rule should be defined to
>insure a standard practice. I assume this is, or will be, defined somewhere
>else in RDF. Presumably this rule will also provide for inclusion of strings
>with no attribute as well, so I can search for a string and find all matches
>with relevant sets of lang attributes.
>
>To repeat the earlier point, the comparison rule should also be made case
>insensitive for language identifiers.
>
>
>3) "RDF URI References" are defined and are essentially IRI. It would be
>better if the spec could simply cite the upcoming IRI spec, but I see RDF is
>waiting on
>the TAG to decide if they can do that yet. Hopefully that will be decided in
>time for RDF to cite the spec before the Concept doc is final. This would
>insure consistency of definitions.
>
>tex
>
>
>--
>-------------------------------------------------------------
>Tex Texin   cell: +1 781 789 1898   mailto:Tex@XenCraft.com
>Xen Master                          http://www.i18nGuy.com
>
>XenCraft                            http://www.XenCraft.com
>Making e-Business Work Around the World
>-------------------------------------------------------------

-------------------
Graham Klyne
<GK@NineByNine.org>
PGP: 0FAA 69FF C083 000B A2E9  A131 01B9 1C7A DBCA CB5E
Received on Tuesday, 4 March 2003 10:22:23 UTC