W3C home > Mailing lists > Public > public-rdf-text@w3.org > January to March 2009

RE: getting to Last Call on rdf:text

From: Phillips, Addison <addison@amazon.com>
Date: Tue, 24 Mar 2009 12:07:35 -0700
To: Alan Ruttenberg <alanruttenberg@gmail.com>, Sandro Hawke <sandro@w3.org>
CC: "public-rdf-text@w3.org" <public-rdf-text@w3.org>, "team-rif-chairs@w3.org" <team-rif-chairs@w3.org>, "team-owl-chairs@w3.org" <team-owl-chairs@w3.org>
Message-ID: <4D25F22093241741BC1D0EEBC2DBB1DA019ECD5473@EX-SEA5-D.ant.amazon.com>
> Here is my take on the editor notes:
> Issue 1, re: an infinity of characters in Unicode, seems wrong
> according to the documentation of Unicode "All three encoding forms
> need at most 4 bytes (or 32-bits) of data for each character", but
> arguments for defining it that way are pragmatic. It would seem
> that
> this needs to be a technical decision about this, probably by vote
> if there is not consensus at this point.

The largest Unicode code point is 0x10FFFF. Period. There is not an infinity of Unicode code points. A better solution would just be to drop this sentence:

The set of available characters is assumed to be infinite, and it is thus independent of the current version of UCS and Unicode.

The set of characters is independent of the version of Unicode provided that the full range is supported.

> Issue 2 asks for an example of pattern and langpattern.
> An example of pattern would be "(in)|(out)", which matches the
> character sequences "in" and "out" and nothing else. It is unclear
> to me whether the literal should be written as a plan literal or not,
> but I am guessing so.
> An example of a langpattern is "(en)|(en-.+)" - one could get more
> precise by following http://www.rfc-editor.org/rfc/rfc4647.txt but
> I'm not sure it's worth it.

I think it's important to follow RFC 4647. A multiplicity of formats makes it more difficult to work with languages and the most likely useful source of 'langpattern' will be RFC 4647-style language priority lists. Also: following the pattern shown would NOT be compliant with BCP 47 language tag matching. (en-.+) matches many invalid tags, for example.


Addison Phillips
Globalization Architect -- Lab126
Chair -- W3C Internationalization WG
Editor -- IETF LTRU WG (BCP 47)

Internationalization is not a feature.
It is an architecture.

Received on Tuesday, 24 March 2009 19:08:15 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 19:53:42 UTC