W3C home > Mailing lists > Public > public-rdf-text@w3.org > January to March 2009

RE: getting to Last Call on rdf:text

From: Phillips, Addison <addison@amazon.com>
Date: Tue, 24 Mar 2009 12:07:35 -0700
To: Alan Ruttenberg <alanruttenberg@gmail.com>, Sandro Hawke <sandro@w3.org>
CC: "public-rdf-text@w3.org" <public-rdf-text@w3.org>, "team-rif-chairs@w3.org" <team-rif-chairs@w3.org>, "team-owl-chairs@w3.org" <team-owl-chairs@w3.org>
Message-ID: <4D25F22093241741BC1D0EEBC2DBB1DA019ECD5473@EX-SEA5-D.ant.amazon.com>
> Here is my take on the editor notes:
> 
> 
> Issue 1, re: an infinity of characters in Unicode, seems wrong
> according to the documentation of Unicode "All three encoding forms
> need at most 4 bytes (or 32-bits) of data for each character", but
> arguments for defining it that way are pragmatic. It would seem
> that
> this needs to be a technical decision about this, probably by vote
> if there is not consensus at this point.

The largest Unicode code point is 0x10FFFF. Period. There is not an infinity of Unicode code points. A better solution would just be to drop this sentence:

--
The set of available characters is assumed to be infinite, and it is thus independent of the current version of UCS and Unicode.
--

The set of characters is independent of the version of Unicode provided that the full range is supported.

> 
> Issue 2 asks for an example of pattern and langpattern.
> 
> An example of pattern would be "(in)|(out)", which matches the
> character sequences "in" and "out" and nothing else. It is unclear
> to me whether the literal should be written as a plan literal or not,
> but I am guessing so.
> 
> An example of a langpattern is "(en)|(en-.+)" - one could get more
> precise by following http://www.rfc-editor.org/rfc/rfc4647.txt but
> I'm not sure it's worth it.

I think it's important to follow RFC 4647. A multiplicity of formats makes it more difficult to work with languages and the most likely useful source of 'langpattern' will be RFC 4647-style language priority lists. Also: following the pattern shown would NOT be compliant with BCP 47 language tag matching. (en-.+) matches many invalid tags, for example.

Addison

Addison Phillips
Globalization Architect -- Lab126
Chair -- W3C Internationalization WG
Editor -- IETF LTRU WG (BCP 47)

Internationalization is not a feature.
It is an architecture.



Received on Tuesday, 24 March 2009 19:08:15 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 19:53:42 UTC