RE: getting to Last Call on rdf:text

> Here is my take on the editor notes:
> 
> 
> Issue 1, re: an infinity of characters in Unicode, seems wrong
> according to the documentation of Unicode "All three encoding forms
> need at most 4 bytes (or 32-bits) of data for each character", but
> arguments for defining it that way are pragmatic. It would seem
> that
> this needs to be a technical decision about this, probably by vote
> if there is not consensus at this point.

The largest Unicode code point is 0x10FFFF. Period. There is not an infinity of Unicode code points. A better solution would just be to drop this sentence:

--
The set of available characters is assumed to be infinite, and it is thus independent of the current version of UCS and Unicode.
--

The set of characters is independent of the version of Unicode provided that the full range is supported.

> 
> Issue 2 asks for an example of pattern and langpattern.
> 
> An example of pattern would be "(in)|(out)", which matches the
> character sequences "in" and "out" and nothing else. It is unclear
> to me whether the literal should be written as a plan literal or not,
> but I am guessing so.
> 
> An example of a langpattern is "(en)|(en-.+)" - one could get more
> precise by following http://www.rfc-editor.org/rfc/rfc4647.txt but
> I'm not sure it's worth it.

I think it's important to follow RFC 4647. A multiplicity of formats makes it more difficult to work with languages and the most likely useful source of 'langpattern' will be RFC 4647-style language priority lists. Also: following the pattern shown would NOT be compliant with BCP 47 language tag matching. (en-.+) matches many invalid tags, for example.

Addison

Addison Phillips
Globalization Architect -- Lab126
Chair -- W3C Internationalization WG
Editor -- IETF LTRU WG (BCP 47)

Internationalization is not a feature.
It is an architecture.

Received on Tuesday, 24 March 2009 19:08:15 UTC