RE: getting to Last Call on rdf:text

Hello Mr. Addison,

Thank you very much for your invaluable input! I am not an expert on Unicode, so
I was unaware of the fact that the number of code points is fixed. In light of
what you said, we've changed the definitions of rdf:text. In order to follow XML
Schema, we've defined a character as in XML 1.1. Since XML 1.1 excludes certain
characters, we are left with 1,112,061 code points in rdf:text.

Thank you also for the pointer to RFC 4647; we'll take this under consideration.

Should you be interested in the changes, you can take a look at them here:

http://www.w3.org/2007/OWL/wiki/InternationalizedStringSpec

We'll appreciate any further feedback you can give us.

Regards,

Boris Motik

> -----Original Message-----
> From: public-rdf-text-request@w3.org [mailto:public-rdf-text-request@w3.org]
> On Behalf Of Phillips, Addison
> Sent: 24 March 2009 19:08
> To: Alan Ruttenberg; Sandro Hawke
> Cc: public-rdf-text@w3.org; team-rif-chairs@w3.org; team-owl-chairs@w3.org
> Subject: RE: getting to Last Call on rdf:text
> 
> > Here is my take on the editor notes:
> >
> >
> > Issue 1, re: an infinity of characters in Unicode, seems wrong
> > according to the documentation of Unicode "All three encoding forms
> > need at most 4 bytes (or 32-bits) of data for each character", but
> > arguments for defining it that way are pragmatic. It would seem
> > that
> > this needs to be a technical decision about this, probably by vote
> > if there is not consensus at this point.
> 
> The largest Unicode code point is 0x10FFFF. Period. There is not an infinity
> of Unicode code points. A better solution would just be to drop this sentence:
> 
> --
> The set of available characters is assumed to be infinite, and it is thus
> independent of the current version of UCS and Unicode.
> --
> 
> The set of characters is independent of the version of Unicode provided that
> the full range is supported.
> 
> >
> > Issue 2 asks for an example of pattern and langpattern.
> >
> > An example of pattern would be "(in)|(out)", which matches the
> > character sequences "in" and "out" and nothing else. It is unclear
> > to me whether the literal should be written as a plan literal or not,
> > but I am guessing so.
> >
> > An example of a langpattern is "(en)|(en-.+)" - one could get more
> > precise by following http://www.rfc-editor.org/rfc/rfc4647.txt but
> > I'm not sure it's worth it.
> 
> I think it's important to follow RFC 4647. A multiplicity of formats makes it
> more difficult to work with languages and the most likely useful source of
> 'langpattern' will be RFC 4647-style language priority lists. Also: following
> the pattern shown would NOT be compliant with BCP 47 language tag matching.
> (en-.+) matches many invalid tags, for example.
> 
> Addison
> 
> Addison Phillips
> Globalization Architect -- Lab126
> Chair -- W3C Internationalization WG
> Editor -- IETF LTRU WG (BCP 47)
> 
> Internationalization is not a feature.
> It is an architecture.
> 
> 

Received on Thursday, 26 March 2009 12:17:17 UTC