RE: getting to Last Call on rdf:text from Boris Motik on 2009-03-26 (public-rdf-text@w3.org from January to March 2009)

From: Boris Motik <boris.motik@comlab.ox.ac.uk>
Date: Thu, 26 Mar 2009 12:16:03 -0000
To: "'Phillips, Addison'" <addison@amazon.com>, "'Alan Ruttenberg'" <alanruttenberg@gmail.com>, "'Sandro Hawke'" <sandro@w3.org>
Cc: <public-rdf-text@w3.org>, <team-rif-chairs@w3.org>, <team-owl-chairs@w3.org>
Message-ID: <7D8DB22D28EE4ED9BFA8C18F386B6727@wolf>

Hello Mr. Addison,

Thank you very much for your invaluable input! I am not an expert on Unicode, so
I was unaware of the fact that the number of code points is fixed. In light of
what you said, we've changed the definitions of rdf:text. In order to follow XML
Schema, we've defined a character as in XML 1.1. Since XML 1.1 excludes certain
characters, we are left with 1,112,061 code points in rdf:text.

Thank you also for the pointer to RFC 4647; we'll take this under consideration.

Should you be interested in the changes, you can take a look at them here:

http://www.w3.org/2007/OWL/wiki/InternationalizedStringSpec

We'll appreciate any further feedback you can give us.

Regards,

Boris Motik

> -----Original Message-----
> From: public-rdf-text-request@w3.org [mailto:public-rdf-text-request@w3.org]
> On Behalf Of Phillips, Addison
> Sent: 24 March 2009 19:08
> To: Alan Ruttenberg; Sandro Hawke
> Cc: public-rdf-text@w3.org; team-rif-chairs@w3.org; team-owl-chairs@w3.org
> Subject: RE: getting to Last Call on rdf:text
> 
> > Here is my take on the editor notes:
> >
> >
> > Issue 1, re: an infinity of characters in Unicode, seems wrong
> > according to the documentation of Unicode "All three encoding forms
> > need at most 4 bytes (or 32-bits) of data for each character", but
> > arguments for defining it that way are pragmatic. It would seem
> > that
> > this needs to be a technical decision about this, probably by vote
> > if there is not consensus at this point.
> 
> The largest Unicode code point is 0x10FFFF. Period. There is not an infinity
> of Unicode code points. A better solution would just be to drop this sentence:
> 
> --
> The set of available characters is assumed to be infinite, and it is thus
> independent of the current version of UCS and Unicode.
> --
> 
> The set of characters is independent of the version of Unicode provided that
> the full range is supported.
> 
> >
> > Issue 2 asks for an example of pattern and langpattern.
> >
> > An example of pattern would be "(in)|(out)", which matches the
> > character sequences "in" and "out" and nothing else. It is unclear
> > to me whether the literal should be written as a plan literal or not,
> > but I am guessing so.
> >
> > An example of a langpattern is "(en)|(en-.+)" - one could get more
> > precise by following http://www.rfc-editor.org/rfc/rfc4647.txt but
> > I'm not sure it's worth it.
> 
> I think it's important to follow RFC 4647. A multiplicity of formats makes it
> more difficult to work with languages and the most likely useful source of
> 'langpattern' will be RFC 4647-style language priority lists. Also: following
> the pattern shown would NOT be compliant with BCP 47 language tag matching.
> (en-.+) matches many invalid tags, for example.
> 
> Addison
> 
> Addison Phillips
> Globalization Architect -- Lab126
> Chair -- W3C Internationalization WG
> Editor -- IETF LTRU WG (BCP 47)
> 
> Internationalization is not a feature.
> It is an architecture.
> 
>

Received on Thursday, 26 March 2009 12:17:17 UTC