- From: Boris Motik <boris.motik@comlab.ox.ac.uk>
- Date: Thu, 26 Mar 2009 12:16:03 -0000
- To: "'Phillips, Addison'" <addison@amazon.com>, "'Alan Ruttenberg'" <alanruttenberg@gmail.com>, "'Sandro Hawke'" <sandro@w3.org>
- Cc: <public-rdf-text@w3.org>, <team-rif-chairs@w3.org>, <team-owl-chairs@w3.org>
Hello Mr. Addison, Thank you very much for your invaluable input! I am not an expert on Unicode, so I was unaware of the fact that the number of code points is fixed. In light of what you said, we've changed the definitions of rdf:text. In order to follow XML Schema, we've defined a character as in XML 1.1. Since XML 1.1 excludes certain characters, we are left with 1,112,061 code points in rdf:text. Thank you also for the pointer to RFC 4647; we'll take this under consideration. Should you be interested in the changes, you can take a look at them here: http://www.w3.org/2007/OWL/wiki/InternationalizedStringSpec We'll appreciate any further feedback you can give us. Regards, Boris Motik > -----Original Message----- > From: public-rdf-text-request@w3.org [mailto:public-rdf-text-request@w3.org] > On Behalf Of Phillips, Addison > Sent: 24 March 2009 19:08 > To: Alan Ruttenberg; Sandro Hawke > Cc: public-rdf-text@w3.org; team-rif-chairs@w3.org; team-owl-chairs@w3.org > Subject: RE: getting to Last Call on rdf:text > > > Here is my take on the editor notes: > > > > > > Issue 1, re: an infinity of characters in Unicode, seems wrong > > according to the documentation of Unicode "All three encoding forms > > need at most 4 bytes (or 32-bits) of data for each character", but > > arguments for defining it that way are pragmatic. It would seem > > that > > this needs to be a technical decision about this, probably by vote > > if there is not consensus at this point. > > The largest Unicode code point is 0x10FFFF. Period. There is not an infinity > of Unicode code points. A better solution would just be to drop this sentence: > > -- > The set of available characters is assumed to be infinite, and it is thus > independent of the current version of UCS and Unicode. > -- > > The set of characters is independent of the version of Unicode provided that > the full range is supported. > > > > > Issue 2 asks for an example of pattern and langpattern. > > > > An example of pattern would be "(in)|(out)", which matches the > > character sequences "in" and "out" and nothing else. It is unclear > > to me whether the literal should be written as a plan literal or not, > > but I am guessing so. > > > > An example of a langpattern is "(en)|(en-.+)" - one could get more > > precise by following http://www.rfc-editor.org/rfc/rfc4647.txt but > > I'm not sure it's worth it. > > I think it's important to follow RFC 4647. A multiplicity of formats makes it > more difficult to work with languages and the most likely useful source of > 'langpattern' will be RFC 4647-style language priority lists. Also: following > the pattern shown would NOT be compliant with BCP 47 language tag matching. > (en-.+) matches many invalid tags, for example. > > Addison > > Addison Phillips > Globalization Architect -- Lab126 > Chair -- W3C Internationalization WG > Editor -- IETF LTRU WG (BCP 47) > > Internationalization is not a feature. > It is an architecture. > >
Received on Thursday, 26 March 2009 12:17:17 UTC