- From: Jos de Bruijn <debruijn@inf.unibz.it>
- Date: Thu, 26 Mar 2009 14:44:08 +0100
- To: Boris Motik <boris.motik@comlab.ox.ac.uk>
- CC: "'Phillips, Addison'" <addison@amazon.com>, 'Alan Ruttenberg' <alanruttenberg@gmail.com>, 'Sandro Hawke' <sandro@w3.org>, public-rdf-text@w3.org, team-rif-chairs@w3.org, team-owl-chairs@w3.org
I was the one who originally raised the issue in the editor's note and I am satisfied with the way it has been handled in the latest change by Boris. Best, Jos Boris Motik wrote: > Hello Mr. Addison, > > Thank you very much for your invaluable input! I am not an expert on Unicode, so > I was unaware of the fact that the number of code points is fixed. In light of > what you said, we've changed the definitions of rdf:text. In order to follow XML > Schema, we've defined a character as in XML 1.1. Since XML 1.1 excludes certain > characters, we are left with 1,112,061 code points in rdf:text. > > Thank you also for the pointer to RFC 4647; we'll take this under consideration. > > Should you be interested in the changes, you can take a look at them here: > > http://www.w3.org/2007/OWL/wiki/InternationalizedStringSpec > > We'll appreciate any further feedback you can give us. > > Regards, > > Boris Motik > >> -----Original Message----- >> From: public-rdf-text-request@w3.org [mailto:public-rdf-text-request@w3.org] >> On Behalf Of Phillips, Addison >> Sent: 24 March 2009 19:08 >> To: Alan Ruttenberg; Sandro Hawke >> Cc: public-rdf-text@w3.org; team-rif-chairs@w3.org; team-owl-chairs@w3.org >> Subject: RE: getting to Last Call on rdf:text >> >>> Here is my take on the editor notes: >>> >>> >>> Issue 1, re: an infinity of characters in Unicode, seems wrong >>> according to the documentation of Unicode "All three encoding forms >>> need at most 4 bytes (or 32-bits) of data for each character", but >>> arguments for defining it that way are pragmatic. It would seem >>> that >>> this needs to be a technical decision about this, probably by vote >>> if there is not consensus at this point. >> The largest Unicode code point is 0x10FFFF. Period. There is not an infinity >> of Unicode code points. A better solution would just be to drop this sentence: >> >> -- >> The set of available characters is assumed to be infinite, and it is thus >> independent of the current version of UCS and Unicode. >> -- >> >> The set of characters is independent of the version of Unicode provided that >> the full range is supported. >> >>> Issue 2 asks for an example of pattern and langpattern. >>> >>> An example of pattern would be "(in)|(out)", which matches the >>> character sequences "in" and "out" and nothing else. It is unclear >>> to me whether the literal should be written as a plan literal or not, >>> but I am guessing so. >>> >>> An example of a langpattern is "(en)|(en-.+)" - one could get more >>> precise by following http://www.rfc-editor.org/rfc/rfc4647.txt but >>> I'm not sure it's worth it. >> I think it's important to follow RFC 4647. A multiplicity of formats makes it >> more difficult to work with languages and the most likely useful source of >> 'langpattern' will be RFC 4647-style language priority lists. Also: following >> the pattern shown would NOT be compliant with BCP 47 language tag matching. >> (en-.+) matches many invalid tags, for example. >> >> Addison >> >> Addison Phillips >> Globalization Architect -- Lab126 >> Chair -- W3C Internationalization WG >> Editor -- IETF LTRU WG (BCP 47) >> >> Internationalization is not a feature. >> It is an architecture. >> >> > > > -- +43 1 58801 18470 debruijn@inf.unibz.it Jos de Bruijn, http://www.debruijn.net/ ---------------------------------------------- Many would be cowards if they had courage enough. - Thomas Fuller
Received on Thursday, 26 March 2009 13:45:14 UTC