W3C home > Mailing lists > Public > public-rdf-text@w3.org > January to March 2009

Re: getting to Last Call on rdf:text

From: Jos de Bruijn <debruijn@inf.unibz.it>
Date: Thu, 26 Mar 2009 14:44:08 +0100
Message-ID: <49CB86A8.6020602@inf.unibz.it>
To: Boris Motik <boris.motik@comlab.ox.ac.uk>
CC: "'Phillips, Addison'" <addison@amazon.com>, 'Alan Ruttenberg' <alanruttenberg@gmail.com>, 'Sandro Hawke' <sandro@w3.org>, public-rdf-text@w3.org, team-rif-chairs@w3.org, team-owl-chairs@w3.org
I was the one who originally raised the issue in the editor's note and I
am satisfied with the way it has been handled in the latest change by Boris.

Best, Jos

Boris Motik wrote:
> Hello Mr. Addison,
> Thank you very much for your invaluable input! I am not an expert on Unicode, so
> I was unaware of the fact that the number of code points is fixed. In light of
> what you said, we've changed the definitions of rdf:text. In order to follow XML
> Schema, we've defined a character as in XML 1.1. Since XML 1.1 excludes certain
> characters, we are left with 1,112,061 code points in rdf:text.
> Thank you also for the pointer to RFC 4647; we'll take this under consideration.
> Should you be interested in the changes, you can take a look at them here:
> http://www.w3.org/2007/OWL/wiki/InternationalizedStringSpec
> We'll appreciate any further feedback you can give us.
> Regards,
> Boris Motik
>> -----Original Message-----
>> From: public-rdf-text-request@w3.org [mailto:public-rdf-text-request@w3.org]
>> On Behalf Of Phillips, Addison
>> Sent: 24 March 2009 19:08
>> To: Alan Ruttenberg; Sandro Hawke
>> Cc: public-rdf-text@w3.org; team-rif-chairs@w3.org; team-owl-chairs@w3.org
>> Subject: RE: getting to Last Call on rdf:text
>>> Here is my take on the editor notes:
>>> Issue 1, re: an infinity of characters in Unicode, seems wrong
>>> according to the documentation of Unicode "All three encoding forms
>>> need at most 4 bytes (or 32-bits) of data for each character", but
>>> arguments for defining it that way are pragmatic. It would seem
>>> that
>>> this needs to be a technical decision about this, probably by vote
>>> if there is not consensus at this point.
>> The largest Unicode code point is 0x10FFFF. Period. There is not an infinity
>> of Unicode code points. A better solution would just be to drop this sentence:
>> --
>> The set of available characters is assumed to be infinite, and it is thus
>> independent of the current version of UCS and Unicode.
>> --
>> The set of characters is independent of the version of Unicode provided that
>> the full range is supported.
>>> Issue 2 asks for an example of pattern and langpattern.
>>> An example of pattern would be "(in)|(out)", which matches the
>>> character sequences "in" and "out" and nothing else. It is unclear
>>> to me whether the literal should be written as a plan literal or not,
>>> but I am guessing so.
>>> An example of a langpattern is "(en)|(en-.+)" - one could get more
>>> precise by following http://www.rfc-editor.org/rfc/rfc4647.txt but
>>> I'm not sure it's worth it.
>> I think it's important to follow RFC 4647. A multiplicity of formats makes it
>> more difficult to work with languages and the most likely useful source of
>> 'langpattern' will be RFC 4647-style language priority lists. Also: following
>> the pattern shown would NOT be compliant with BCP 47 language tag matching.
>> (en-.+) matches many invalid tags, for example.
>> Addison
>> Addison Phillips
>> Globalization Architect -- Lab126
>> Chair -- W3C Internationalization WG
>> Editor -- IETF LTRU WG (BCP 47)
>> Internationalization is not a feature.
>> It is an architecture.

+43 1 58801 18470        debruijn@inf.unibz.it

Jos de Bruijn,        http://www.debruijn.net/
Many would be cowards if they had courage
  - Thomas Fuller
Received on Thursday, 26 March 2009 13:45:14 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 19:53:42 UTC