- From: Boris Motik <boris.motik@comlab.ox.ac.uk>
- Date: Thu, 26 Mar 2009 18:34:17 -0000
- To: "'Phillips, Addison'" <addison@amazon.com>, "'Alan Ruttenberg'" <alanruttenberg@gmail.com>, "'Sandro Hawke'" <sandro@w3.org>
- Cc: <public-rdf-text@w3.org>, <team-rif-chairs@w3.org>, <team-owl-chairs@w3.org>
Hello, Thanks for these comments. I've replaced the reference to XML 1.0 with a reference to XML 1.0. Furthermore, I've changed the definition of a language tag to point to the langtag production in BCP-47. (Just pointing to it seems to me preferable to repeating it.) Please let me know should you have any further comments. Regards, Boris > -----Original Message----- > From: public-rdf-text-request@w3.org [mailto:public-rdf-text-request@w3.org] > On Behalf Of Phillips, Addison > Sent: 26 March 2009 15:15 > To: Boris Motik; 'Alan Ruttenberg'; 'Sandro Hawke' > Cc: public-rdf-text@w3.org; team-rif-chairs@w3.org; team-owl-chairs@w3.org > Subject: RE: getting to Last Call on rdf:text > > Hello Mr. Motik, > > Thank you for modifying the description characters in rdf:text. I have some > small concerns about using XML 1.1 as the reference. You might be better off > referencing XML 1.0 Fifth Edition, whose definition of Char is identical (but > XML 1.0 is more widely used than 1.1), or, as XML Schema does, referencing > both. Since the two are now in alignment, the choice of reference no longer > matters. > > Thank you for noting RFC 4647. > > I have an additional concern about how language tags are handled in the draft > page. Specifically: > > The regular expression for a language tag is wrong, even by the very relaxed > standards of former-BCP47 RFC 3066. If you mean to permit the older syntax > (which was simpler), you should reference obs-langtag in BCP 47 or at least > convert it properly to a schema-style regular expression. Under that syntax, > subtags were limited to a length of eight characters. > > Addison > > Addison Phillips > Globalization Architect -- Lab126 > > Internationalization is not a feature. > It is an architecture. > > > > -----Original Message----- > > From: Boris Motik [mailto:boris.motik@comlab.ox.ac.uk] > > Sent: Thursday, March 26, 2009 5:16 AM > > To: Phillips, Addison; 'Alan Ruttenberg'; 'Sandro Hawke' > > Cc: public-rdf-text@w3.org; team-rif-chairs@w3.org; team-owl- > > chairs@w3.org > > Subject: RE: getting to Last Call on rdf:text > > > > Hello Mr. Addison, > > > > Thank you very much for your invaluable input! I am not an expert > > on Unicode, so > > I was unaware of the fact that the number of code points is fixed. > > In light of > > what you said, we've changed the definitions of rdf:text. In order > > to follow XML > > Schema, we've defined a character as in XML 1.1. Since XML 1.1 > > excludes certain > > characters, we are left with 1,112,061 code points in rdf:text. > > > > Thank you also for the pointer to RFC 4647; we'll take this under > > consideration. > > > > Should you be interested in the changes, you can take a look at > > them here: > > > > http://www.w3.org/2007/OWL/wiki/InternationalizedStringSpec > > > > We'll appreciate any further feedback you can give us. > > > > Regards, > > > > Boris Motik > > > > > -----Original Message----- > > > From: public-rdf-text-request@w3.org [mailto:public-rdf-text- > > request@w3.org] > > > On Behalf Of Phillips, Addison > > > Sent: 24 March 2009 19:08 > > > To: Alan Ruttenberg; Sandro Hawke > > > Cc: public-rdf-text@w3.org; team-rif-chairs@w3.org; team-owl- > > chairs@w3.org > > > Subject: RE: getting to Last Call on rdf:text > > > > > > > Here is my take on the editor notes: > > > > > > > > > > > > Issue 1, re: an infinity of characters in Unicode, seems wrong > > > > according to the documentation of Unicode "All three encoding > > forms > > > > need at most 4 bytes (or 32-bits) of data for each character", > > but > > > > arguments for defining it that way are pragmatic. It would seem > > > > that > > > > this needs to be a technical decision about this, probably by > > vote > > > > if there is not consensus at this point. > > > > > > The largest Unicode code point is 0x10FFFF. Period. There is not > > an infinity > > > of Unicode code points. A better solution would just be to drop > > this sentence: > > > > > > -- > > > The set of available characters is assumed to be infinite, and it > > is thus > > > independent of the current version of UCS and Unicode. > > > -- > > > > > > The set of characters is independent of the version of Unicode > > provided that > > > the full range is supported. > > > > > > > > > > > Issue 2 asks for an example of pattern and langpattern. > > > > > > > > An example of pattern would be "(in)|(out)", which matches the > > > > character sequences "in" and "out" and nothing else. It is > > unclear > > > > to me whether the literal should be written as a plan literal > > or not, > > > > but I am guessing so. > > > > > > > > An example of a langpattern is "(en)|(en-.+)" - one could get > > more > > > > precise by following http://www.rfc-editor.org/rfc/rfc4647.txt > > but > > > > I'm not sure it's worth it. > > > > > > I think it's important to follow RFC 4647. A multiplicity of > > formats makes it > > > more difficult to work with languages and the most likely useful > > source of > > > 'langpattern' will be RFC 4647-style language priority lists. > > Also: following > > > the pattern shown would NOT be compliant with BCP 47 language tag > > matching. > > > (en-.+) matches many invalid tags, for example. > > > > > > Addison > > > > > > Addison Phillips > > > Globalization Architect -- Lab126 > > > Chair -- W3C Internationalization WG > > > Editor -- IETF LTRU WG (BCP 47) > > > > > > Internationalization is not a feature. > > > It is an architecture. > > > > > > > >
Received on Thursday, 26 March 2009 18:35:31 UTC