RE: getting to Last Call on rdf:text from Boris Motik on 2009-03-26 (public-rdf-text@w3.org from January to March 2009)

From: Boris Motik <boris.motik@comlab.ox.ac.uk>
Date: Thu, 26 Mar 2009 18:34:17 -0000
To: "'Phillips, Addison'" <addison@amazon.com>, "'Alan Ruttenberg'" <alanruttenberg@gmail.com>, "'Sandro Hawke'" <sandro@w3.org>
Cc: <public-rdf-text@w3.org>, <team-rif-chairs@w3.org>, <team-owl-chairs@w3.org>
Message-ID: <5415DA0B19DD4490B9C1D683DC68E7F9@wolf>
Hello,

Thanks for these comments. I've replaced the reference to XML 1.0 with a
reference to XML 1.0. Furthermore, I've changed the definition of a language tag
to point to the langtag production in BCP-47. (Just pointing to it seems to me
preferable to repeating it.)

Please let me know should you have any further comments.

Regards,

	Boris 

> -----Original Message-----
> From: public-rdf-text-request@w3.org [mailto:public-rdf-text-request@w3.org]
> On Behalf Of Phillips, Addison
> Sent: 26 March 2009 15:15
> To: Boris Motik; 'Alan Ruttenberg'; 'Sandro Hawke'
> Cc: public-rdf-text@w3.org; team-rif-chairs@w3.org; team-owl-chairs@w3.org
> Subject: RE: getting to Last Call on rdf:text
> 
> Hello Mr. Motik,
> 
> Thank you for modifying the description characters in rdf:text. I have some
> small concerns about using XML 1.1 as the reference. You might be better off
> referencing XML 1.0 Fifth Edition, whose definition of Char is identical (but
> XML 1.0 is more widely used than 1.1), or, as XML Schema does, referencing
> both. Since the two are now in alignment, the choice of reference no longer
> matters.
> 
> Thank you for noting RFC 4647.
> 
> I have an additional concern about how language tags are handled in the draft
> page. Specifically:
> 
> The regular expression for a language tag is wrong, even by the very relaxed
> standards of former-BCP47 RFC 3066. If you mean to permit the older syntax
> (which was simpler), you should reference obs-langtag in BCP 47 or at least
> convert it properly to a schema-style regular expression. Under that syntax,
> subtags were limited to a length of eight characters.
> 
> Addison
> 
> Addison Phillips
> Globalization Architect -- Lab126
> 
> Internationalization is not a feature.
> It is an architecture.
> 
> 
> > -----Original Message-----
> > From: Boris Motik [mailto:boris.motik@comlab.ox.ac.uk]
> > Sent: Thursday, March 26, 2009 5:16 AM
> > To: Phillips, Addison; 'Alan Ruttenberg'; 'Sandro Hawke'
> > Cc: public-rdf-text@w3.org; team-rif-chairs@w3.org; team-owl-
> > chairs@w3.org
> > Subject: RE: getting to Last Call on rdf:text
> >
> > Hello Mr. Addison,
> >
> > Thank you very much for your invaluable input! I am not an expert
> > on Unicode, so
> > I was unaware of the fact that the number of code points is fixed.
> > In light of
> > what you said, we've changed the definitions of rdf:text. In order
> > to follow XML
> > Schema, we've defined a character as in XML 1.1. Since XML 1.1
> > excludes certain
> > characters, we are left with 1,112,061 code points in rdf:text.
> >
> > Thank you also for the pointer to RFC 4647; we'll take this under
> > consideration.
> >
> > Should you be interested in the changes, you can take a look at
> > them here:
> >
> > http://www.w3.org/2007/OWL/wiki/InternationalizedStringSpec
> >
> > We'll appreciate any further feedback you can give us.
> >
> > Regards,
> >
> > Boris Motik
> >
> > > -----Original Message-----
> > > From: public-rdf-text-request@w3.org [mailto:public-rdf-text-
> > request@w3.org]
> > > On Behalf Of Phillips, Addison
> > > Sent: 24 March 2009 19:08
> > > To: Alan Ruttenberg; Sandro Hawke
> > > Cc: public-rdf-text@w3.org; team-rif-chairs@w3.org; team-owl-
> > chairs@w3.org
> > > Subject: RE: getting to Last Call on rdf:text
> > >
> > > > Here is my take on the editor notes:
> > > >
> > > >
> > > > Issue 1, re: an infinity of characters in Unicode, seems wrong
> > > > according to the documentation of Unicode "All three encoding
> > forms
> > > > need at most 4 bytes (or 32-bits) of data for each character",
> > but
> > > > arguments for defining it that way are pragmatic. It would seem
> > > > that
> > > > this needs to be a technical decision about this, probably by
> > vote
> > > > if there is not consensus at this point.
> > >
> > > The largest Unicode code point is 0x10FFFF. Period. There is not
> > an infinity
> > > of Unicode code points. A better solution would just be to drop
> > this sentence:
> > >
> > > --
> > > The set of available characters is assumed to be infinite, and it
> > is thus
> > > independent of the current version of UCS and Unicode.
> > > --
> > >
> > > The set of characters is independent of the version of Unicode
> > provided that
> > > the full range is supported.
> > >
> > > >
> > > > Issue 2 asks for an example of pattern and langpattern.
> > > >
> > > > An example of pattern would be "(in)|(out)", which matches the
> > > > character sequences "in" and "out" and nothing else. It is
> > unclear
> > > > to me whether the literal should be written as a plan literal
> > or not,
> > > > but I am guessing so.
> > > >
> > > > An example of a langpattern is "(en)|(en-.+)" - one could get
> > more
> > > > precise by following http://www.rfc-editor.org/rfc/rfc4647.txt
> > but
> > > > I'm not sure it's worth it.
> > >
> > > I think it's important to follow RFC 4647. A multiplicity of
> > formats makes it
> > > more difficult to work with languages and the most likely useful
> > source of
> > > 'langpattern' will be RFC 4647-style language priority lists.
> > Also: following
> > > the pattern shown would NOT be compliant with BCP 47 language tag
> > matching.
> > > (en-.+) matches many invalid tags, for example.
> > >
> > > Addison
> > >
> > > Addison Phillips
> > > Globalization Architect -- Lab126
> > > Chair -- W3C Internationalization WG
> > > Editor -- IETF LTRU WG (BCP 47)
> > >
> > > Internationalization is not a feature.
> > > It is an architecture.
> > >
> > >
> >
Received on Thursday, 26 March 2009 18:35:31 UTC