RE: getting to Last Call on rdf:text from Phillips, Addison on 2009-03-26 (public-rdf-text@w3.org from January to March 2009)

From: Phillips, Addison <addison@amazon.com>
Date: Thu, 26 Mar 2009 08:14:53 -0700
To: Boris Motik <boris.motik@comlab.ox.ac.uk>, "'Alan Ruttenberg'" <alanruttenberg@gmail.com>, "'Sandro Hawke'" <sandro@w3.org>
CC: "public-rdf-text@w3.org" <public-rdf-text@w3.org>, "team-rif-chairs@w3.org" <team-rif-chairs@w3.org>, "team-owl-chairs@w3.org" <team-owl-chairs@w3.org>
Message-ID: <4D25F22093241741BC1D0EEBC2DBB1DA019ED99BAB@EX-SEA5-D.ant.amazon.com>

Hello Mr. Motik,

Thank you for modifying the description characters in rdf:text. I have some small concerns about using XML 1.1 as the reference. You might be better off referencing XML 1.0 Fifth Edition, whose definition of Char is identical (but XML 1.0 is more widely used than 1.1), or, as XML Schema does, referencing both. Since the two are now in alignment, the choice of reference no longer matters.

Thank you for noting RFC 4647.

I have an additional concern about how language tags are handled in the draft page. Specifically:

The regular expression for a language tag is wrong, even by the very relaxed standards of former-BCP47 RFC 3066. If you mean to permit the older syntax (which was simpler), you should reference obs-langtag in BCP 47 or at least convert it properly to a schema-style regular expression. Under that syntax, subtags were limited to a length of eight characters.

Addison

Addison Phillips
Globalization Architect -- Lab126

Internationalization is not a feature.
It is an architecture.


> -----Original Message-----
> From: Boris Motik [mailto:boris.motik@comlab.ox.ac.uk]
> Sent: Thursday, March 26, 2009 5:16 AM
> To: Phillips, Addison; 'Alan Ruttenberg'; 'Sandro Hawke'
> Cc: public-rdf-text@w3.org; team-rif-chairs@w3.org; team-owl-
> chairs@w3.org
> Subject: RE: getting to Last Call on rdf:text
> 
> Hello Mr. Addison,
> 
> Thank you very much for your invaluable input! I am not an expert
> on Unicode, so
> I was unaware of the fact that the number of code points is fixed.
> In light of
> what you said, we've changed the definitions of rdf:text. In order
> to follow XML
> Schema, we've defined a character as in XML 1.1. Since XML 1.1
> excludes certain
> characters, we are left with 1,112,061 code points in rdf:text.
> 
> Thank you also for the pointer to RFC 4647; we'll take this under
> consideration.
> 
> Should you be interested in the changes, you can take a look at
> them here:
> 
> http://www.w3.org/2007/OWL/wiki/InternationalizedStringSpec

> 
> We'll appreciate any further feedback you can give us.
> 
> Regards,
> 
> Boris Motik
> 
> > -----Original Message-----
> > From: public-rdf-text-request@w3.org [mailto:public-rdf-text-
> request@w3.org]
> > On Behalf Of Phillips, Addison
> > Sent: 24 March 2009 19:08
> > To: Alan Ruttenberg; Sandro Hawke
> > Cc: public-rdf-text@w3.org; team-rif-chairs@w3.org; team-owl-
> chairs@w3.org
> > Subject: RE: getting to Last Call on rdf:text
> >
> > > Here is my take on the editor notes:
> > >
> > >
> > > Issue 1, re: an infinity of characters in Unicode, seems wrong
> > > according to the documentation of Unicode "All three encoding
> forms
> > > need at most 4 bytes (or 32-bits) of data for each character",
> but
> > > arguments for defining it that way are pragmatic. It would seem
> > > that
> > > this needs to be a technical decision about this, probably by
> vote
> > > if there is not consensus at this point.
> >
> > The largest Unicode code point is 0x10FFFF. Period. There is not
> an infinity
> > of Unicode code points. A better solution would just be to drop
> this sentence:
> >
> > --
> > The set of available characters is assumed to be infinite, and it
> is thus
> > independent of the current version of UCS and Unicode.
> > --
> >
> > The set of characters is independent of the version of Unicode
> provided that
> > the full range is supported.
> >
> > >
> > > Issue 2 asks for an example of pattern and langpattern.
> > >
> > > An example of pattern would be "(in)|(out)", which matches the
> > > character sequences "in" and "out" and nothing else. It is
> unclear
> > > to me whether the literal should be written as a plan literal
> or not,
> > > but I am guessing so.
> > >
> > > An example of a langpattern is "(en)|(en-.+)" - one could get
> more
> > > precise by following http://www.rfc-editor.org/rfc/rfc4647.txt

> but
> > > I'm not sure it's worth it.
> >
> > I think it's important to follow RFC 4647. A multiplicity of
> formats makes it
> > more difficult to work with languages and the most likely useful
> source of
> > 'langpattern' will be RFC 4647-style language priority lists.
> Also: following
> > the pattern shown would NOT be compliant with BCP 47 language tag
> matching.
> > (en-.+) matches many invalid tags, for example.
> >
> > Addison
> >
> > Addison Phillips
> > Globalization Architect -- Lab126
> > Chair -- W3C Internationalization WG
> > Editor -- IETF LTRU WG (BCP 47)
> >
> > Internationalization is not a feature.
> > It is an architecture.
> >
> >
>

Received on Thursday, 26 March 2009 15:15:36 UTC