RE: comments on rdf:text draft from Boris Motik on 2009-04-08 (public-i18n-core@w3.org from April to June 2009)

From: Boris Motik <boris.motik@comlab.ox.ac.uk>
Date: Wed, 8 Apr 2009 11:02:17 +0100
To: "'Phillips, Addison'" <addison@amazon.com>, <public-rdf-text@w3.org>
Cc: <public-i18n-core@w3.org>
Message-ID: <DD7FCB6667504A6D81B7B21B5170813A@wolf>
Hello,

> -----Original Message-----
> From: Phillips, Addison [mailto:addison@amazon.com]
> Sent: 07 April 2009 21:39
> To: Boris Motik; public-rdf-text@w3.org
> Cc: public-i18n-core@w3.org
> Subject: RE: comments on rdf:text draft
> 
> >
> > Thanks for this comment. I'm afraid, however, that in response to
> > Sandro's
> > comments, I have rewritten earlier today this part of the
> > introduction. I've
> > adopted the "elevator pitch" that Sandro suggested. Please let me
> > know should
> > you consider that the current intro needs further revision.
> >
> 
> The new text is okay, although I think it might leave the average reader
> slightly mystifying what rdf:text is for. There is a lot of text about
> different literal flavors, but no mention about why the presence or absence of
> a language tag is interesting. And it concludes with this paragraph, which
> suggests some confusion about how to represent text in RDF:
> 
> --
> RDF tools may use other mechanisms for representing internationalized text,
> such as the xml:lang feature of the rdf:XMLLiteral datatype. The rdf:text
> datatype does not provide a replacement for such mechanisms.
> --
> 
> It seems to me that the introduction should say why these three classes of
> literals are related and why rdf:text might be interesting. I would at least
> include some sort of notation about why language tags might be needed. Perhaps
> add a third bullet point:
> 
> --
> * Literals often contain human-readable natural language text. RDF needs a
> mechanism for representing literals in various different languages, for
> selecting the proper literal in a specific language, and to allow applications
> to keep language information with literals to facilitate processing that is
> language affected.
> --
> 

I agree with your point. I've rewritten the first paragraph of the introduction
along the lines of what you suggested, and I hope that the introduction is now
clearer.

> Minor notes: first bullet s/literals/literal/
> Also: "internationalized text" is a misnomer. Perhaps "text in different
> languages"??
> 

I've changed this.

> >
> > > 3. The intro to section 2 is still not quite right. Instead of
> > the first
> > > paragraph, I think it suffices to say:
> > >
> > > --
> > > A 'character' is an atomic unit of text, as defined in [Unicode]
> > and/or
> > > [ISO/IEC 10646] and corresponding to the 'Char' production from
> > [XML].
> > > --
> > >
> >
> > This formulation was taken from XML Schema. Nevertheless, your
> > suggestion is an
> > improvement, modulo the fact that, if a character must match the
> > 'Char'
> > production, it is not defined as in [Unicode]. Therefore, I've
> > rewritten the first two sentences like this:
> 
> I'm not sure what you mean by this. Unicode defines a range of code points and
> 'Char' mirrors it. The definition of 'Char' actually says "Unicode code
> points" :-).
> 

But the 'Char' production seems to actually exclude some Unicode code points.
Here is how I interpreted all the specs:

- Any integer between 0 and 0x10FFFF is a Unicode code point.
- Not every such integer, however, matches the 'Char' production from XML.

Because of that, it seems to me that the two definitions (i.e., the one in
Unicode and the one in XML) are *not* equivalent; in fact, the latter is a
proper subset of the former.

[snip]

> 
> >
> > > 4. The sentence "Code points are written as U+ followed by the
> > hexadecimal
> > > value of the code point" is not quite right. You might moderate
> > this by saying
> > > "are represented by U+ (etc.) in this document". Although you
> > barely use the
> > > U+ syntax in the document. Note that the sentence is also
> > incomplete: the
> > > usual minimum length of a U+ hex sequence is four hex digits
> > (U+00E9).
> > >
> >
> > I've rephrased the sentence like this:
> >
> > Code points are represented in this document as U+ followed by a
> > four-digit hexadecimal value of the code point.
> 
> That sounds good, although I'd even tend to say "are sometimes represented",
> since there are plenty of code points that are represented as ASCII characters
> :-).
> 

I've added "sometimes".

[snip]

> >
> > Nevertheless, I don't understand now whether foo-bar is a valid
> > language tag. It
> > does seem to match the production from BCP 47, so I'd say it is.
> > Your
> > explanation, however, suggests that the "en" part must be
> > registered; is this
> > really the case? In any case, I strongly believe that the
> > definitions *must not*
> > depend on any kind of a registry, as this would make the
> > consequences of an OWL
> > 2 ontology possibly vary over time.
> 
> This is why I refered specifically to the conformance requirements in BCP 47,
> which defines two separate terms:
> 
> - "well-formed" means matching the ABNF/grammar but not necessarily checking
> to see if the subtags are registered. This is the sort of conformance you
> have.
> - "valid" means "well-formed" plus checking that the subtags are each properly
> registered (and a few other very minor checks on stuff like extensions). This
> is not the sort of conformance you require, although you allow it.
> 

OK, thanks. But would it then be possible to have an example with "xy-fubar"?
Currently, we are using "en-fubar", which seems to suggest that the first part
("en") must be somehow valid. (That is, I don't see a point in the 'langtag'
production of BCP 47 which would force the first part to be "en", "de", or
anything registered.) I'd like to use an example where nothing is "valid", just
to drive the message home that no registry must be taken into account. Hence, if
you agree, I'd change the example to "xy-fubar".

> Hope this helps.
> 

It does indeed: thanks a lot for your expert help!

Regards,

	Boris

> Addison
> 
> Addison Phillips
> Globalization Architect -- Lab126
> 
> Internationalization is not a feature.
> It is an architecture.
> 
>
Received on Wednesday, 8 April 2009 10:03:35 UTC