comments on http://www.w3.org/TR/2009/WD-rdf-text-20090421/ from C. M. Sperberg-McQueen on 2009-04-22 (public-owl-comments@w3.org from April 2009)

From: C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com>
Date: Tue, 21 Apr 2009 18:05:40 -0600
To: public-owl-comments@w3.org
Cc: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>
Message-Id: <2826AFBB-788B-4128-AD67-249F6B2D7963@blackmesatech.com>
[Speaking for myself and not for any organization or working group]

I've just read "rdf:text: A Datatype for Internationalized Text"
in the version of 21 April 2009.  Nice work.

I do have a few questions or comments.

(1) Typo in two namespace names?

In section 2, you define conventional meanings for several
namespace prefixes, including

   xs for http://www.w3.org/2001/XMLSchema#
   fn for http://www.w3.org/2005/xpath-functions#

I realize that for reasons I think I once understood (but do not
now recall -- explain if you like, but I don't mind if you spare
yourself the effort) RDF users often create namespace names with
trailing hash marks.  But I'm pretty sure that there is no
trailing hash mark in the XML Schema namespace defined by the XML
Schema spec at

   http://www.w3.org/TR/xmlschema-1/
   http://www.w3.org/TR/xmlschema-2/

or, for XSD 1.1, by

   http://www.w3.org/TR/xmlschema11-1/
   http://www.w3.org/TR/xmlschema11-2/

If you are endeavoring to refer to that namespace, you have a
typo and should (I think) remove the hash mark.  Simple-minded
readers who copy and paste the namespace name into (say) a schema
document will be disappointed, perhaps, to find that most XSD
validators don't recognize the form with the hash mark.  And a
quick test reveals that some of them are fairly nasty about it.

If on the other hand you are endeavoring to refer not to that
namespace but to a different one, related conceptually to the
first (thus motivating the mnemonic of having a similar
spelling), it would probably be helpful to the reader to mention
that fact.

 From uses of the xs: prefix later in the document (e.g. the
reference in 5.1.1 to xs:string), I think the former more likely.

It may be a mistake on the part of the XML Schema WG not to have
provided our namespace with a hash mark, but if so, it's a
mistake we've made (note the past tense here) and cannot now
unmake.

Similar remarks apply to the fn namespace.

(2) Should XSD 1.1 refer to rdf:text?

As you may know, XSD 1.1 differs from XSD 1.0 in allowing
conforming validators to accept primitives, and facets,
additional to those defined by the XSD 1.1 spec itself.  It
occurs to me that it might be helpful to refer, from the XSD 1.1
spec, to the rdf:text spec as an example of a published
definition of such an additional primitive datatype, with
(voila!) a facet defined for it.  Would the OWL and RIF working
groups have any objection to my suggesting this to the XML Schema
wg?

(3) Required export to plain literals

In section 4, you require that all RDF tools translate rdf:text
values into plain literals before exporting data to exchange with
another RDF tool.  This seems likely to have the effect that some
toolmakers, at least, will argue that there is no need to support
rdf:text because no one is using it, they never see any instances
of it.  (The rules in XML 1.1 which encourage users of XML 1.1 to
label their data as XML 1.0 whenever possible have led to similar
arguments that there is no XML 1.1 data anywhere, nor any XML 1.1
processors, both of which are falsehoods but apparently cannot be
rooted out.)

I wonder if it would be better just to encourage, or require,
that RDF tools which support rdf:text provide user control over
whether to export to plain literals or not.  It's your decision,
of course: since rdf:text and plain literals are semantically
interchangeable, I suppose it may not matter as much as I
imagine.

(4) rtfn:length function

In section 5.3.1 you define an rtfn:length function.  To avoid
confusion or error, it might be helpful to remind the reader and
implementor explicitly here that what are counted are characters,
not 16-bit code units or octets.

Otherwise, it seems inevitable that someone is just going to
implement the length function with a call to strlen(), oblivious
to the havoc that shortcut will wreak later on.

(5) Internationalization issues

 From the fact that rdf:text values are pairs of UCS strings and
language tags, I infer that the type is intended to handle
natural-language text.

But if I understand correctly, some authorities strongly
recommend the use of explicit XML markup both for bidirectional
text (which, n.b., is not necessarily polyglot text) and for text
with ruby-style annotations.

I assume that one reason you don't allow internal XML markup is
that that would break compatibility with plain literals.

I think your document would be the stronger if you explained what
is to be done with Japanese text with ruby annotations, or with
Hebrew or Arabic text for which the Unicode bidi algorithm does
not suffice (and which therefore appears to need internal XML
markup to be handled reliably).


Good luck with the spec.

-- 
****************************************************************
* C. M. Sperberg-McQueen, Black Mesa Technologies LLC
* http://www.blackmesatech.com
* http://cmsmcq.com/mib
* http://balisage.net
****************************************************************
Received on Wednesday, 22 April 2009 00:06:27 UTC