Re: review of rdf:text, dated 2008-11-04 from Axel Polleres on 2008-11-06 (public-rdf-text@w3.org from October to December 2008)

From: Axel Polleres <axel.polleres@deri.org>
Date: Thu, 06 Nov 2008 16:29:27 +0000
To: Jos de Bruijn <debruijn@inf.unibz.it>, public-rdf-text@w3.org
CC: RIF WG <public-rif-wg@w3.org>
Message-ID: <49131B67.7000102@deri.org>
Jos de Bruijn wrote:
> I reviewed the current draft of the rdf:text specification [1].
> I subdivided my comments into criticism on the content, criticism on the
> structure, errors in the document, and editorial issues.
> 
> Criticism on the content
> ====
> - to assure maximum compatibility with current and future versions of
> XML schema datatypes, the string parts of both the lexical and value
> space should be based on the respective spaces of the XML schema
> datatype string.
> - the set of characters is finite, and thus it cannot be assumed that it
> is infinite. The problem that some OWL 2 implementations might have some
> issue with the finiteness of this set is of no concern to this datatype
> per se. In fact, the XML schema string datatype is based on a finite set
> of characters, and so OWL 2 implementations will run into problems with
> this datatype.
> If there is really a problem to be expected with implementations of OWL
> 2, it should be dealt with in the OWL 2 specification, and not the
> specification of this datatype.

I guess all the above is better answered by the OWL crowd who insisted 
on that solution with infinite characters. We had quite some discussions 
on that. I see Bijan already picked up the issue.

> - concerning the definition of fn:text-length: It is not obvious that
> this function should return the length of only the string part of the
> text. A user might expect the language tag, and perhaps even the
> separator used in the lexical space, to be taken into account when
> computing the length.
> Therefore, I believe no text-length function should be provided.

It is convenient to emulate the length facet.
Whether or not to include the lang-tag is a coin-flipping decision, 
where I wnet for the option to not include it.
The rationale to include this and other functions was to have simple 
functions (even if syntactic sugar) to emulate the facets with one 
function and not having to write complicated nested functions for that.
Therefore, I believe the text-length function should persist.

> Criticism on the structure
> ====
> - the sections 3.1 and 3.2 are not logically part of the definition of
> the data type, and so should not be included in section 3.

I see no logical place where else it should go. Moving these sections 
into own section would give the impression that these shortcuts and the 
treatment of xs:string as rdf:text are optional, but in my opinion they 
are an essential part of the whole idea of introducing this unifying 
datatype. I would though not strongly object to moving 3.1 and 3.2 to 
own sections, we could just make them sections 4 and 5 i.e move them 
from subsections to own sections... would that remedy your concern?

> Errors in the document
> ====
> - In the example in section 3.2 it is claimed that the string "Padre de
> familia" is mapped to the same value as the text "Padre de familia@".
> This is clearly not true.

it is, in the reading of section 3.1. Maybe this example should be moved 
to section 3.1 then, because it illustrated the uniform treatment of 
strings as texts.

> - In the definition of text-from-string-lang, $arg2 must be a string as
> specified in BCP 47, and otherwise an error must be raised.

you mean a valid language tag... yes, you are right, I corrected this.
> 
> Editorial issues
> ====
> - abstract: "both in" => "in both"

?

> - introduction: the text about how this document came to be and about
> the collaboration between the working groups might be interesting for
> the "purpose of this document" section, but not for the specification
> document itself. However, I guess that for the first public working
> draft it's not really an issue.

probably not, we might move it in future versions.

> - the references of the form [1],... are awkward. Please use the same
> style for all references.

I agree that we should, in the final version remove the references to WG 
mails and wiki documnts and we should fic the references to documents.

> - some of the references are italicized, and some are not, e.g., the
> second sentence of section 2.
> - sections 4.1.3 and 4.1.4: please specify the return values; extraction
> is a process.

the return types are given:

as xs:string

as xs:lang

(BTW, I changed xsd: to xs: ithroughout the document following some 
earlier resolution in RIF with this regard)

> - the text and summaries in sections 4.2.1 and 4.2.2 is not entirely
> clear. Please use symbols for referring to the individual parts of the
> arguments and to state properties about them, like in sections 4.1.3 and
> 4.1.4.

Hmmm, the summary is a summary, the exact semantics is obvious from the 
XQuery function declaration, so I honestly don't see a problem. I don't 
see how the XQuery/Xpath functions summaries are in any sense more 
precice then what I used here.

> - there is a question-mark in the signature declaration in section
> 4.3.2. It is not clear what this means.

"The terminology used and structure to describe these functions and 
operators is in accordance with the XQuery 1.0 and XPath 2.0 Functions 
and Operators [XPathFunc]."

It is explained there, '?' says that the input can be a sequence. If 
people think that we shall duplicate these terminology definitions here, 
they can of course be included. We need to decide over that.

> [1] http://www.w3.org/2007/OWL/draft/ED-owl2-rdf-text-20081104/


-- 
Dr. Axel Polleres
Digital Enterprise Research Institute, National University of Ireland, 
Galway
email: axel.polleres@deri.org  url: http://www.polleres.net/
Received on Thursday, 6 November 2008 16:30:27 UTC