Re: review of rdf:text, dated 2008-11-04 from Jos de Bruijn on 2008-11-06 (public-rdf-text@w3.org from October to December 2008)

From: Jos de Bruijn <debruijn@inf.unibz.it>
Date: Thu, 06 Nov 2008 19:57:17 +0100
To: Axel Polleres <axel.polleres@deri.org>
CC: public-rdf-text@w3.org, RIF WG <public-rif-wg@w3.org>
Message-ID: <49133E0D.6040302@inf.unibz.it>
Axel Polleres wrote:
> 
> Jos de Bruijn wrote:
>> I reviewed the current draft of the rdf:text specification [1].
>> I subdivided my comments into criticism on the content, criticism on the
>> structure, errors in the document, and editorial issues.
>>
>> Criticism on the content
>> ====
>> - to assure maximum compatibility with current and future versions of
>> XML schema datatypes, the string parts of both the lexical and value
>> space should be based on the respective spaces of the XML schema
>> datatype string.
>> - the set of characters is finite, and thus it cannot be assumed that it
>> is infinite. The problem that some OWL 2 implementations might have some
>> issue with the finiteness of this set is of no concern to this datatype
>> per se. In fact, the XML schema string datatype is based on a finite set
>> of characters, and so OWL 2 implementations will run into problems with
>> this datatype.
>> If there is really a problem to be expected with implementations of OWL
>> 2, it should be dealt with in the OWL 2 specification, and not the
>> specification of this datatype.
> 
> I guess all the above is better answered by the OWL crowd who insisted
> on that solution with infinite characters. We had quite some discussions
> on that. I see Bijan already picked up the issue.
> 
>> - concerning the definition of fn:text-length: It is not obvious that
>> this function should return the length of only the string part of the
>> text. A user might expect the language tag, and perhaps even the
>> separator used in the lexical space, to be taken into account when
>> computing the length.
>> Therefore, I believe no text-length function should be provided.
> 
> It is convenient to emulate the length facet.
> Whether or not to include the lang-tag is a coin-flipping decision,
> where I wnet for the option to not include it.

To me, the fact that the coin needs to be flipped indicates that one
should not have this function.

> The rationale to include this and other functions was to have simple
> functions (even if syntactic sugar) to emulate the facets with one
> function and not having to write complicated nested functions for that.
> Therefore, I believe the text-length function should persist.
> 
>> Criticism on the structure
>> ====
>> - the sections 3.1 and 3.2 are not logically part of the definition of
>> the data type, and so should not be included in section 3.
> 
> I see no logical place where else it should go. Moving these sections
> into own section would give the impression that these shortcuts and the
> treatment of xs:string as rdf:text are optional, but in my opinion they
> are an essential part of the whole idea of introducing this unifying
> datatype. I would though not strongly object to moving 3.1 and 3.2 to
> own sections, we could just make them sections 4 and 5 i.e move them
> from subsections to own sections... would that remedy your concern?

Yes.

> 
>> Errors in the document
>> ====
>> - In the example in section 3.2 it is claimed that the string "Padre de
>> familia" is mapped to the same value as the text "Padre de familia@".
>> This is clearly not true.
> 
> it is, in the reading of section 3.1. Maybe this example should be moved
> to section 3.1 then, because it illustrated the uniform treatment of
> strings as texts.
> 
>> - In the definition of text-from-string-lang, $arg2 must be a string as
>> specified in BCP 47, and otherwise an error must be raised.
> 
> you mean a valid language tag... yes, you are right, I corrected this.
>>
>> Editorial issues
>> ====
>> - abstract: "both in" => "in both"
> 
> ?

I meant: replace "both in" with "in both"

> 
>> - introduction: the text about how this document came to be and about
>> the collaboration between the working groups might be interesting for
>> the "purpose of this document" section, but not for the specification
>> document itself. However, I guess that for the first public working
>> draft it's not really an issue.
> 
> probably not, we might move it in future versions.
> 
>> - the references of the form [1],... are awkward. Please use the same
>> style for all references.
> 
> I agree that we should, in the final version remove the references to WG
> mails and wiki documnts and we should fic the references to documents.
> 
>> - some of the references are italicized, and some are not, e.g., the
>> second sentence of section 2.
>> - sections 4.1.3 and 4.1.4: please specify the return values; extraction
>> is a process.
> 
> the return types are given:
> 
> as xs:string
> 
> as xs:lang
> 
> (BTW, I changed xsd: to xs: ithroughout the document following some
> earlier resolution in RIF with this regard)

I meant the summary text. This talks about "extracts", rather than
"returns", whereas all other function definitions talk about return values.

> 
>> - the text and summaries in sections 4.2.1 and 4.2.2 is not entirely
>> clear. Please use symbols for referring to the individual parts of the
>> arguments and to state properties about them, like in sections 4.1.3 and
>> 4.1.4.
> 
> Hmmm, the summary is a summary, the exact semantics is obvious from the
> XQuery function declaration, so I honestly don't see a problem. I don't

The semantics is not obvious. It needs to be specified, as you did for
the other functions.

Something like:
returns true if and only if for $comparand1=(s1,l1) and
$comparand2=(s2,l2), s1=s2 and l1=l2


> see how the XQuery/Xpath functions summaries are in any sense more
> precice then what I used here.
> 
>> - there is a question-mark in the signature declaration in section
>> 4.3.2. It is not clear what this means.
> 
> "The terminology used and structure to describe these functions and
> operators is in accordance with the XQuery 1.0 and XPath 2.0 Functions
> and Operators [XPathFunc]."
> 
> It is explained there, '?' says that the input can be a sequence. If
> people think that we shall duplicate these terminology definitions here,
> they can of course be included. We need to decide over that.

I would suggest to link directly to the specification of the syntax, or
include a brief explanation of what the question-mark means.

> 
>> [1] http://www.w3.org/2007/OWL/draft/ED-owl2-rdf-text-20081104/
> 
> 

-- 
Jos de Bruijn            debruijn@inf.unibz.it
+390471016224         http://www.debruijn.net/
----------------------------------------------
No one who cannot rejoice in the discovery of
his own mistakes deserves to be called a
scholar.
  - Donald Foster
Received on Thursday, 6 November 2008 19:04:34 UTC