RE: Comments on datatyping draft

> -----Original Message-----
> From: ext Graham Klyne [mailto:GK@NineByNine.org]
> Sent: 30 August, 2002 17:49
> To: RDF core WG
> Subject: Comments on datatyping draft
> 
> 
> 
> Comments based on:
> http://lists.w3.org/Archives/Public/www-archive/2002Aug/att-01
> 00/01-dtrec2.zip
> 
> [I've looked very briefly at Patrick's new Part1/Part2 
> document -- I think 
> some of these comments *may* still be relevant, so I'm this 
> posting now, 
> without updating it or re-doing the review.]
> 

Fair enough. I'll only reply to those comments I feel are relevant
to the latest restructured document. If you think I've missed
something, let me know.

> Abstract:
> ---------
> If I read this in isolation, I'm not sure I'd know what the 
> document was about.
> 
> Starting with something like:
> [[
> This document describes how to reference data type values, 
> such as numbers 
> or dates, in an RDF graph. ...
> ]]
> would help.

Well, I would be concerned about saying that the values are
"in the graph". Do we now say that URI denoted resources are
themselves "in the graph" or that their denotation by a 
URIref node is in the graph?

This touches on whether there are any trully native datatypes
in the RDF abstract graph syntax, or just a standardized,
generic means of denoting datatype values.

> 
> Section 1:
> ----------
> Seems very heavy on general RDF background -- I'd have 
> thought one short 
> paragraph would be enough for that.

Sure, that can be trimmed down a good bit, and a reference
added to one or more of the other specs.

> 
> 1.1 What is Datatyping?
> -----------------------
> 
> 1st para:
> [[
> Due to RDF's role as a means of interchange between disparate 
> systems, and 
> in order to achieve portability and independence of platform it is 
> necessary to forgoe any native representation of values or 
> native datatypes 
> in RDF itself. This means that
> ]]
> I suggest deleting the above text, leaving:
> [[
> RDF has no built-in knowledge about particular datatypes such 
> as strings or 
> integers, and the lexical representation of a given value, 
> such as the 
> number twenty-five "25", has no native interpretation in RDF. RDF is 
> datatype neutral in the same manner as it is vocabulary neutral. The 
> specific semantics for individual datatypes must reside in 
> the application 
> layers above RDF.
> ]]
> 

Sounds better. Thanks.


> 1.2 Desiderata for RDF Datatyping
> ---------------------------------
> 
> I think this was already noted:  the desiderata seem somewhat 
> out of sync 
> with the current goals.

Umm, ahem, the desiderada are meant to reflect the original concerns
and issues that WG members expected/hoped datatyping would address,
not the current solution ends up addressing. I think it's valuable
to document these and clearly indicate which have been satisfied
and which have not.

However, that said, the desiderada section could very well be moved
to the non-normative appendices, and paired up with a section (to be
added) that details the official closed issues list relating to
datatyping.

> 
> 1.5 Comments on the Structure of RDF Literals
> ---------------------------------------------
> 
> I appreciate the text is currently still under 
> review/development, but in a 
> final version it might be appropriate to refer to Jeremy's 
> abstract syntax 
> for this.

Agreed, that's what

   "[refs to syntax/primer/mt/etc]"

was meant to suggest ;-)

Sorry the 'etc' was too vague... I wasn't yet sure which other
spec it would be best to reference.

> 
> 2. RDF Datatypes
> ----------------
> Concerning Dan's comments about verbification, I think many of the 
> instances of "datatyping" here could be replaced by 
> "datatype" without loss 
> of information.  e.g.
> 
> Also, I'd suggest simplifying the reference to XML datatypes:
> 
> [[
> The conceptual framework for RDF datatypes presented in this 
> specification 
> uses concepts from the type system defined by XML Schema.  It 
> also can be 
> used with any datatype framework which conforms to the 
> characteristics 
> defined below.
> ]]

The restructured document includes just such a simplification. Have
a look.


> 
> 2.1 rdfs:Datatype
> -----------------
> 
> Possible simplification:
> [[
> An rdfs:Datatype is defined as consisting of
> ]]

The restructured document is even shorter ;-)

   "An rdfs:Datatype consists of ..."

> 2.3 Typed Literal
> -----------------
> 
> [[
> A typed literal is a pair where the first element is a URI 
> Reference (or 
> implicit systemID) denoting a datatype and the second element 
> is a lexical 
> form (literal). Following from the nature of datatypes as 
> defined above, 
> this pairing of datatype and lexical form unambiguously identifies a 
> specific member of a datatype mapping and hence a specific 
> member of the 
> value space of the datatype.
> ]]
> What is an "implicit systemID"?
> 
> I think the second sentence is redundant and potentially 
> confusing - I 
> suggest removing it.
> 
> I think the final paragraph could be pared down to something like:
> [[
> The means for defining an rdfs:Datatype are not specified 
> here.  It is 
> presumed that an agent that needs to interpret a typed literal has 
> sufficient knowledge of the datatype used to do so.
> ]]
> In particular, I think the "implicit designation" is 
> incompatible with 
> current WG decisions.

N/A in restructured document (at least Part 1)

> 
> 3.1 Local Datatyping
> --------------------
> 
> I agree that the use of rdf:type here is unfortunate, and likely to 
> confuse.  

In what way? I *really* must be off in la la land on this one,
as the use of rdf:type seems to me to be the most correct,
concise, and accurate means to state that the, ahem, rdf:type
of the typed literal node is the datatype class specified.

Before I go any further, I will specify whatever term the WG
decides to use, but with my editor's hat off, I offer the following
comments.

I think there has been an agreed principle that vocabulary will
only be added when absolutely necessary. And I've yet to see any
convincing argument why use of rdf:type is not completely correct.

The use of any other term will require additional explanation
that, actually, it really means the *same* as rdf:type, but
for some reasons (which I've yet to grasp) we've just used some
other term.

Consider the following examples, which simply define "some"
typed resource:

   <bar rdf:type="http://booga.com/wiggyhoo"/>
   <bar foo:blarg="http://booga.com/wiggyhoo"/>

which give us

   ?s bar _:x .
   _:x rdf:type <http://booga.com/wiggyhoo> .

   ?s bar _:y .
   _:y foo:blarg "http://booga.com/wiggyhoo" .
  
respectively.

Now, clearly, rdf:type must be used here if we are
to get a URIref node for the attribute value, otherwise,
we get a literal containing the URI. If we introduce
another attribute other than rdf:type, we have to both
state that it has identitical semantics to rdf:type
as well as treat the attribute value exceptionally as
a URIref rather than a literal just like rdf:type. But
why? They would have identical meaning and purpose!

I hope it's obvious that using rdf:type here is clearer
and better than some new term that is otherwise identical
in every way to rdf:type.

OK, now let's consider two similar examples, with typed literals:

   <bar rdf:type="http://booga.com/wiggyhoo">79*7%22.191/22</bar>
   <bar foo:blarg="http://booga.com/wiggyhoo"79*7%22.191/22</bar>

which, if literals *could* be subjects (which they can't and I'm
not suggesting here that they should), we'd get the following
graph:

   ?s bar _:x"79*7%22.191/22" .
   _:x rdf:type <http://booga.com/wiggyhoo> .

   ?s bar _:y"79*7%22.191/22" .
   _:y foo:blarg "http://booga.com/wiggyhoo" .

Again, rdf:type does "the right thing" and provides us with the
correct semantics.

Now since we can't have literals as subjects, we use a compact
typed literal node representation, and to get that for foo:blarg
we would of course have to specify that foo:blarg also results in 
such a typed literal node representation, in which case, we'd get the
same result for either term:

   ?s bar <http://booga.com/wiggyhoo>"79*7%22.191/22" .

But we'd still have to define that foo:blarg takes a URIref rather
than a literal as its attribute value, otherwise, we'd get

   ?s bar "http://booga.com/wiggyhoo""79*7%22.191/22" .

which isn't what we want for a typed literal node.

Now, we know from the semantics of typed literal nodes, that
the typed literal node above has an rdf:type of 
http://booga.com/wiggyhoo. So which serialization most accurately
reflects that. Clearly, I think, the one using rdf:type.

And, since one can already now say

   <bar rdf:type="http://booga.com/wiggyhoo"/>

to express that the value of bar is "something" of type
http://booga.com/wiggyhoo, is it not obvious that using
a different term for the more explicit specification of
what that something is, e.g.

   <bar foo:blarg="http://booga.com/wiggyhoo"79*7%22.191/22</bar>

will be far more confusing and cumbersome to explain than
simply using the same term which already expresses
the actual intended semantics and is recognized by parsers
as taking a URIref attribute value rather than a literal
string value:

   <bar rdf:type="http://booga.com/wiggyhoo">79*7%22.191/22</bar>


Eh???

Am I the only one for whom this makes sense?


> For what I understand to be the current purpose, 
> I'd tend to 
> favour using XML schema syntax in a way that is fully 
> compatible with XML 
> schema processors, leading to things like:
> 
>    <age xsi:type="xsd:integer">25</age>
> 
> with appropriate namespace declarations.

The use of xsi:type is not possible, because rdfs:Datatype is
not constrained to the XML Schema specification.

> 
> 3.2 Global Datatyping
> ---------------------
> 
> I understand the WG has opted to defer this in favour of the 
> datatyped 
> literal approach; e.g. per 
> http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2002Aug/0135.html
> 
> Unless I've misunderstood, I think this entire section should 
> be removed.
> 
> I would be very wary about trying to incporporate complex XML 
> schema types 
> into RDF, as in the vCard example, because that leads to 
> different ways to 
> structure the same information.  I believe that would be damaging to 
> interoperability between RDF processors.

N/A, given restructured document.

> 
> Section 6
> ---------
> Section number jumps:  shouldn't this be 4?
> 
> I'm not reviewing this for now, because I don't think literal 
> subjects are 
> part of our current work package.


N/A, given restructured document.

> 
> 5. RDF Datatyping Model Theory
> ------------------------------
> 
> I think item (3) is outside the scope of our current goal


N/A, given restructured document.

> 
> 6. RDF Schema for Datatyping
> ----------------------------
> 
> I think this is outside the scope of our current goal

Eh? If we're defining an official RDF Class, why would we not
be expected to also provide a normative schema for that definition?

Is the RDFS schema outside the scope of the RDFS spec?

To be clear, if the WG doesn't want it included, I'll axe it,
but it seems to definitely be in scope to me.

Comments from others?


> 6.1.2 CC/PP
> -----------
> 
> I think this example, as stated, is outside the scope of our 
> current goal
> 
> ...

N/A, given restructured document.

--

Thanks for the feedback.

Cheers,

Patrick

Received on Saturday, 31 August 2002 04:35:25 UTC