Re: Comments on dt proposal from Patrick Stickler on 2002-08-30 (w3c-rdfcore-wg@w3.org from August 2002)

From: Patrick Stickler <patrick.stickler@nokia.com>
Date: Fri, 30 Aug 2002 17:03:55 +0300
To: "ext Jan Grant" <Jan.Grant@bristol.ac.uk>, "RDFCore Working Group" <w3c-rdfcore-wg@w3.org>
Message-ID: <pp6ZSkZ41Kgr.mPByPBRW@mail.nokia.com>
_____________Original message ____________
Subject:	Comments on dt proposal
Sender:	ext Jan Grant <Jan.Grant@bristol.ac.uk>
Date:		Fri, 30 Aug 2002 16:05:33 +0300


as of http://lists.w3.org/Archives/Public/www-archive/2002Aug/0111.html

Think the whole thing needs running through a spell-checker;
word-smithing comments not included.

	Well, I was mostly concerned about the technical
	accuracy. It's not meant to be published.

Para 1.1 "This means that RDF has no built-in knowledge about particular
datatypes such as strings or integers,"

this is _only_ true if the WG decide not to 'seed' RDF's DT knowledge.

Or if the WG has not made any decision to do so.


	Well, I'm not aware of an official WG decision
	to do such seeding, so I think the document 
	reflects the status quo, not a wish.

Section 1.4

(Stylistic) Whatever else might have been said about this, the use of
xml entities _is_ clear and well-explained, I think.


Section 1.5

"The parseType bit and xml:lang (if present) are irrelevant to RDF
Datatyping and to the meaning of the lexical form."

I think this is the wrong way to express things. RDF (as expressed in
RDF/XML) seems to come with several "built-in" datatypes (of debatable
worth) - unicode strings, unicode langstrings, xml structures
(fragments?) and lang-tagged xml structures. Better and more consistent,
I think, to consider the current RDF/XML syntax a "syntactic sugar" for
expressing literals with those types. That is,

	<eg:property xml:lang="en">foo</eg:property>

expresses the locally-typed literal

	(something:langstring, {en, foo})

	The xml:lang and parseType="Literal" flag
	are both "residue" of the XML serialization
	and not relevant to datatyping semantics.
	This appears to have been the majority
	view of thw WG for a very, very long time.

	If this has, or would, change, then I'm happy
	to say otherwise, or say the above more
	clearly.

	Or feel free to offer alternative wording that
	says the same (rather than some other view)


Section 2.1

"An rdfs:Datatype consists of

a set of distinct values, called its value space
a set of lexical representations or forms, called its lexical space
an N:1 mapping from the lexical space to the value space, called its
datatype mapping"

Stupid question/niggle, but what do you mean by N:1? Are the value
spaces of all datatypes constrained to be countable? (I don't think it's
a problem if they are, it's just slightly odd). Ah yes, section 2.2
backs this countable thing up, ok.

	I should add that N > 0 explicitly.

Section 3.

DO NOT USE RDF:TYPE! This is really important. Overloading the term is a
confusing mistake (imho).

	I don't understand where the overloading would
	occur. If we take two examples, both of which
	specify as the object "something" that is of
	type xsd:integer:

	<age rdf:type="&xsd;integer"/>
	?s age _:x .
	_:x rdf:type xsd:integer .
	
	<age rdf:type="&xsd;integer">10</age>
	?s age xsd:integer"10" .

	Now, both object nodes have the same type
	and the use of rdf:type has identical semantics.

	Where is the overloading?

	The fact that we have a more compact graph
	representation for typed literals (where the
	lexical form is specified) does not constitute
	overloading. Everything is working exactly as
	it should.

	

Section 3.1

OK, although the example of "conflicting" definitions for age might be
used to give a rule-of-thumb for defining properties (eg, divide out SI
units?). Dunno about that, but most datatyping practice will include
"standard practice" and some advice about this might be timely.

Of course, the property defined is <#age>; one might expect that where
properties are defined, care is taken to supply consistent range
constraints - even if those are spread across multiple RDF
schema documents.

	OK. I'll think on this.

Section 6.1.1

"Note that not defining any meaning to inline literals should not be
equated with interpreting inline literals to be strings (i.e.
self-denoting). It simply means that RDF does not say anything about
what an inline literal means, and leaves it up to each individual
application to decide whether a string or value is denoted by the
literal."

True; but I will note that such "syntactic" long-range typing is
perfectly doable, even with ranges of different properties defined in
multiple external documents, providing those external documents are
known of a priori [even if that knowledge is contained in some external
schema-element registry].

	IMO (and not as editor) I find the idea of syntactic
	mechanisms for implicit datatyping a royal can of
	worms and not at all practical in real use.

6.2.1, example 1.

"Alternately, controlled vocabularies and code sets such as dcq:MESH
could be denoted by URIs rather than typed literals, which would enable
each value to be qualified for type, label, etc."

This is the route I followed with EASEL. It works _reasonably_ well
although many "controlled vocabularies" (for example, UDC) are
theoretically recursive structures with parts chosen from fixed
taxonomies. I had no choice but to model these "in the graph" (ie, with
triples). Again, there might be some advice to be extracted here.




The question that arises, for me, when I read this, is: where do we stop
typing literals and acknowledge that the value of a property is a
_representation_?

	Like a URI is a representation?

    <eg:name> <rdfs:range> <xsd:string> .

    <eg:jan> <eg:name> <xsd:string>"jan grant" .

That isn't right, because my name _isn't_ a string. It can be written
down in unicode, sure, but that's just another representation. Why not
use the following datatyping:

    <eg:name> <rdfs:range> <eg:Name> .

    <eg:jan> <eg:name> <eg:Name>"jan grant".


- in other words, should "intention" leak into literal datatyping? And
if not, why not? CF. this example with the MESH literal in the DC
example.


	Well, if explicit datatyping is used, then the
	literal denotes a member of the value space
	of the datatype. If that value is supposed to
	be a string, or a name, or a person, then one
	better specify  the correct datatype.

	I see no ambiguity here. Though I could try
	to address this point in the implications
	and considerations section.


Appendix B.

Again, falls foul of overloading rdf:type and it's a gratuitous syntax
extension. I wouldn't be against it technically if rdf:type was renamed,
but process issues arise when talking about this wrt. getting the
current syntax document out.

	Could you please elaborate. Exactly how is
	rdf:type being overloaded? 

C.2

I like this, although I am _still_ not convinced that people would be
unable to live without the global implicit idiom; or that the global
implicit idiom would be used without a priori knowledge of schemas, as a
simple syntactic sugar.

	Well, I guess that's what the WG has to decide
	(or decide not to decide...)

	Cheers,

	Patrick
Received on Friday, 30 August 2002 10:04:56 UTC