Comments on dt proposal from Jan Grant on 2002-08-30 (w3c-rdfcore-wg@w3.org from August 2002)

From: Jan Grant <Jan.Grant@bristol.ac.uk>
Date: Fri, 30 Aug 2002 11:28:59 +0100 (BST)
To: RDFCore Working Group <w3c-rdfcore-wg@w3.org>
Message-ID: <Pine.GSO.4.44.0208301049420.5847-100000@mail.ilrt.bris.ac.uk>
as of http://lists.w3.org/Archives/Public/www-archive/2002Aug/0111.html

Think the whole thing needs running through a spell-checker;
word-smithing comments not included.


Para 1.1 "This means that RDF has no built-in knowledge about particular
datatypes such as strings or integers,"

this is _only_ true if the WG decide not to 'seed' RDF's DT knowledge.


Section 1.4

(Stylistic) Whatever else might have been said about this, the use of
xml entities _is_ clear and well-explained, I think.


Section 1.5

"The parseType bit and xml:lang (if present) are irrelevant to RDF
Datatyping and to the meaning of the lexical form."

I think this is the wrong way to express things. RDF (as expressed in
RDF/XML) seems to come with several "built-in" datatypes (of debatable
worth) - unicode strings, unicode langstrings, xml structures
(fragments?) and lang-tagged xml structures. Better and more consistent,
I think, to consider the current RDF/XML syntax a "syntactic sugar" for
expressing literals with those types. That is,

	<eg:property xml:lang="en">foo</eg:property>

expresses the locally-typed literal

	(something:langstring, {en, foo})


Section 2.1

"An rdfs:Datatype consists of

a set of distinct values, called its value space
a set of lexical representations or forms, called its lexical space
an N:1 mapping from the lexical space to the value space, called its
datatype mapping"

Stupid question/niggle, but what do you mean by N:1? Are the value
spaces of all datatypes constrained to be countable? (I don't think it's
a problem if they are, it's just slightly odd). Ah yes, section 2.2
backs this countable thing up, ok.


Section 3.

DO NOT USE RDF:TYPE! This is really important. Overloading the term is a
confusing mistake (imho).


Section 3.1

OK, although the example of "conflicting" definitions for age might be
used to give a rule-of-thumb for defining properties (eg, divide out SI
units?). Dunno about that, but most datatyping practice will include
"standard practice" and some advice about this might be timely.

Of course, the property defined is <#age>; one might expect that where
properties are defined, care is taken to supply consistent range
constraints - even if those are spread across multiple RDF
schema documents.

Section 6.1.1

"Note that not defining any meaning to inline literals should not be
equated with interpreting inline literals to be strings (i.e.
self-denoting). It simply means that RDF does not say anything about
what an inline literal means, and leaves it up to each individual
application to decide whether a string or value is denoted by the
literal."

True; but I will note that such "syntactic" long-range typing is
perfectly doable, even with ranges of different properties defined in
multiple external documents, providing those external documents are
known of a priori [even if that knowledge is contained in some external
schema-element registry].

6.2.1, example 1.

"Alternately, controlled vocabularies and code sets such as dcq:MESH
could be denoted by URIs rather than typed literals, which would enable
each value to be qualified for type, label, etc."

This is the route I followed with EASEL. It works _reasonably_ well
although many "controlled vocabularies" (for example, UDC) are
theoretically recursive structures with parts chosen from fixed
taxonomies. I had no choice but to model these "in the graph" (ie, with
triples). Again, there might be some advice to be extracted here.




The question that arises, for me, when I read this, is: where do we stop
typing literals and acknowledge that the value of a property is a
_representation_?

    <eg:name> <rdfs:range> <xsd:string> .

    <eg:jan> <eg:name> <xsd:string>"jan grant" .

That isn't right, because my name _isn't_ a string. It can be written
down in unicode, sure, but that's just another representation. Why not
use the following datatyping:

    <eg:name> <rdfs:range> <eg:Name> .

    <eg:jan> <eg:name> <eg:Name>"jan grant".


- in other words, should "intention" leak into literal datatyping? And
if not, why not? CF. this example with the MESH literal in the DC
example.



Appendix B.

Again, falls foul of overloading rdf:type and it's a gratuitous syntax
extension. I wouldn't be against it technically if rdf:type was renamed,
but process issues arise when talking about this wrt. getting the
current syntax document out.


C.2

I like this, although I am _still_ not convinced that people would be
unable to live without the global implicit idiom; or that the global
implicit idiom would be used without a priori knowledge of schemas, as a
simple syntactic sugar.






-- 
jan grant, ILRT, University of Bristol. http://www.ilrt.bris.ac.uk/
Tel +44(0)117 9287088 Fax +44 (0)117 9287112 http://ioctl.org/jan/
Work #90: As many pseudo-intellectual sycophants as necessary to make one
inarticulate scotsman think he's a genius in command of The Profound.
Received on Friday, 30 August 2002 06:29:08 UTC