More data types in RDF from William Grosso on 2000-03-10 (www-rdf-interest@w3.org from March 2000)

From: William Grosso <grosso@SMI.Stanford.EDU>
Date: Fri, 10 Mar 2000 09:38:27 -0800
To: www-rdf-interest@w3.org
CC: Dan Brickley <danbri@w3.org>
Message-ID: <38C93313.5EB651F3@smi.stanford.edu>
Dan Brickley wrote:
> 
> There wasn't much in the way of discussion of that topic before my
> semi-arbitrary cut-off date of 2000-02-25, but I owe the list a summary of
> the msgs to date. I think the topic was somewhat swamped by the 'certain
> difficult' thread so if everyone who wants datatypes in RDF and hasn't yet
> commented could review DanC's msg, that would be valuable.
> 

<h2> What is needed: </h2>

A set of primitive data types, universally defined 
(e.g. as part of the RDF spec and blessed by the W3C).
And a validating parser that enforces the syntactical
restrictions associated to the data types (or, at least,
that statement that parsers should be validating :-).

We can quibble over what exactly belongs in the set
of primitive datatypes, of course.  But, the point is, 
someone is going to solve the problem illustrated by 
the visa invoice DTD, where there are definitions like:

	<!ELEMENT InvoiceDate (#PCDATA)> 
	<!--String, 1..19 Character Datetime, (CCYY-MM-DDTHH:MM:SS-->

If RDF doesn't solve this problem, and some other specification 
does, RDF is more likely to become a niche solution. 

This means RDF needs:

	(1) A set of types
	(2) A set of valid syntax formats for each type
	(3) Interpretations for the syntax formats

E.g. what the XML Schema folks have already done. 

<h2> A side comment about facets </h2>

A set of facets like Stefan Decker's aren't really 
a solution for this for two reasons: 

	(1) Unless they live in a W3 namespace and have universal
	from-the-spec semantics, they're open to idiosyncratic 
	interpretations.

	(2) Facets are an complicated solution for this problem.
	The facet mechanism used in Protege-2000 (Download now! 
	http://www-smi.stanford.edu/projects/protege) allows us to
	restrict ranges and define value types. But it's also a
	general solution which allows us to assert many other 
	properties about a (property, resource) binding. Good stuff,
	no doubt about that. 

	But facets are not the most natural idea in the world. Even 
	if the underlying mechanism does look a lot like facets,
	there needs to be a layer of syntactic sugar on top of it
	for the simple cases ("this property has values which are
	integers"). Especially since properties are reified in 
	RDF.


<h2> What I'm not clear on: </h2>

That all seems obvious. So the question then seems to be: 
Should each type have an associated URI (presumably inside 
a W3C-owned namespace).

I like that a little. Part of me likes the idea of treating
all data types in the same way, and likes the idea of being able
to make assertions about INTEGER. Also, the idea of subclassing
INTEGER (and getting at least partial validation from the parser) 
seems quite nice. 

But it also seems like conceptual baggage without much purpose. 
I mean, when I explain resources, I say things like "A resource 
is something you can make assertions about."

Now, most programmers are going to hear that, see that INTEGER
is a resource, and ask "Why would I want to make an assertion 
about INTEGER ?"

And I'd wind up hemming and hawing and then saying something 
like "Well, in most cases you wouldn't. That's there for 
generality and to make everything symmetric."

And, believe it or not, that will taint the spec. It's not a 
huge taint, but it will give programmers pause. Programmers 
regard unecessary generality with suspicion. And rightly so-- 
it's often a red flag that indicates designers haven't really 
addressed the right problem. 

This is especially true in this case, where the spec would
explicitly depart from majority practice (most programming 
languages have explicit primitive data types that are treated 
distinctly from classes). 

Which means that I would rather RDF not go down the path Dan Connoly
indicated when he wrote:

>I was thinking that it meant we would supply, explicitly, a 
>URI for each of boolean, float, double, etc. 


William Grosso
Received on Friday, 10 March 2000 12:38:30 UTC