RE: datatyping is not needed from Babich, Alan on 1998-07-21 (www-webdav-dasl@w3.org from July to September 1998)

From: Babich, Alan <ABabich@filenet.com>
Date: Mon, 20 Jul 1998 21:30:35 -0700
To: "'Jim Davis'" <jdavis@parc.xerox.com>, www-webdav-dasl@w3.org
Message-ID: <72B1992276A9D111A20E00805FEAC96D01324C7E@cm-expo1.filenet.com>
My comments interspersed.

Alan Babich

> -----Original Message-----
> From: Jim Davis [mailto:jdavis@parc.xerox.com]
> Sent: July 20, 1998 7:53 PM
> To: www-webdav-dasl@w3.org
> Subject: RE: datatyping is not needed
> 
> 
> At 05:58 PM 7/20/98 PDT, Babich, Alan wrote:
> >I presume that you mean to drop data type indications from 
> the query schema 
> >discovery, not just the query itself (select, where, and 
> sortby). I presume
> >that this also means you withdraw the offer of compromise 
> that you and I 
> >nearly reached on the QSD.
> 
> No, not yet anyway.  If it turns out that datatype is never used in
> queries, then I would consider removing it from QSD.  On the 
> other hand, I
> think there's a valid argument that providing it in QSD is 
> both very cheap
> and provides some real (even if small) benefit to clients, so 
> it might be
> worth providing as a *hint*.  But we need to settle the query 
> issue first,
> since clearly if datatyping *is* ever allowed in a query, 
> that strengthens
> the argument for putting it into QSD considerably.  Even if 
> datatype is not
> allowed in the query, it might still go into QSD.  Indeed, I can see
> incorporating even more metadata, along the lines of the
> PropertyDescription object in DMA, or perhaps the Metadata 
> Repository used
> by the Stanford digital library protocol.  So please consider 
> QSD an open
> issue for now, but let's settle the issue of whether it 
> belongs in queries.
ALAN BABICH: I don't understand two things: (1) What do you mean
"use datatype in the query"? Does this amount to decorating 
literals? (2) What is the logical connection between datatyping
in the query and datatyping in the QSD? I don't see that it
being in the query strengthens or weakens the argument for
putting it in the QSD. (Having it in the QSD provides information
that can be used to create a better user experience, and it allows
generic clients to be written, because they can be driven
of off metadata, not hardcoded to a specific schema. That's
about it, I think. The server already has the metadata information, 
so it isn't required in the query for the 80% cases. We just might
all agree on the stuff within these parentheses.)

> 
> The remainder of your email message lays out a major premise I would
> paraphrase as "divergence from past accepted practise is 
> dangerous".  The
> minor premise I don't understand so clearly, it is something 
> like "past
> accepted practise is 'query conditions and order by on scalar 
> values that
> are of one of the 5 fundamental datatypes (integer, string, etc.)'".
ALAN BABICH: Yes #1. On the beaten path, if there were land mines,
somebody before you blew up, and the problem has been solved by now.
Once you get off the beaten path, you have to be extremely careful 
you don't go down a rathole that open up a research area that blows 
out your schedule. Yes #2. I have been working with DBMS query
since before SQL was invented. Back in the early days, we still
only queried scalar values of basic data types. The basic datatypes
of SQL 92 are still very limited. Let me quote from the spec., 
section 4.1 "Data types": "SQL defines distinct data types named 
by the following <key words>s: CHARACTER, CHARACTER VARYING, BIT, 
BIT VARYING, NUMERIC, DECIMAL, INTEGER, SMALLINT, FLOAT, REAL, 
DOUBLE PRECISION, DATE, TIME, TIMESTAMP, and INTERVAL." That's 
an exhaustive list. You don't get to query structures, or arrays, 
for example, or XML valued properties, or ...

The string types are CHARACTER and CHARACTER VARYING. 
The bit string types are BIT and BIT VARYING.
The integer types are NUMERIC, DECIMAL, INTEGER, and SMALLINT with
a scale of zero. 
The non integer types are NUMERIC, DECIMAL,
INTEGER, and SMALLINT with a nonzero value of scale, plus
FLOAT, REAL, and DOUBLE PRECISION. 
The datetime datatypes are DATE, TIME, and TIMESTAMP. 
The INTERVAL time is a datatype for the difference between two 
datetime datatypes. 

Since that's all there are, that's all you can query in SQL 92.

DASL has simplified things by leaving out the bit string and interval
datatypes.

> 
> Does this argue for datatyping in the QSD, or in the query? 
ALAN BABICH: I'm mostly concerned with the QSD having
datatype information plus the other information we
talked about, and then hopefully beefing it up later.

> If the former,
> I think you've stated the case for use in the QSD well 
> already, so please
> don't repeat it.  
ALAN BABICH: Thank you for saying so. I won't repeat it.

> If the latter though, it needs more explanation.
ALAN BABICH: OK.

> 
> Of course, all things being equal, trying anything new is 
> risky.  But in
> this case, we seem to not have the luxury of just repeating 
> the old and
> familiar.  If tried and true SQL (or Z39.50) were good 
> enough, then DASL
> could just be a Web encoding of one of those, and we'd be 
> done.  Like it or
> not, WebDAV has already made some design committments, e.g. that
> resourcetype is an XML element not PCDATA.  We have to 
> support those.  DASL
> is, first and formost, a query for the model that WebDAV 
> exposes, not for
> RDBMS.  We can't assume the underlying store is tables.
ALAN BABICH: Yes, you're right, we have to support the WebDAV
design commitments. So, yes, you're right, we do have to do
some things that are somewhat different (for example,
we don't have resource classes, so we can't do joins).
No, you're right, we can't assume the underlying store 
is always tables. However, we should make some effort to try
to accommodate that case, since we believe (or at least I
believe) it is an important use case. Accommodating that
case reasonably well will help DASL succeed. I don't see
any reason why we can't.

> 
> My emails have shown why datatype is not needed in queries 
> for "live" and
> "famous dead" properties, and why it's harmful if adopted for 
> the "obscure
> dead", and why there is scarcely any problem by treating the 
> dead as mere
> strings.  If you - or anyone else - have counter arguments 
> against any of
> these, please state them.
ALAN BABICH: As I understand them, "obscure dead" properties
are off the beaten path. I didn't state it explicitly (sorry), 
but the scalar properties of one of the 5 fundamental data types
are required to draw values from a well defined domain of
values in order to be on the beaten path. 
Domains of values are a subset of a basic datatype.
For example, house numbers are a subset of the integers.
Consequently, without a plausible use case that shows
me they are significant, I tend to think they are not
important for 1.0. Given the number of things I'm committed
to do, I look for things I do NOT have to drill down into.
Obscure dead properties fell into this category, so I haven't
thought about them in depth. So now I will.

So, how obscure are they? Can you put an integer into one
belonging to ordinary resource 1, put an ASCII string into one 
belonging to ordinary resource 2, and then put in a datetime value to
update the property on ordinary resource 1? In other words,
if the datatype of such a property is not predefined in
a schema for the collection (analogous to having an RDBMS schema
for a relational database), what constrains/defines the possible
values for the property? Apparently the datatype is determined
when you stuff a value into it? Thus, it seems PROPPATCH
has to determine the value. So, can it be any value at any time?

case 1: Yes, it can be any value any time. Then, you probably
want to be able to say "find me only resources having a value for
this property that is of datatype D, and compares (>, =, or whatever)
to this literal X (also of datatype D)". 
So now we have datatypes in the query.

case 2: No, just try to treat them as strings -- even floating
point numbers. A problem with this is that if there was
no datatype discipline in the first place, why would
there be discipline in the second place? Are people really
going to remember and bother to put leading zeroes in front
of integers, and expand out floating point numbers to 
eliminate the exponent and expand it out into a fixed
length field just so greater than and equal will work?
Wow. You could get some really long floating point literals
that way. How does one know, in general, how many leading
zeroes (and trailing zeroes) are needed if there is no
schema? If you know nothing, you have to expand out to the
full range of the exponent in a double precision floating
point number. I don't think that approach is reasonable. If they 
were really strings, just say, "they're strings". Then the
numeric and datetime comparisons are irrelevant. (We assume
we can deal with string comparisons.) So, then, can we
say that the datatype of all obscure dead properties
is string? I don't think so, because I suspect that
people want fundamental-datatype-based comparisons, not string 
based comparisons for them.

One approach might be to store the datatype along with 
the property each time a value for the property is stored
on a resource. Then, the datatype would be discoverable at run 
time on a resource instance basis. Then, they might be better
characterized as dynamic datatype dead properties. Of
course, there are still issues to address, but hopefully the
solutions wouldn't be total kludges. Of course, this approach
might not be upward compatible with the existing WebDAV
spec. (e.g., can PROPPATCH specify the datatype of the value
to be stored?) It's very late, and this approach is off the top 
of my head. It wouldn't surprise me if it didn't fly when we
looked at it closer. If it doesn't fly, then I would ask,
do we really need to allow obscure dead properties?

This is all theoretical to me, because I don't have a believable
use case in mind. So I'm very suspicious about any arguments
and conclusions I make. I would very much appreciate someone
suggested a believable (to me) use case of some significance
for DASL 1.0 for obscure dead properties. Of course, maybe
I don't quite understand what they are, in which case I
would very much appreciate being enlightened.

> 
> best regards
> 
> Jim
> 
> 
> 
> 
> ------------------------------------
> http://www.parc.xerox.com/jdavis/
> 650-812-4301
>
Received on Tuesday, 21 July 1998 00:33:33 UTC