RE: datatyping is not needed

Jim: I will get to your proposed compromise at the end of this e-mail.

> DASL can provide, at most, the base datatype, which is like
> the bicycle, but nothing else.  (Even the XML Data drafts 
> don't go beyond, at most, some simple ranges limits.  Still 
> more like a Geo Metro.)  

DASL can, if it wants to, provide the following in the 
query schema information FOR EACH PROPERTY, because the 
server has this information available:

(1) property name.
    Required. This is good to provide. Then users don't 
    have to guess the property names. (Guessing would be 
    awful.) And, UI's can give instant, pinpoint accurate
    feedback right at the point of commission of the error
    when incorrect property names are entered 
    for any reason, including typos and spelling errors.
    Also, a list of the legal properties could be shown
    to the user on demand.

(2) data type.
    The 90% case is {numbers, strings, datetimes, Booleans}. DASL 
    should let the datatypes for other datatypes be absent, with 
    the understanding that the absence of a datatype means the 
    property is not a number, string, datetime, or Boolean. 
 
(3) textual property description
    Optional. 

(4) case insensitive comparisons or not, case preserved or not.
    Required for strings if upcased or downcased, or if case 
    insensitive comparisons are performed. If null, then not upcased 
    or downcased, and comparisons are case sensitive. Disallowed
    for other datatypes. This reflects the common practices of 
    more than a few systems.

(5) value constraints
    Optional. Ranges, enumerated list of values etc. If unspecified,
    the value is not constrained in this way. I don't think this is
    in the 90% case, so I don't think DASL needs to go this far.

(6) character set and collation sequence.
    Only allowable for strings. Collation sequence is defined to 
    include case sensitivity or not. This information is known by 
    the server and could be provided. However, I don't think we
    should go this far for DASL 1.0. 

I could continue with more information the server has available, 
but I've already gone past the point I think DASL should go for 1.0 .

> Still more like a Geo Metro.

For the following example, let's say DASL just provides 
(1) and (2) above. Then, DASL can probably support the following 
system adequately. If not, it can't. This system has made 
Billions of dollars, is an industry leader, and is still going strong.

Queries can optionally be restricted to a folder, and optionally 
be restricted to any subset of the document classes. The query 
condition is always a conjunction of zero or more simple binary 
operator expressions, where the first operand is a property name, 
and the second operand is a literal constant of the appropriate 
type for the property. The only operators supported are 
>, <, >=, <=, =, !=, and the SQL LIKE operator. (NOT and OR are
not used by this UI.) The properties must be of datatype number, 
string, or datetime. The system administrator can decide that a 
given specific string property is always upcased and that
comparisons are case insensitive.

When the user mistypes a property name, a clear error message 
pops up immediately. The user can retype it or use a pulldown of 
the property names. The operator can be typed or selected from a 
pulldown menu, whose contents depend upon the datatype of the 
property. If an non applicable operator is typed, a clear error 
message pops up immediately. The constant must be typed, and a 
clear error message is given as to what is wrong with the value 
if an improperly formed constant is typed.

The system can find a document among tens of millions of other 
documents quickly. Unsophisticated clerks deal with it nicely, 
because it is so simple, yet it meets their needs.

So, as to the characterization of this system as having a Geo 
Metro UI or a Cadillac UI, I don't think that characterization 
important. (Also, this system has more than one UI.) I think that 
what is important is supporting at least the functionality in this 
time proven system and similar systems. This requires (1) and (2) 
above:

I assume that requiring (1) is self evident.

Suppose we took away (2). Then the user could type an inappropriate
operator or constant for the property. It would be sent to the server
to the DASL layer, perhaps down through the API layer, and perhaps 
even into the SQL RDBMS. (It depends on where the checking is done.)
The SQL RDBMS would give an error, the DMA layer would transform 
that into a 32 bit DMA error code, and, no matter where the error is
caught, the DASL layer would return a three decimal digit error 
code that is overloaded (something vaguely indicating "improperly
constructed query"). The user would have to wait to get the 
three digit error code, and then guess which part of his query was 
in error. This is a bad user experience. It also reduces productivity
significantly because of the wait, and also because of extra network
traffic and server CPU cycle consumption.

>Smart UIs will use out of band information to get this metadata, 
>including, but not limited to, private agreements among implementors.
>They can still transmit the query using DASL.

I consider DASL 1.0 to be inadequate -- a failure -- if it can't
support basic systems similar to the above example with only 
in band information. On future releases, the metadata DASL returns 
will have to be enhanced until the systems people think are 
important are adequately served. Let's let market demand tell us 
what those are.

>> However, for the sake of interoperability,
>>DASL must state that the datatype of the literal must be compatible
>>with the datatype of the property and give all the details.
>
>I am not sure what details you are asking for.  
>Would this require DASL to
>state the base datatypes for every RDBMS, OODB, and 
>DMS known to humanity?

No, we won't have to support every datatype known to humanity, 
thank goodness. For details, we just say that numbers must not have
extraneous garbage characters in them, datetimes must conform to 
date/time syntax (ISO 8601 or whatever), and Boolean constants must 
be "t" or "f". We can give a few more details, if we like, which 
would probably be a good idea. Maybe we could reference some 
standard(s) for the syntax of these literal values. That's all.

OK. Now for the compromise.

>1) The syntax of properties in the Query Schema Discovery will allow
>for an optional datatype field (attribute or XML element, I don't
>care much).  The datatypes will be a subset of those defined in XML 
>Data sufficient for WebDAV properties.  The only use of datatyping 
>in DASL will be in schema discovery, not in query terms.
>
>2) No datatyping in DASL 1.0, but we promise (as a WG) that DASL 2.0 
>will incorporate XML Data in its full power and expressiveness 
>as soon as XML Data stabilizes (provided it does, and provided 
>there's not a better car available then.)
> ...
>Does either of these appeal to you?

Yes, (1) appeals to me. (I've said essentially that
in other e-mails.) Of course, when I read "optional datatype", 
I want it to mean optional in the sense of my number (2) 
above, i.e., that if the datatype is missing, that indicates 
that the datatype is NOT one of the basic datatypes DASL
references (integer, real, string, Boolean, datetime).
(That keeps us out of the business of being concerned
with numerous non basic datatypes that aren't in the
90% case.) I see no downside to this approach.

In other words, I view your number (1) as being close
(or maybe even identical, depending upon the semantics of
optionality) as requiring my number (1) and (2) above
in the query schema. I consider that to be the absolute
minimum. I would also like to see my number (3) and (4)
above included in the query schema. However, I'm willing to
discuss the possibility of not including (3) and/or (4) 
in DASL 1.0, even though I don't see a downside.

As regards the syntax of the query schema, I think we should 
look at both alternatives -- using tags, and using 
attributes -- and pick the best approach.

Alan Babich

Received on Thursday, 16 July 1998 17:48:44 UTC