- From: Babich, Alan <ABabich@filenet.com>
- Date: Mon, 20 Jul 1998 21:30:35 -0700
- To: "'Jim Davis'" <jdavis@parc.xerox.com>, www-webdav-dasl@w3.org
My comments interspersed. Alan Babich > -----Original Message----- > From: Jim Davis [mailto:jdavis@parc.xerox.com] > Sent: July 20, 1998 7:53 PM > To: www-webdav-dasl@w3.org > Subject: RE: datatyping is not needed > > > At 05:58 PM 7/20/98 PDT, Babich, Alan wrote: > >I presume that you mean to drop data type indications from > the query schema > >discovery, not just the query itself (select, where, and > sortby). I presume > >that this also means you withdraw the offer of compromise > that you and I > >nearly reached on the QSD. > > No, not yet anyway. If it turns out that datatype is never used in > queries, then I would consider removing it from QSD. On the > other hand, I > think there's a valid argument that providing it in QSD is > both very cheap > and provides some real (even if small) benefit to clients, so > it might be > worth providing as a *hint*. But we need to settle the query > issue first, > since clearly if datatyping *is* ever allowed in a query, > that strengthens > the argument for putting it into QSD considerably. Even if > datatype is not > allowed in the query, it might still go into QSD. Indeed, I can see > incorporating even more metadata, along the lines of the > PropertyDescription object in DMA, or perhaps the Metadata > Repository used > by the Stanford digital library protocol. So please consider > QSD an open > issue for now, but let's settle the issue of whether it > belongs in queries. ALAN BABICH: I don't understand two things: (1) What do you mean "use datatype in the query"? Does this amount to decorating literals? (2) What is the logical connection between datatyping in the query and datatyping in the QSD? I don't see that it being in the query strengthens or weakens the argument for putting it in the QSD. (Having it in the QSD provides information that can be used to create a better user experience, and it allows generic clients to be written, because they can be driven of off metadata, not hardcoded to a specific schema. That's about it, I think. The server already has the metadata information, so it isn't required in the query for the 80% cases. We just might all agree on the stuff within these parentheses.) > > The remainder of your email message lays out a major premise I would > paraphrase as "divergence from past accepted practise is > dangerous". The > minor premise I don't understand so clearly, it is something > like "past > accepted practise is 'query conditions and order by on scalar > values that > are of one of the 5 fundamental datatypes (integer, string, etc.)'". ALAN BABICH: Yes #1. On the beaten path, if there were land mines, somebody before you blew up, and the problem has been solved by now. Once you get off the beaten path, you have to be extremely careful you don't go down a rathole that open up a research area that blows out your schedule. Yes #2. I have been working with DBMS query since before SQL was invented. Back in the early days, we still only queried scalar values of basic data types. The basic datatypes of SQL 92 are still very limited. Let me quote from the spec., section 4.1 "Data types": "SQL defines distinct data types named by the following <key words>s: CHARACTER, CHARACTER VARYING, BIT, BIT VARYING, NUMERIC, DECIMAL, INTEGER, SMALLINT, FLOAT, REAL, DOUBLE PRECISION, DATE, TIME, TIMESTAMP, and INTERVAL." That's an exhaustive list. You don't get to query structures, or arrays, for example, or XML valued properties, or ... The string types are CHARACTER and CHARACTER VARYING. The bit string types are BIT and BIT VARYING. The integer types are NUMERIC, DECIMAL, INTEGER, and SMALLINT with a scale of zero. The non integer types are NUMERIC, DECIMAL, INTEGER, and SMALLINT with a nonzero value of scale, plus FLOAT, REAL, and DOUBLE PRECISION. The datetime datatypes are DATE, TIME, and TIMESTAMP. The INTERVAL time is a datatype for the difference between two datetime datatypes. Since that's all there are, that's all you can query in SQL 92. DASL has simplified things by leaving out the bit string and interval datatypes. > > Does this argue for datatyping in the QSD, or in the query? ALAN BABICH: I'm mostly concerned with the QSD having datatype information plus the other information we talked about, and then hopefully beefing it up later. > If the former, > I think you've stated the case for use in the QSD well > already, so please > don't repeat it. ALAN BABICH: Thank you for saying so. I won't repeat it. > If the latter though, it needs more explanation. ALAN BABICH: OK. > > Of course, all things being equal, trying anything new is > risky. But in > this case, we seem to not have the luxury of just repeating > the old and > familiar. If tried and true SQL (or Z39.50) were good > enough, then DASL > could just be a Web encoding of one of those, and we'd be > done. Like it or > not, WebDAV has already made some design committments, e.g. that > resourcetype is an XML element not PCDATA. We have to > support those. DASL > is, first and formost, a query for the model that WebDAV > exposes, not for > RDBMS. We can't assume the underlying store is tables. ALAN BABICH: Yes, you're right, we have to support the WebDAV design commitments. So, yes, you're right, we do have to do some things that are somewhat different (for example, we don't have resource classes, so we can't do joins). No, you're right, we can't assume the underlying store is always tables. However, we should make some effort to try to accommodate that case, since we believe (or at least I believe) it is an important use case. Accommodating that case reasonably well will help DASL succeed. I don't see any reason why we can't. > > My emails have shown why datatype is not needed in queries > for "live" and > "famous dead" properties, and why it's harmful if adopted for > the "obscure > dead", and why there is scarcely any problem by treating the > dead as mere > strings. If you - or anyone else - have counter arguments > against any of > these, please state them. ALAN BABICH: As I understand them, "obscure dead" properties are off the beaten path. I didn't state it explicitly (sorry), but the scalar properties of one of the 5 fundamental data types are required to draw values from a well defined domain of values in order to be on the beaten path. Domains of values are a subset of a basic datatype. For example, house numbers are a subset of the integers. Consequently, without a plausible use case that shows me they are significant, I tend to think they are not important for 1.0. Given the number of things I'm committed to do, I look for things I do NOT have to drill down into. Obscure dead properties fell into this category, so I haven't thought about them in depth. So now I will. So, how obscure are they? Can you put an integer into one belonging to ordinary resource 1, put an ASCII string into one belonging to ordinary resource 2, and then put in a datetime value to update the property on ordinary resource 1? In other words, if the datatype of such a property is not predefined in a schema for the collection (analogous to having an RDBMS schema for a relational database), what constrains/defines the possible values for the property? Apparently the datatype is determined when you stuff a value into it? Thus, it seems PROPPATCH has to determine the value. So, can it be any value at any time? case 1: Yes, it can be any value any time. Then, you probably want to be able to say "find me only resources having a value for this property that is of datatype D, and compares (>, =, or whatever) to this literal X (also of datatype D)". So now we have datatypes in the query. case 2: No, just try to treat them as strings -- even floating point numbers. A problem with this is that if there was no datatype discipline in the first place, why would there be discipline in the second place? Are people really going to remember and bother to put leading zeroes in front of integers, and expand out floating point numbers to eliminate the exponent and expand it out into a fixed length field just so greater than and equal will work? Wow. You could get some really long floating point literals that way. How does one know, in general, how many leading zeroes (and trailing zeroes) are needed if there is no schema? If you know nothing, you have to expand out to the full range of the exponent in a double precision floating point number. I don't think that approach is reasonable. If they were really strings, just say, "they're strings". Then the numeric and datetime comparisons are irrelevant. (We assume we can deal with string comparisons.) So, then, can we say that the datatype of all obscure dead properties is string? I don't think so, because I suspect that people want fundamental-datatype-based comparisons, not string based comparisons for them. One approach might be to store the datatype along with the property each time a value for the property is stored on a resource. Then, the datatype would be discoverable at run time on a resource instance basis. Then, they might be better characterized as dynamic datatype dead properties. Of course, there are still issues to address, but hopefully the solutions wouldn't be total kludges. Of course, this approach might not be upward compatible with the existing WebDAV spec. (e.g., can PROPPATCH specify the datatype of the value to be stored?) It's very late, and this approach is off the top of my head. It wouldn't surprise me if it didn't fly when we looked at it closer. If it doesn't fly, then I would ask, do we really need to allow obscure dead properties? This is all theoretical to me, because I don't have a believable use case in mind. So I'm very suspicious about any arguments and conclusions I make. I would very much appreciate someone suggested a believable (to me) use case of some significance for DASL 1.0 for obscure dead properties. Of course, maybe I don't quite understand what they are, in which case I would very much appreciate being enlightened. > > best regards > > Jim > > > > > ------------------------------------ > http://www.parc.xerox.com/jdavis/ > 650-812-4301 >
Received on Tuesday, 21 July 1998 00:33:33 UTC