RE: Does DASL need to support structured queries? from David G. Durand on 1998-11-30 (w3c-dist-auth@w3.org from October to December 1998)

From: David G. Durand <dgd@cs.bu.edu>
Date: Mon, 30 Nov 1998 14:50:01 -0400
To: w3c-dist-auth@w3.org
Message-Id: <v0401170ab28892afe1f8@[24.0.249.126]>
Now that my illness is receding, I'm diving into this a bit. Mostly this is
still on the XML property front.

For the record, I think structured searching is pretty essential (it should
be in the goals document, for instance). I also think it could be tabled
for a 1.1 release, especially given the ongoing work on XML Querying, etc.
We should not re-create a wheel here, and even simple searching has value.

At 6:00 PM -0400 11/17/98, Babich, Alan wrote:
>Jim Amsden wrote:
>"... in order for DASL to be useful and effective, it must
>support structured searches."

>I disagree. It's obvious that that's not true.
>For example, neither SQL nor SQL 92 support structured
>data, let alone structured searches, and SQL is very widely
>used. Years of experience have shown that SQL is both useful
>and effective for many applications without the ability to
>define or query structures. Similarly, DASL would be very useful
>for many queries (including the most common queries) without
>the ability to query structures. (Of course, that doesn't
>imply that we should not add that ability in the future. I
>think we should.)

SQL is only useful with a database schema. We've so far not had any such
thing, only assertions that servers might have their own (hardwired
schemas) and that based on the XML data they see, they are free to
transform client property requests to conform to these (non-public)
schemas. This sounds like a disaster-in-waiting to me.
>"So the same mechanism can be use to search structure properties
>since they are XML elements (on the wire) too."

>I disagree with the implicit assumption behind this statement.
>The assumption is that the serialization format (i.e., XML)
>is somehow equivalent to the data model, or somehow forces
>the data model to be the same.
>As far as searching, it is irrelevant what the serialization
>format is. Multiple different protocols could be used to access
>the same data source if that were desirable. A binary
>serialization format would be much more efficient than XML,
>as one example. What matters for query is the data model.

I think that this is true, but in a trivial way, because we seem to be
ignoring some critically important questions in the current view of things.
_If_ we are only using XML as a "serialization format", then we still need
a hard and fast definition of the data model that we are using it as a
format for. Unless this is in the DASL document, we don't have a data
model. We have discussed many situations where the string PROPATCHed and
the value GETPROPed may be different. (dates, times, integers, Floats (with
differing accuracies and decimal conversion algorithms?), etc.). If we
intend to allow such things for servers, we need to be clear about when
they can, or cannot, happen. If not, there's no way to know what is
actually going to happen when you put a property at a server.

I know that some feel that the only properties supported by servers will be
hardwired in, but I doubt that that assumption can hold up:

First of all, the current protocol is grossly overdesigned if clients
aren't to be allowed to set arbitrary properties (that a server may _not_
have special support for).

Secondly, we want this to be useful for metadata generally, and that means
that it ought to be compatible with the formats in use for meta-data: at
this point, very clearly, this will be XML, in a wide variety of metadata
domains. Off the top of my head, I can think of at least the TEI (Text
Enxcoding Initiative), the EAD (Encoded Archival Description), and Dublin
Core -- and I know that there are others.

So the encoding format argument seems already wobbly to me.

>XML all by itself is not a data model.

There's a very simple data model for XML (multilingual strings, with
labelled bracketings and attached, unordered attributes. Attributes get
some people very excited, but that's fundamentally silly: they can easily
be represented as a special case of containment (internally for servers
more limited pure hierarchical data models): for instance any attribute
name could be represented as a dummy element with a name beginning with
"#". Since # is not a legal element name start character this is an easy
reversible transformation.

>Conventions would have
>to be added, starting with data types.

This is really only a problem for live properties, where servers already
have the needed leeway. I think servers should be required to preserve all
information in properties that they cannot interpret via the XML namespace
mechanisms.. If the server is going to perform some selection process, then
we need to specify what that is, and since that is essentially a
modification of XML, we need a good justification as to why it's a good
idea.

>Then there is the
>whole question of metadata, central definition of data
>(as opposed to client program definition of data),
>administration of database schemas (retrieving and updating
>metadata), merging metadata when querying across multiple
>data sources, etc., none which are issues that XML or any
>other serialization format was intended to address directly.

These are all facilities that server might want to offer, and might not.
They might make sense as additional layers and they might not. We don't
need to solve those (potential) probolems to allow clients to request the
attachment of arbitrary properties, and require servers to deliver them
back unmodified (within the limits of XML). Servers can still reject such
requests if that is their policy; but they should not be allowed to accept
such a request and _modify the data_, either by ignoring or changing the
information the client provides. [In fact, modification is OK, but only if
we define the limits precisely -- I see no reason to limit the model from
XML, and think that such an attempt will be confusing and silly.

We chose a (good) notation for poperty values (certainly the one that is
being used in all current metadata efforts I know about). We should just go
ahead and admit that that's what we're doing.

>There are pros and cons for every design choice. For example,
>XML is a very good format for certain types of text based documents,
>and is a very poor format for others (e.g., image documents).
>Similarly, XML plus extensions wouldn't be the best property
>model.

Since XML (potentially plus extensions) is being used in so many metadata
efforts, I find the above statement rather odd.

What do you know that Dublin Core, EAD, the digital libraries folks, and
the TEI don't?

  -- David

_________________________________________
David Durand              dgd@cs.bu.edu  \  david@dynamicDiagrams.com
Boston University Computer Science        \  Sr. Analyst
http://www.cs.bu.edu/students/grads/dgd/   \  Dynamic Diagrams
--------------------------------------------\  http://www.dynamicDiagrams.com/
MAPA: mapping for the WWW                    \__________________________
Received on Monday, 30 November 1998 14:50:52 UTC