- From: David G. Durand <dgd@cs.bu.edu>
- Date: Mon, 30 Nov 1998 14:50:01 -0400
- To: w3c-dist-auth@w3.org
Now that my illness is receding, I'm diving into this a bit. Mostly this is still on the XML property front. For the record, I think structured searching is pretty essential (it should be in the goals document, for instance). I also think it could be tabled for a 1.1 release, especially given the ongoing work on XML Querying, etc. We should not re-create a wheel here, and even simple searching has value. At 6:00 PM -0400 11/17/98, Babich, Alan wrote: >Jim Amsden wrote: >"... in order for DASL to be useful and effective, it must >support structured searches." >I disagree. It's obvious that that's not true. >For example, neither SQL nor SQL 92 support structured >data, let alone structured searches, and SQL is very widely >used. Years of experience have shown that SQL is both useful >and effective for many applications without the ability to >define or query structures. Similarly, DASL would be very useful >for many queries (including the most common queries) without >the ability to query structures. (Of course, that doesn't >imply that we should not add that ability in the future. I >think we should.) SQL is only useful with a database schema. We've so far not had any such thing, only assertions that servers might have their own (hardwired schemas) and that based on the XML data they see, they are free to transform client property requests to conform to these (non-public) schemas. This sounds like a disaster-in-waiting to me. >"So the same mechanism can be use to search structure properties >since they are XML elements (on the wire) too." >I disagree with the implicit assumption behind this statement. >The assumption is that the serialization format (i.e., XML) >is somehow equivalent to the data model, or somehow forces >the data model to be the same. >As far as searching, it is irrelevant what the serialization >format is. Multiple different protocols could be used to access >the same data source if that were desirable. A binary >serialization format would be much more efficient than XML, >as one example. What matters for query is the data model. I think that this is true, but in a trivial way, because we seem to be ignoring some critically important questions in the current view of things. _If_ we are only using XML as a "serialization format", then we still need a hard and fast definition of the data model that we are using it as a format for. Unless this is in the DASL document, we don't have a data model. We have discussed many situations where the string PROPATCHed and the value GETPROPed may be different. (dates, times, integers, Floats (with differing accuracies and decimal conversion algorithms?), etc.). If we intend to allow such things for servers, we need to be clear about when they can, or cannot, happen. If not, there's no way to know what is actually going to happen when you put a property at a server. I know that some feel that the only properties supported by servers will be hardwired in, but I doubt that that assumption can hold up: First of all, the current protocol is grossly overdesigned if clients aren't to be allowed to set arbitrary properties (that a server may _not_ have special support for). Secondly, we want this to be useful for metadata generally, and that means that it ought to be compatible with the formats in use for meta-data: at this point, very clearly, this will be XML, in a wide variety of metadata domains. Off the top of my head, I can think of at least the TEI (Text Enxcoding Initiative), the EAD (Encoded Archival Description), and Dublin Core -- and I know that there are others. So the encoding format argument seems already wobbly to me. >XML all by itself is not a data model. There's a very simple data model for XML (multilingual strings, with labelled bracketings and attached, unordered attributes. Attributes get some people very excited, but that's fundamentally silly: they can easily be represented as a special case of containment (internally for servers more limited pure hierarchical data models): for instance any attribute name could be represented as a dummy element with a name beginning with "#". Since # is not a legal element name start character this is an easy reversible transformation. >Conventions would have >to be added, starting with data types. This is really only a problem for live properties, where servers already have the needed leeway. I think servers should be required to preserve all information in properties that they cannot interpret via the XML namespace mechanisms.. If the server is going to perform some selection process, then we need to specify what that is, and since that is essentially a modification of XML, we need a good justification as to why it's a good idea. >Then there is the >whole question of metadata, central definition of data >(as opposed to client program definition of data), >administration of database schemas (retrieving and updating >metadata), merging metadata when querying across multiple >data sources, etc., none which are issues that XML or any >other serialization format was intended to address directly. These are all facilities that server might want to offer, and might not. They might make sense as additional layers and they might not. We don't need to solve those (potential) probolems to allow clients to request the attachment of arbitrary properties, and require servers to deliver them back unmodified (within the limits of XML). Servers can still reject such requests if that is their policy; but they should not be allowed to accept such a request and _modify the data_, either by ignoring or changing the information the client provides. [In fact, modification is OK, but only if we define the limits precisely -- I see no reason to limit the model from XML, and think that such an attempt will be confusing and silly. We chose a (good) notation for poperty values (certainly the one that is being used in all current metadata efforts I know about). We should just go ahead and admit that that's what we're doing. >There are pros and cons for every design choice. For example, >XML is a very good format for certain types of text based documents, >and is a very poor format for others (e.g., image documents). >Similarly, XML plus extensions wouldn't be the best property >model. Since XML (potentially plus extensions) is being used in so many metadata efforts, I find the above statement rather odd. What do you know that Dublin Core, EAD, the digital libraries folks, and the TEI don't? -- David _________________________________________ David Durand dgd@cs.bu.edu \ david@dynamicDiagrams.com Boston University Computer Science \ Sr. Analyst http://www.cs.bu.edu/students/grads/dgd/ \ Dynamic Diagrams --------------------------------------------\ http://www.dynamicDiagrams.com/ MAPA: mapping for the WWW \__________________________
Received on Monday, 30 November 1998 14:50:52 UTC