Re: Comparing relationships XML vs. RDBs from Ivan Mikhailov on 2003-09-10 (www-ql@w3.org from July to September 2003)

From: Ivan Mikhailov <iv_an_ru@yahoo.com>
Date: Wed, 10 Sep 2003 10:48:03 -0700 (PDT)
To: brian@blumenfeld-maso.com, www-ql@w3.org
Message-ID: <20030910174803.48912.qmail@web41012.mail.yahoo.com>
--- Brian Maso <brian@blumenfeld-maso.com> wrote:
> I'm having thoughts peripherally related to XQuery. Specifically what types 
> of data patterns can be stored in an XDB vs. an RDB. I thought I'd try to 
> spark a discussion here since this list seems so quiet...

Your wish is granted.

It may be interesting to compare differences in costs of particular atomic operations.

1. Any data can be stored in any of these two formats -- relational or XML.

E.g. it is possible to write an XML that contain a list of all car models plus a list of all types
of oil plus a list of pairs CarID/OilID.
But it looks 'ugly'.
OTOH it is still much more human-readable than pages of binary data of the database with three
tables with the same data.
So why "ugly"?


2. We are not interested in storing data by itself. We should be able to answer questions, no
matter how.

If I need 'log 2 bas 10' I can look up the table or calculate it or just remember that it's 0.301.
The cheapest way is the best.
So we should not care regarding internals, the cost of answering questions is the key issue.
Note that the cost of updating data is not interesting.
I can call my broker once and ask the position of SaTelecom on NYSE yesterday closing or I can
read newspaper every morning. We can treat the cost of updates as a part of answering questions.


3. Definition: data representation is [ugly] if popular data retrieval operations are costly.


4. XML is especially ugly when lack of indexing becomes fatal.

The most expensive operation is a full scan of the data repository.
The typical way to avoid full scans is indexing.
Thus XML document needs indexing like any data repository.
The native XML 'ID' attributes is something that is OK but not enough. XSLT 'key' is better but
still insufficient.
Advanced applications try to divide a source XML into relations and then use a sort of canonical
RDBMS indexing -- tuples as keys, with a lexicographical order or with no order at all.
Maybe we should look for other sort of indexes?


5. An XML may be stored locally in a smart way but it becomes ugly when exported outside.

Obviously, it is not enough to create indexes on the fly when a query is in progress.
Obviously, one may wish to have persistent indexes.
But we have no methods of declaring such indexes in e.g. XMLSchema so an application can export
data but can't export its best practices of indexing these data.


6. Natural restrictions can reduce the complexity of calculations by preventing us from
considering impossible cases.

XMLSchema has no notation for complicated natural restrictions on stored data.
E.g. if I export a three-dimensional map of a railway as an XML data then I can set minimal and
maximal values for 'elevation' attribute but not a restriction like 'the difference of elevations
of any two neighbor points is less than 8 meters per 1000 meters of horizontal distance'.
I have to write e.g. XMLQuery function that checks such a restriction but I have no standard place
for such a restriction in XMLSchema.
In database schema, it's not a problem to write some validating trigger as a part of database
schema declaration.
Natural restrictions are the vital part of data schema for image recognition, geographical and
traffic models, linear programming, dynamic modeling etc. etc.

7. There are other reasons that make XML repositories 'ugly'. But I'll follow an old rule: "If you
can count one to ten then stop at seven".

Best Regards,
IvAn Mikhailov.

__________________________________
Do you Yahoo!?
Yahoo! SiteBuilder - Free, easy-to-use web site design software
http://sitebuilder.yahoo.com
Received on Wednesday, 10 September 2003 13:48:05 UTC