Re: SQL to XQuery translation

Hi all,

FWIW, I work with Michael Rys at Microsoft, where among other things I
implemented the component (designed by Michael) in SQL Server 2000 that
translates a subset of XPath to SQL.  I've also written several XQuery
prototypes in the past, including two that translate it into SQL.

Microsoft has made widely available an implementation that translates XQuery
to SQL (to attendees of last year's PDC and through an ongoing beta
program), and presumably will eventually release it.  Most of the other
major software vendors are doing similar work (they're not on the W3C
committee for nothing) and you should expect to see a lot of it become
available over the next couple years as XQuery finalizes.

For the same reasons as Jim and Michael, I'm unable to provide details of
our implementations.  Unfortunately, most (but not all!) of the interesting
work in this area is commercial and covered by a mix of trade secrets and
patents.  (Depending on your situation, you might be able to do a patent
search...)

That said, there are some public documents that you may find helpful:

 - There are many published research papers about this subject.  Some of the
best ones are by Mary Fernandez and Dan Suciu, such as the ones on RXL.
There are also some VLDB and other papers in recent years, of varying
quality.  Google and Citeseer are your friends.

 - I wrote a couple of chapters in the book Professional XML Databases that
address translating XPath into SQL. It's now out of print and the publisher,
Wrox, is defunct, but you can still find it in some stores and secondhand
markets.  I also presented a talk at XMLDevCon Fall, 2000 on this subject,
and the Q&A session was lively.  (Many other people from this list also
presented there on similar topics.)  I think the company that ran that
conference has also gone belly-up, but maybe you can find an archive of the
presentation somewhere.

- You can always buy the products and examine them.  For SQL Server, just
run SQL Profiler to see the SQL queries generated for XPath queries.  (Of
course, the software's covered by patents and licenses and such, and it's
four-year-old technology.)


As for Michael's comment about the difficulty of relational semantics over
XML data, I won't presume to put words in his mouth, but it's clear that
there are major differences between the SQL and XML type systems and data
models.  User expectations also vary widely: Both domains commonly use the
same syntactic operators, such as +, but with different behaviors.  When
translating from one to the other, do you choose to preserve strict
semantics or user expectations?  Or provide both (using pragmas, extension
functions for all the operators from the other domain, or the like)?

Even simple XPath predicates, like string(b) = 'abcdef' can't be directly
transliterated into SQL. [and notice there's a significant difference
between string(b)='abcdef' and just b='abcdef']  Jim's example translation
below omits several important details, like NULL-handling and collation.  If
you map SQL NULL to XML absence, then you'd have to account for the
difference during translation (SQL would return NULL for the cast, but XPath
would return the empty string).  In my experience, databases are commonly
installed with collations that are case-insensitive, drop trailing spaces,
and other oddities -- but XML is case-sensitive and has its own whitespace
normalization rules.

And that's really just the tip of the iceberg. :-)



-- 
Michael Brundage
xquery@comcast.net

Writing as
Author, XQuery: The XML Query Language (Addison-Wesley, 2004)
Co-author, Professional XML Databases (Wrox Press, 2000)

not as
Technical Lead
Common Query Runtime/XML Query Processing
WebData XML Team
Microsoft



On 1/15/04 12:53 PM, "Jim Melton" <jim.melton@acm.org> wrote:
> David and Murali,
> 
> You've opened the door for some interesting conversations!
> 
> At 13:13 2004-01-15 Thursday, Michael Rys wrote:
> 
>> Some reasons why there is not much done in this area:
>> 
>> 1. SQL is not powerful enough alone to query true XML data where mixed
>> content, order, changing structure really matters.
> 
> I agree with Michael (even though I would have said "traditional SQL" or
> used some other clarifying adjective).
> 
>> You would have to
>> extend SQL fundamentally to allow this. There is some work going on at
>> the relational database companies and the ANSI/ISO standards level.
> 
> This effort is called "SQL/XML" and its first efforts just reached fruition
> on 9 December, 2003 with the publication of SQL:2003, particularly part 14,
> SQL/XML (ISO/IEC 9075-14:2003, for those of you who like such formalities).
> 
>> However, the focus there is mainly to leverage XPath and XQuery in the
>> context of SQL expressions. So mapping SQL to XQuery would also mean
>> that you define  relational semantics over the XML data, which you
>> cannot easily do for general XML documents (see the discussions of
>> edge/node-tables to represent XML relationally for some of the
>> complexity).
> 
> Again, I agree with Michael, although I am somewhat less pessimistic since
> I work for a company that has actually delivered products that accomplish a
> lot of this --- using proprietary technologies, I emphasize, and not
> standardized facilities.  So, my summary here is that it can be done, but
> that there is nothing already standardized or currently in the
> standardization pipeline to do it.
> 
>> 2. In order to query relational data marked up in XML, most commercial
>> systems provide a mapping level into XML (annotated schemata in SQL
>> Server 2000, XSD-based OR mapping in Oracle 9i, DADs in IBM DB2) and
>> then just let you use SQL to query it. So no need to map it to XQuery
>> since the primary implementation architecture is SQL.
> 
> Absolutely correct.
> 
>> If it would have been easy, I guess we would not have invested into
>> XQuery in the first place but just defined a relational semantics on XML
>> and be done with it...
> 
> Wouldn't that have been nice ;^)
> 
> My way of expressing the overall problem is that SQL was designed to manage
> data that is, by definition, fully described by "structural metadata" (the
> SQL schemas, table definitions, etc.) and is completely regular (no
> "missing" data elements, even though data can be missing and represented by
> null values).  By contrast, XML is used to represent semi-structured data
> in which entire components can be unrepresented in any way, and there is no
> inherent requirement that any descriptive (structural) metadata be
> available to describe the XML.
> 
> As a consequence, unextended SQL is simply the wrong tool...it's the hammer
> that shouldn't be used to tighten the bolt.  Similarly, XQuery is excellent
> for querying arbitrary XML documents, but cannot really "get at" SQL data
> unless that SQL data is somehow "published" into an XML form...it's the
> crescent wrench that shouldn't be used to drive that nail.
> 
> My personal favorite solution is, of course, to combine the two tools in a
> way that allows me to query both types of data from whichever viewpoint I
> need for a given application.  And that is the ultimate goal of SQL/XML --
> allowing the use of SQL and XML together.
> 
> Now, as it happens, I do know of at least one XQuery implementation that
> translates XQuery expressions into SQL (well, technically, it's not into
> actual SQL language as character strings, but into a parsed representation
> of SQL).  The translation seems to be pretty good overall, but the job of
> building that translation software was challenging ;^)  I have no idea
> whether that particular XQuery implementation will ever be commercialized,
> much less industrial strength; it was apparently done (I don't have any
> inside information...sorry) as a proof of concept and not as a prototype of
> a product.
> 
> At 13:26 2004-01-15 Thursday, Murali Mani wrote:
> 
> 
>> Michael,
>> 
>> Some questions regarding Query Language translations propped up recently.
>> This was one that we could not answer..
>> 
>> When you do XQuery -> SQL, how do you translate predicates?
>> 
>> a predicate in XPath such as
>> 
>> path [predicate]/ ...
>> 
>> we need existential semantics over a potential result set.. does this get
>> translated to a subquery in SQL???
> 
> I don't have the cycles to discuss this in detail, as it could be a long
> discussion.  However, predicates in XQuery/XPath do have reasonable
> correspondences to predicates in SQL.  If your "[predicate]" is something
> simple, such as "[string(b)='abcdef']", then that would have a directly
> corresponding analog in SQL, like "WHERE CAST (b AS CHARACTER VARYING(100))
> = 'abcdef'".  More complex XPath predicates would have more complex
> "translations" to SQL, but there are not many such XPath predicates that
> don't have pretty reasonably corresponding SQL predicates.
> 
> 
>> best, murali.
>> 
>> -------------------------
>> 
>> Note: for SQL -> XQuery might be easier (if we consider simple queries)..
>> 
>> for example,
>> SELECT <Arr1>
>> FROM <Arr2>
>> WHERE <Arr3>
>> 
>> can be translated to (the way we teach SQL semantics in DB I ..)
>> 
>> for <Arr2'>
>> where <Arr3'>
>> return <Arr1'>
>> 
>> i think it is not difficult to translate these Arr2 -> Arr2' etc..?? Are
>> there things that I am overlooking??
> 
> In simple cases, it is certainly not difficult.  Nested for loops in XQuery
> correspond to joins (or, equivalently, creative use of subqueries) in
> SQL.  XQuery's where clause corresponds to SQL's WHERE clause.  Many
> correspondences are obvious (or nearly so...there are often some
> complications), while others might be quite elusive.  It's inappropriate to
> expect a single generalized answer to this.
> 
> Hope this helps,
>   Jim
> 
> ========================================================================
> Jim Melton --- Editor of ISO/IEC 9075-* (SQL)     Phone: +1.801.942.0144
> Oracle Corporation            Oracle Email: mailto:jim.melton@oracle.com
> 1930 Viscounti Drive          Standards email: mailto:jim.melton@acm.org
> Sandy, UT 84093-1063              Personal email: mailto:jim@melton.name
> USA                                                Fax : +1.801.942.3345
> ========================================================================
> =  Facts are facts.  However, any opinions expressed are the opinions  =
> =  only of myself and may or may not reflect the opinions of anybody   =
> =  else with whom I may or may not have discussed the issues at hand.  =
> ========================================================================
> 

Received on Friday, 16 January 2004 01:20:40 UTC