Re: [www-ql] FLWR->Algebra->FLWR from Michael Kifer on 2001-03-02 (www-ql@w3.org from January to March 2001)

From: Michael Kifer <kifer@cs.sunysb.edu>
Date: Fri, 02 Mar 2001 14:13:41 -0500
To: www-ql@w3.org
Message-Id: <200103021913.OAA04541@sbcs.cs.sunysb.edu>
Ingo> I understand there currently can not be a literal roundtrip
Ingo> FLWR->Algebra->FLWR. But unless this becomes feasible, FLWR-XQuery must
Ingo> not be considered orthogonal enough. This may require some heuristics to
Ingo> disambiguate the rewrite to FLWR syntax.

Jonathan> Can you unpack this a little - what problems might arise if more
Jonathan>  than one XQuery expression can map onto the same XML Query
Jonathan> Algebra expression? 

Let's not loose perspective here. The query language and the algebra are
independent things, and in the classical architecture the purpose of the
the algebra is to implement the query language.

Let us look at the following classical textbook issues (I am deliberately
avoiding specific names like FLWR or XML Algebra):

1. The mapping Query lang -> Algebra

   a. Does it exist?

      If it doesn't, this means the algebra is not complete with respect to
      the query language. Might indicate that the algebra is inadequate or
      that the query language is "too powerful".

   b. Is it unique?

      As I explained in a previous message (and which is a textbook truth
      that reflects the past 30 years), such a mapping is not unique 
      AND SHOULD NOT BE.

      Even in the relatively simple (compared to XML) relational domain it
      is not unique. Think "query optimization".

      The only thing we can hope for is to come up with a simple and dumb
      translation that shows that the query languge is implementable in the
      given algebra.

   c. Is it one-to-one? (leaving out "onto" for now)
   
      Why does it matter?
      SQL->algebra is not 1-1 (even for the common dumb translation).
      What kind of disaster is going to happen if two equivalent queries
      that differ syntactically in some way are mapped to the same
      algebraic expression?
      In fact, practical query languages typically have all kinds of
      syntactic sugar, which is discarded in translation. Should these be
      prohibited?

2. The mapping Algebra -> Query language.

   a. Does it exist?

      This is, again, the issue of completeness. Does it matter?
      This is really a theoretical question about the class of
      abstract queries defined as (Algebra) minus (query lang).
      Do we need to capture these queries in a declarative language?
      Can they be captured without sacrificing the ability to optimize the
      query language?
      Maybe the language is inadequate or maybe the algebra is too powerful.
      Maybe the difference should be captured using something like the
      language of PSM (persistent stored modules) -- a procedural
      (non-algebraic) programming language that is tightly integrated with
      the query language.

      All these issues are worth considering, but the absense of such a
      mapping doesn't automatically disqualify the algebra or the query
      language.

   b. Is it unique?

      Why does it matter? It doesn't matter in the least!
      As long as the different mapping are semantically sound this
      shouldn't be an issue.

   c. Is it 1-1?

      Doesn't matter as long as f(a)=f(b) implies that "a" and "b" are
      equivalent algebraic expressions.

3. Are the mappings Query lang-> Algebra and Algebra -> Query lang
   inverse functions.

   Again, why does it matter?
   In view of 1(b) and 2(b), this is impossible unless you declare
   certain dumb mappings to be canonical. However, I think it should be
   clear by now that searching for canonical mappings that are inverses of
   each other is a total waste of time. 

   Even in the formal theory the mappings

       rel calculus->rel algebra and rel algebra->rel calculus

   that one typically finds in textbooks are not inverses of each other. I
   believe that nobody seriously entertained this thought in case of SQL. I
   think (3) is a moot question.



	--michael
Received on Friday, 2 March 2001 14:14:19 UTC