Re: Use case: AR-2: "Federated query" from Eric Prud'hommeaux on 2004-03-23 (public-rdf-dawg@w3.org from January to March 2004)

From: Eric Prud'hommeaux <eric@w3.org>
Date: Mon, 22 Mar 2004 22:58:40 -0500
To: Alberto Reggiori <alberto@asemantics.com>
Cc: RDF Data Access Working Group <public-rdf-dawg@w3.org>
Message-ID: <20040323035840.GC26681@w3.org>
a bit of fairy tale wrapped around the federated query ideal...

Joe Lamda (SP? is it like the greek letter?) wonders what entertainers
had top 10 movies and films.

Scenario 1 Joe Lamda is a geek:
  Joe lamda knows that his data will come from IMDB and CDDB and
  writes a query appropriately:
QL requirements: local "unification" if different data sources, query targeting
  ask http://imdb.com/rdf (
    ?film :rating ?r {?r <= 10}.
    ?actron :staredIn ?film )
  ask http://cddb.com/rdf (
    ?tune :rating ?r {?r <= 10}.
    ?tune :performer ?actron )
  collect (?actron)
Apologies for presuming familiartiy with the algae syntax. It had the
level of expressivity I needed.

Scenario 1 Joe Lamda's query agent is clever:
  Joe lamda writes the "intuitive" query and the agent handles federation:
QL requirements: local "unification" if different data sources
  ask http://localhost/myAgent (
    ?film :rating ?r {?r <= 10}.
    ?actron :staredIn ?film.
    ?tune :rating ?r {?r <= 10}.
    ?tune :performer ?actron )
  collect (?actron)

Then the agent splits this into the two queries listed above, does the
unification, and dutifully reports the results back to Joe Lamda in a
nice tabular fashion.

On Wed, Mar 17, 2004 at 09:41:56PM +0100, Alberto Reggiori wrote:
> 
> == Use Case Name
> 
> Federated query
> 
> == Intent: Task & Roles
> 
> Actor/User Agent needs to seamlessly query/access/integrate related  
> chunks/pieces of data coming from a set of decentralized heterogeneous  
> sources, and get presented an unified view over a the whole  
> result-set/data-set.
> 
> == Key Benefits / Value
> 
> Most of existing DBMS query systems are mostly centralized or subsumes  
> some kind of central authority/control (and constraints) over their  
> whole database architecture. What is needed over the Web is a system  
> that allows fully federated queries over a bunch of distributed and  
> heterogeneous sources/services/tables. Each source must be fully  
> decoupled from the others; and must be able to retain its own workflow,  
> schema and control/authority over its data. Each source only needs to  
> be interfaced to the data federation through some kind of "proxy  
> service" which allows to map its native data format or query-language  
> to a common query/data format; and map results back and forth as  
> requested. In other words, with a single query statement, the user can  
> access and join tables located across multiple data sources without  
> needing to know the source location.
> 
> == Description
> 
> The Web itself a good example of a federated system, providing dynamic  
> direct and easy access to several different and heterogeneous  
> information sources; search engines, image galleries, online travel  
> agencies, online newspapers, online shops (e.g. Amazon [1]) are  
> examples. Everybody can easily contribute to the Web by simply writing  
> a piece of HTML and then publish it to a specific URI location. Links  
> between similar pages can be easily set up without requiring any kind  
> of centralized control and requiring few "integrity constraints" but  
> naming "things" in a specific way; images can be as well be inlined  
> inside pages by simply pin-pointing to their location URI. Then a  
> specific Web browser application will take care of aggregating and  
> assembling the hypertext in a unified view over a bunch of physically  
> decentralized pages and related images.
> 
> Similarly most of the dynamic data available into DBMS systems is  
> available on the Web. Unfortunately while doing so most of the  
> semantics of the original database fields/tables is lost and most of  
> the DBMS usage benefits are somehow lost too [2]. Generally only a  
> limited set of search operations is made available to the end user a  
> part plain free-text search. Web services are trying to overcome this  
> problem with a more general XML based solution, by providing the user  
> ad-hoc designed API to go beyond simple HTML human-interpretion. On the  
> other side, such a technology did not proof to be general and flexible  
> enough to solve most of the database federation problems yet. And this  
> approach is suited but limited to a closed/vertical application  
> domains.
> 
> Differently, RDF provides a more general and powerful framework built  
> on the Web for the Web - it is expected that people will start to  
> annotate their pages/services with RDF descriptions allowing a third  
> part application to transparently query/aggregate Web resources.
> 
> Despite such a large set of solutions available to the user today, what  
> is needed is a real federated query system which spawn several virtual  
> database tables/resources/services.
> 
> The query system must provide a user-friendly syntax and a standard  
> API/protocol to express query statements over one or more distributed  
> data sources - data sources might be Web pages, XML documents, DBMS,  
> ad-hoc Web Services or any RDF metadata source. Each source might  
> interface to the query federation system in many different ways [3-12].  
>  The query processing engine then has to split up the input query in  
> several different sub-queries, to be run on each system, apply the  
> constraints, join the results back and return to the user. Each result  
> will then have to retain its full provenance/source information to  
> allow the user to pose more queries in a second time eventually. In the  
> easiest and most general case the query system will be simply provide a  
> way to SELECT a certain number of fields/tables. Full DML functionality  
> will be better tackled in the original sources using existing DBMS  
> tools. If any of the sub-queries can not be run or fails to join in the  
> main query, an empty result set is returned to the user.
> 
> == Other
> 
> === Notes
> 
> This use cases subsumes some extensive/systematic query optimization,  
> caching and other important technical/technological aspects not  
> considered here. As well as the need to globally uniquely identify/name  
> concepts/objects/relations and tables to make the model really fully  
> federated (e.g. definition of a URI/URN scheme and resolution  
> protocol). In relation to the DAWG work we are only/mostly interested  
> to the data access/query syntax/protocol more than the  
> technical/architectural choices which an system designer/implementor  
> would need to consider/stick-to.
> 
> === Applicability/Scale
> 
> Real-time data, Legacy data/services, External services
> 
> === Related systems/cases
> 
> RDF Access to Relational Databases -  
> http://www.w3.org/2003/01/21-RDF-RDB-access/
> 
> == References
> 
> [1] http://www.amazon.com
> [2]  
> http://www.igd.fhg.de/archive/1995_www95/proceedings/papers/54/ 
> darm.html
> [3] http://rdfweb.org/2002/02/java/squish2sql/intro.html
> [4] http://www.wiwiss.fu-berlin.de/suhl/bizer/d2rmap/D2Rmap.htm
> [5] http://kaon.semanticweb.org/alphaworld/reverse/view
> [6] http://www.w3.org/2000/10/swap/dbork/dbview.py
> [7] http://www.openlinksw.com/virtuoso/
> [8] http://www.picdiary.com/triplequerying/
> [9] http://iconocla.st/~sderle/squish.pl
> [10]  
> http://www.w3.org/2001/sw/Europe/reports/scalable_rdbms_mapping_report/
> [11] http://www.w3.org/2002/02/21-WSDL-RDF-mapping/
> [12] http://www.w3.org/DesignIssues/RDB-RDF.html

-- 
-eric

office: +81.466.49.1170 W3C, Keio Research Institute at SFC,
                        Shonan Fujisawa Campus, Keio University,
                        5322 Endo, Fujisawa, Kanagawa 252-8520
                        JAPAN
        +1.617.258.5741 NE43-344, MIT, Cambridge, MA 02144 USA
cell:   +1.857.222.5741 (does not work in Asia)

(eric@w3.org)
Feel free to forward this message to any list for any purpose other than
email address distribution.
Received on Monday, 22 March 2004 22:58:40 UTC