- From: Alberto Reggiori <alberto@asemantics.com>
- Date: Wed, 17 Mar 2004 21:41:56 +0100
- To: RDF Data Access Working Group <public-rdf-dawg@w3.org>
== Use Case Name Federated query == Intent: Task & Roles Actor/User Agent needs to seamlessly query/access/integrate related chunks/pieces of data coming from a set of decentralized heterogeneous sources, and get presented an unified view over a the whole result-set/data-set. == Key Benefits / Value Most of existing DBMS query systems are mostly centralized or subsumes some kind of central authority/control (and constraints) over their whole database architecture. What is needed over the Web is a system that allows fully federated queries over a bunch of distributed and heterogeneous sources/services/tables. Each source must be fully decoupled from the others; and must be able to retain its own workflow, schema and control/authority over its data. Each source only needs to be interfaced to the data federation through some kind of "proxy service" which allows to map its native data format or query-language to a common query/data format; and map results back and forth as requested. In other words, with a single query statement, the user can access and join tables located across multiple data sources without needing to know the source location. == Description The Web itself a good example of a federated system, providing dynamic direct and easy access to several different and heterogeneous information sources; search engines, image galleries, online travel agencies, online newspapers, online shops (e.g. Amazon [1]) are examples. Everybody can easily contribute to the Web by simply writing a piece of HTML and then publish it to a specific URI location. Links between similar pages can be easily set up without requiring any kind of centralized control and requiring few "integrity constraints" but naming "things" in a specific way; images can be as well be inlined inside pages by simply pin-pointing to their location URI. Then a specific Web browser application will take care of aggregating and assembling the hypertext in a unified view over a bunch of physically decentralized pages and related images. Similarly most of the dynamic data available into DBMS systems is available on the Web. Unfortunately while doing so most of the semantics of the original database fields/tables is lost and most of the DBMS usage benefits are somehow lost too [2]. Generally only a limited set of search operations is made available to the end user a part plain free-text search. Web services are trying to overcome this problem with a more general XML based solution, by providing the user ad-hoc designed API to go beyond simple HTML human-interpretion. On the other side, such a technology did not proof to be general and flexible enough to solve most of the database federation problems yet. And this approach is suited but limited to a closed/vertical application domains. Differently, RDF provides a more general and powerful framework built on the Web for the Web - it is expected that people will start to annotate their pages/services with RDF descriptions allowing a third part application to transparently query/aggregate Web resources. Despite such a large set of solutions available to the user today, what is needed is a real federated query system which spawn several virtual database tables/resources/services. The query system must provide a user-friendly syntax and a standard API/protocol to express query statements over one or more distributed data sources - data sources might be Web pages, XML documents, DBMS, ad-hoc Web Services or any RDF metadata source. Each source might interface to the query federation system in many different ways [3-12]. The query processing engine then has to split up the input query in several different sub-queries, to be run on each system, apply the constraints, join the results back and return to the user. Each result will then have to retain its full provenance/source information to allow the user to pose more queries in a second time eventually. In the easiest and most general case the query system will be simply provide a way to SELECT a certain number of fields/tables. Full DML functionality will be better tackled in the original sources using existing DBMS tools. If any of the sub-queries can not be run or fails to join in the main query, an empty result set is returned to the user. == Other === Notes This use cases subsumes some extensive/systematic query optimization, caching and other important technical/technological aspects not considered here. As well as the need to globally uniquely identify/name concepts/objects/relations and tables to make the model really fully federated (e.g. definition of a URI/URN scheme and resolution protocol). In relation to the DAWG work we are only/mostly interested to the data access/query syntax/protocol more than the technical/architectural choices which an system designer/implementor would need to consider/stick-to. === Applicability/Scale Real-time data, Legacy data/services, External services === Related systems/cases RDF Access to Relational Databases - http://www.w3.org/2003/01/21-RDF-RDB-access/ == References [1] http://www.amazon.com [2] http://www.igd.fhg.de/archive/1995_www95/proceedings/papers/54/ darm.html [3] http://rdfweb.org/2002/02/java/squish2sql/intro.html [4] http://www.wiwiss.fu-berlin.de/suhl/bizer/d2rmap/D2Rmap.htm [5] http://kaon.semanticweb.org/alphaworld/reverse/view [6] http://www.w3.org/2000/10/swap/dbork/dbview.py [7] http://www.openlinksw.com/virtuoso/ [8] http://www.picdiary.com/triplequerying/ [9] http://iconocla.st/~sderle/squish.pl [10] http://www.w3.org/2001/sw/Europe/reports/scalable_rdbms_mapping_report/ [11] http://www.w3.org/2002/02/21-WSDL-RDF-mapping/ [12] http://www.w3.org/DesignIssues/RDB-RDF.html
Received on Wednesday, 17 March 2004 15:42:02 UTC