- From: Eric Prud'hommeaux <eric@w3.org>
- Date: Mon, 22 Mar 2004 22:58:40 -0500
- To: Alberto Reggiori <alberto@asemantics.com>
- Cc: RDF Data Access Working Group <public-rdf-dawg@w3.org>
a bit of fairy tale wrapped around the federated query ideal... Joe Lamda (SP? is it like the greek letter?) wonders what entertainers had top 10 movies and films. Scenario 1 Joe Lamda is a geek: Joe lamda knows that his data will come from IMDB and CDDB and writes a query appropriately: QL requirements: local "unification" if different data sources, query targeting ask http://imdb.com/rdf ( ?film :rating ?r {?r <= 10}. ?actron :staredIn ?film ) ask http://cddb.com/rdf ( ?tune :rating ?r {?r <= 10}. ?tune :performer ?actron ) collect (?actron) Apologies for presuming familiartiy with the algae syntax. It had the level of expressivity I needed. Scenario 1 Joe Lamda's query agent is clever: Joe lamda writes the "intuitive" query and the agent handles federation: QL requirements: local "unification" if different data sources ask http://localhost/myAgent ( ?film :rating ?r {?r <= 10}. ?actron :staredIn ?film. ?tune :rating ?r {?r <= 10}. ?tune :performer ?actron ) collect (?actron) Then the agent splits this into the two queries listed above, does the unification, and dutifully reports the results back to Joe Lamda in a nice tabular fashion. On Wed, Mar 17, 2004 at 09:41:56PM +0100, Alberto Reggiori wrote: > > == Use Case Name > > Federated query > > == Intent: Task & Roles > > Actor/User Agent needs to seamlessly query/access/integrate related > chunks/pieces of data coming from a set of decentralized heterogeneous > sources, and get presented an unified view over a the whole > result-set/data-set. > > == Key Benefits / Value > > Most of existing DBMS query systems are mostly centralized or subsumes > some kind of central authority/control (and constraints) over their > whole database architecture. What is needed over the Web is a system > that allows fully federated queries over a bunch of distributed and > heterogeneous sources/services/tables. Each source must be fully > decoupled from the others; and must be able to retain its own workflow, > schema and control/authority over its data. Each source only needs to > be interfaced to the data federation through some kind of "proxy > service" which allows to map its native data format or query-language > to a common query/data format; and map results back and forth as > requested. In other words, with a single query statement, the user can > access and join tables located across multiple data sources without > needing to know the source location. > > == Description > > The Web itself a good example of a federated system, providing dynamic > direct and easy access to several different and heterogeneous > information sources; search engines, image galleries, online travel > agencies, online newspapers, online shops (e.g. Amazon [1]) are > examples. Everybody can easily contribute to the Web by simply writing > a piece of HTML and then publish it to a specific URI location. Links > between similar pages can be easily set up without requiring any kind > of centralized control and requiring few "integrity constraints" but > naming "things" in a specific way; images can be as well be inlined > inside pages by simply pin-pointing to their location URI. Then a > specific Web browser application will take care of aggregating and > assembling the hypertext in a unified view over a bunch of physically > decentralized pages and related images. > > Similarly most of the dynamic data available into DBMS systems is > available on the Web. Unfortunately while doing so most of the > semantics of the original database fields/tables is lost and most of > the DBMS usage benefits are somehow lost too [2]. Generally only a > limited set of search operations is made available to the end user a > part plain free-text search. Web services are trying to overcome this > problem with a more general XML based solution, by providing the user > ad-hoc designed API to go beyond simple HTML human-interpretion. On the > other side, such a technology did not proof to be general and flexible > enough to solve most of the database federation problems yet. And this > approach is suited but limited to a closed/vertical application > domains. > > Differently, RDF provides a more general and powerful framework built > on the Web for the Web - it is expected that people will start to > annotate their pages/services with RDF descriptions allowing a third > part application to transparently query/aggregate Web resources. > > Despite such a large set of solutions available to the user today, what > is needed is a real federated query system which spawn several virtual > database tables/resources/services. > > The query system must provide a user-friendly syntax and a standard > API/protocol to express query statements over one or more distributed > data sources - data sources might be Web pages, XML documents, DBMS, > ad-hoc Web Services or any RDF metadata source. Each source might > interface to the query federation system in many different ways [3-12]. > The query processing engine then has to split up the input query in > several different sub-queries, to be run on each system, apply the > constraints, join the results back and return to the user. Each result > will then have to retain its full provenance/source information to > allow the user to pose more queries in a second time eventually. In the > easiest and most general case the query system will be simply provide a > way to SELECT a certain number of fields/tables. Full DML functionality > will be better tackled in the original sources using existing DBMS > tools. If any of the sub-queries can not be run or fails to join in the > main query, an empty result set is returned to the user. > > == Other > > === Notes > > This use cases subsumes some extensive/systematic query optimization, > caching and other important technical/technological aspects not > considered here. As well as the need to globally uniquely identify/name > concepts/objects/relations and tables to make the model really fully > federated (e.g. definition of a URI/URN scheme and resolution > protocol). In relation to the DAWG work we are only/mostly interested > to the data access/query syntax/protocol more than the > technical/architectural choices which an system designer/implementor > would need to consider/stick-to. > > === Applicability/Scale > > Real-time data, Legacy data/services, External services > > === Related systems/cases > > RDF Access to Relational Databases - > http://www.w3.org/2003/01/21-RDF-RDB-access/ > > == References > > [1] http://www.amazon.com > [2] > http://www.igd.fhg.de/archive/1995_www95/proceedings/papers/54/ > darm.html > [3] http://rdfweb.org/2002/02/java/squish2sql/intro.html > [4] http://www.wiwiss.fu-berlin.de/suhl/bizer/d2rmap/D2Rmap.htm > [5] http://kaon.semanticweb.org/alphaworld/reverse/view > [6] http://www.w3.org/2000/10/swap/dbork/dbview.py > [7] http://www.openlinksw.com/virtuoso/ > [8] http://www.picdiary.com/triplequerying/ > [9] http://iconocla.st/~sderle/squish.pl > [10] > http://www.w3.org/2001/sw/Europe/reports/scalable_rdbms_mapping_report/ > [11] http://www.w3.org/2002/02/21-WSDL-RDF-mapping/ > [12] http://www.w3.org/DesignIssues/RDB-RDF.html -- -eric office: +81.466.49.1170 W3C, Keio Research Institute at SFC, Shonan Fujisawa Campus, Keio University, 5322 Endo, Fujisawa, Kanagawa 252-8520 JAPAN +1.617.258.5741 NE43-344, MIT, Cambridge, MA 02144 USA cell: +1.857.222.5741 (does not work in Asia) (eric@w3.org) Feel free to forward this message to any list for any purpose other than email address distribution.
Received on Monday, 22 March 2004 22:58:40 UTC