- From: Eric Prud'hommeaux <eric@w3.org>
- Date: Mon, 22 Mar 2004 22:58:40 -0500
- To: Alberto Reggiori <alberto@asemantics.com>
- Cc: RDF Data Access Working Group <public-rdf-dawg@w3.org>
a bit of fairy tale wrapped around the federated query ideal...
Joe Lamda (SP? is it like the greek letter?) wonders what entertainers
had top 10 movies and films.
Scenario 1 Joe Lamda is a geek:
Joe lamda knows that his data will come from IMDB and CDDB and
writes a query appropriately:
QL requirements: local "unification" if different data sources, query targeting
ask http://imdb.com/rdf (
?film :rating ?r {?r <= 10}.
?actron :staredIn ?film )
ask http://cddb.com/rdf (
?tune :rating ?r {?r <= 10}.
?tune :performer ?actron )
collect (?actron)
Apologies for presuming familiartiy with the algae syntax. It had the
level of expressivity I needed.
Scenario 1 Joe Lamda's query agent is clever:
Joe lamda writes the "intuitive" query and the agent handles federation:
QL requirements: local "unification" if different data sources
ask http://localhost/myAgent (
?film :rating ?r {?r <= 10}.
?actron :staredIn ?film.
?tune :rating ?r {?r <= 10}.
?tune :performer ?actron )
collect (?actron)
Then the agent splits this into the two queries listed above, does the
unification, and dutifully reports the results back to Joe Lamda in a
nice tabular fashion.
On Wed, Mar 17, 2004 at 09:41:56PM +0100, Alberto Reggiori wrote:
>
> == Use Case Name
>
> Federated query
>
> == Intent: Task & Roles
>
> Actor/User Agent needs to seamlessly query/access/integrate related
> chunks/pieces of data coming from a set of decentralized heterogeneous
> sources, and get presented an unified view over a the whole
> result-set/data-set.
>
> == Key Benefits / Value
>
> Most of existing DBMS query systems are mostly centralized or subsumes
> some kind of central authority/control (and constraints) over their
> whole database architecture. What is needed over the Web is a system
> that allows fully federated queries over a bunch of distributed and
> heterogeneous sources/services/tables. Each source must be fully
> decoupled from the others; and must be able to retain its own workflow,
> schema and control/authority over its data. Each source only needs to
> be interfaced to the data federation through some kind of "proxy
> service" which allows to map its native data format or query-language
> to a common query/data format; and map results back and forth as
> requested. In other words, with a single query statement, the user can
> access and join tables located across multiple data sources without
> needing to know the source location.
>
> == Description
>
> The Web itself a good example of a federated system, providing dynamic
> direct and easy access to several different and heterogeneous
> information sources; search engines, image galleries, online travel
> agencies, online newspapers, online shops (e.g. Amazon [1]) are
> examples. Everybody can easily contribute to the Web by simply writing
> a piece of HTML and then publish it to a specific URI location. Links
> between similar pages can be easily set up without requiring any kind
> of centralized control and requiring few "integrity constraints" but
> naming "things" in a specific way; images can be as well be inlined
> inside pages by simply pin-pointing to their location URI. Then a
> specific Web browser application will take care of aggregating and
> assembling the hypertext in a unified view over a bunch of physically
> decentralized pages and related images.
>
> Similarly most of the dynamic data available into DBMS systems is
> available on the Web. Unfortunately while doing so most of the
> semantics of the original database fields/tables is lost and most of
> the DBMS usage benefits are somehow lost too [2]. Generally only a
> limited set of search operations is made available to the end user a
> part plain free-text search. Web services are trying to overcome this
> problem with a more general XML based solution, by providing the user
> ad-hoc designed API to go beyond simple HTML human-interpretion. On the
> other side, such a technology did not proof to be general and flexible
> enough to solve most of the database federation problems yet. And this
> approach is suited but limited to a closed/vertical application
> domains.
>
> Differently, RDF provides a more general and powerful framework built
> on the Web for the Web - it is expected that people will start to
> annotate their pages/services with RDF descriptions allowing a third
> part application to transparently query/aggregate Web resources.
>
> Despite such a large set of solutions available to the user today, what
> is needed is a real federated query system which spawn several virtual
> database tables/resources/services.
>
> The query system must provide a user-friendly syntax and a standard
> API/protocol to express query statements over one or more distributed
> data sources - data sources might be Web pages, XML documents, DBMS,
> ad-hoc Web Services or any RDF metadata source. Each source might
> interface to the query federation system in many different ways [3-12].
> The query processing engine then has to split up the input query in
> several different sub-queries, to be run on each system, apply the
> constraints, join the results back and return to the user. Each result
> will then have to retain its full provenance/source information to
> allow the user to pose more queries in a second time eventually. In the
> easiest and most general case the query system will be simply provide a
> way to SELECT a certain number of fields/tables. Full DML functionality
> will be better tackled in the original sources using existing DBMS
> tools. If any of the sub-queries can not be run or fails to join in the
> main query, an empty result set is returned to the user.
>
> == Other
>
> === Notes
>
> This use cases subsumes some extensive/systematic query optimization,
> caching and other important technical/technological aspects not
> considered here. As well as the need to globally uniquely identify/name
> concepts/objects/relations and tables to make the model really fully
> federated (e.g. definition of a URI/URN scheme and resolution
> protocol). In relation to the DAWG work we are only/mostly interested
> to the data access/query syntax/protocol more than the
> technical/architectural choices which an system designer/implementor
> would need to consider/stick-to.
>
> === Applicability/Scale
>
> Real-time data, Legacy data/services, External services
>
> === Related systems/cases
>
> RDF Access to Relational Databases -
> http://www.w3.org/2003/01/21-RDF-RDB-access/
>
> == References
>
> [1] http://www.amazon.com
> [2]
> http://www.igd.fhg.de/archive/1995_www95/proceedings/papers/54/
> darm.html
> [3] http://rdfweb.org/2002/02/java/squish2sql/intro.html
> [4] http://www.wiwiss.fu-berlin.de/suhl/bizer/d2rmap/D2Rmap.htm
> [5] http://kaon.semanticweb.org/alphaworld/reverse/view
> [6] http://www.w3.org/2000/10/swap/dbork/dbview.py
> [7] http://www.openlinksw.com/virtuoso/
> [8] http://www.picdiary.com/triplequerying/
> [9] http://iconocla.st/~sderle/squish.pl
> [10]
> http://www.w3.org/2001/sw/Europe/reports/scalable_rdbms_mapping_report/
> [11] http://www.w3.org/2002/02/21-WSDL-RDF-mapping/
> [12] http://www.w3.org/DesignIssues/RDB-RDF.html
--
-eric
office: +81.466.49.1170 W3C, Keio Research Institute at SFC,
Shonan Fujisawa Campus, Keio University,
5322 Endo, Fujisawa, Kanagawa 252-8520
JAPAN
+1.617.258.5741 NE43-344, MIT, Cambridge, MA 02144 USA
cell: +1.857.222.5741 (does not work in Asia)
(eric@w3.org)
Feel free to forward this message to any list for any purpose other than
email address distribution.
Received on Monday, 22 March 2004 22:58:40 UTC