Data Access Working Group User Cases: WORKING DRAFT

Document Status
Introduction
Motivation
Problem Description
Use Cases
Candidate Technical Requirements
Related Technologies and Standards

1. Document Status

This document is an informal artifact of the W3 Data Access Working Group and, as such, has no formal status. The most recent version of this document is available at http://www.w3.org/2001/sw/DataAccess/UseCases. The current working version is marked:

$Id: UseCases.html,v 1.24 2004/04/16 21:44:17 kclark Exp $.

2. Introduction

DAWG members are using this document to structure and organize discussion about use cases related to RDF query language and data access standardization efforts.

3. Motivation

The Semantic Web effort is mature enough that the existing implementations of RDF data storage servers require a standardized query language and data access protocol in order to achieve widespread data interoperability. A standard RDF query language might coalesce the technology intended for querying RDF data in much the same way that SQL did for RDBMS data. A standard way to access remote RDF storage servers might accomplish for the Semantic Web and data interoperability much of what HTTP did for the Web itself.

4. Problem Description

4.1. Query

Because there are no formal standards in these areas, developers in industry and in open source projects have created many query languages for RDF data. These languages lack both a common syntax and a common semantics. In fact, the extant query languages implement a significant semantic range: from declarative, SQL-like languages, to path languages, to rule or production-like systems. The existing languages also exhibit a range of extensibility features and builtin capabilities, including inferencing, distributed query, and domain-specific semantics.

4.2. Data Access

There are as many different methods of accessing remote RDF storage servers as there are distinct RDF storage server projects. Even where the basic access protocol is a standard—HTTP, SOAP, or XML-RPC—there isn't much ground upon which generic client support to access a wide variety of such servers might be developed.

5. Use Cases

Use cases are used to determine and publicize the scope of the working group's technical work. Each use case describes a concrete application of the future DAWG recommendation, setting a user-oriented context in which the query language or protocol or both are used to solve a real problem. In this way the Working Group describes the principle benefits of DAWG, while at the same time creating a map of the problem space.

5.1. Personal Information Management

5.1.1. Finding an email address

Description

George wants to send an email message to John Smith. His personal address book, which includes John Smith's contact information, is stored in RDF using the FOAF vocabulary Specification. George's email client queries his local address book service and, if there is only one match, sets the query result as the value of tthe "To:" field; otherwise it prompts George to choose the best match.

Benefits

Efficiency. First, it's more efficient for the programmer who develops George's email client to use a query language than to write custom code against a low-level RDF storage interface. Second, execution of the query may be more efficient because query language implementations are often able to achieve more aggressive optimizations.
Interoperability. Applications that use a standard query langauge and data access protocol can submit queries to a local address book service or to a remote RDF-aware directory service with no change other than pointing to a different directory service resource.

5.1.2. Regularly executing a query

Description

Marshall needs to update some personal financial information every day; he programs an off-the-shelf web agent program to execute a query every morning before he gets to work and every evening before he goes to bed. Marshall uses a wizard to formulate the query, which the web agent constructs as an HTTP URI. In order to fulfill Marshall's information gathering requirement, his web agent simply resolves the query URI.

Benefits

Preserving Investment. Expressing DAWG queries as HTTP URIs, and returning the results of the query as the representation retrieved by dereferencing the URI, preserves the existing investment in web infrastructure.

5.1.3. Monitoring news events

Description

Kate wants to be notified whenever there is a news item about her favorite television show. She is accustomed to visiting web sites every day to search for and read about news items that match her interests.

...

5.2. Web Publishing

5.2.1. Saying things about web resources

Description

Frannie and Zoe live in different countries and keep in daily contact via IRC. Zoe wrote an IRC bot that they use to keep track of things they say about web pages. Frannie wants to be able to republish some of the things she says in IRC on her weblog. So Zoe tells her about a server that accepts and agrees to host documents that describe what they say about web pages, and their IRC robot sends those documents periodically to the server.

Frannie programs her weblog software to query the server that hosts their annotation documents. The server returns all the assertions Frannie and Zoe have made about webs page that Frannie writes about in her weblog; Frannie's weblog software then publishes the things they've said as comments.

Benefits

Interoperability. Using RDF, the DAWG query language and data access protocol, Frannie and Zoe are able to build several different kinds of software systems by passing RDF documents as messages.
Content Reuse

5.2.2. Discovering what people say about news stories

Description

Abelard, an independent publisher of web publications, often needs to query an arbitrary list of RDF storage servers for assertions about a set of URIs he cares about; the URIs identify Abelard's web publications. The RDF storage servers are RSS feed aggregators. Abelard wants to use RDF to keep track of the things people say in weblogs about his publications.

Abelard's client software includes support for three different query languages. Abelard's client software connects to each RDF storage server and determines whether it supports one of the three query languages it knows about. Abelard's client software chooses, based on priorities set by Abelard, to send different queries to different servers.

Heloise, an aggregator of RSS feeds, publishes RDF (extracted from RSS feeds) on the Web using an RDF storage server. Heloise's server supports several RDF query languages.

Heloise's server publishes its supported query language available in a machine readable form. It negotiates with clients in order to choose the most appropriate query language that they have in common.

Benefits

Automated Resource Annotation Discovery. Abelard can use software to automate the process of tracking the things people say on the Web about his publications.
Preserve existing human investment. Abelard can formulate queries in a variety of query languages, which preserves his existing investment. Abelard's software vendor and Heloise's software vendor can develop and sell extensible, relatively generic systems. In addition to client-server coordination about query languages, they can also negotiation other provisioning or service capabilities, including RDF serialization formats, query context support, query inferencing support, access control models, etc.
Frictionless Information Exchange. Abelard and Heloise are able to exchange third party data in an interoperable fashion without requiring out-of-band, human negotiation about capabilities.

5.3. Financial Services

5.3.1. Tracking accounts and customers

Description

Bartleby manages an accounting firm that has aggressively adopted Web and Semantic Web technology. The firm stores information about its customers, other companies, and competing accounting firms in an RDF storage server; and it relates these entities via predicates acct:accountsFor and acct:hasCustomer. Bartleby wants to retrieve the names of all the firms which either maintain accounts for military suppliers or maintain accounts for the military itself.

Benefits

Real world queries. The ability to use disjunction or union is key for real world problems. Requiring users to join their own results guarantees that a DAWG-QL will be confined to programmatic interfaces as parts of large systems.

5.4. Urban Planning

5.4.1. Exploring my neighborhood

Description

Jose learns that the U.S. Census Bureau provides some very interesting geographic data in its public domain Tiger database. Jose moves to a new home in the Thomas Circle neighborhood of Washington, DC. Jose wants to find out the latitude, longitude, name, and type of everything within 50 miles of his new home.

Rather than downloading all the Tiger database files, unziping them, reading the docs, writing some software, and so on, Jose sends a DAWG-QL query to the Census Bureau's new RDF storage server and requests that the results be passed to an XSLT transformation service so that he can print the resulting XHTML.

Benefits

...

5.5. Intelligence

5.5.1. Finding unknown human persons

Description

Smiley works for a governmental intelligence agency. As part of his job as an analyst of raw human intelligence, he needs to be notified whenever the knowledge base contains information about people matching various properties: last known location, often visited web sites, and political associations.

Smiley uses his web browser to setup a regular query over several knowledge bases by filling out a web form. Whenever there are new matches for Smiley's query in the knowledge base, Smiley receives an email with URIs to resources about the new matches; and Smiley's personal RSS feed is also updated with the new matches, since he uses an RSS aggregator to gather news every day.

Since Smiley's query will operate over knowledge bases structured by several different ontologies, Karla, the staff programmer for Smiley's group, builds Smiley's query to look for rdfs:subPropertyOf foaf:Person (expecting to find properties like terror:RegisteredForeignAgent, terror:TerroristSuspect, and humint:UnidentifiedPerson). Smiley's staff programmer uses the DAWG-QL and the foaf:Person predicate, as well as several others, to formulate Smiley's query.

Benefits

Integration. Since the system that Smiley and Karla have access to sits in front of a constantly evolving, heterogenous collection of knowledge bases, they don't want to have to update Smiley's query each time a new KB is available. They rely on DAWG-QL's support for rdfs:subPropertyOf to find knowledge rooted at foaf:Person, which government agencies have agreed to use as a common parent property to represent natural persons.

5.6. Supply Chain Support

5.6.1. Finding information about motorcycle parts

Description

Endeavour, a Triumph Motorcycle dealer, maintains a database that describes spare and replacement parts, including their properties and relationships, needed to repair its motorcycles. Ev, a Triumph repair person, is working on a motorcycle and a diagnostic tool produces a report that identifies a faulty part.

Ev goes to a query interface to the vendor's parts database and asks "tell me about this part". In response, Ev recieves a humanly-readable description of the part, which provides sufficient information to determine how to obtain the part and whether any other dependent parts must also be replaced at the same time.

Benefits

Manage complex class-property relations.

5.7. Software Development

5.7.1. Finding input and output documents for test cases

Description

Nada, a Semantic Web developer, has had a bug report from a valued user that indicates that a software tool is failing to correctly the N3 representation of some of the RDF core test cases correctly. Nada wants to create a list of input and output documents for each of the approved test cases from the RDF core test suite. The list of tests resides in a single file.

Benefits

The value is the systematic processing of the RDF core manifest file with a result which is one line per input/output pair so that a script can easily be written to create the next stage - reading the input document, writing it and checking it. Writing a query, feeding it to a query processor is much quicker than writing a custom program to do the same.

5.7.2. Describing software configurations

Description

Grace, an open souce developer, is developing a new email client, and the system uses a lot of configuration settings and data, some of which may be relevent to any email client and some of which is specific to this particular one. Grace would like to record all configuration data as RDF. She understands the basic RDF data model and knows precisely the structure of the information she's interested in: local-username email:hasAccount account, account email:hasServer server, and server email:usesProtocol protocol. But she has no expertise in programming graph algorithms or manipulating RDF programmatically. She would like to retrieve those aspects of the configuration files in which she is interested, i.e. the server and its protocol for a particular account or username.

Benefits

Demonstrates the value of programmatic access to local RDF repositories.

5.8. Transportation

5.8.1. Avoiding traffic jams

Description

Niel wants to drive, during heavy rush hour traffice in Atlanta, GA, from his home to his office. His new car has both Bluetooth and wireless internet access. His car makes three queries to public RDF storage servers on the Web: the first for a up-to-date description Atlanta road conditions and construction projects; the second for an updated description of traffic jams; the third for an updated description of Atlanta roads suffering inclement weather.

Based on this information, Niel's car suggests a different route to work, cutting his commute time by 12%.

Benefits

...

5.8.2. Finding the cheapest flight from Boston to Chicago

Description

...

Benefits

5.9. Health Care

5.9.1. Ordering an x-ray

Description

Amy, an oncologist, enters an order for a chest x-ray. She works in a large, multi-campus hospital with multiple radiology departments. The hospital complex uses RDF to describe the properties of its departments and the relations between them. For example,

Campus A is a children's hospital
Radiology department B is part of hospital A
Radiology department C specializes in examinations of type D
Urgent requests should be handled on the same campus, given specialization constraints
Requesting department E has as its first collaboration choice department F

Amy doesn't know or care about all of these relations or rules. She only wants to place an order and then learn where it will be executed.

Benefits

Issues

This use case shows the need to add constraints in the query and shows the need for not only querying for information but also for a resolution of a problem: given constraints, imposing a sort order on some criteria. The difficulty is that that 'criteria' is not necessarily a data element in the rdf document but implied in the rules within the document.

5.10. Human Resources

5.10.1. Find employees by group

Description

A company classifies employees into three groups: management, support, and engineering. Each employee is assigned to exactly one of these groups. There exists an RDF store which encodes information about employees. This information also includes the office in which the employee works. For example, #David hasGroup #Engineering and #David worksInOffice #Carlsbad.

The information is not complete; e.g. there may be employees whose group is not explicitly stated in the RDF store. The company also builds an OWL ontology to supplement their RDF data with semantic information. Among other things, this ontology contains the assertion that certain corporate locations contain no engineers, only management and support. A user wishes to query the RDF store to find all employees who are in either the management or support groups (and print out their names).

While inferencing and OWL may be beyond the scope of this working group, this use case demonstrates the continuity from RDF queries to OWL queries. The user's question can be answered fairly well by a simple RDF store (with no OWL), but precisely the same query (in terms of the user's desire for information) can retrieve even better information if OWL data is available.

Note that this use case is *not* subsumed by any other we are considering. The use of disjunction makes the OWL ontology incompatible with the naive "inferred triples" model.

6. Candidate Technical Requirements

DAWG use cases are a pool from which to extract technical requirements. These requirements frame the technical scope of the working group's activities, including the delivery of a strawman query language at the end of the first phase of the working group. The working group will use the strawman query language as the starting point for design work in its second phase.

6.1. General Requirements

6.2. Query Language Requirements

Queries with optional triples (Related discussion: 1 and 2.) [CR-01]
Disjunction [CR-02]
Queries with paths of two or more edges [CR-03]
Query results as graph entailment or treating the graph as a fixed object. [CR-04]
Queries expressing arbitrary RDF data types. [CR-05].
Queries expressible as URLs [CR-06]
Query results in user-selectable Internet Media Types [CR-07].
Query results in RDF (i.e., closure) [CR-08]
Queries written in RDF [CR-09]
Negation as failure. (Queries for the non-existence of one or more triples in a graph.) [CR-10]
Queries expressible in a syntax that is easily read and written by people. [CR-12]
Queries should be executable against a local RDF storage service without network support (i.e., queries independent of any network protocol). [CR-13]
Queries returning aggregate reports. [CR-14]

6.3. Protocol Requirements

some kind of extensibility bandwidth efficiency: http://lists.w3.org/Archives/Public/public-rdf-dawg/2004AprJun/0095.html limit, orderining usecase: http://lists.w3.org/Archives/Public/public-rdf-dawg/2004AprJun/0096.html

7. Related Technologies and Standards

See the survey of existing RDF query language implementations: RDF Query and Rules Framework.

RDF Core
RDF Query languages
- SQL-like
- Rule-like
- Path
SQL
XQuery
XPointer
SOAP/XMLP and REST

If you have questions about specific problems or issues in this document, contact Kendall Grant Clark.

Data Access Working Group User Cases: WORKING DRAFT

Table of Contents

1. Document Status

2. Introduction

3. Motivation

4. Problem Description

4.1. Query

4.2. Data Access

5. Use Cases

5.1. Personal Information Management

5.1.1. Finding an email address

Description

Benefits

5.1.2. Regularly executing a query

Description

Benefits

5.1.3. Monitoring news events

Description

5.2. Web Publishing

5.2.1. Saying things about web resources

Description

Benefits

5.2.2. Discovering what people say about news stories

Description

Benefits

5.3. Financial Services

5.3.1. Tracking accounts and customers

Description

Benefits

5.4. Urban Planning

5.4.1. Exploring my neighborhood

Description

Benefits

5.5. Intelligence

5.5.1. Finding unknown human persons

Description

Benefits

5.6. Supply Chain Support

5.6.1. Finding information about motorcycle parts

Description

Benefits

5.7. Software Development

5.7.1. Finding input and output documents for test cases

Description

Benefits

5.7.2. Describing software configurations

Description

Benefits

5.8. Transportation

5.8.1. Avoiding traffic jams

Description

Benefits

5.8.2. Finding the cheapest flight from Boston to Chicago

Description

Benefits

5.9. Health Care

5.9.1. Ordering an x-ray

Description

Benefits

Issues

5.10. Human Resources

5.10.1. Find employees by group

Description

6. Candidate Technical Requirements

6.1. General Requirements

6.2. Query Language Requirements

6.3. Protocol Requirements

7. Related Technologies and Standards