Review of RDF Data Access Use Cases and Requirements

NOTE: These suggestions are based on my current knowledge of these working groups, which, being a new member, might be a bit redundant to prior knowledge or discussions on these matters. Furthermore, I'm not entirely sure what intersection of 'best practices' we should respond decidely to, but below are my general thoughts on query languages, the DAWG use case document and their specific requests from us (the 2 questions).

query language practices 
------------------------

With regards to query languages and data models. Many, if not all, query languages operate in conjunction with an underlying data model. The data model provides the known types for results and intermediate state produced by (and within) the query language. XPath, Xquery, SQL, etc. all have well-defined data models.

For example, XQuery attempts to accomodate xml schema data types, but also provides types not consistent with xml schema nor do all data values represent qualified XML fragments within an XQuery. This has caused some unease when using XQuery in SOME situations. Generally, this data type mismatch should be avoided early in design of the query language and data model design. It can result, however, as needs change over time and query languages are modified outside their original scope. Some degree of accomodation as new insights unfold is to be expected.

XQuery was designed to provide rich functional control over searching and processing XML data. It is designed for XML, hierarchically structured data and it is best suited for this purpose.

Alternatively, SQL is less functional and more declarative in nature and is designed with tabular structured data in mind. The point here is that there is often an optimization between the physical representation of a particular data set and its query language which intentionally seek to agree. This agreement makes it easier for folks who design (or use) the data model to query it in a similar and compatible idiom. For example, XQuery and XPath uses convenient 'path' notations to index into XML documents, but a path does not really make much sense if you consider it over a table.

That said, XQuery is a valid (whether acceptable in practice or not, another issue) consideration for querying RDF-XML, because of its XML embodiment. However, there are multiple embodiments of
pure (or abstract) RDF. 

Generally I would not attempt to apply a query language well suited to a particular embodiment (e.g. XML) across all general embodiments whose true and physical representation MAY clash with the query language facilities (e.g axis, paths, etc.). 

DAWG use cases
----------------

Some general thoughts on the "RDF Data Access Use Cases and Requirements" document.

- The high-level use patterns seem generally helpful and diligent in the design and intent of the related technologies. The cross-reference to the requirements is important since the use-cases are very high-level.
- Two things are mentioned early. 1) A query language 2) a data access protocol. The DAP seems to be necessary for clients to talk to servers. Generally, I would keep the two unbound so that they can change independently as needed. There is not much discussion of the DAP, specifically whether it is a transport/wire protocol (like HTTP, IIOP, etc.) or some higher level protocol. This distinction is key. I would strongly recommened not to create a new wire-protocol like IIOP which could have a similar fate. HTTP is a best practice, in my opinion and should be considered as a transport protocol. Ideally, the transport would be extracted from a uri scheme (e.g. http://, tcp://, udp://, ). 

- It appears the data model for the query language will be a subset of XML schema.


DAWG question responses
-----------------------

1. XQuery as one of the candidates for a "Human friendly Syntax". ref: 4.1

The XQuery syntax is relatively easy to read and quite expressive. If drawing from this query language to form a new RDF specific language, I would recommend shedding all traces of XML operations/semantics including the various XML 'axis' (e.g. child(@att) ) and replace those
axis' with graph specific ones (e.g. trans(@uri,'http://namespace/name#predicate') ).

NOTE: Using the syntax of XQuery to create a new query language seems to be the intent of this question and NOT taking XQuery en totale and using it specifically.

2. It should be possible for knowledge encoded in other semantic languages-for example: RDFS, OWL, and SWRL-to affect the results of queries executed against RDF graphs. ref: 4.6

To acheive this, the recommendation of NOT using a query language designed for a particular RDF embodiment (e.g. XML) is necessary. Rather, the resulting RDF query language SHOULD operate over the abstract RDF triple format and be optimized as such. This suggests that perhaps a standard RDF abstract format be declared such that conformance to the abstract format (e.g. triples) directly supports the query language independent of the originating embodiment or serialization of RDF. In this way, a separation of concern between 'pure' and conformant RDF, which agrees with the query language in terms of physical representation and data model, and the various serializations or deserialization schemes achievable to/from RDF, which may vary over time. To some extent RDQL address this.

In fact, this practice is being used. For example, one can serialize RDF-XML into an abstract
RDF data (or object) model (and vice-versa) and query over it using RDQL or similarly derived query language.

These are my thoughts.

Darren Govoni
Senior Architect
McDonald Bradley
dgovoni@mcdonaldbradley.com
703.375.6087

Received on Sunday, 3 October 2004 16:17:27 UTC