Towards an RDF Query standard: some proposed minimum requirements from NMP-MSW/Tampere on 2003-11-18 (www-rdf-interest@w3.org from November 2003)

From: NMP-MSW/Tampere <patrick.stickler@nokia.com>
Date: Tue, 18 Nov 2003 11:04:52 +0200
To: www-rdf-rules@w3.org
Message-Id: <BA4126DE-19AE-11D8-BAA5-000A95EAFCEA@nokia.com>
Reading through all the discussions arising from the proposed charter
for an RDF Query WG, and considering the various approaches that exist
to date to serve as input to such a process, I think it would be useful
to keep the following distinctions in mind when defining one or more
standardized RDF query protocols:

The first distinction is between discovery versus submission. I.e.
between pull versus push.

The second distinction is between a resource centric focus and a
broader, general, knowledge base focus.

Thus, given these two distinctions, I see the degree of need
(non proportionally) ordered as follows:

                               Direction
                     -----------------------------
                     |      Pull    |    Push    |
        ------------------------------------------
        |  Resource  |       1      |     3      |
Focus  |-----------------------------------------
        |     KB     |       2      |     4      |
        ------------------------------------------

I also see a huge gap in degree of need between the pull
versus push functionality, the latter being highly specialized
and no where near as acutely needed as the former.

Thus, if the first standardized RDF Query protocol(s) only
addressed pull, that would IMO address the lion's share of
the greatest immediate need, insofar as getting SW services
and agents widely deployed.

That is not to say that push is not very important. It is.
But when deciding on the reasonable scope/target for the WG,
if something has to go in order to have a timely first round,
the I suggest that push functionality is a reasonable
candidate for deferement.

Also, I think the industry would benefit more from several,
short, focused standardization rounds, rather than one long
comprehensive standardization round.

A first round focusing on two pull protocols (or two modes
of behavior of the same pull protocol) -- one that is
resource centric and the other that is KB centric, but
sharing a common conceptual core -- aimed at providing a
deployable standard for the publication and access of
knowledge within a single calendar year, followed up by
a second round addressing push functionality (with all the
additional issues that come into play therein, e.g.
authentication, organization/management of kb, etc.)
constituting a more involved task.

Yet while the WG works on push functionality in a second
round, the SW can be happiliy purring along with the pull
functionality.

The resource centric protocol/mode will serve as a bootstrapping
mechanism for the knowledge base centric protocol/mode (as well
as any arbitrary web or semantic web service) such that for any
web authority, an agent can obtain a description of that server
based on its HTTP URI (e.g. http://example.org) from which it
can obtain information about all protocols, interfaces, services,
etc. -- including information relevant to general query services, and
when relevant push functionality.

--

I recommend a first round focusing only on pull functionality.

I also recommend following minimal requirements be adopted into
the charter for such a first round:

1. For any arbitrary URI having a web authority component, for any
    arbitrary agent, the agent is able to obtain authoritative
    knowledge about the resource denoted by the URI from the web
    authority of the URI; and can do so solely based on the to-be-defined
    standard, and with no additional knowledge other than that URI.
    The knowledge returned should, by default, correspond to a concise
    bounded description of the resource, representing a fundamental
    feature for scalability and efficiency. I.e. something akin
    to http://sw.nokia.com/uriqa/URIQA.html#cbd.

2. For any arbitrary query service which conforms to the to-be-defined
    standard, for any arbitrary agent, the agent is able to submit
    a generalized query to be executed against the knowledge base
    exposed by that service and recieve a subgraph of that knowledge
    base (possibly empty), corresponding to the knowledge matched by
    the query; and can do so solely based on the to-be-defined standard,
    and with no additional (manditory) knowledge about the service, 
server,
    particular knowledge base, model, etc. specified either in the 
request
    or the query.

3. Existing web standards should be employed as much as possible; 
however,
    overloading of the semantics of existing web protocols should be 
avoided
    and the deployment and use of the to-be-defined standard should have 
zero
    impact on the present-day functioning of the web and should not 
introduce
    any confusion or ambiguity whatsoever regarding the interpretation of
    any existing web standards or proper web server or web client 
behavior.
    Semantic web standards should extend the web, not redefine it.

4. The form of expression for general queries should not impose any 
constraints
    on the scope of expression which are not imposed by RDF, such as 
failure
    to support arbitrary vocabularies or arbitrary datatypes used in the
    expression of typed literals, e.g. by limiting the query language to 
a
    select set of native datatypes. Particular implementations will only 
support
    a limited number of datatypes, and thus, some queries may not be 
resolvable
    by all conformant query services; however the query language itself 
should
    neither restrict nor discriminate against any particular datatype, 
but should
    be as datatype agnostic and vocabulary agnostic as RDF.

In addition to the above "absolute" requirements, I would also
recommend the following additional requirements/deliverables:

5. Input queries presented to the general query service should be
    expressed as RDF (not necessarily RDF/XML, see #8 below). This
    allows for standard RDF tools to be used in the description of,
    expression of, processing of, reasoning about, the queries 
themselves.

    (note that adoption of #5 gives you #4 "for free")

6. Output of query results should be expressed as RDF/XML.
    Note that variable bindings can also be returned expressed as RDF 
(see
    #7 below), which allows for both matched knowledge *and* bindings to 
be
    accomodated in query results in a consistent, standardized manner.

7. A standardized vocabulary for describing query services, for 
expressing
    queries and for expressing query results should be defined, along 
with a
    formal semantics. E.g. something like the union of 
http://sw.nokia.com/rdfq/RDFQ.html
    and 
http://www.w3.org/2003/03/rdfqr-tests/recording-query-results.html, with
    additional terms relating to the query service description, etc.

8. A subset of N3 should be defined/standardized/blessed as a 
keyboard-friendly
    RDF serialization, for use with query entry interfaces. Queries 
expressed in
    RDF using this subset of N3 can be as compact and readable as 
queries expressed
    in a SQL like syntax. C.f. 
http://sw.nokia.com/rdfq/RDFQ.html#examples. This
    would provide for a pair of standardized RDF serializations: RDF/XML 
and
    "Compact RDF".

Anyway, those would be my recommendations.

Regards,

Patrick


--

Patrick Stickler
Nokia, Finland
patrick.stickler@nokia.com
Received on Tuesday, 18 November 2003 05:07:39 UTC