Re: Some (v. late) Comments on the RDF Data Access Charter

I responded to many of Brian's more editorial comments. The latter
ones involve more specification of deliverables and working group
logistics so I need to figure out how flexible this charter before
addressing them.

On Mon, Jan 05, 2004 at 04:33:46PM -0000, Brian McBride wrote:
> I have finally found some time to take a look over the RDF data access
> charter [1].  I realise these comments are very late, but am sending them in
> case they might still be useful.
> 
> 1.  The charter states that the principal task of the WG is to define a
> protocol for subgraph selection.  One of the justifications for this WG is
> that there a number of RDF query languages about and now is a good time to
> standardize.  Whilst my knowledge is somewhat limited, my understanding is
> that many of the existing query languages focus on variable binding result
> sets rather than query subgraph extraction, which makes me wonder why this
> restriction is placed in the charter.
> 
> I appreciate there is a tricky balance between specifying a job the WG can
> do in a reasonable time and giving the WG design freedom, but it strikes me
> that perhaps the form of results required is something the WG might decide
> based on its requirements.

My hope:
Subgraph selection will not be seen as subgraph reporting. The
existing languages first select a subgraph. There is then a split
between those that report the subgraph (a minority) and those that
report rows of bindings from the subgraph. The WG will be able to
decide what to report. I expect it will be able to report both.
The ResultsSet work trods down this path a bit.

> 2.  Whilst section states that the principal task of the WG is to define a
> protocol, much (all?) of the emphasis of the rest of the document is on a
> query language.  As written the charter might be interpretted as implying
> that the protocol and query language are tightly bound together.  I suggest
> rewording to encourage the working group to adopt a modular design approach,
> separating the protocol, language design and result encoding.

My excuse:
The protocol stuff came in later. I was hoping to draw a circle around
query and do that first (ala SQL) and leave the protocol work for
after the QL was well under way, perhaps with a smiley and successful
WG recharting for more work. It makes some sense to do the QL first
(if not in parallel) as one can test queries from the command line and
from simple ad-hoc GET protocols. However, many folks wanted the whole
thing solved at once so we added protocol to the charter.

You are quite likely right that they could be more integrated. I
wouldn't mind working on this, but need to consider whether the
product would be enough improved that it would be worth the setback in
review time. A finite set of diffs is easier to pass to reviewers.

Perhaps a paragraph in the scope section espousing the virtues of
modularity will do. As a challenge to this, I see provenance issues
poking into all three modules:

  query language: subgraph selection
    - expressivity
      - provenance identification ?
      - individual statement identification ?
    - syntax

  result encoding: reporting the subgraph
    - sets of bindings &&|| statements
    - provenance identification ?
    - individual statement identification ?

  protocol: manipulate stores, convey queries and results
    - provenance identification ?
    - individual statement identification ?

> 3. 1.1
> 
> There should either also be a Protocol requirements document or a joint
> document covering both language and and protocol requirements.

1.1 Query Language and Protocol Requirements Document

The working group will review existing applications and query language
implementations. It will document the requirements for an RDF query
language and data access protocol.

> 4. 1.3
> 
> The idea of an abstract syntax is an excellent one.  In an analagous manner,
> I suggest similar benefits acrue from defining a protocol abstraction which
> can then be bound to various underlying protocol layers, a practice, adopted
> I believe by the XML protocol WG.

I added ¨The working group will produce an abstract syntax for the
query language and protocol." to 1.3.

>                                    Oh, that reminds me, section 1 should say
> XML protocol, not SOAP, right?

"SOAP" is the name of XMLP WG (not just a vestigial association).
Philippe requested 1.2 but that was awkward to word so I added a link
to the 1.2 spec..

> 5.  1.5 Relationship to XQuery
> 
> The relationship to XQuery is important, however, I'm uncomfortable with the
> way the charter addresses this.
> 
> I suggest that the WG should be required to "account for the relationship
> between RDF data access and XML Query" and the WG should be encouraged
> re-use XML Query and other W3C technologies where appropriate.
> 
> I am concerned that requiring the WG to produce a translation for RDF data
> access query language abstract syntax to XML Query may requiring the WG to
> take on a task that is not essential to its principal task and may thus slow
> down its completion of that task.

Earlier criticisms of this requirement were that it was a not
necessarily possible task. In response to this, I added TreeHugger [1]
and XQueryFA [2] to the query survey. You, however, call into question
whether it is essential.

W3C membership has invested on the order of 100 man years in the
XQuery specification, and more than that on implementation. There is a
rift between the RDF community and the XML community stemming from not
only different data models, but a clash of attitudes.

Practically, we can give the XQuery user access to RDF data in the
same manner that they have access to relational data via XQuery-SQL
mappings. This opens up RDF to a much larger set of clients and
reduces the "RDF or XML" question.

One could argue that the XQuery working group should do this work
instead, but I think that the somewhat separatist RDF community will
have to do the work because the DAWG starts after XQuery finishes (and
the gesture would probably be welcome).

> 6. 1.6 Extensibility Mechanism
> 
> I suspect this is just a picky wordsmithing point.  To talk about an
> extensibility mechanism is to suggest that the WG should seek a single
> mechanism to support extensibility.  I think what the charter is trying to
> say is that the WG is strongly encouraged to produce a design that a) allows
> for extensability and b) allows multiple extensions to be adopted
> simulataneously.

Yes, the orthoganal extensions bit comes from SOAP, and before that,
HTTP Extensions.

> What the current document does not say, though perhaps implies, is that the
> WG is encouraged not to try to "boil the ocean", but to limit its work
> defining a 'small' design that can be extended to support additional
> requirements.  The WG might be encouraged to classify its requirements into
> 'core' and 'extension'.

This could reflect in a conformance section where "conformant"
implementations provide the "core" functionality and had a defined
response when it recieves requests for unsupported extensions.  But
I'm not convinced that we can provide that guidance at this point.

A rule head is an example of something that may be an
"extension". This would allow the graceful definition and deployment
of a rules language.

> 7. 1.7 Defined expressivity
> 
> I'm not sure what this is trying to say.  It says the WG will have to make
> tradeoffs.  Sure.  Is the charter trying to express some guidance to the WG
> about how to make those tradeoffs?

Added "The Requirements Document and test cases will provide the
working group with guidance in making these trade-ffs." Reasonable?

> 8. 1.8 Derived Graphs
> 
> Again, I'm not sure why this is here.  Yes, some RDF graphs are infinite.
> This is bound to come out in the requirements.  Is there something special
> about this requirement the means it needs special mention in the charter.
> If so, I didn't understand the significance from the current text.

There are three objectives here:
  Don't tie the QL/protocol exclusively to either materialized graphs
  or to inference engines.
  Raise the specter of provenance.
  Something about not going to hell with infinite data sets if you
  can avoid it.

I don't know that this is such a practical distinction. SQL doesn't
give you infinite relations, but it *easily* gives you data sets that
would take years to transmit. I didn't put the word "infinite¨ in
there, but will lobby to remove it as I don't think it helps with the
important point that you may not be querying a simple list of
statements.

> 9. 2.1 RDF Schema/Owl semantics
> 
> This states that the fact that a graph is virtual (i.e. derived from
> inference) does not affect the protocol.  Is this intended to state that it
> should not affect the query language either, i.e. you can't say things like
> "find me all the ground facts such that ..."?

Yes. We're only querying stuff we can get to my a graph. Maybe you
could reify your query and tag particular statements as being ground
facts, but that would still mean that the protocol/QL is only
communicating a graph query.

> 10. RDF Rules
> 
> "will be useful in the later ..."
> 
> This implies that the rules WG wills start after/later than the Query WG.
> Is that still the case?

Yes, though probably not after DAWG completes.

I see this ordering as conducive to good layering and working out some
of the more agreeable areas in the DAWG before adding more complication
in a SWRL (rules WG).

> "is not expected to develop such a language"
> 
> That is weak language.  "Development of a rules language is out of scope for
> this WG".  Is there a clear boundary between rules and query?

While it is hoped that the product of the RDF Data Access Working
Group will be useful in later development of a rules language,
development of such a rules language is out of scope for this working
group.

> "the groups should ..."
> 
> This implies the groups are operating simultaneously.

Plan A is to have the SWRL start in about 6 months.

> "expend minimal effort"
> 
> That feels weak.  Is there a charter requirement that query and rules should
> work together or not?

Yes, but the chair is stuck saying how much (probably with some
feedback from the Coordination Group). The idea is that some spend a
couple days thinking about how they'd stick a rules language into the
query language and if there are no great sacrifices, leave room for
that in the language. There is no requirement to produce something
that SWRL will use, merely make a small effort in that direction.

> Does the statement under 1.5 Relationship with XQuery also apply here.  The
> WG's are strongly encouraged to support/use each others stuff, but exactly
> what that means in practice will only become clear as the work progresses.

Yeah, I'd be afraid to nail that down any further.

> 11 2.3 Cursors and Proofs.
> 
> I'm not sure what is meant by cursors here.  It could mean that the variable
> binding functionality of say, Squish is out of scope, or it could be that
> session state between client and server is not supported.  I've commented on
> the former in section 1.

The session state is ruled out of scope. Perhaps the current wording,
"requester/server interactions for result set cursors or proofs¨, could
use some beefing up to make that clear. Suggestions?

>                           I'm not sure why the latter is ruled out of scope
> by the charter, even though I'm sympathetic to it being so. I wonder whether
> a general guideline of "keep it simple" and "don't boil the ocean" is
> sufficient discouragement that the WG can be relied on to do the right thing
> based on what it learns about requirements.

It is intended to "keep it simple¨. I also suspect that really
formally defining cursors is not a simple job.

> 12 3. Deliverables and schedule
> 
> The deliverables are not clearly listed.
> 
> Phase 1 Deliverables:
> 
>   - protocol requirements, including what bindings will be specified in
> phase 2.
>   - Query language requirements
>   - An account of the relationship of the Query language to XML Query
>   - An account of the relationship of the Query language to RDF rules
>   - revised schedule
> 
> Requirements may indicate specific non requirements and postponed
> requirements.
> 
> Phase 2 Protocol Deliverables:
> 
>   - an abstract protocol specification
> 
> At least one of (depending on requirements):
> 
>   - a binding of the abstract protocol to HTTP
>   - a binding of the abstract protocol to XML Protocol
> 
> Phase 2 Query language Deliverables:
> 
>   - an abstract syntax for the query language
>   - at least one concrete syntax for the query language
>   - semantics of the query language, i.e. a specification of the results
> that should be produced for any query against any graph
>   - test cases for the query langauge.  These are not expected to be a
> complete conformance test suite, but are expected to illustrate key aspects
> of the design, to be machine processable and to be a useful indication that
> compatible implementations exist at the request for PR.
>   - a list of postponed issues
>   - a *small* introductory primer
>   - a validator?
> 
> The WG is free to organise these deliverables into documents as it sees fit.
> The WG should consider the use of mathematical methods for specifying the
> semantics and should bear in mind both the benefits and the costs of the
> approach its adopts.
> 
> 12 4. Relationship with other W3C activities
> 
> - I18N should be mentioned here.  The WG should be strongly encouraged to
> establish an internal champion for internationalization issues who should
> be/come an expert in internationalization issues, liase on a regular basis
> with the I18N folks and champion internationalization concerns within the
> WG.
> 
> - QA might be mentioned here.  Similar to I18N, the WG might be encouraged
> to establish someone to liase with the QA folks and champion QA issues
> within the WG.
> 
> 13. 5.1 Email communication
> 
> does the charter need to specify the mailing list used?  Is www-rdf-rules
> appropriate?  What will the rules working group use?  Ah, the same one -
> that might be a good idea.  This is not the natural list to look at for
> query discussion.  I think the RDFCore organization worked well from my
> point of view.  I'd like hear whether folks not on the WG felt we were/are
> too isolated.
> 
> RDFCore generated a lot of traffic on ocasions - I'm not sure folks on the
> general lists really want all that stuff in their inbox.  Similarly, wider
> discussion can get pretty fierce and undiscipined too, e.g. the tag list.
> I'd suggest giving the WG(s) a public list of their own for technical
> discussion.
> 
> I'd suggest:
> 
>   - www-rdf-da-wg a public list for WG admin/technical discussion
>   - www-rdf-da-comment a public comments list for formal communication with
> the WG
> 
> 14. 5.4 Face-to-Face Meetings
> 
> RDFCore didn't have enough f2f meetings.  I wonder if every three months
> might be appropriate.
> 
> I'm wondering about the chair inviting observers to 'participate in decision
> making'.  Is there a rationale for this.
> 
> That's it on a quick read, apart from the timescale.  The requirements will
> be settled at a f2f in June, thats probably the end of June.  Can there be a
> WD by the end of August, given holidays etc.  Yes, if the there is a good
> enough starting point for whatever strawman is selected.  Its impossible to
> accurately estimate the development time, but what we have here looks
> impossibly tight to me.  The schedule from last call to REC also looks
> incredible to me.  I don't think this can be done in a year and its only
> fair on folks who join to give them a reasonable estimate of the commitment
> they are making.
> 
> Brian
> 
> [1] http://www.w3.org/2003/10/RDF-Query-Charter
[2] http:°www.w3.org½004/01/07-DAWG-Comments.html,access

$Id: RDF-Query-Charter.html,v 1.78 2004/01/07 17:29:06 eric Exp $
-- 
-eric

office: +81.466.49.1170 W3C, Keio Research Institute at SFC,
                        Shonan Fujisawa Campus, Keio University,
                        5322 Endo, Fujisawa, Kanagawa 252-8520
                        JAPAN
        +1.617.258.5741 NE43-344, MIT, Cambridge, MA 02144 USA
cell:   +1.857.222.5741 (does not work in Asia)

(eric@w3.org)
Feel free to forward this message to any list for any purpose other than
email address distribution.

Received on Wednesday, 7 January 2004 12:34:00 UTC