Re: Some (v. late) Comments on the RDF Data Access Charter

Sorry folks, I gave the wrong URI for XQueryFA [2].

[2] http://www.w3.org/2001/11/13-RDF-Query-Rules/#XQueryFA

On Wed, Jan 07, 2004 at 12:33:59PM -0500, Eric Prud'hommeaux wrote:
> I responded to many of Brian's more editorial comments. The latter
> ones involve more specification of deliverables and working group
> logistics so I need to figure out how flexible this charter before
> addressing them.
> 
> On Mon, Jan 05, 2004 at 04:33:46PM -0000, Brian McBride wrote:
> > I have finally found some time to take a look over the RDF data access
> > charter [1].  I realise these comments are very late, but am sending them in
> > case they might still be useful.
> > 
> > 1.  The charter states that the principal task of the WG is to define a
> > protocol for subgraph selection.  One of the justifications for this WG is
> > that there a number of RDF query languages about and now is a good time to
> > standardize.  Whilst my knowledge is somewhat limited, my understanding is
> > that many of the existing query languages focus on variable binding result
> > sets rather than query subgraph extraction, which makes me wonder why this
> > restriction is placed in the charter.
> > 
> > I appreciate there is a tricky balance between specifying a job the WG can
> > do in a reasonable time and giving the WG design freedom, but it strikes me
> > that perhaps the form of results required is something the WG might decide
> > based on its requirements.
> 
> My hope:
> Subgraph selection will not be seen as subgraph reporting. The
> existing languages first select a subgraph. There is then a split
> between those that report the subgraph (a minority) and those that
> report rows of bindings from the subgraph. The WG will be able to
> decide what to report. I expect it will be able to report both.
> The ResultsSet work trods down this path a bit.
> 
> > 2.  Whilst section states that the principal task of the WG is to define a
> > protocol, much (all?) of the emphasis of the rest of the document is on a
> > query language.  As written the charter might be interpretted as implying
> > that the protocol and query language are tightly bound together.  I suggest
> > rewording to encourage the working group to adopt a modular design approach,
> > separating the protocol, language design and result encoding.
> 
> My excuse:
> The protocol stuff came in later. I was hoping to draw a circle around
> query and do that first (ala SQL) and leave the protocol work for
> after the QL was well under way, perhaps with a smiley and successful
> WG recharting for more work. It makes some sense to do the QL first
> (if not in parallel) as one can test queries from the command line and
> from simple ad-hoc GET protocols. However, many folks wanted the whole
> thing solved at once so we added protocol to the charter.
> 
> You are quite likely right that they could be more integrated. I
> wouldn't mind working on this, but need to consider whether the
> product would be enough improved that it would be worth the setback in
> review time. A finite set of diffs is easier to pass to reviewers.
> 
> Perhaps a paragraph in the scope section espousing the virtues of
> modularity will do. As a challenge to this, I see provenance issues
> poking into all three modules:
> 
>   query language: subgraph selection
>     - expressivity
>       - provenance identification ?
>       - individual statement identification ?
>     - syntax
> 
>   result encoding: reporting the subgraph
>     - sets of bindings &&|| statements
>     - provenance identification ?
>     - individual statement identification ?
> 
>   protocol: manipulate stores, convey queries and results
>     - provenance identification ?
>     - individual statement identification ?
> 
> > 3. 1.1
> > 
> > There should either also be a Protocol requirements document or a joint
> > document covering both language and and protocol requirements.
> 
> 1.1 Query Language and Protocol Requirements Document
> 
> The working group will review existing applications and query language
> implementations. It will document the requirements for an RDF query
> language and data access protocol.
> 
> > 4. 1.3
> > 
> > The idea of an abstract syntax is an excellent one.  In an analagous manner,
> > I suggest similar benefits acrue from defining a protocol abstraction which
> > can then be bound to various underlying protocol layers, a practice, adopted
> > I believe by the XML protocol WG.
> 
> I added ¨The working group will produce an abstract syntax for the
> query language and protocol." to 1.3.
> 
> >                                    Oh, that reminds me, section 1 should say
> > XML protocol, not SOAP, right?
> 
> "SOAP" is the name of XMLP WG (not just a vestigial association).
> Philippe requested 1.2 but that was awkward to word so I added a link
> to the 1.2 spec..
> 
> > 5.  1.5 Relationship to XQuery
> > 
> > The relationship to XQuery is important, however, I'm uncomfortable with the
> > way the charter addresses this.
> > 
> > I suggest that the WG should be required to "account for the relationship
> > between RDF data access and XML Query" and the WG should be encouraged
> > re-use XML Query and other W3C technologies where appropriate.
> > 
> > I am concerned that requiring the WG to produce a translation for RDF data
> > access query language abstract syntax to XML Query may requiring the WG to
> > take on a task that is not essential to its principal task and may thus slow
> > down its completion of that task.
> 
> Earlier criticisms of this requirement were that it was a not
> necessarily possible task. In response to this, I added TreeHugger [1]
> and XQueryFA [2] to the query survey. You, however, call into question
> whether it is essential.
> 
> W3C membership has invested on the order of 100 man years in the
> XQuery specification, and more than that on implementation. There is a
> rift between the RDF community and the XML community stemming from not
> only different data models, but a clash of attitudes.
> 
> Practically, we can give the XQuery user access to RDF data in the
> same manner that they have access to relational data via XQuery-SQL
> mappings. This opens up RDF to a much larger set of clients and
> reduces the "RDF or XML" question.
> 
> One could argue that the XQuery working group should do this work
> instead, but I think that the somewhat separatist RDF community will
> have to do the work because the DAWG starts after XQuery finishes (and
> the gesture would probably be welcome).
> 
> > 6. 1.6 Extensibility Mechanism
> > 
> > I suspect this is just a picky wordsmithing point.  To talk about an
> > extensibility mechanism is to suggest that the WG should seek a single
> > mechanism to support extensibility.  I think what the charter is trying to
> > say is that the WG is strongly encouraged to produce a design that a) allows
> > for extensability and b) allows multiple extensions to be adopted
> > simulataneously.
> 
> Yes, the orthoganal extensions bit comes from SOAP, and before that,
> HTTP Extensions.
> 
> > What the current document does not say, though perhaps implies, is that the
> > WG is encouraged not to try to "boil the ocean", but to limit its work
> > defining a 'small' design that can be extended to support additional
> > requirements.  The WG might be encouraged to classify its requirements into
> > 'core' and 'extension'.
> 
> This could reflect in a conformance section where "conformant"
> implementations provide the "core" functionality and had a defined
> response when it recieves requests for unsupported extensions.  But
> I'm not convinced that we can provide that guidance at this point.
> 
> A rule head is an example of something that may be an
> "extension". This would allow the graceful definition and deployment
> of a rules language.
> 
> > 7. 1.7 Defined expressivity
> > 
> > I'm not sure what this is trying to say.  It says the WG will have to make
> > tradeoffs.  Sure.  Is the charter trying to express some guidance to the WG
> > about how to make those tradeoffs?
> 
> Added "The Requirements Document and test cases will provide the
> working group with guidance in making these trade-ffs." Reasonable?
> 
> > 8. 1.8 Derived Graphs
> > 
> > Again, I'm not sure why this is here.  Yes, some RDF graphs are infinite.
> > This is bound to come out in the requirements.  Is there something special
> > about this requirement the means it needs special mention in the charter.
> > If so, I didn't understand the significance from the current text.
> 
> There are three objectives here:
>   Don't tie the QL/protocol exclusively to either materialized graphs
>   or to inference engines.
>   Raise the specter of provenance.
>   Something about not going to hell with infinite data sets if you
>   can avoid it.
> 
> I don't know that this is such a practical distinction. SQL doesn't
> give you infinite relations, but it *easily* gives you data sets that
> would take years to transmit. I didn't put the word "infinite¨ in
> there, but will lobby to remove it as I don't think it helps with the
> important point that you may not be querying a simple list of
> statements.
> 
> > 9. 2.1 RDF Schema/Owl semantics
> > 
> > This states that the fact that a graph is virtual (i.e. derived from
> > inference) does not affect the protocol.  Is this intended to state that it
> > should not affect the query language either, i.e. you can't say things like
> > "find me all the ground facts such that ..."?
> 
> Yes. We're only querying stuff we can get to my a graph. Maybe you
> could reify your query and tag particular statements as being ground
> facts, but that would still mean that the protocol/QL is only
> communicating a graph query.
> 
> > 10. RDF Rules
> > 
> > "will be useful in the later ..."
> > 
> > This implies that the rules WG wills start after/later than the Query WG.
> > Is that still the case?
> 
> Yes, though probably not after DAWG completes.
> 
> I see this ordering as conducive to good layering and working out some
> of the more agreeable areas in the DAWG before adding more complication
> in a SWRL (rules WG).
> 
> > "is not expected to develop such a language"
> > 
> > That is weak language.  "Development of a rules language is out of scope for
> > this WG".  Is there a clear boundary between rules and query?
> 
> While it is hoped that the product of the RDF Data Access Working
> Group will be useful in later development of a rules language,
> development of such a rules language is out of scope for this working
> group.
> 
> > "the groups should ..."
> > 
> > This implies the groups are operating simultaneously.
> 
> Plan A is to have the SWRL start in about 6 months.
> 
> > "expend minimal effort"
> > 
> > That feels weak.  Is there a charter requirement that query and rules should
> > work together or not?
> 
> Yes, but the chair is stuck saying how much (probably with some
> feedback from the Coordination Group). The idea is that some spend a
> couple days thinking about how they'd stick a rules language into the
> query language and if there are no great sacrifices, leave room for
> that in the language. There is no requirement to produce something
> that SWRL will use, merely make a small effort in that direction.
> 
> > Does the statement under 1.5 Relationship with XQuery also apply here.  The
> > WG's are strongly encouraged to support/use each others stuff, but exactly
> > what that means in practice will only become clear as the work progresses.
> 
> Yeah, I'd be afraid to nail that down any further.
> 
> > 11 2.3 Cursors and Proofs.
> > 
> > I'm not sure what is meant by cursors here.  It could mean that the variable
> > binding functionality of say, Squish is out of scope, or it could be that
> > session state between client and server is not supported.  I've commented on
> > the former in section 1.
> 
> The session state is ruled out of scope. Perhaps the current wording,
> "requester/server interactions for result set cursors or proofs¨, could
> use some beefing up to make that clear. Suggestions?
> 
> >                           I'm not sure why the latter is ruled out of scope
> > by the charter, even though I'm sympathetic to it being so. I wonder whether
> > a general guideline of "keep it simple" and "don't boil the ocean" is
> > sufficient discouragement that the WG can be relied on to do the right thing
> > based on what it learns about requirements.
> 
> It is intended to "keep it simple¨. I also suspect that really
> formally defining cursors is not a simple job.
> 
> > 12 3. Deliverables and schedule
> > 
> > The deliverables are not clearly listed.
> > 
> > Phase 1 Deliverables:
> > 
> >   - protocol requirements, including what bindings will be specified in
> > phase 2.
> >   - Query language requirements
> >   - An account of the relationship of the Query language to XML Query
> >   - An account of the relationship of the Query language to RDF rules
> >   - revised schedule
> > 
> > Requirements may indicate specific non requirements and postponed
> > requirements.
> > 
> > Phase 2 Protocol Deliverables:
> > 
> >   - an abstract protocol specification
> > 
> > At least one of (depending on requirements):
> > 
> >   - a binding of the abstract protocol to HTTP
> >   - a binding of the abstract protocol to XML Protocol
> > 
> > Phase 2 Query language Deliverables:
> > 
> >   - an abstract syntax for the query language
> >   - at least one concrete syntax for the query language
> >   - semantics of the query language, i.e. a specification of the results
> > that should be produced for any query against any graph
> >   - test cases for the query langauge.  These are not expected to be a
> > complete conformance test suite, but are expected to illustrate key aspects
> > of the design, to be machine processable and to be a useful indication that
> > compatible implementations exist at the request for PR.
> >   - a list of postponed issues
> >   - a *small* introductory primer
> >   - a validator?
> > 
> > The WG is free to organise these deliverables into documents as it sees fit.
> > The WG should consider the use of mathematical methods for specifying the
> > semantics and should bear in mind both the benefits and the costs of the
> > approach its adopts.
> > 
> > 12 4. Relationship with other W3C activities
> > 
> > - I18N should be mentioned here.  The WG should be strongly encouraged to
> > establish an internal champion for internationalization issues who should
> > be/come an expert in internationalization issues, liase on a regular basis
> > with the I18N folks and champion internationalization concerns within the
> > WG.
> > 
> > - QA might be mentioned here.  Similar to I18N, the WG might be encouraged
> > to establish someone to liase with the QA folks and champion QA issues
> > within the WG.
> > 
> > 13. 5.1 Email communication
> > 
> > does the charter need to specify the mailing list used?  Is www-rdf-rules
> > appropriate?  What will the rules working group use?  Ah, the same one -
> > that might be a good idea.  This is not the natural list to look at for
> > query discussion.  I think the RDFCore organization worked well from my
> > point of view.  I'd like hear whether folks not on the WG felt we were/are
> > too isolated.
> > 
> > RDFCore generated a lot of traffic on ocasions - I'm not sure folks on the
> > general lists really want all that stuff in their inbox.  Similarly, wider
> > discussion can get pretty fierce and undiscipined too, e.g. the tag list.
> > I'd suggest giving the WG(s) a public list of their own for technical
> > discussion.
> > 
> > I'd suggest:
> > 
> >   - www-rdf-da-wg a public list for WG admin/technical discussion
> >   - www-rdf-da-comment a public comments list for formal communication with
> > the WG
> > 
> > 14. 5.4 Face-to-Face Meetings
> > 
> > RDFCore didn't have enough f2f meetings.  I wonder if every three months
> > might be appropriate.
> > 
> > I'm wondering about the chair inviting observers to 'participate in decision
> > making'.  Is there a rationale for this.
> > 
> > That's it on a quick read, apart from the timescale.  The requirements will
> > be settled at a f2f in June, thats probably the end of June.  Can there be a
> > WD by the end of August, given holidays etc.  Yes, if the there is a good
> > enough starting point for whatever strawman is selected.  Its impossible to
> > accurately estimate the development time, but what we have here looks
> > impossibly tight to me.  The schedule from last call to REC also looks
> > incredible to me.  I don't think this can be done in a year and its only
> > fair on folks who join to give them a reasonable estimate of the commitment
> > they are making.
> > 
> > Brian
> > 
> > [1] http://www.w3.org/2003/10/RDF-Query-Charter
> [2] http://www.w3.org/2001/11/13-RDF-Query-Rules/#XQueryFA
> 
> $Id: RDF-Query-Charter.html,v 1.78 2004/01/07 17:29:06 eric Exp $
-- 
-eric

office: +81.466.49.1170 W3C, Keio Research Institute at SFC,
                        Shonan Fujisawa Campus, Keio University,
                        5322 Endo, Fujisawa, Kanagawa 252-8520
                        JAPAN
        +1.617.258.5741 NE43-344, MIT, Cambridge, MA 02144 USA
cell:   +1.857.222.5741 (does not work in Asia)

(eric@w3.org)
Feel free to forward this message to any list for any purpose other than
email address distribution.

Received on Thursday, 8 January 2004 06:47:29 UTC