Re: proposal to drop DESCRIBE from SPARQL from Bob MacGregor on 2005-01-17 (public-rdf-dawg-comments@w3.org from January 2005)

From: Bob MacGregor <bmacgregor@siderean.com>
Date: Mon, 17 Jan 2005 11:06:29 -0800
To: Dan Brickley <danbri@w3.org>
CC: public-rdf-dawg-comments@w3.org
Message-ID: <41EC0CB5.2030001@siderean.com>
I would appreciate a high-level description/clarification of the 
ramifications of
dropping DESCRIBE.  The idea that each system/implementation would 
decide what
attributes to send back, and what not, is clearly not a long-term winner.
However, the notion of sending back graphs insead of "rows" is extremely
important for some applications.

We generate queries that have lots of OPTIONAL clauses that bind
to variables in the SELECT clause (the small example below was submitted
as a use case a while back).  The optional clauses, in combination with
multiple valued attributes, can generate vary large cartesion products.  We
routinely generate queries where 500 rows collapse into a single "tree"
(a portion of a graph rooted at a resource).  This means that much
space is wasted, transmission is therefore slow, and the routine that
does the "collapsing" is expensive.  Returning a graph containing the
set of "trees" would be much preferable.  We have implemented an extended
version of SPARQL that includes a facility for specifying the structures
we want back, and will be switching to that soon.

Summarizing, the ability to retrieve tree-like structures containing a
large percentage of optional attributes is critical to some 
applications.  There
are no efficient ways of getting such results using a SQL-like language that
returns a row at a time.  When implemented efficiency, this represents a 
step
beyond the "competition" (relational databases), and hence is a means for
establishing a stronger argument for using RDF technology in preference to
existing tools.  Hence, something like DESCRIBE needs to be part of SPARQL.

Cheers, Bob


SELECT ?Book, ?title, ?coverage, ?title2, ?isPartOf, ?description, 
?date, ?creator, ?title3, ?nationality
WHERE (?Book, 
<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>,           
<http://www.siderean.com/bookdemo#Book>)
    AND (OPTIONAL (?Book, <http://purl.org/dc/elements/1.1/title>, ?title))
    AND (OPTIONAL ((?Book, <http://purl.org/dc/elements/1.1/coverage>, 
?coverage)
                   AND (OPTIONAL (?coverage, 
<http://purl.org/dc/elements/1.1/title>, ?title2))
                   AND (OPTIONAL (?coverage, 
<http://purl.org/dc/terms/isPartOf>, ?isPartOf))))
    AND (OPTIONAL (?Book, <http://purl.org/dc/elements/1.1/description>, 
?description))
    AND (OPTIONAL (?Book, <http://purl.org/dc/elements/1.1/date>, ?date))
    AND (OPTIONAL ((?Book, <http://purl.org/dc/elements/1.1/creator>, 
?creator)
                   AND (OPTIONAL (?creator, 
<http://purl.org/dc/elements/1.1/title>, ?title3))
                   AND (OPTIONAL (?creator, 
<http://www.siderean.com/bookdemo#nationality>,     ?nationality))))

Dan Brickley wrote:

>(this is a personal review comment, like my others; it shouldn't
>be mistaken for a SWBPD or SWIG request for changes to SPARQL).
>
>I hereby propose you drop the DESCRIBE construct from SPARQL, 
>and rework that part of the spec to show how queries can be 
>written which ask for RDF documents in terms of their 
>topics and other properties.
>
>Similar (perhaps in some ways better) functionality can be achieved by 
>simply asking SPARQL questions which give as their answers 
>references to RDF/XML documents.
>
>Refs: 
>published WD of 2004-10-12,
>http://www.w3.org/TR/2004/WD-rdf-sparql-query-20041012/#describe
>
>live editor's copy at time of writing,
>http://www.w3.org/2001/sw/DataAccess/rq23/#describe
>
>The spec has been re-organized a bit, but the basic idea seems 
>the same as published in the October WD, which is to provide a 
>loose and flexible mechanism by which a server can return a 
>useful bundle of information about some entities identified 
>via query expressions. I should be clear here that I think this 
>is a valuable facility to have available in SPARQL; my only 
>objection is that it doesn't need a keyword in the language.
>
>Instead, I propose that queries which use the rdfs:seeAlso
>property could serve this purpose. If that property 
>doesn't quite meet your needs, please look into designs 
>which use similar properties.
>
>>From http://www.w3.org/TR/rdf-schema/#ch_seealso
>
>	5.4.1 rdfs:seeAlso
>
>	rdfs:seeAlso is an instance of rdf:Property that is used to indicate a
>	resource that might provide additional information about the subject
>	resource.
>
>	A triple of the form:
>
>	    S rdfs:seeAlso O
>
>	states that the resource O may provide additional information about S.
>	It may be possible to retrieve representations of O from the Web, but
>	this is not required. When such representations may be retrieved, no
>	constraints are placed on the format of those representations.
>
>	The rdfs:domain of rdfs:seeAlso is rdfs:Resource. The rdfs:range of
>	rdfs:seeAlso is rdfs:Resource.
>
>There are also some notes in the ESW wiki, 
>http://esw.w3.org/topic/UsingSeeAlso which might be useful.
>
>Here's the first example from 10.3.2 editor's draft,
>
>	PREFIX foaf:   <http://xmlns.com/foaf/0.1/>
>	DESCRIBE ?x
>	WHERE    (?x foaf:mbox <mailto:alice@org> )
>
>
>Here's the seeAlso'd form I propose:
>
>        PREFIX foaf:   <http://xmlns.com/foaf/0.1/>
>	PREFIX rdfs:   <http://www.w3.org/2000/01/rdf-schema#>
>        SELECT ?doc
>        WHERE    (?x foaf:mbox <mailto:alice@org> )
>	WHERE    (?x rdfs:seeAlso ?doc)
>
>
>Some notes on pro/cons:
>
>1. the query is longer, but the query language is simpler
>
>2. implementations barely change; as with DESCRIBE particular datasets 
> and services may or may not have anything to offer in response
> to the query.
>
>3. the query-based design is incrementally extensible - different 
>types of description could be requested. We could constrain the 
>query by (?doc dc:format "text/rdf+n3") or by reference to the 
>rdf:type of the ?doc, eg. FOAF's "PersonalProfileDocument" or RSS1's 
>notion of a channel, or by characteristics of the source of the 
>describing statements (eg. we could ask for ?docs which have been 
>digitally signed by people who match some SPARQL expression).
>
>4. the main difference between the current keyword-based design and 
>the property-centric alternative I'm proposing seems to be that in 
>the latter, we get the actual RDF, in the former, we get a (potentially
>dangling, 404 etc) document reference. I'd expect a common setup to be 
>that these are simply references back into the same SPARQL service, 
>although similar queries could (depending on nature and whims of the
>dataset/service being asked) return rdfs:seeAlsos that point elsewhere 
>in the Web. The property-centric approach could then mean more
>round-trips to the server --- is this a problem? But it also allows 
>RDF to be used to describe the documents being referenced (eg. size in
>bytes, number of triples, etc) which could help clients be more 
>efficient and selective. 
>
>5. Can we SELECT from the results of a DESCRIBE in a single SPARQL 
>expression? I can see something like that making sense in terms of 
>partitioning work between a client and server, eg. a dumb server returns 
>generic book reviews; local client selects out prices and ratings. If this 
>sort of thing is important, perhaps it is evidence for the existing 
>keyword-based approach. But I'm not sure DESCRIBE works like that
>currently. There's also some relationship to N3's 'log:semantics'
>design, perhaps.  In N3, I believe I could query using rdfs:seeAlso
>expressions, and then dereference the RDF to populate a queriable 
>context. I expect that in SPARQL, such things will get done in 
>application code rather than in the QL.
>
>6. How to deal with multiple topics?
>
>Another example (from Editor's copy)
>
>[[	More than one URI or can be given:
>
>	PREFIX foaf:   <http://xmlns.com/foaf/0.1/>
>	DESCRIBE ?x ?y <http://example.org/>
>	WHERE    (?x foaf:knows ?y)
>]]
>
>This time we're asking for a chunk of RDF that describes 
>things that are in a foaf:knows relationship and a 
>document, <http://example.org/>. Triples that
>were "about" (in a loose sense, not just the rdf:subject sense)
>either ?x, or ?y, or the doc... would be relevant. Triples which
>were "about" the foaf:knows relationship connecting the two people
>might be even more relevant, etc etc. The query is pretty vague.
>
>Can this be reformulated using properties? I tried, with a 
>mess of optionals. I'm sure something could be cooked up 
>with seeAlso or other properties, or rdf:List or rdf:Alt. The 
>sense of the query is loose enough that I doubt it worth 
>trying to capture it more formally as a complicated bunch of 
>OPTIONALs. Doing so is somewhat add odds to the whole point of 
>DESCRIBE anyway.
>
>
>7. seeAlso is good for Semantic Web deployment
>
>Promoting seeAlso and RDF description of RDF *documents* is
>good for the Semantic Web, since it helps people find and 
>cross-reference pieces of RDF data. SPARQL's "DESCRIBE" 
>mechanism is purely internal to the language, currently. It 
>allows me to ask a service for RDF that describes some 
>particular thing, but all I get is the actual RDF. Encouraging 
>that RDF to be made available at GET-able URIs, and 
>described with rdfs:seeAlso and other RDF properties, is imho
>an important part of getting the Semantic Web deployed, 
>crawled, and indexed. It also directs some attention towards 
>the important problem of characterising broad classes of RDF 
>document, and the constraints (loose/prose or machine-readable, 
>RDF-level or XML-level, etc) associated with them.
>
>
>
>That's about it. To recap: please drop the DESCRIBE keyword and 
>replace the examples with ones that use rdfs:seeAlso. 
>
>cheers,
>
>Dan
>
>ps. I numbered my paragraphs to make my thoughts look ordered; don't  
>suppose I fooled anyone.
>
>pps. http://www.w3.org/1999/11/02-RDFServices/ was an attempt in 
>similar vein; now obsoleted by Annotea and SPARQL. I think there 
>almost as many usecases for 'describe' functionality as there 
>are for Web pages...
>
>
>
>
> 
>  
>

-- 

Bob MacGregor
Chief Scientist

	
	Siderean Software Inc
390 North Sepulveda Blvd., Suite 2070
<http://maps.yahoo.com/py/maps.py?Pyt=Tmap&addr=5155+Rosecrans+Ave&csz=Hawthorne%2C+Ca+90250&country=us> 
El Segundo, CA 90245
bmacgregor@siderean.com <mailto:bmacgregor@siderean.com> 	
tel: 	+1-310 647-4266
fax: 	+1-310-647-3470
Received on Monday, 17 January 2005 19:07:11 UTC