Re: output-xslt good enough for sort objective? from Yoshio Fukushige on 2005-04-15 (public-rdf-dawg@w3.org from April to June 2005)

From: Yoshio Fukushige <fukushige.yoshio@jp.panasonic.com>
Date: Fri, 15 Apr 2005 13:14:05 +0900
To: "Seaborne, Andy" <andy.seaborne@hp.com>, "Dan Connolly" <connolly@w3.org>
Cc: "RDF Data Access Working Group" <public-rdf-dawg@w3.org>
Message-ID: <000901c54171$900f9f60$e593b985@IRONGEAR>

Hi

Andy:
>> For the "give me the most recent 10 articles" use case, it's
>> perhaps somewhat inefficient in the server... the SPARQL
>> ending would spit out all the articles and the XSLT would sort them and
>> perhaps truncate them. But to the client (and the network)
>> it looks just the same.
>
> In the case of my service, the impact on the network is huge - all the 
> results are transferred, then sorted, then truncated.  It does not look 
> the same to the client.
> Some problems arise with placing the sorting as an transformation of the 
> XML results:
>
> 1/ SORT-LIMIT-CONSTRUCT makes sense for an RDF graph of top 10 most recent 
> articles.

I agree.

I don't think it's a good idea to once get all the 27,500,000 results and 
hand it to a XSLT server
to get the 20 most recently updated Web pages which contains "W3C",
for example.

> I suggest that the primary role of the protocol is to transport query 
> requests to a server and transfer the results back.  It is the role of the 
> query processor (and hence the QL) to generate and manipulate the results, 
> result sets included.  This is why I don't think DISTINCT, LIMIT and SORT 
> have a place in the protocol - they happen before serilization.

> So "OK" to have having an XSLT parameter but "no" to saying that meets the 
> sorting requirement.

+1

And I further propose (again) the option of only answering back the number 
of the matches.

Think about a case where querying to a SPARQL server is charged
in proportion to the number of the answers (matches).
Getting all the matches once should be avoided.

Patent search is one of such cases. In the system I use,
a user is charged even for retrieving the bibliographic information
(applicant's name, title, domain, etc) in proportion to the number of the 
match.

If SPARQL supports the  give-me-only-the-number-of-the-matches function,
a user can polish his/her query till he/she gets the moderate number of 
matches
before asking for the bibliographic information (and being charged).

It may be 11 hour or even 11:50, but I think to get only the number of the
matches is of definite need for such services.

Should I write it up as a use case?

Best,
Yoshio
fukushige.yoshio@jp.panasonic.com
fuku@w3.org

Received on Friday, 15 April 2005 04:11:29 UTC