- From: Ron Alford <ronwalf@umd.edu>
- Date: Sat, 06 Aug 2005 10:53:31 -0400
- To: public-rdf-dawg-comments@w3.org
- Message-ID: <42F4CEEB.5090306@umd.edu>
I hate to do this after last call, but we only started discussing it on August 2. The XML Results format specifies that unbound variables are represented as <binding><unbound/><binding>. This is relatively concise as long as it's not repeated too much in each row. However, it ends up not being the case with a common (at least for me) use case for UNION. There are two cases that I'm running into that cause problems: UNION can be used to smash two queries into one request. Although this cuts down on some setup and tear down time, it basically doubles the number of <binding> elements that are returned. OPTIONAL in the wrong place can lead to a large fan out of results[1]. If one uses UNION instead, it reduces the possible explosion of rows[2]. Unfortunately, this spreads the results out many rows, and leaves the majority of the variables blank. There are at least two ways to trim the results back down with just syntax changes. The least intrusive change would be to just drop the <unbound> tag, and have it be implicit with <binding name=".."/>. More drastic is to just drop the entire <binding> tag when the variable is unbound, since the information can be retrieved from the head. To study the effects of these changes, I've picked up two large foaf data sources. One is a scutter dump from mattb[3], and the other is an old Julie dump from Christopher Schmidt. I've placed copies at [4]. I used a query[5] to pick out every person, and optionally their name, mailbox, homepage, mbox_sha1sum, nick, and seeAlso links. They may have many more properties off of them (knows, surname, aim addresses, depictions, made, etc). Using ARQ to generate the xml results, I made two result files[6]. The julie-dump xml results were 121 MB with 234K rows, and the scutter dump was 25 MB with 46K rows. Using some simple xslt[7], I was able to create sample result sets with the unbounds stripped[8] and the bindings collapsed[9]. The stripped files were about 68% the size of the original, while the collapsed files were 45 % of the originals size. The parse time followed similarly. I used a dumb script[10] that timed how long it took for expat's xmlwf to complete. The stripped files took about 61% of the time to parse as the complete files, and the collapsed files took about 42% of the time it took to parse the originals. The raw results are at [11]. There is a third possibility, much more remote, which would work independently of the previous suggested changes. That would be to have an operator like UNION, but allowed matching graph patterns to be presented on the same row. This would effectively fall somewhere between UNION and OPTIONAL, and facilitate querying for multiple arity predicates. Query cascading might have similar benefits. I wouldn't expect anything like this to be done in last call. However, I would like to see some discussion on the size of the results document. -Ron [1] See "Multiple Arity Predicates" on http://lists.w3.org/Archives/Public/public-rdf-dawg-comments/2005Jun/0039.html [2] Andy Seaborne's UNION comment on http://lists.w3.org/Archives/Public/public-rdf-dawg-comments/2005Jun/0044.html [3] Originally from http://rdfig.xmlhack.com/2003/11/11/2003-11-11.html#1068578689.828260 [4] http://www.mindswap.org/2005/sparql/unbound/scutter.rdf http://www.mindswap.org/2005/sparql/unbound/julie-dump.rdf [5] http://www.mindswap.org/2005/sparql/unbound/person.query [6] http://www.mindswap.org/2005/sparql/unbound/results/julie-dump.xml http://www.mindswap.org/2005/sparql/unbound/results/scutter.xml [7] http://www.mindswap.org/2005/sparql/unbound/results/strip.xslt http://www.mindswap.org/2005/sparql/unbound/results/collapse.xslt [8] http://www.mindswap.org/2005/sparql/unbound/results/stripped/julie-dump.xml http://www.mindswap.org/2005/sparql/unbound/results/stripped/scutter.xml [9] http://www.mindswap.org/2005/sparql/unbound/results/collapsed/julie-dump.xml http://www.mindswap.org/2005/sparql/unbound/results/collapsed/scutter.xml [10] http://www.mindswap.org/2005/sparql/unbound/results/timing.sh [11] http://www.mindswap.org/2005/sparql/unbound/results/results.txt
Received on Saturday, 6 August 2005 14:53:37 UTC