- From: Henry Story <henry.story@bblfish.net>
- Date: Mon, 10 Oct 2005 13:04:45 +0200
- To: Gareth Andrew <freega@freegarethandrew.org>, SWIG SWIG <semantic-web@w3.org>
On 10 Oct 2005, at 03:21, Gareth Andrew wrote: > Hi Henry, > > I just posted a rebuttal at > http://gingerhendrix.blogspot.com/2005/10/sparql-and-web-2.html > but in the interests of discussion I am reposting it here (Please > excuse > the third person). No problem. Thanks for the feedback. I'll add a link to your post and to this thread from my post. (It is easier to have a discussion on a mailing list, than on a blog) > [DISCLAIMER: I have no expertise in this area, I am just a lay > commentator] I think we are all explorers of this vast, new, unconquered land. :-) > Henry Story has just posted describing SPARQL as a query language for > Web 2.0. I think all his usage examples are good, but I think he's > missed the point slightly. Henry suggests that Web 2.0 business will > expose SPARQL endpoints over web services. This isn't going to happen > for several reasons > 1. Economics: There is a lot of value stored in the databases > Henry > mentions and most companies will not want competitors/users to > have unrestricted access to this data. Current web service > APIs > are designed so the expected value increase from user derived > software, is likely to exceed the loss of the value in the > data. I am not saying "open all your databases, and all information in all your databases". That would be crazy and often illegal (think of information held about customers for example). No, clearly the idea is to expose only a subset of the data that the enterprise has available. Enterprises should consider though that the most successful web businesses are all (in one way or another) search engines. And search engines have made it their business to open a huge amount of data to the world. > 2. Performance: Even if the data is completely open, and the > economics doesn't come into play, performance is a major > issue. > SPARQL queries are designed to be written by Semantic Web > engineers, much as SQL queries are designed to be written by > database engineers. As an example, consider the following > query > > PREFIX foaf: > PREFIX dc: > SELECT ?book > WHERE { ?book dc:creator ?who > ?who foaf:name "J. K. Rowling" > } > > This query (if the WHERE clause is evaluated top to bottom) is > highly inefficient, it first searches for all triples with > property dc:creator, then filters those such that the > dc:creator's foaf:name is "J. K. Rowling". A much more > efficient > query reverses the patterns in the WHERE clause. I believe > automated query rewriting is beyond state of the art at the > moment and will continue to be for the foreseeable future, That is a good point. But this is going to be true whenever you open a query interface to the web. I worked at AltaVista and we had to deal with exactly the same problem. If someone asks for "The cat of Danny Ayers" none of the search engines first go and find all pages in which the word "the" appears. There just are too many. Google even drops it. They would first look for pages with Danny and Ayers, and then look for which of those pages contain the word "cat". And search engines have absolutely VAST indexes to search through, which most companies don't have. So whatever query language you have, be it ever so simple as text search, you will have the above problem. Clearly there is a huge opportunity here for people to write SPARQL drivers that optimize the queries for the database they are hooked into, be it an RDF database or a plain normal relational database. Given that we now have a uniform interface to query databases the demand for such drivers will become very big, and so competition will work out the details. Just think of java servlets. You can get some nice and simple implementations, and then you can get much more sophisticated ones that reduce the number of threads by using pooling or the nio socket select call. By the way I don't recommend waiting for such drivers to be available to start working on this. Because it is the first who get there that will have the advantage. Just start a service like this as a low key beta site. I did this at AltaVista with the BabelFish machine translation service. We had no idea how successful it would be. So we just put a c cgi script out there that did some pretty awful things (like fetch web pages by forking lynx) -- though I did make sure it did not do the worst (I used sockets to inform the translators that there was a new file requiring translation instead of having it poll the file system, which could have been deadly with volume). When it was clear that this was of interest, we removed the lynx fork, and improved a few other things that we could fix immediately. Later I rewrote the cgi as a Java servlet which was a lot cleaner, less buggy, more scalable and of course unicode enabled. > especially when you consider the technical challenge of > throwing > inferencing into the mix, and the social challenge of open > access (eg. consider the query "SELECT ?s ?p ?o WHERE > { ?s ?p ?o}"). With respect to some queries requiring too much processing on the server side there are already numerous well established techniques for dealing with this on the web. You can simply return an error message, explaining that the server does not allow that type of query. You can cut up the results into little chunks. This is something to look into. The best way is to try it out. > that's not to say I don't see SPARQL becoming an integral part of Web > 2.0. I envisage that the next generation of back-end storage products, > will be based on triples, inferencing, and rules. SPARQL will be the > query language used to interface with the backend. At the web tier, > services will continue to be built on RESTful principles, however more > services will expose data as RDF, and publish schemas based on > RDFS, OWL > etc to enhance their meaning. At the client side aggregators, > smushers, > inferencers and provers will be fundamental building blocks, and > high-level special purpose APIs written to interface with them (eg. > BlogEd's RDF Javabeans classes). I think there is room for SPARQL > again > at this level, but it's likely to be too general and complex for your > average application programmer. Mhh. I think that the SPARQL interface just makes life a lot easier for client side application programmers, and perhaps a little more difficult for the server side ones. I don't think writing SPARQL queries is complex. But this is easy to test. We just need to start opening a few databases out there, and see how it works out and how developers take to it? There's nothing like empirical data. In fact I'd like to post a blog entry to point to a nice test database with a well laid out ontology so that developers can play with SPARQL. Any good ideas? Henry Story > On Mon, 2005-10-10 at 00:46 +0200, Henry Story wrote: > >> I just posted this: http://blogs.sun.com/roller/page/bblfish/20051009 [snip]
Received on Monday, 10 October 2005 11:04:58 UTC