- From: Henry Story <henry.story@bblfish.net>
- Date: Thu, 22 Dec 2005 00:12:14 +0100
- To: Russell Duhon <fugu13@mac.com>
- Cc: Joshua Allen <joshuaa@microsoft.com>, tim.glover@bt.com, fmanola@acm.org, semantic-web@w3.org
Hi Russel, those are good points you make. 1. On complexity ---------------- The complexity you speak of can be solved. It need only be solved a few time by the tool makers. Those that do will make a lot of money. So I don't doubt a second that this will be done. In fact there is already a very good example available for the uniprot database [1]. I used to work at AltaVista [2] and we had similar problems. The way you deal with this is by slowing down requests that are too complex in order to give everyone a chance to have their queries answered. You can even make a money by putting your data online by giving service level guarantees to people who use your service a lot for a price. In fact the beauty of making your data online for free is that this is a way to find out who your customers are. This is what we are doing at Sun. You can have all of the Software we make available for free but if you want support you have to pay. At AltaVista we sold search to Yahoo, and to other major players. Why would they pay? Well service level guarantees are worth a lot. Of course you should be honest a clear about your policy. You would loose trust very quickly (and as I said trust is 99% of business), if you just decided to make people pay on a whim. 2. On opening up databases -------------------------- That is not so difficult to do. You don't of course open all your data. You open data that is of interest to people and that will at least include everything that is available online already. For example you can currently get all the train timetable information from the sncf by going to sncf.fr. Here you would just be making it available to machines to process. Same with Amazon. They know full well that you can get all their information from their web site. By making it easier for robots to access using their RESTful service they help people create businesses in markets they may never have the ability to reach. Data can be repurposed in many more ways than any individual organization will ever be able to think of or explore. Let others explore those, and cash in on the side effects. 3. On what ontology a database uses ----------------------------------- If you want to find out what ontology a database uses there are a few simple tricks. 1. put up an html page with links to the ontology on the query form [1] (duh!) Yes, put up a query form for engineers to play with, that will help. Engineers write the robots don't forget. 2. Well designed ontologies are self descriptive. Yes! This is what is soo cool about RDF. It should be said loud and clear. We have self descriptive metadata here! Everything in RDF is based on the core element of the WEB: the URI. If you design your ontologies with URLs furthermore you can place a description of your classes and relations at the end of the URLs in question. Try http://xmlns.com/foaf/0.1/ Person for example 3. You can make a sparqle query to find out what relations are used if you want SELECT ?r WHERE [] ?r ?y 4. perhaps it would be worth developing a simple ontology that every SPARQL end point should use that would point people to the right place for human and machine readable information Anyway none of the above are rocket science. Once a few people have those services up, it won't take long for people to learn everything they need about RDF. RDF is so much simpler than Java in many ways and there are 5 million java developers. Where there is money there is the will. Sorry to be so materialistic. We are speaking about emergence, "when the tires hit the road". And the road is a material thing. Henry Story [1] http://blogs.sun.com/roller/page/bblfish? entry=262_million_facts_to_play [2] http://bblfish.net/ On 21 Dec 2005, at 20:51, Russell Duhon wrote: > I see two major hurdles for a SPARQLing web, and both are takes on > complexity. > > First, reluctance by businesses (or organizations in general) to > expose > their information to such arbitrary queries. Sure, SQL's in wide > usage, > but not exposed to the general public! > > Part of this is due to difficulties in segregating public data from > private data -- something that will hopefully be easier in the > semantic > web. Are there any efforts to create simple declarative "filters" for > RDF data? > > But another part is due to understandable reluctance to allow > queries of > arbitrary complexity. Two things will distinctly ameliorate this: a > switch you can turn on to reject queries that are "too complex" (for > some value of complex that can be calculated for a given query), and a > way to set a maximum resource usage per query in SPARQL endpoint > configuration (along with rate restrictions on queries, of course). > > The second hurdle is also complexity related, but on the querying end. > How do I know the sorts of queries meaningful to perform on an > endpoint? > Sure, look at the ontology, but that is not always useful -- for > instance, some ontologies rely on the dublin core elements directly to > carry meanings that are really more specific, and that often can't > really be "seen" in an ontology (a good argument for subclassing > properties). It may well be useful to create prescriptions for > constructing queries from ontologies, and make a GUI allowing one to > apply those prescriptions, and APIs. > > That means there's one missing point to the SPARQL story -- a way to > denote both all ontologies the endpoint is using, and all ontologies > (likely one or two) that are relevant to the "point" of the > endpoint. I > include this last because many SPARQL endpoints will expose data for a > number of ontologies -- but the data most people querying the endpoint > will be interested in will be from one or two ontologies. > > Actually, I haven't read the SPARQL specs in enough depth to know > there > /isn't/ a solution to this somewhere. One obvious one presents > itself if > none exists currently, though -- have RDF triples in there with the > SPARQL endpoint URI as the subject, appropriate predicates for the > various ontologies exposed, and using the ontology URIs (same as for > imports) as objects. Then people can just query the SPARQL endpoint > for > the information. These triples would likely be "mixed in" with the > datastore triples based on information in a config file. > > Even with these dealt with, there are major hurdles to widespread > SPARQL > adoption. I am not entirely convinced a significantly simpler query > protocol (I hesitate to say language, as it would likely be too simple > to obviously need the moniker), allowing the small class of most > common > RDF queries, would not have a better chance of success. We shall see, > though. > > Russell > >> >> [snip]
Received on Wednesday, 21 December 2005 23:12:37 UTC