Re: How will the semantic web emerge: SPARQL end point and $$€€ from Henry Story on 2005-12-21 (semantic-web@w3.org from December 2005)

From: Henry Story <henry.story@bblfish.net>
Date: Thu, 22 Dec 2005 00:12:14 +0100
To: Russell Duhon <fugu13@mac.com>
Cc: Joshua Allen <joshuaa@microsoft.com>, tim.glover@bt.com, fmanola@acm.org, semantic-web@w3.org
Message-Id: <CE0B79B6-9D4E-4092-9012-DD0E9C351679@bblfish.net>
Hi Russel,

	those are good points you make.

1. On complexity
----------------

The complexity you speak of can be solved. It need only be solved a  
few time by the tool makers. Those that do will make a lot of money.  
So I don't doubt a second that this will be done. In fact there is  
already a very good example available for the uniprot database [1]. I  
used to work at AltaVista [2] and we had similar problems. The way  
you deal with this is by slowing down requests that are too complex  
in order to give everyone a chance to have their queries answered.  
You can even make a money by putting your data online by giving  
service level guarantees to people who use your service a lot for a  
price. In fact the beauty of making your data online for free is that  
this is a way to find out who your customers are. This is what we are  
doing at Sun. You can have all of the Software we make available for  
free but if you want support you have to pay. At AltaVista we sold  
search to Yahoo, and to other major players. Why would they pay? Well  
service level guarantees are worth a lot. Of course you should be  
honest a clear about your policy. You would loose trust very quickly  
(and as I said trust is 99% of business), if you just decided to make  
people pay on a whim.


2. On opening up databases
--------------------------

That is not so difficult to do. You don't of course open all your  
data. You open data that is of interest to people and that will at  
least include everything that is available online already. For  
example you can currently get all the train timetable information  
from the sncf by going to sncf.fr. Here you would just be making it  
available to machines to process. Same with Amazon. They know full  
well that you can get all their information from their web site. By  
making it easier for robots to access using their RESTful service  
they help people create businesses in markets they may never have the  
ability to reach. Data can be repurposed in many more ways than any  
individual organization will ever be able to think of or explore. Let  
others explore those, and cash in on the side effects.

3. On what ontology a database uses
-----------------------------------

If you want to find out what ontology a database uses there are a few  
simple tricks.

1. put up an html page with links to the ontology on the query form  
[1] (duh!) Yes, put up a query form for engineers to play with, that  
will help. Engineers write the robots don't forget.

2. Well designed ontologies are self descriptive. Yes! This is what  
is soo cool about RDF. It should be said loud and clear. We have self  
descriptive metadata here! Everything in RDF is based on the core  
element of the WEB: the URI. If you design your ontologies with URLs  
furthermore you can place a description of your classes and relations  
at the end of the URLs in question. Try http://xmlns.com/foaf/0.1/ 
Person for example

3. You can make a sparqle query to find out what relations are used  
if you want

     SELECT ?r
     WHERE [] ?r ?y

4. perhaps it would be worth developing a simple ontology that every  
SPARQL end point should use
    that would point people to the right place for human and machine  
readable information

Anyway none of the above are rocket science.

Once a few people have those services up, it won't take long for  
people to learn everything they need about RDF. RDF is so much  
simpler than Java in many ways and there are 5 million java  
developers. Where there is money there is the will. Sorry to be so  
materialistic. We are speaking about emergence,  "when the tires hit  
the road". And the road is a material thing.

Henry Story

[1] http://blogs.sun.com/roller/page/bblfish? 
entry=262_million_facts_to_play
[2] http://bblfish.net/

On 21 Dec 2005, at 20:51, Russell Duhon wrote:
> I see two major hurdles for a SPARQLing web, and both are takes on
> complexity.
>
> First, reluctance by businesses (or organizations in general) to  
> expose
> their information to such arbitrary queries. Sure, SQL's in wide  
> usage,
> but not exposed to the general public!
>
> Part of this is due to difficulties in segregating public data from
> private data -- something that will hopefully be easier in the  
> semantic
> web. Are there any efforts to create simple declarative "filters" for
> RDF data?
>
> But another part is due to understandable reluctance to allow  
> queries of
> arbitrary complexity. Two things will distinctly ameliorate this: a
> switch you can turn on to reject queries that are "too complex" (for
> some value of complex that can be calculated for a given query), and a
> way to set a maximum resource usage per query in SPARQL endpoint
> configuration (along with rate restrictions on queries, of course).
>
> The second hurdle is also complexity related, but on the querying end.
> How do I know the sorts of queries meaningful to perform on an  
> endpoint?
> Sure, look at the ontology, but that is not always useful -- for
> instance, some ontologies rely on the dublin core elements directly to
> carry meanings that are really more specific, and that often can't
> really be "seen" in an ontology (a good argument for subclassing
> properties). It may well be useful to create prescriptions for
> constructing queries from ontologies, and make a GUI allowing one to
> apply those prescriptions, and APIs.
>
> That means there's one missing point to the SPARQL story -- a way to
> denote both all ontologies the endpoint is using, and all ontologies
> (likely one or two) that are relevant to the "point" of the  
> endpoint. I
> include this last because many SPARQL endpoints will expose data for a
> number of ontologies -- but the data most people querying the endpoint
> will be interested in will be from one or two ontologies.
>
> Actually, I haven't read the SPARQL specs in enough depth to know  
> there
> /isn't/ a solution to this somewhere. One obvious one presents  
> itself if
> none exists currently, though -- have RDF triples in there with the
> SPARQL endpoint URI as the subject, appropriate predicates for the
> various ontologies exposed, and using the ontology URIs (same as for
> imports) as objects. Then people can just query the SPARQL endpoint  
> for
> the information. These triples would likely be "mixed in" with the
> datastore triples based on information in a config file.
>
> Even with these dealt with, there are major hurdles to widespread  
> SPARQL
> adoption. I am not entirely convinced a significantly simpler query
> protocol (I hesitate to say language, as it would likely be too simple
> to obviously need the moniker), allowing the small class of most  
> common
> RDF queries, would not have a better chance of success. We shall see,
> though.
>
> Russell
>
>>
>> [snip]
Received on Wednesday, 21 December 2005 23:12:37 UTC