Re: Managing Co-reference (Was: A Semantic Elephant?)

2008/5/15 Renato golin <renato@ebi.ac.uk>:
>
> Kendall Grant Clark wrote:
>>
>> You don't have to  do it at query time. Owlgres  does owl:sameAs
>> processing at
>> load time and so the *query time*  cost is negligible. The usual caveats
>> about
>> tradeoffs and use cases apply, of course.
>
> Kendal,
>
> You're assuming you have ALL triplets in your store. I think the discussion
> is broader and goes to all web. As Jim said: "We need to go beyond just
> triple stores and get some fast inferencing at Web scales."

Latency on the web prohibits "fast inferencing" in any sense of the
term, literally. Even if everyone published in forms that were
accessible via SPARQL endpoints, resolving non-trivial queries would
more than likely require multiple hits to each sparql endpoint as you
move further into the inference resolution for a given query. Above
that you also have the fact that RDF forms an open world, and hence
consistent inferencing at Web scales is a myth. Don't take that the
wrong way and think that nothing can be done, just don't expect the
earth if you are only going to get a small city.

> I was saying that months ago in this list but no one seemed to care too
> much... We need an index that takes into account other stores, pretty much
> as we have today for routing algorithms.

Distributed SPARQL is already practical within Quad Stores using Named
Graphs within stores if you accept that the Named Graph can be
resolved to an actual non-database entity. Currently the names of
graphs have been assumed to be arbitrary non-meaningful URI's but what
if they were not and they could actually be utilised. Kind of like the
jump from the URI literal structure to resolvable URI's within RDF. If
they were resolvable in their own right than you wouldn't need an
index, which inevitably would cause more hassles to get it integrated
and useful for everyone.

> All the technology we have today for storing RDF assumes all data is in the
> same database or at least in the same engine, so you can rely on local fast
> indexes, but when you start looking on remote webpages (personal websites
> included) it's obvious that you can't control what's in there nor how to
> access it.

You shouldn't either, as you aren't an authoritative source for that
data. If you want to mirror the information and perform trivial
cleansing procedures then it may be suitable, but you are changing the
data so it is always dangerous.

> Instead, if we had a way to say what's the probability of the information X
> about Y being in some particular direction (as in connections to other
> datasets that, in turn, connect to other datasets) and make those
> probabilities be updated whenever you find a link, we can then infer from
> where you'll go searching for that.

Look up the Distributed SPARQL literature for some more ideas on this
that have already been put forth. The whole area doesn't have to be
redeveloped.

> Of course, that would involve manual curation when you say you liked the
> result or not (network feedback) and smart algorithms to randomly choose to
> go in new directions when no result is acceptable (markov chains, monte
> carlo optimizations), but both were summarized by Michael, so I understand
> that's pretty much accepted to happen in the near future anyway.

If the Named Graph in the QuadStore didn't resolve to an applicable
source you could fallback to resolving the URI in question.

Received on Wednesday, 14 May 2008 22:00:32 UTC