Re: Subjects as Literals from Ivan Mikhailov on 2010-07-07 (semantic-web@w3.org from July 2010)

From: Ivan Mikhailov <imikhailov@openlinksw.com>
Date: Wed, 07 Jul 2010 11:08:13 +0700
To: Antoine Zimmermann <antoine.zimmermann@deri.org>
Cc: public-lod@w3.org, Semantic Web <semantic-web@w3.org>
Message-ID: <1278475693.2840.96.camel@octo.iv.dev.null>
Antoine, all,

On Tue, 2010-07-06 at 20:54 +0100, Antoine Zimmermann wrote:

> Not only there are volunteers to implement tools which allow literals as 
> subjects, but there are already implementations out there.
> As an example, take Ivan Herman's OWL 2 RL reasoner [1]. You can put 
> triples with literals as subject, and it will reason with them.
> Here in DERI, we also have prototypes processing generalised triples.

It is absolutely not a problem to add a support in, e.g., Virtuoso as
well. 1 day for non-clustered version + 1 more day for cluster. But it
will naturally kill the scalability. Literals in subject position means
either outlining literals at all or switch from bitmap indexes to plain,
and it the same time it blocks important query rewriting.

We have seen triple store benchmark reports where a winner is up to 120
times faster than a loser and nevertheless all participants are in
widespread use. With these reports in mind, I can make two forecasts.

1. RDF is so young that even an epic fail like this feature would not
immediately throw an implementation away from the market.

2. It will throw it away later.

> Other reasoners are dealing with literals as subjects. RIF 
> implementations are also able to parse triples with literals as 
> subjects, as it is required by the spec.
...
> Some people mentioned scalability issues when we allow literals as 
> subject. It might be detrimental to the scalability of query engines 
> over big triple stores, but allowing literals as subjects is perfectly 
> scalable when it comes to inference materialisation (see recent work on 
> computing the inference closure of 100 billion triples [2]).
> 

Reasoners should get data from some place and put them to same or other
place. There are three sorts of inputs: triple stores with real data,
dumps of real data and synthetic benchmarks like LUBM. There are two
sorts of outputs: triple stores for real data and papers with nice
numbers. Without adequate triple store infrastructure at both ends (or
inside), any reasoner is simply unusable. [2] compares a reasoner that
can not answer queries after preparing the result with a store that
works longer but is capable of doing something for its multiple clients
immediately after completion of its work. If this is the best achieved
and the most complete result then volunteers are still required.

> Considering this amount of usage and use cases, which is certainly meant 
> to grow in the future, I believe that it is time to standardised 
> generalised RDF.

http://en.wikipedia.org/wiki/Second-system_effect

There were "generalised RDFs" before a simple RDF comes to scene. Minsky
--- frames and slots. Winston --- knowledge graphs that are only a bit
more complicated than RDF. The fate of these approaches is known: great
impact on science, little use in industry.

> A possible compromise would be to define RDF 2 as /generalised RDF + 
> named graphs + deprecate stuff/, and have a sublanguage (or profile) 
> RDF# which forbids literals in subject and predicate positions, as well 
> as bnodes in predicate position.

Breaking a small market in two incompatible parts is as bad as asking my
mom what she would like to use on her netbook, ALSA or OSS. She don't
know (me either) and she don't want to chose which half of sound
applications will crash.

> Honestly, it's just about putting a W3C stamp on things that some people 
> are already using and doing.

If people are living in love and happiness without a stamp on a paper,
it does not mean living in sin ;) Similarly, people may use literals as
subjects without asking others and without any stamp.

Best Regards,
Ivan Mikhailov
OpenLink Software
http://virtuoso.openlinksw.com

> [2] Jacopo Urbani, Spyros Kotoulas, Jason Maassen, Frank van Harmelen, 
> and Henri Bal. "OWL reasoning with WebPIE: calculating the closure of 
> 100 billion triples" in the proceedings of ESWC 2010.
Received on Wednesday, 7 July 2010 04:11:28 UTC