Re: FW: Datatyping Summary from Sergey Melnik on 2002-01-30 (w3c-rdfcore-wg@w3.org from January 2002)

From: Sergey Melnik <melnik@db.stanford.edu>
Date: Wed, 30 Jan 2002 12:33:09 -0800
To: Brian McBride <bwm@hplb.hpl.hp.com>
CC: Patrick Stickler <patrick.stickler@nokia.com>, RDF Core <w3c-rdfcore-wg@w3.org>
Message-ID: <3C585885.58798599@db.stanford.edu>
Brian McBride wrote:
> 
> At 20:41 29/01/2002 +0200, Patrick Stickler wrote:
> [...]
> 
> >I definitely cannot live with an unecessary mulitplicity
> >of synonymous vocabularies just to accommodate RDF datatyping.
> 
> Noted
> 
> >
> >
> > > Issue B3:  the "duh" issue
> > > ==========================
> > >
> > > DanC is concerened that with TDL:
> > >
> > >  <mary> <haircolor> "red" .
> > >
> > > and a rule:
> > >
> > >  ?x <haircolor> "red" => ?x <rdf:type> <redhead> .
> > >
> > > one cannot conclude
> > >
> > >  <mary> <rdf:type> <rdfhead> .
> > >
> > > since one conclude that both "red"'s denote the same thing.
> > >
> > > Jeremy has responded:
> > >
> > > From:
> > >
> > >  <mary> <haircolor> "red" .
> > >  <haircolor> <rdfs:range> <xsd:string> .
> > >
> > > and the same rule one can draw the required inference.
> > >
> > > DanC:  Does that solve the problem?  Do you withdraw that objection?
> > >
> > > Jeremy/Patrick:  Do you accept that without the range constraint, DanC is
> > > correct?
> >
> >I do not accept that this is correct.
> 
> The question was:
> 
> DanC assserts that under TDL, given
> 
>    <mary> <haircolor> "red" .
> 
>   and a rule:
> 
>    ?x <haircolor> "red" => ?x <rdf:type> <redhead> .
> 
>   one cannot conclude
> 
>    <mary> <rdf:type> <rdfhead> .
> 
> Is he correct.  Patrick responds:
> 
>    I do not accept that this is correct
> 
> and from the text that follows, I believe that Patrick means "yes, this is
> correct".
> Patrick, please can you confirm.
> 
> >A literal can only have globally unique meaning if some
> >application context defines it as such, but RDF must exist in
> >an enviroment where knowledge is expressed independent of
> >application context, therefore, even if "red" always means
> >the same thing to Dan's application, it may not mean that
> >same thing, or consistently some other thing, to my application.
> >
> >I also do not concur that S takes such a view, that a literal
> >always has the same meaning. The example in section 5 of Sergey's
> >document bears this out, with most of the literals denoting
> >different values, based on the mappings asserted by the
> >predicates of the statements.
> >
> >I do not accept Dan's view that literals are global constants,
> >as being valid for arbitrary global interchange and syndication
> >of RDF expressed knowledge. It reflects a closed system view
> >of an RDF graph.
> >
> > > Issue B4 - TDL breaks existing code
> > > ===================================
> > >
> > > This is similar to B2.  I've changed the example slightly from Sergey's.
> > > Consider the graph:
> > >
> > >  _:f <rdf:type> <film> .
> > >  _:f <dc:Title> "10" .
> > >  <mary> <age> "10" .
> > >
> > > Given a query:
> > >
> > >  (?x <dc:Title> ?y) & (?z <age> ?y)
> > >
> > > existing applications will return:
> > >
> > >  ?x = _:f, ?y = "10", ?z = <mary>
> > >
> > > Under TDL, they would return null.
> > >
> > > Sergey:  Does this version of the issue illustrate your point?
> > >
> > > Jeremy/Patrick:  Do you accept this analysis; would the query return null
> > > under TDL?
> >
> >I've provided a response to this in
> >
> >http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2002Jan/0365.html
> 
> My reading of your response is that you agree that under TDL, the query
> would return null.  Is that correct.
> 
> > > Issue B5:  Storage Requirements
> > > ===============================
> > >
> > > TDL requires significantly more storage to implement.
> > >
> > > Jeremy/Patrick:  do you accept this statement?
> >
> >No. There are many ways to optimize the implementation of
> >a triples store, even if literal nodes are untidy. It is
> >an issue of the interpretation/model being based on untidy
> >graphs, not the implementation.
> 
> Noted.  Sergey, do you accept that strings can be shared in a TDL
> implementation, avoiding a significant memory bloat, and that this issue
> can be removed from the list of 'show stoppers'.

Of course, an efficient implementation would utilize some kind of string
sharing, both for tidy and untidy graphs. A major problem that I pointed
out in [1] still persists: each literal gets a separate identifier in
the database. Consequently, the number of object IDs used in the
database increases by a significant factor. The result is that the
indexes used for query processing grow substantially. Large index means
slow queries. Furthermore, an additional join condition needs to be
evaluated by the database engine (no matter how storage is implemented),
whenever literals are accessed. For example, if two literal property
values are compared, two extra joins are needed.

Moreover, as pointed out by Libby [2] and Andy [3], queries would need
to take schema information into account, which makes most queries
computationally prohibitive and very hard to specify (and, actually,
returning empty results if portions of schema information are missing).
From my perspective, the performace issue remains on the table as one of
the show stoppers for TDL (of course, it's a non-issue for toy data
sets, but may make million-triples data sets impractical).

Sergey


[1] http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2002Jan/0293.html
[2]
http://lists.w3.org/Archives/Public/www-rdf-comments/2002JanMar/0057.html
[3]
http://lists.w3.org/Archives/Public/www-rdf-comments/2002JanMar/0058.html

> 
> Brian

-- 
E-Mail:      melnik@db.stanford.edu (Sergey Melnik)
WWW:         http://www-db.stanford.edu/~melnik
Tel:         OFFICE: 1-650-725-4312 (USA)
Address:     Room 438, Gates, Stanford University, CA 94305, USA
Received on Wednesday, 30 January 2002 15:16:29 UTC