Re: datatypes message - draft 2 from Dan Connolly on 2002-06-28 (w3c-rdfcore-wg@w3.org from June 2002)

From: Dan Connolly <connolly@w3.org>
Date: 28 Jun 2002 01:09:25 -0500
To: pat hayes <phayes@ai.uwf.edu>
Cc: w3c-rdfcore-wg@w3.org
Message-Id: <1025244566.22220.169.camel@dirk>

On Fri, 2002-06-28 at 00:36, pat hayes wrote:
> >On Thu, 2002-06-27 at 01:40, Patrick Stickler wrote:
> >>
> >>
> >>  On 2002-06-26 20:38, "ext Brian McBride" <bwm@hplb.hpl.hp.com> wrote:
> >>
> >>
> >>  In the introduction, it would be good, I think, to add a comment
> >>  that the tidy/untidy issue has little to do with implementational
> >>  efficiency of triples stores,
> >
> >not true; see comments from Sergey a long time ago about
> >the cost of untidy literals in RDF/database systems.
> >
> >Sorry I don't have time to find it.
> 
> I recall. But Sergey was being alarmist about a worst-possible case 
> where every triple contained a literal

ok, density of literals is perhaps not that high, though
they're quite common, but...

> and every literal might or 
> might not be the same as any other.

... that's the definition of untidy literals, no?

You can't tell whether two literals denote the same thing,
so you have to keep them separate until you know more.

> That will never happen.


> >The swap/cwm implementation would suffer a significant
> >efficiency hit if literals weren't tidy.
> 
> Im not convinced of this. Sure, the code would get a bit more 
> complicated, but I bet that any efficiency cost could be kept 
> marginal, way below a linear factor in practice. If the graphs are 
> encoded in the extended Ntriples convention then there should be no 
> extra cost at all: literal nodes act just like bnodes, uniquely 
> specified by their nodeIDs.

???

That looks like tidy literals, to me; i.e. if literals
work like bnodes, then just as

	:Mary :age _:x.
	:movie :title _x.

entails

	:Mary :age _:y.
	:movie :title _y.

we would have

	:Mary :age "10".
	:movie :title "10".

entials

	:Mary :age _:y.
	:movie :title _y.

If literals aren't tidy, then we have to keep the two
occurences of "10" from matching each other.

We end up with nothing in our language that work
like integer numerals:
	x=y => s(x) = s(y)
and
	0 <> s(x)
i.e. a (countably) infinte supply if expressions where
     same expression => same denotation,
different expression => different denotation


> If they are encoded in RDF/XML then the 
> only cases which arise will be those that we introduce deliberately 
> to handle some XML idioms, which can be recognized during 
> round-tripping one triple at a time by looking at the property 
> urirefs. That is just a few extra operations per triple, at worst.

I don't follow that part.

-- 
Dan Connolly, W3C http://www.w3.org/People/Connolly/

Received on Friday, 28 June 2002 02:08:55 UTC