Re: Datatyping Summary from Patrick Stickler on 2002-01-31 (w3c-rdfcore-wg@w3.org from January 2002)

From: Patrick Stickler <patrick.stickler@nokia.com>
Date: Thu, 31 Jan 2002 11:49:55 +0200
To: ext Sergey Melnik <melnik@db.stanford.edu>, Brian McBride <bwm@hplb.hpl.hp.com>
CC: RDF Core <w3c-rdfcore-wg@w3.org>
Message-ID: <B87EDFE3.CCEE%patrick.stickler@nokia.com>

On 2002-01-30 22:33, "ext Sergey Melnik" <melnik@db.stanford.edu> wrote:

> From my perspective, the performace issue remains on the table as one of
> the show stoppers for TDL (of course, it's a non-issue for toy data
> sets, but may make million-triples data sets impractical).

Let me offer only one possible implementation solution for TDL
that is highly efficient:

Transform the untidy TDL graph into a graph where every TDL pairing
is expressed by a 'tdl:' URV. For literals that have no determinable
type, use a consistent 'uuid:' URI for the datatype in the 'tdl:'
URV.

C.f. http://ietf.org/internet-drafts/draft-pstickler-tdl-00.txt

E.g.

   OldTree ex:age _:1 .
   _:1 rdf:value "1984" .
   _:1 rdf:type xsd:integer .
   FavoriteBook dc:title "1984" .
   dc:title rdfs:range xsd:string .

becomes

   OldTree ex:age <tdl:(xsd:integer)1984> .
   FavoriteBook dc:title <tdl:(xsd:string)1984> .

[please forgive qnames in tdl: examples, they should
 be complete URIs...]

For each input query, perform the same transformation.

Now you have a query graph that is highly compressed, completely
tidy for both lexical and URIref nodes, and provides value-based
matching reliably and efficiently.

Note that, because typed data literals are denoted now by URIs, all
equivalent TDLs will be merged to be tidy. That doesn't mean that
there will be a 1:1 correlation between tdl: URVs and values,
since lexical forms can still be non-canonical -- but such a
1:1 correlation could be achieved by, as part of the transformation,
converting all non-canonical lexical forms to canonical lexical
forms (presuming all datatypes are known/supported by the transformation
application). 

If you wish to also provide queries simply based on string
equality of literals, then simply examine the local, literal
portion of the 'tdl:' URVs during comparisons.

No bloat. No increased inefficiency. No problem.

As it has been said before, there are ways to achieve highly
efficient implementations based on TDL.

Cheers,

Patrick

--

Patrick Stickler              Phone: +358 50 483 9453
Senior Research Scientist     Fax:   +358 7180 35409
Nokia Research Center         Email: patrick.stickler@nokia.com

Received on Thursday, 31 January 2002 04:48:50 UTC