Re: Trust, Context, Justification and Quintuples

I have developed and worked with small to medium-sized triple-based
databases since before RDF became a recommendation, and learned a number
of practical details I have been putting into a publishable format for the
last few weeks, but more on that later when I have a prototype ready.

What I found is, if your goal is a database of accurate and useful
statements, it is critical to record two additional items with each
triple, and useful to have two others.  The critical items to record are
first, the datatype (finally now part of RDF) or language tag for the
object of the statement.  For the most part these two items can be
considered mutually exclusive so one item covers both.

The second critical item to record about every statement is the source of
the statement, a URI pointing to where this statement came from. 
Originally, I even called this item the, "context," of the statement, as
you do, but when standardizing the model decided the formally defined
Dublin Core concept of, "source," was appropriate.  When the statement is
original work not derived from some other resource, the source item
identifies the author (also by URI).

Although I originally assumed a statement ID would be necessary, it turns
out it is not, though it can be convenient to assign one in some limited
circumstances.  Specifically, if you need to formally track the creation,
editing, and deletion of statements, say in co-authoring, author/editor,
or draft/comment/revise/approve environments, statement IDs can be
helpful.  Otherwise, just keep in mind each triple is essentially its own
identifier.  Although it can be said many times and in many ways,
 { person:you ; holiday:Xmas ; funlevel:merry } 
is a unique sentiment.  Yes, it may be useful to specify all who made this
statement, and when and under what conditions this was asserted, but the
whole purpose of RDF is to unambiguously encode assertions.  (My apologies
for the last sentence, I just copied it out of my recent notes; it uses
XML namespaces for URI abbreviation.)  And, the dc:source recorded for the
triple may already point to a resource containing the information you
would otherwise cross-reference to the statement identifier.

For myself, I am currently prototyping draft/comment/revise/approve system
for RDF triples and will add a statement identifier to the model as soon
as it becomes necessary for the prototype to work well.  So far it doesn't
need it, but I have just started developing the editing functions.

The other two items I have found useful to record for each triple,
primarily for administrative purposes, are the dc:creator of the triple
itself, and the triple's date and time of creation (dcterms:created).  At
times it seemed a good idea to include other information, such as a
primitive ordinal item (RDF sorting is not fun at the triple level) and
for other details, but ultimately these four additions have proven their
value.

Oh, I also found it very, very helpful to use the same format for all
resources recorded, including these four additional items and if used, the
statement identifier.  (dcterms:created is a literal in dcterms:W3CDTF
date/time format).

-David E. Wagner II

Chris Bizer asked for feedbacke about:
...
> we did some brainstorming about trust, context and the justification of
> query results and ended up with:
> - an extended RDF data model based on quintuples (a triple plus two
> additional elements: context and statement ID).
> - a trust-oriented query language for this data model
> - the concept of justification trees for tracking data provenance and
> data
> lineage.
...


__________________________________
Do you Yahoo!?
Free Pop-Up Blocker - Get it now
http://companion.yahoo.com/

Received on Tuesday, 30 December 2003 09:43:48 UTC