- From: Waggy <waggy@yahoo.com>
- Date: Mon, 29 Dec 2003 18:19:55 -0800 (PST)
- To: www-rdf-interest@w3.org
- Cc: Chris Bizer <chris@bizer.de>
I have developed and worked with small to medium-sized triple-based databases since before RDF became a recommendation, and learned a number of practical details I have been putting into a publishable format for the last few weeks, but more on that later when I have a prototype ready. What I found is, if your goal is a database of accurate and useful statements, it is critical to record two additional items with each triple, and useful to have two others. The critical items to record are first, the datatype (finally now part of RDF) or language tag for the object of the statement. For the most part these two items can be considered mutually exclusive so one item covers both. The second critical item to record about every statement is the source of the statement, a URI pointing to where this statement came from. Originally, I even called this item the, "context," of the statement, as you do, but when standardizing the model decided the formally defined Dublin Core concept of, "source," was appropriate. When the statement is original work not derived from some other resource, the source item identifies the author (also by URI). Although I originally assumed a statement ID would be necessary, it turns out it is not, though it can be convenient to assign one in some limited circumstances. Specifically, if you need to formally track the creation, editing, and deletion of statements, say in co-authoring, author/editor, or draft/comment/revise/approve environments, statement IDs can be helpful. Otherwise, just keep in mind each triple is essentially its own identifier. Although it can be said many times and in many ways, { person:you ; holiday:Xmas ; funlevel:merry } is a unique sentiment. Yes, it may be useful to specify all who made this statement, and when and under what conditions this was asserted, but the whole purpose of RDF is to unambiguously encode assertions. (My apologies for the last sentence, I just copied it out of my recent notes; it uses XML namespaces for URI abbreviation.) And, the dc:source recorded for the triple may already point to a resource containing the information you would otherwise cross-reference to the statement identifier. For myself, I am currently prototyping draft/comment/revise/approve system for RDF triples and will add a statement identifier to the model as soon as it becomes necessary for the prototype to work well. So far it doesn't need it, but I have just started developing the editing functions. The other two items I have found useful to record for each triple, primarily for administrative purposes, are the dc:creator of the triple itself, and the triple's date and time of creation (dcterms:created). At times it seemed a good idea to include other information, such as a primitive ordinal item (RDF sorting is not fun at the triple level) and for other details, but ultimately these four additions have proven their value. Oh, I also found it very, very helpful to use the same format for all resources recorded, including these four additional items and if used, the statement identifier. (dcterms:created is a literal in dcterms:W3CDTF date/time format). -David E. Wagner II Chris Bizer asked for feedbacke about: ... > we did some brainstorming about trust, context and the justification of > query results and ended up with: > - an extended RDF data model based on quintuples (a triple plus two > additional elements: context and statement ID). > - a trust-oriented query language for this data model > - the concept of justification trees for tracking data provenance and > data > lineage. ... __________________________________ Do you Yahoo!? Free Pop-Up Blocker - Get it now http://companion.yahoo.com/
Received on Tuesday, 30 December 2003 09:43:48 UTC