Terminology for RDF Statement Sets

There's a technique in object-oriented design where you listen to all
the different words people are using and then turn those words into
class names.  In the RDF community, there seem to be a small number of
concepts for which an large number of terms are used.  I'm going to
try to list the ones I've heard, suggest what I think they mean, and
generally suggest this be on the RDF Issues List.

As background, there are also various other terms in use for "RDF
statement".  I've heard (and used) "statement", "assertion", "triple",
"3-tuple", "tuple", "sentence", and "property statement", at least.
But I think "RDF Statement" is okay for the formal documents and for
this message.

The area I'm concerned about is sets (in the mathematic, set theory
sense) of RDF statements.  Let me list some of the terms I've heard,
and see if I can organize them.

(set itself)
  statement set
  graph
  subgraph
  model 
  theory   (a set of theorems; rdf statements as simple theorems)
  infoset   (an RDF infoset, not an XML infoset)
  dataset
  corpus (a body of knowledge; term I coined some years back)
  world
  universe
  description
  semantic content  ("for is in the semantic content of document bar")
  knowledge base

(set storage)
  triple store
  repository
  database  (or set itself; ambiguous)

(set encoding)
  context (in n3)
  logical formula 
  document   ("does RDF document foo include RDF statement bar?")
  text     (like document)

(set source)
  attribution 
  provenance 

(The term "model" deserves a special disambiguation: "*The* RDF Model"
is the architecture, technique, or method of building things we use in
the RDF community.  "*An* RDF Model" is a representation of some
knowledge as a collection of RDF sentences (made according to *the* RDF
Model).  I would suggest "architecture" for the former sense, and the
latter sense is the subject of this message.)


 * "RDF" or "RDF Statement" Specializations

Some of these terms are well understood in some field, and we just
want a specialization.  We can prepend "RDF" to be make our usage
precise if the context does not do so.  Terms like "RDF statement set"
or "RDF infoset" or "RDF statement repository" work this way.

Many of these terms are defined in the appropriate sense only in some
fairly narrow field or context.  For example, you need just the right
setting to have the phrase "an RDF theory" understood to mean a set of
RDF sentences.

 * Confusing Information with its Identification

We sometimes conflate a set with the attributes of the set we use to
identify it, such as where it is stored and where we got it from.
Contrast terms for the information itself ("dataset"), the place it
exists ("repository"), the thing representing or encoding it
("document"), or the source of the information ("provenance").

Quite a bit could be said about this kind of confusion.  In common
usage, the term "database" is used for both a collection of data and
for a database management system (a running process, or the software).
Think of all the ways one might answer "What database did you use?" in
different situations.

This distinction is intentionally ignored in most programming systems.
In C, an "int" is a C object (an area of memory) which represents an
integer.  It is not actually an integer itself, of course.  In C this
is rarely a problem.

For us, though, it may be more pernitious.  In set theory, sets are
immutable.  But we programmers are used to Set.add(element) and
Set.remove(element) because we conflate mathematical sets with the
data structures which can be used to store information about set
membership.  To me, every term on the above list could be used in a
mutable sense, because I have the programmer's habit of naming data
structures (mutable or not) after the objects about which they store
data.  So what term can I use to unambiguously denote the
mathematically pure, immutable kind of set of RDF statements?

    -- sandro

Received on Monday, 9 April 2001 19:24:16 UTC