RDF history

(subject rejigged, cc: shrunk to the SemWeb list only)

On Thu, Jul 1, 2010 at 8:53 AM, Pat Hayes <phayes@ihmc.us> wrote:
(harry writes...)
>> The issue is here that RDF started as a metadata format to "describe"
>> data I believe, and at this point with the Linked Data is now being
>> transformed into a generic language *for* data, period.

Not really, no. It was always a general representation language,
albeit created to fill a gap in the metadata universe.

See http://www.w3.org/TR/NOTE-MCF-XML-970624/#sec1.
We believe the following principles to be central to making progress
in this area:

There is no useful distinction between the representational needs of
data and metadata. The kinds of information that need to be
represented in metadata and data are very similar. Furthermore, every
item of information, without exception, is likely to be regarded by
some applications as ancillary and never to be displayed, and by
others as core content that needs to be formatted, printed, or
For interoperability and efficiency, schemata designed to serve
different applications should share as much as possible in the way of
data structures, syntax, and vocabulary.
The consequence of the first principle is that it is simply incorrect
to reserve any special representation for use just in "metadata"."""


"The distinction between "data" and "metadata" is not an absolute one;
it is a distinction created primarily by a particular application, and
many times the same resource will be interpreted in both ways

(both documents are worth reading in full; some very clear points,
some murkyness, but both help explain how we got here...)

> Is this really the case? I wasn't part of the very first RDF initiative, but
> ever since I've been involved with it, its purpose was pretty explicitly
> supposed to be for representing information - call it data if you like -
> rather than anything "meta". I've never read anything that suggest that RDF
> is supposed to be describing data. It is supposed to be describing the
> world.

Yes. The RDF design was always about describing the world. In earlier
years this was couched (confusingly) as metadata, since initial use
cases were descriptions typically relating to some piece of Web
content. But even then it was clear we were describing worldly
entities (people, contact info, places) as part of the task of
describing documents. The terminology in the oldest RDF docs, about
'statements' makes this much clear.

Different efforts that came together to create RDF used different
terminology and thinking. And even in one contribution, Guha's MCF
(RDF's most direct ancestor, technically) you see a couple of
perspectives presented, and then synthesised.


"The goal of MCF is to abstract and standardize the representation of
the structures we use for organizing information."

"MCF is based on predicate logic and is hence very close to both
object oriented and relational databases"
"a set of n-tuples(typically triples), each consisting of a slot and
an ordered list of n-1 object references and a layer. These tuples are
called assertions. Each assertion also has a true/false value
associated with it. Assertions are said to be true/false in the layer
associated with them. An assertion that is true/false in a layer is
also true/false in all the superior layers, unless one of those also
contains the assertion with a different true/false value."

These are drawn together though:

"The central concept is the use of rich, standard, structured,
extensible, compositable descriptions of information organization
structures as the core of information management systems."
"Structures such as file systems record very little information. They
typically allow a tree structure together with a small number of
attributes such as the author, creation date, modification date and
size. We cannot for example, tell the machine that a certain file is a
memo written in response to a certain email message. Furthermore, the
machine's ontology (the set of things it knows about) is severely
restricted. Even though much of the content is about people,
organizations, places, etc. the machine only knows about files,
folders, messages and such. MCF allows for semantically rich
descriptions of content and its relationship to objects such as
people, organizations and events."
"This is an extremely important point. In order to adequately describe
information organization structures, the system needs to be able to
express/reference more than just machine internal objects such as
files, folders and email messages. Entities such as people,
organizations and projects need to be first class citizens as well and
MCF provides for this."

This puts it as well as anyone has done since, imho. Information about
information is very often information about the associated real world
entities, and so a closed approach to metadata excludes us from
representing useful, interesting facts that will help us make better
use of that info.

Nothing much has changed on that front since 1996 (or 1886 for that
matter http://www.udcc.org/about.htm ...)



ps. as I'm about to send this,  I see Sandro dug up a related MCF
quote yesterday. I'll send this as the earlier white paper has a bunch
more detail that didn't make it thru to the Netscape submission...

Received on Friday, 2 July 2010 09:32:59 UTC