Re: PRIMER: draft data model section

>Attached is a draft data model section.  (Thanks to Aaron for part 
>of the URI discussion).  As you will see, there are some questions 
>about how much should be covered in here, and how fast the material 
>should go (i.e., you may think it takes an awfully long time to 
>cover an awfully few number of ideas).  Anyway, I'd appreciate 
>comments on this sort of stuff (and anything else, of course).

Noble effort, for doing the first draft.

But... I feel that the intro is seriously misleading in the way it 
uses quotation. It seems to deliberately get things wrong in order to 
confuse the reader.

Many comments interspersed in the text below.

>Also, I apologize for no relative links to the diagrams, and other 
>messy details like that.  Having (relatively) recently changed to a 
>Windows environment (and Netscape 6.1), all my old familiar tools 
>are gone, and have yet to be replaced [any suggestions?], so I had 
>to use Word to produce the images (yuk!), and Netscape Composer 
>(which is still doing funny things).

You have my sympathy. I just recently managed to *avoid* using these 
things. N. Composer in particular is unusable, is my sad conclusion, 
after talking to several people who have tried to use it.

>
>--Frank
>
>
>--
>Frank Manola                   The MITRE Corporation
>202 Burlington Road, MS A345   Bedford, MA 01730-1420
>mailto:fmanola@mitre.org       voice: 781-271-8147   FAX: 781-271-875
>
>
>An Overview of the RDF Data Model
>
>Frank Manola, 14 October 2001
>
>1.  Introduction
>
>The World Wide Web provides people with an unprecedented capability 
>to share information across the globe. However, the sheer amount and 
>diversity of Web information now available makes finding and using 
>the right information more difficult. As the Web continues its 
>spectacular growth, the value of software that can find, filter, and 
>combine information in response to specified user requirements 
>greatly increases.  
>
>The basic difficulty in providing this software support is that the 
>Web was originally aimed at providing its resources to people, not 
>to other software, and so Web resources do not have descriptions of 
>their meanings or capabilities that software can understand.  For 
>example, the meaning of a Web page is determined by human 
>understanding of the screen content when the page is displayed in a 
>browser. This meaning is inaccessible to a piece of software.  As a 
>result, software such as search engines must rely on such techniques 
>as simple text matching, rather than being able to process Web 
>resources based on an understanding of their true relationships to a 
>user's intentions and needs.
>
>The Semantic Web is going to change all that.  A vision of Tim 
>Berners-Lee, the creator of the World Wide Web, the Semantic Web 
>will enhance the Web's inter-linked information and service 
>resources with software-interpretable descriptions of the resources' 
>meanings, capabilities, and inter-relationships.  These descriptions 
>will allow tools such as agents, search engines, or service brokers 
>to more automatically and reliably find, filter, and combine 
>appropriate resources in response to user requirements.

That seems to me to be rather a limited range of uses. It suggests 
that the primary goal is to make better web-page search engines, 
whereas I see the SW as creating the infrastructure for an entirely 
new class of agent-oriented activities.

>
>The W3C's Resource Description Framework (RDF) is, as its name 
>suggests, a framework (or approach) for describing Web resources.

I wonder, could you (or someone) actually SAY what is meant by a 'Web 
resource' ?

>The approach is based on some very simple ideas, but when those 
>ideas are taken together, and suitably generalized (as they are in 
>RDF), they provide a means for describing practically anything,

That is strictly true, but it is too easily read as meaning 'saying 
practically anything', which is emphatically not true.

>in a form that can be processed by software.  The use of RDF (and 
>richer approaches based on it) provides the basic technology for 
>providing the software-interpretable descriptions required to 
>support the Semantic Web.

Maybe this is being too sensitive, but I would really like to avoid 
the use of terms like 'basic' and 'based on'. If read strictly (as 
some readers will) these are false claims that will be red rags to 
some bulls. If intended politically, let us be explicit about the 
political nature of the claims by using phrases like 'W3C standard'.

>At the same time, the use of RDF does not necessarily involve the 
>use of esoteric AI technologies, as the term "semantic" might 
>suggest.

Oh, PLEASE. What a slew of inappropriate (and now archaic) academic 
prejudices lurk in that word 'esoteric'. Why not just come out and 
say "Don't be frightened, children".  Sorry, but you can hardly 
expect an ex-president of AAAI to let that go by without a murmur.

If you want to be really persnickety, RDF inference *is* an 'AI 
technique', if that silly phrase has any meaning at all, just a very 
simple one. So for that matter are optimizing compilers, temporal 
databases, logic programming, web search engines and a host of other 
now-industrial-standard techniques.

>  RDF also provides essential support for more straightforward

than what?

>applications, such as providing simple information about Web content 
>(provenance, content ratings), defining privacy policies, specifying 
>site maps, or supporting description-based Web service brokering.

Well, the only serious work I know of in that latter direction is the 
DAML-S effort, which is pushing the expressive boundaries of DAML+OIL 
and couldn't possibly be done in RDF.

>
>2.  The general idea
>
>The initial impetus for RDF was to provide a simple way to describe 
>the characteristics of (facts about) Web resources, e.g., Web pages. 
>For example, imagine that we want to record the fact that someone 
>named John Smith created a particular Web page. A straightforward 
>way to state this fact in English would be in the form of a simple 
>statement, e.g.:
>
>"the creator of [the particular Web page we're talking about] is 
>"John Smith" "

Er.....no. The NAME of the bloke is "John Smith"'; the man himself is 
John Smith (no quotes). What you are saying here is that a string of 
11 English letters created a web page, which seems unlikely. (And 
this really is what that means *in English*.)
You could say
"the creator of [the particular Web page we're talking about] is 
called: "John Smith" "
but that means exactly the same as
"the creator of [the particular Web page we're talking about] is John Smith"

>
>We've underlined parts of this statement to illustrate that, in 
>order to describe the characteristics of something, we need ways to 
>identify a number of things:
>
>We need a way to identify the thing we want to describe (the Web 
>page, in this case)
>We need a way to identify a specific characteristic (the creator) of 
>the thing that we want to describe
>We need a way to identify a specific value (the creator's name) we 
>want to assign to this characteristic, for the thing we want to 
>describe
>The Web page can be identified by its URL (Uniform Resource 
>Locator), say http://www.foobar.org/index.html, so we can now write 
>the statement as:
>
>"the creator of http://www.foobar.org/index.html is "John Smith" "

Erase quotes.

>In this statement, in addition to using a URL to identify the Web 
>page, we've used an English string, "creator", to identify the 
>characteristic we want to talk about, and another string, "John 
>Smith", to identify the value we want to assign to this 
>characteristic (the creator's name).

No, look, its obvious that you can't possibly have done that, since 
those two strings don't occur in your sentence in the same way. The 
first one is a part of the sentence, the second one is quoted in your 
sentence. You are *using* "creator" and *mentioning* (not using) 
"John Smith".

I think I see the pedagogic point you are trying to make, but it 
would be better if you just used English normally (no internal 
quotes) and then said that you were taking *fragments* of the English 
sentences and using them as strings in the pseudo-RDF examples (as 
you in fact do for the properties, but not the nodes.) But I think it 
would be better not to start off with bad RDF as the first thing that 
people see.

>We can describe other characteristics of this Web page by writing 
>additional statements, using the same techniques

Which? You used two techniques and didn't say which you meant. Now 
this reader is completely confused.

>  to identify the page, the characteristics, and their values.  For 
>example, to specify the date the page was created, and the overall 
>subject of the page, we could invent additional strings for the 
>characteristics, and their values, and write the additional 
>statements:
>
>"the date-created of http://www.foobar.org/index.html is "August 16, 1999" "
>"the subject of http://www.foobar.org/index.html is "my home page" "

No no, this is impossibly confusing, and in any case wrong. Why did 
you put those quotes into the English sentences? They make better 
sense without them both as English and as RDF.

Also might be worth spending a few words to elaborate on that notion 
'date-created'. If a reader needs this much hand-holding, they might 
find that a little odd; and it's not English, for sure.

>RDF basically defines a model for making such statements that 
>describe resources, using specific kinds of identifiers for the 
>resources we want to describe, the characteristics we want to use to 
>describe them, and the values we want to assign to those 
>characteristics for each resource.  So far, we've seen the use of 
>URLs to identify the resources we want to talk about, and strings 
>for the characteristics and values.  Later, we'll see that more 
>general usage of RDF involves using more general types of 
>identifiers.  
>
>The RDF model uses a particular terminology for talking about the 
>various parts of statements.  Specifically, the thing the statement 
>is about (the resource, the Web page in this example) is called the 
>subject.  The attribute or characteristic of the subject that the 
>statement specifies

This is getting rather awkward. We now have two long words 
'attribute' and 'characteristic' which both seem to me to be less 
clear in meaning than the real word, which is 'property' . Why not 
just call them properties up front, and say that RDF assumes that 
things have properties which have values, and give some examples, 
before getting into RDF itself? I don't think that people will find 
that hard to follow, and it will set them thinking along the right 
lines before they have to wrestle with how to torture English syntax 
into RDF-triple format.

>  (creator or date-created in this case) is called the predicate 
>(borrowing a term from mathematical logic)

What?? I thought it was called a "property". It's not a predicate in 
the math-logic sense: it is a binary relation.  Predicates don't have 
values. I would much prefer to stick to 'property', but if we must 
use logical terminology then call it a relation, not a predicate. 
(The term 'p' in the s-p-o terminology is from simple linguistics, 
not from mathematical logic.)

>, and the value of the predicate (or characteristic) is called the 
>object.  So, taking the statement
>
>"the creator of http://www.foobar.org/index.html is "John Smith" "
>
>in RDF terminology:
>
>the subject is the resource identified by http://www.foobar.org/index.html
>the predicate is the characteristic identified by "creator"
>the object is the person identified by "John Smith"

Which illustrates why those inner quotes in the English sentences 
were so confusing. I thought that you were setting the scene to say 
that the object was a string; but now the object has magically become 
a person once more. (HOW??)

>There are a number of different ways of thinking about individual 
>RDF statements, and the collections of statements that might be used 
>to describe a given resource.  For example, we can think of the 
>statements describing a given resource as:
>
>entries in a simple record or catalog listing describing the 
>resource in a data processing system.
>rows in a simple relational database.
>simple assertions in mathematical logic (this will prove to be a 
>particularly important way of thinking about these statements, which 
>we'll discuss in more detail later). 
>

I don't like this way of talking.  RDF is a formal notation, not some 
vague thing that can be 'thought about' in any number of ways.

Did you mean to say something like that one can find things in many 
formats that could be treated or read as being RDF triples? I agree 
that is worth pointing out, and makes it more accessible.

It would be better to just say 'logic' rather than 'mathematical 
logic' ( 'formal logic' if you want it to sound impressive.) There 
really isn't anything particularly mathematical about logic these 
days.

>RDF "officially"

So what would it mean to be "unofficial" in this context? I wouldn't 
know how to interpret this if I read it for the first time. It sounds 
like the writer is trying to avoid committing himself, and I would be 
wondering what he was trying to hide.

>models statements as nodes and arcs in a graph or diagram.  In this 
>model, a statement is represented by a node for the subject, a node 
>for the object, and a labeled arc between them for the predicate, as 
>in:
>
>
>
>Collections of statements are represented by collections of nodes 
>and arcs, the subjects and objects forming the nodes of the graph, 
>and the predicates (or characteristics) forming the arcs or links 
>between these nodes.

Don't say 'forming' the arcs (also not the nodes, actually, though 
its not so misleading there). They don't 'form' them, they label 
them. Several arcs may have the same label.

>  So the three statements we've given so far would be represented by 
>the following graph:
>
>
>
>The graph is technically a labeled directed graph, since the arcs 
>have labels, and are "directed" (point in a specific direction, from 
>subject to object). 
>
>Sometimes it is not convenient to draw graphs (and, in particular, 
>it is hard for machines to directly process them

I disagree. Machines process graphs very well indeed, cf. LISP.

>), so an alternative way of thinking about the statements, called 
>triples, can be used.  In the triples notation, each statement is 
>written as a simple triple of subject, predicate, and object, in 
>that order.  The triples representing the above three statements 
>would be written:
>
>http://www.foobar.org/index.html   "creator"           "John Smith"
>http://www.foobar.org/index.html   "date-created"  "August 16, 1999"
>http://www.foobar.org/index.html   "subject"          "my home page"

Why have you quoted the property names?

>Each triple corresponds to a single arc in the graph, complete with 
>the arc's beginning and ending nodes (the subject and object of the 
>statement).  Unlike the graph notation, the triple notation requires 
>that a node be separately identified for each statement it appears 
>in.  So, for example, http://www.foobar.org/index.html appears three 
>times (once in each triple) in the triple representation of the 
>graph, but only once in the actual graph. 
>
>In each of the statements we've considered so far, the object has 
>been something identified by a simple string value,

No, its been identified *as being the string*, in the way you have 
presented it. (Or maybe that is what you mean by being identified by 
a simple string value??)

>  like "John Smith".  In RDF, the objects in statements may be 
>identified by any type of literal value.  More importantly, the 
>objects in RDF statements may also be other Web resources, 
>identified by their URLs.

I'm now (genuinely) confused about what you mean by 'object'. Does 
that refer to part of a triple (which is also a label in a graph), or 
to what that syntactic object represents or identifies? Is the object 
Mr. Smith, or the string "John Smith"? (And, by the way, which of 
these is a Web resource?) If "John Smith" is a literal, what is a 
'literal value'? (Is John Smith the literal value of the string "John 
Smith" ? If so, how can a literal *value* - as opposed to a literal - 
be said to identify anything?)

>  This allows us to represent not only the characteristics (or 
>properties) of individual resources, but also relationships between 
>those resources and others.  So, for example, we could represent the 
>fact that the resource http://www.barbaz.org/myprojects.html has the 
>same creator as the resource http://www.foobar.org/index.html, by 
>the statement
>
>http://www.foobar.org/index.html  "sameCreatorAs" 
>http://www.barbaz.org/myprojects.html
>
>And we could represent the fact that 
>http://www.foobar.org/index.html links to http://www.w3.org/ by the 
>statement
>
>http://www.foobar.org/index.html    "linksTo"   http://www.w3.org/

Wow, that's a very exotic example. This seems to be about the uriref 
itself (??) rather than about what it refers to. (?? Or do you mean 
that the URI " http://www.foobar.org/index.html" refers to a web page 
which linksTo the webpage http://www.w3.org/ ? That does make sense, 
but its not the same kind of usage that occurs in most of the other 
examples, is it?

>
>Adding these two additional statements to the original ones would 
>give us the graph shown below:
>
>
>This graph illustrates another aspect of the RDF graph notation: 
>nodes that represent resources are shown as ellipses, while nodes 
>that represent literal values are shown as boxes.

Is that graphical notation considered definitive? If so, we have 
*four* notations for RDF !! Might be better to just say that its a 
handy convention for use in diagrams.

>
>3.  RDF identifiers
>
>So far, we've described some of the basic ideas behind RDF, showing 
>how simple statements composed of subjects, predicates, and objects 
>provide a way of describing Web resources.   However, to simplify 
>the initial presentation, we've oversimplified some of these ideas. 
>It's now time to develop these ideas more fully, since they provide 
>much of the potential power of RDF. 
>
>We first need to provide further detail about how RDF actually 
>identifies the subjects, predicates, and objects in statements.  So 
>far, the identifiers we've used are:
>
>URLs to identify Web pages (the only subjects we've talked about so far)
>either simple literal values or other URLs to identify objects.
>simple strings (like "creator") to identify predicates
>In principle we would like to be able to record information about 
>many things in addition to Web pages.  In particular, we'd like to 
>record information about lots of things that don't have URLs.  For 
>example, I don't have a URL, and yet my company needs to record all 
>sorts of things about me in order to pay my salary, keep track of 
>the work that I've been doing, and so on.  My doctor needs to record 
>other sorts of things about me in order to keep track of my medical 
>history:  tests that have been performed (and the results, who 
>performed them, and when), shots I've received, etc. 
>
>We've recorded information about lots of things that don't have URLs 
>using files (both manual and automated) for many years, and the way 
>we identify those things is by assigning them identifiers: values 
>that we uniquely associate with individual things.  The identifiers 
>we use to identify various kinds of things go by names like "Social 
>Security Number", "Part Number", "license number", "employee 
>number", "user-id", etc.  In some cases, these identifiers (such as 
>Social Security Numbers) are assigned by an official authority of 
>some kind.  In other cases, identifiers are generated by a private 
>organization or individual.  In some cases, these identifiers have a 
>national or international scope within which they are unique (a 
>Social Security Number has national scope), while in other cases 
>they may only be unique within a very limited scope (my employee 
>number is only unique among the numbers assigned by my specific 
>employer).  Nevertheless, these identifiers serve, if used properly, 
>to identify the things we want to talk about. 
>
>As we've seen, the Web already provides one form of identifier, the 
>Uniform Resource Locator (URL). A URL is a string that identifies a 
>Web resource by representing its primary access mechanism (e.g., its 
>network "location"). However, URLs are a subset of a more general 
>and powerful concept, the Uniform Resource Identifier (URI). A 
>Uniform Resource Identifier (URI) is defined [by RFC2396] as "a 
>compact string of characters for identifying an abstract or physical 
>resource".

Now, if I were reading this for the first time, I would be very 
puzzled, since that looks like a general definition of a name in just 
about *any* language. It sounds like the W3C has redefined the notion 
of a name, which is such an amazingly silly act of hubris that I 
would be either goggling in disbelief or rolling around with 
laughter.  But in fact RFC2396 doesn't say that this is the 
*definition* of a URI. In fact, the full definition of what counts as 
a URI can only be fully appreciated by reading all RFC2396, but the 
best brief one-sentence version I've  managed to find is in the 
earlier RFC1630, referred to in RFC2396, which reads: " A Universal 
Resource Identifier (URI) is a member of this universal set of names 
in registered name spaces and addresses referring to registered 
protocols or name spaces. "

>Unlike the URL, the URI is not limited to identifying things that 
>have network locations, or use other computer access mechanisms.  In 
>fact, a URI can be assigned to identify anything, including
>
>network-accessible things, such as an electronic document, an image, 
>a service (e.g., "today's weather report for Los Angeles"), or a 
>collection of other resources.
>things that are not network-accessible, such as human beings, 
>corporations, and bound books in a library.

(Wriggles uncomfortably.). Be careful. While it may be true that one 
can assign a URI to "anything" in the sense that there is no a priori 
boundary on what can be assigned a URI,  it is far less clear that it 
is reasonable to claim that every single thing in the universe can be 
somehow "assigned" a URI which is sufficient to *identify* it, in any 
reasonable sense of 'identify'. A reader could easily take this to be 
saying that the W3C has somehow found a magical way to access the One 
True Name of everything in the universe. (Ever read Ursula leGuin?) 
It would be much better, and more accurate, to just say that URIs 
constitute an infinite stock of names that can be used to refer to 
anything, and which have the valuable property that they can be 
uniquely traced to the origin of their use. Its not that the URI 
enables you to uniquely 'get at' the thing it refers to (URLs do, but 
not URIs; that is in general impossible without divine help) but what 
it does enable you to do is to get at the source of the naming usage, 
the first (and prime) use of that name, which is often (as in the 
Dublin Core example) the definitive source for the intended meaning 
of that name. This is only of utility if that source provides with 
you with some clue as to the meaning, but in many cases that will be 
true; even if the only clue that it does provide is that there is a 
unique referent. Of course the machines often won't be able to 
actually get 'access' to the actual resource named by the URI - Mr. 
Smith, for example, who might be asleep or travelling or otherwise 
incommunicado - but just having a unique handle for it is often 
enough to support useful M2M communication.

>Using URIs, we can generalize the resources we can talk about in 
>RDF.  Moreover, creation of URIs is decentralized; no one person or 
>organization controls who creates them or how they are used.  You 
>don't need any authority or permission to make a URI for something. 
>You can even create URIs for things you don't own (just as you can 
>use whatever name you like for things you don't own), or for 
>abstract concepts that don't physically exist.  The extensibility of 
>URIs allows the introduction of identifiers for any entity 
>imaginable.

That is a very nice overview. I think it would be better to do this 
first, though, and then the first examples of RDF could actually be 
legal RDF, and you could avoid all the use/mention confusion and bad 
quotes.

>Since the URI is such a general identification mechanism, capable of 
>identifying anything, it should not be surprising that RDF uses URIs 
>to identify all the resources it talks about.  Specifically, RDF 
>uses URIs to identify both the subjects and objects

Well, some of them.

>  in RDF statements (the objects in some statements, such as age 
>values or names, must

Well, can be

>  still be identified by literal values).  As we've just noted, using 
>URIs allows us to talk about things that aren't "network 
>accessible", and to define

Better say: state relationships between them. "Define" has a lot of 
baggage to both mathematically trained and programming-language-savvy 
readers.

>relationships between such things as well.   
>
>Now that we have URIs to identify resources, we can be more complete 
>and precise about recording information.  For example, instead of 
>identifying the creator of the Web page in our original example by 
>the string "John Smith", we can assign him a URI, say (using a URI 
>based on his employee number) http://www.foobar.org/staffid/85740. 
>The RDF statement stating this fact would then have the graph:
>
>
>or, in the triples notation:
>
>http://www.foobar.org/index.html   "creator" 
>http://www.foobar.org/staffid/85740"
>
>One advantage of using a URI to identify the creator of the page in 
>this example is that we can be precise in our identification.  That 
>is, the creator of the page isn't the string "John Smith", or any 
>one of the thousands of people having "John Smith" as their name, 
>but the particular John Smith associated with that URI.

Right, *when* the URI source is sufficient to identify such a 
particular JS; which in many (most?) cases it will be.  But the 
success of the entire scheme depends on this (ultimately social!) 
fact about URI usage. URIs don't perform semantic miracles, they just 
enable software to get along in much the way that people do.

>  Moreover, since we have a URI for the creator of the page, it is a 
>full-fledged resource, and we can record additional information 
>about him, such as his name, and age, as in the graph
>
>
>or the triples
>
>http://www.foobar.org/index.html      "creator" 
>http://www.foobar.org/staffid/85740"
>http://www.foobar.org/staffid/85740  "name       "John Smith"
>http://www.foobar.org/staffid/85740  "age"         "27"
>
>However, in these latest examples, we've still oversimplified 
>something.  RDF doesn't just use  URIs to identify subjects and 
>objects in RDF statements;  RDF also uses URIs to identify the 
>predicates in RDF statements.  That is, rather than actually using 
>simple strings such as "creator" or "name" to identify predicates, 
>RDF uses URIs to identify them. 
>
>Using URIs to identify predicates

I'm increasingly uncomfortable with this usage of 'identify'. It 
sounds like (the 'john smith' example strongly suggests this) that 
you are saying that simply using a URI *automatically* pins down the 
meaning of a property. That is a very large claim to make, and almost 
impossible to justify, so it would be better to avoid saying it if 
possible, or take care to disclaim this reading.

>is important for a number of reasons.  First, it allows us to 
>distinguish the predicates we use from the predicates someone else 
>may use that would otherwise be identified by the same text string. 
>For instance, in our example, foobar.org uses "name" to mean the 
>full name written out as a string (e.g., "John Smith"), but someone 
>else may intend it to mean only the surname (e.g., "Smith").  A 
>program encountering "name" on the Web wouldn't necessarily know 
>which was meant.  However, if foobar.org writes 
>http://www.foobar.org/terms/name for its "name" predicate, and the 
>other person writes http://geneology.org/terms/name for hers, we can 
>keep them straight.

Surely the point is that foobar.org can just use 'name' and so can 
the other person, and WE can still keep them straight. That is, we 
can keep them straight even if the other people are using the same 
name 'internally'; this mechanism doesn't require everyone to be 
well-behaved. Your prose sounds like it does.

Also, why does this point only apply to properties? Doesn't the same 
point apply to all uses of URIs?

>  Another reason why it is important to use URIs to identify 
>predicates is that it allows us to consider RDF predicates as 
>resources themselves.

Well, yes, in a sense, but again this could be very misleading. If I 
use 'name' on my website, and you refer to my use of 'name' and say 
something else about it that I didnt have in mind when I wrote my 
RDF,its not at all clear that you and I are talking about the SAME 
property. In fact its not clear that we even agree about whether 
there are one or two properties involved.  Properties aren't nice 
crisp things like people and websites, that have a clear identity 
criterion. So just saying that a property IS a resource is dicing 
with philosophical death. Also, it doesn't really say anything at all 
until you say what a 'resource' is; we've already established that 
this no longer means something that is web-accessible like a URL, so 
now we know even less about what its supposed to mean.

The real point about RDF is that the above is really NOT A PROBLEM. 
Even if what you had 'in mind' when you wrote your triples isn't 
exactly what I had in mind when I used your property name, as long as 
everything that you say in your graph is still true when I think of 
it in my way, I can still use your URI safely. The RDF graph doesn't 
have to actually *define* a property, whatever that really means; it 
just has to record an area of agreement that we (and our programs) 
can utilize. Which is exactly what the model theory says, by the way. 
Your RDF graph and mine might have many possible interpretations, and 
you and I might be thinking about different ones; but it doesn't 
matter, because any valid entailment from the merge of our graphs 
will be true in ALL the interpretations, so both of us will agree 
that they are true. Agreeing about real meanings requires telepathy; 
but agreeing about propositions can be done in languages like English 
and RDF, and that's all we really need to be able to do.

>As a result, we can record descriptive information about the 
>predicates (e.g., the English description of what foobar.org means 
>by "name"), simply by making additional RDF statements about them.

About them, or using them? Or both? You havn't really yet clearly 
pointed out that the same thing can be both a predicate and a subject,

>
>Finally, using URIs to identify predicates, as well as subjects and 
>objects, allows us to begin to develop and use a shared vocabulary 
>on the Web, reflecting (and creating) a shared understanding of the 
>concepts we talk about.  For example, now that we know to use URIs 
>(where we can) to identify all the parts of an RDF statement, we can 
>write the statement "the creator of http://www.foobar.org/index.html 
>is "John Smith" " as the triples
>
>http://www.foobar.org/index.html 
>http://purl.org/dc/elements/1.1/creator 
>http://www.foobar.org/staffid/85740
>http://www.foobar.org/staffid/85740 
>http://www.foobar.org/terms/name "John Smith"
>
>The URI http://purl.org/dc/elements/1.1/creator for the "creator" 
>predicate in the first triple is an unambiguous reference to the 
>"creator" attribute in the Dublin Core metadata attribute set, a 
>widely-used collection of attributes (predicates) for describing 
>information of all kinds.  The writer of this triple is effectively 
>saying that the relationship between the Web page (identified by 
>http://www.foobar.org/index.html ) and the creator of the page (a 
>distinct person, identified by http://www.foobar.org/staffid/85740 ) 
>is exactly the concept defined by 
>http://purl.org/dc/elements/1.1/creator.  Moreover, anyone else, or 
>any program, which understands 
>http://purl.org/dc/elements/1.1/creator will know exactly what is 
>meant by this relationship.
>@@ this could be developed a bit better;  we also might want to 
>mention, now that we're using URIs for all the parts of the 
>statements, that there is an "official" syntax for these triples, 
>called Ntriples. 
>
>4.  Complex Data

This section is very nice, but could be improved at the end.

>Things would be very simple if the only types of information we had 
>to record about things were obviously in the form of the simple RDF 
>statements we've illustrated so far.  However, most real-world data 
>involves structures that are more complicated than that, at least on 
>the surface.  For instance, in our original example, we recorded the 
>date the Web page was created as a simple string value.  However, 
>suppose we wanted to record the month, day, and year as separate 
>pieces of information?  Or, in the case of John Smith's personal 
>information, suppose we wanted to record his address.  We might 
>write the whole address out as a string, as in
>
>http://www.foobar.org/staffid/85740 
>http://www.foobar.org/terms/address   "1501 Grant Avenue, Bedford, 
>Massachusetts 01730"
>
>However, suppose we wanted to record the various pieces of 
>information about his address as separate street, city, state, and 
>Zip code values?  How do we do this using RDF? 
>
>In RDF, we represent such structured information by considering the 
>aggregate thing we want to talk about (like "address") as a separate 
>resource, and then make separate statements about that new resource. 
>So, in the RDF graph, in order to break up John Smith's address into 
>its component parts, we create a new node to represent the concept 
>"John Smith's address".  We then write RDF statements (create 
>additional arcs and nodes) with that node as the subject, to 
>represent the additional information, as in:
>
>
>
>We can represent the same information using the triples notation, 
>using a technique that relational databases have been using for many 
>years.  The idea is to assign a separate identifier to the aggregate 
>thing we want to talk about, and then record separate statements 
>about that aggregate thing, identifying the aggregate thing using 
>its newly-assigned identifier.  So, in order to break up John 
>Smith's address into its component parts, we assign a new URI to 
>identify "John Smith's address", and proceed to write RDF statements 
>with that URI as the subject (in effect, the new identifier 
>represents the node we added to the graph above to represent "John 
>Smith's address"), as in:
>
>http://www.foobar.org/staffid/85740 
>http://www.foobar.org/terms/address 
>http://www.foobar.org/addressid/85740
>http://www.foobar.org/addressid/85740 
>http://www.foobar.org/terms/street  "1501 Grant Avenue"
>http://www.foobar.org/addressid/85740 
>http://www.foobar.org/terms/city   "Bedford"
>http://www.foobar.org/addressid/85740 
>http://www.foobar.org/terms/state  "Massachusetts"
>http://www.foobar.org/addressid/85740 http://www.foobar.org/terms/Zip  "01730"
>
>Notice that, in the graph representation of this example, we did not 
>give the new node representing the concept of John Smith's address a 
>URI (although we could have), while in the triples representation we 
>had to assign an identifier to represent that concept.

I find this confusing, since there *is* a URI in the Ntriples.

Id suggest doing this example with a URI in both the triples *and the 
graph* (like fig 6 but with the center node labelled), then pointing 
out that this URI really has no utility in the graph, so leaving it 
off the graph in another example to illustrate blank nodes (fig 6) 
*then* point out that to represent this in Ntriples we need an 
'ntriples name' to identify the blank node, then do it in Ntriples 
using a nodeID instead of the http://www.foobar.org/addressid/85740. 
This would illustrate the whole point of anonnodes very nicely, 
illustrate how to interpret the node-ID syntax, and it would also 
motivate that syntax properly by showing that it was entirely an 
Ntriples problem and that nodeIDs are not in the graph at all, at all.

>  The reason is that in the graph representation the node can stand 
>for itself, without any further identification, while in the triples 
>representation the only way to identify the thing being identified 
>(the address in this case) is by giving it an explicit identifier.

Nicely put.

>
>@@this could lead naturally into a discussion of "anonymous" 
>resources and bNodes;  but do we want to discuss that in the Primer?

Yes. Its not hard, you've essentially done it already, see above.

>  What else do we want to consider in the "data model"?

What about mentioning containers? Once you have blank nodes, its 
pretty easy to motivate containers as a kind of general-purpose 
blank-node-with-properties-clustering device.

Also might be worth pointing out explicitly that its OK to have 
properties of properties, and that the graphs can get tangled, just 
to emphasise that people can feel free to add triples all over the 
place. Just show them one reasonably 'big' graph and mention some of 
the triples in it and what they are supposed to mean.

>
>@@we also might want to say something to the effect that RDF 
>statements should be considered assertions or facts (by whoever 
>wrote the RDF), and lead gently into the model theory that way.

I think that kind of is said already, no?

Pat



-- 
---------------------------------------------------------------------
IHMC					(850)434 8903   home
40 South Alcaniz St.			(850)202 4416   office
Pensacola,  FL 32501			(850)202 4440   fax
phayes@ai.uwf.edu 
http://www.coginst.uwf.edu/~phayes

Received on Monday, 15 October 2001 23:47:28 UTC