- From: <sw@semanticsoft.net>
- Date: Wed, 18 Jun 2008 21:03:38 +0300
- To: semantic-web@w3.org
Greetings!
I am sharing with this mailing list a treatment of blank nodes - both as
an exchange of experience and in order to get advice on whether this
treatment is completely compliant with the standards.
Blank nodes have a key importance on SemanticWeb. Without them, RDF
would be a simple decidable calculus, but with them, playing the roles
of existential quantifier variables, the logical expressiveness of
Semantic Web is raised to that of the first order logics of predicates,
which together with set theory served as a strict foundation, first for
mathematics and, finally, - for all sciences.
Without trying to shed some light on currenly somewhat "occult
knowledge" of blank nodes, it would be hard to live on Semantic Web.
Therefore I am writing as much detail as I can on blank nodes which I am
treating as "local names", and to be precise I have to share about a
larger context of "names" in general. All this makes the message long. I
would appreciate, if somebody got enough patience to read it and shares
other details on blank nodes, or comments on compliance of this
treatment with the standards.
If by "RDF semantics" you understand the formal semantics described in
terms of set theory in http://www.w3.org/TR/rdf-mt/ , then I cannot see
how the treatment below can be non-compliant with RDF semantics. But if
by "RDF semantics" you understand what is expressed by the *wording* of
that document, then my treatment is different in some details. The first
understanding of semantics reflects current state of the deployed
things, the second understanding might affect the future versions and it
is also important.
1. Names
After some "terminological reconciliation" of the foundational work on
semantics "On Sense and Reference" by Gottlob Frege, 1892, with modern
terminology, I treat a *name* as a triad (three things):
1. A *lexeme*, which is the set of lexical forms of a name
(http://en.wikipedia.org/wiki/Lexeme). *Lexeme* corresponds to *lexical
space* in RDF semantics, but in RDF, the term *lexical space* is applied
only to typed literals. Therefore, for the more general notion of *name*
I am using the term *lexeme*.
2. A *meaning* (or *sense*), which is a correlation of the name with
other names in the same language. According Frege, *meaning* or is
relevant only to a name built according the rules of a language.
3. An *interpretation*, which corresponds to *lexical-to-value-mapping*
from RDF semantics, also limited to typed literals. I am using the term
*interpretation* from model theory, because the expression
*lexical-to-value-mapping* is too long, and because I am unwilling to
alter the meaning of a term in standards (in this case - by making it
more general).
Several comments on this definition follow. By using terminology
specific to languages (like lexeme) I place names into the context of a
language, which might be regarded as limitation of generality. But I
would *not* call *name* something which lies outside any language. For
such "standalone" things we can use terms like "sign" or "label". Also,
I treat *sense* and *meaning* as two different things - I treat *sense*
as the name of a property of the *name*, and *meaning* - as the value of
such property. Such a distinction helps in lots of situations.
What's in a name? There might be lots of things, but, following Frege, I
am treating the three things above as essential properties for a thing
in order to call it "name". Generally, I refer as *attributes* to the
*essential properties* of a thing. But what is an "essential property"?
In a definition, we say something to be X if it has certain properties
P1, P2,... It is P1,P2,... which I call essential properties of X. In
short, I am saying a property to be essential for a concept, if it
participates in the definition of the concept.
How mandatory (optional) are these 3 things in a name? At first glance,
this question might look incorrect - all these 3 things are essential
properties of a name and, obviously, what is essential must be also
mandatory! But, the question starts making sense immediately as we find
to what "mandatory" ("optional") apply - they apply to *values* of
properties and not to properties (even though are used as modifiers
before the names of properties). It is like in a form which you have to
fill - if against a field you found written "optional", this does not
mean that the field is not there, but that you may leave it blank.
Therefore, even though the three properties above are essential for a
name (and thus mandatory for any thing which you call a name), some of
their values might be missing!
If we admit that all three properties of a name are optional, then
something which has none of the three can be useful only in an "algebra
of names" as a fixed element like a "unit" in an algebra with
multiplication, but it cannot be useful for an agent, which must have at
least one "handle" to "hold" the object which it works with. Therefore,
the I regard as mandatory property of a name its lexeme.
There is no reason to regard mandatory other properties of a name and
this allows for useful partial cases of the notion of name. Say, I call
*reference* a name which meaning is unessential in the discourse, and I
call *label* a name which is meaningless in a discourse. A label is a
partial case of a reference. Also, I treat a name without interpretation
as something *used as* a name. SemanticWeb is said to employ the
intensional approach (versus extensional approach) - this means that the
names used in descriptions, are not expected to be interpreted at the
time when the description is created (but they can obtain values and
become interpreted afterwards). The intensional aproach is possible due
to the property *interpretation* in a name being treated as optional, as
we did above.
2. Blank Nodes as names.
It is important to place blank nodes inside a well understood framework,
and the concept *name* as a triad above provides such a framework.
The name "blank node" comes from RDF graphs, where a node is not labeled
- that is, the place, where the label is expected, is blank. A RDF graph
can be treated as a proposition in a graphic language (planary or 3D).
But all natural languages, spoken or written, are linear, and the term
"node" sounds bizarre. Also, in a serialization of an RDF graph with 2
or more blank nodes, you have to use a label, which makes the word
"blank" also sound bizarre. If they tell you that the label you use is
"temporary", this also does not satisfy, because you might have chosen
so appropriate a name, that you might want to keep it forever, or at
least, longer than some of short-lived URIreferences. Thus, the
"temporary" character of the names of "blank names" sounds incorrect.
To find good terminology we must return to natural languages, where
semantics is well studied and things are named most appropriately. But
before doing this, we must answer the question - can blank names be
treated as names at all? The meaningful elements in a description are
*not* intended by an agent, but are used by the agent to *intend* other
things (see the notion “intentionality”
http://en.wikipedia.org/wiki/Intentionality). It is names (more
precisely, lexemes) that are used to intend objects. Since by graph
nodes an agent intends things generally different from graph nodes, this
compels to accept that blank nodes themselves *are* names. More
precisely, blank nodes in a graph are lexemes of names. Generally, a
node (blank or not blank) in a graph is both a graphical object (like a
oval or rectangle) and the textual label associated with it, and these
two things are just different lexical form of the same lexeme. A blank
node is one lexical form of the lexeme of a name.
According this terminology (and conceptuality behind it) a blank node is
a real name like any other names. Moreover, since a blank node can be
"renamed", its "lexical space" (which we called "lexeme") is potentially
infinite. How then can we describe in these terms the situation when in
a description we need only one blank node and we chose such a node to be
“anonymous”, that is "without a name"? This situation can be described
differently in a graphic language and in linear languages. In a graphic
language like RDF graphs langauge, a blank node as a graphical object is
a lexical form of the lexeme. In linear languages, like natural
languages, where you have to use a string, this must be a special
string. For example, English uses the word "something" for this purpose
- "something" is the counterpart of "blank node" in English. In any
cases, blank nodes treated as names have a non-void lexeme.
I regard as incorrect the expression “name of a blank node” - it sounds
to me like “name of a name” as well as the expression “renaming blank
nodes” which sounds to me like "renaming a name" (you can "rename" the
value, but not a name). What is actually meant in this context by
"renaming", is "using another lexical form of the same name".
Due to the arguments above, I prefer instead of "blank nodes", to say
*local names*, as they do in N3. Generally, the terminology should be
invariant of the dimensionality of language and the term *local name*
satisfy this requirement, but *blank node* does not satify it, because
"nodes" are possible in languages with dimensionality 2 or higher.
Since in RDF Semantics, they call *names* the URI references and
literals, the first benefit from treating blank nodes also as names, is
that we obtain that a triple is a sequence of three names - name of a
subject, name of a predicate and name of an object. This brings us
closer to natural languages linguistics which is concerned more with
expressiveness, than with certain names being on certain places in a
triple, in order to obtain "decidable" theories or to keep inside the
first-order logic of predicates, which are concerns of RDF standard
describing tools. By changing the *wording* of RDF standard document, we
can make it usable also by those who are more concerned in
expressiveness than in tools.
3. Local Names Scheme
Is there a scheme for local names? I treat the underscore “_” as the
name of this scheme, and this complies with the fact that we write such
names like this _:foo. The standards recommend to use this notation, but
do not say that we are proceeding according a URI scheme. Therefore,
probably, many people like me have to discover the simple fact that they
are dealing with a scheme. Why it would be so difficult to discover
this? Because, you first have to discover that blank nodes are also
*names* (and the standards exclude them from names for a simple reason
that they call names only the URIrefs and literals).
If you look in the text of a real ontology (not an example), then you
find that the overwhelming majority of names are local names starting
with underscores. The local names by far outnumber the global names, and
a scheme for them probably must be defined in a standard for names. It
is URI standard which defines global names and if it also brings to
attention the URI scheme same way as it does with URL and URN schemes,
so that URI treats both local names and global names, then the name of
such a more general standard could be a standard of "universal names".
There is a difficulty about “local name” - the word “local” is
understood by most people as applicable also to the terms defined within
an ontology. To help solve this problem, I would develop a piece of
terminology.
How do you call a term defined in an ontology? They say “term or word in
the ontology vocabulary” but English does not have one word for this. In
Romanian there is a word *vocabula* which correlates a word with a
vocabulary, and I borrowed it into my English SemanticWeb terminology.
(Romanian is one of the main natural languages which preserved Latin
lexicology pretty “intact” and the experience of Romanian with the word
"vocabula" is really useful).
A vocabula of an ontology (that is, a vocabula in the vocabulary of an
ontology) is a lexeme which has two forms – with “prepended” namespace
name when it is used outside the ontology and “plain”, without anything
prepended when it is used within the ontology. I call them *full name*
form and *citation* form, respectively (see again
http://en.wikipedia.org/wiki/Lexeme) to find this terminology) . I used
the expression “to prepend the namespace name” and not “to prefix the
namespace name”, because the full name is not a qualified name but a URI
reference. The treatment of vocabulas as lexemes explains why
SemanticWeb standards call "vocabulary" both the URI references and
local names of the ontologies. Notice that rdf, rdfs, owl, and other,
are *citation forms* of certain ontologies names.
As I said above, there is a difficulty with modifier “local” in “local
name”. Really, why a vocabula, which can be used in different ontologies
with different meanings (like Class in RDF and Class in OWL), is *not*
also “local”? That is, why does not “local name” apply to both blank
nodes names and to vocabulas? . The answer is that people treat names,
proper or common, as *lexemes* - not as lexical forms, and we should
also treat vocabulas as lexemes (with two forms discussed above). A
vocabula is a global name, but it can have a “local part” and only the
“local part” of a vocabula is governed by the ontology namespace . On
the other hand, a blank node name is a local name, no matter in which
lexical form it is used. This explanation must eliminate any confusion
between vocabulas and local names.
4. How can all this help?
Such treatment of blank nodes as exposed above can help in RDF graph
mergers.
A local name as _:foo looks like a *citation form* of a vocabula, and we
can look for a form which we could call *full name* form. To proceed
similarly with a vocabula lexeme, and prepend the ontology URI say like
this A#_:foo, where A is an ontology URI and _:foo is local part, is
obviously *incorrect*, because the “_” scheme name cannot be inside the
name of this scheme, and the local full name must be within the same
scheme as the citation form. Therefore, I propose the form _:U:foo –
where U is the URI of ontology A, as a full name lexical form for local
names.
I treat the _ scheme as non-limitative regarding the characters used
after (:), even less limitative that the syntax of “local parts”. This
allows to use URIs as lexical forms of local names. But in case of local
names, alongside full name form and citation form it makes sense to use
also *qualified full name form*, which has a syntax like this _:C:foo,
where C is the citation form of the ontology A. I assume that this form
will be most useful for local names.
Now, suppose, we have two ontologies A and B both using _:foo and I want
to merge them. The recommendation in standards is that for a merger of
two ontologies you rename the blank node in one of the ontologies. But I
would recommend this - *for a merger, use full local names lexical forms
in both ontologies*.
Why this is better than “renaming”? First, this is close to natural
language – if John said “there exist something X which.....”, I can say
“There is something which John named X and which”. Since this is the
usage of a natural language, I would *not* call this recommendation a
“trick”, but rather a natural recommendation. On the other hand,
renaming blank nodes for a merger recommended by RDF is obviously a
trick which solves a difficulty, but it goes against the nature (of
natural languages, which keep names).
There are also pragmatical and logical reasons why RDF, should better
*not* give the recommendation of “renaming blank nodes for a merger” to
chemists, mathematicians, linguists and other lay people. Really, in
real life they rarely use “unlimited quantifiers” like “there is
something, which”. A mathematician would use a “limited quantifier” and
say “there is a separable Hilbert space, which”, where 'separable
Hilbert space' is a term which has a very complex description. Now, how
would one mathematician take the recommendation to rename such names in
order to merge his ontology with another mathematician's ontology, if
both ontologies are masterpieces of mathematical and terminological
thought!? Instead of renaming blank nodes of an ontology to be merged
with another ontology, I would recommend to “use *full* local names
forms in a merger” (something they do daily when they reference the
source of a notion treated differently from how he/she does) .
The recommendation to rename blank nodes in a merger reflects a fact
from mathematical logics - you can rename variables in the scope of a
quantifier. But RDF governs another domain – the development (of
software or of ontologies) and it does not govern the logical science.
Therefore, I believe, RDF standard should state a *development specific*
corollary of the logical fact, and this corollary is *use full local
names in a merger*.
As I said above, I don't see how this treatment of blank nodes can
affect the set theoretic apparatus of "RDF Semantics" document. It only
affects the wording. On the other hand, this treatment looks to me as
leading to natural languages and as I showed above, and it helps in RDF
graphs mergers which is "integration of knowledge". I believe, this
treatment can be considered for the next versions of RDF standards.
Ioachim Drugus, Ph.D
ioachim.drugus@semanticsoft.net
Main Architect
SemanticSoft, Inc.
http://www.semanticsoft.net
Received on Wednesday, 18 June 2008 18:01:30 UTC