Local names URI scheme from sw@semanticsoft.net on 2008-06-18 (semantic-web@w3.org from June 2008)

From: <sw@semanticsoft.net>
Date: Wed, 18 Jun 2008 21:03:38 +0300
To: semantic-web@w3.org
Message-ID: <48594DFA.8000006@semanticsoft.net>
Greetings!

I am sharing with this mailing list a treatment of blank nodes - both as 
an exchange of experience and in order to get advice on whether this 
treatment is completely compliant with the standards.

Blank nodes have a key importance on SemanticWeb. Without them, RDF 
would be a simple decidable calculus, but with them, playing the roles 
of existential quantifier variables, the logical expressiveness of 
Semantic Web is raised to that of the first order logics of predicates, 
which together with set theory served as a strict foundation, first for 
mathematics and, finally, - for all sciences.

Without trying to shed some light on currenly somewhat "occult 
knowledge" of blank nodes, it would be hard to live on Semantic Web. 
Therefore I am writing as much detail as I can on blank nodes which I am 
treating as "local names", and to be precise I have to share about a 
larger context of "names" in general. All this makes the message long. I 
would appreciate, if somebody got enough patience to read it and shares 
other details on blank nodes, or comments on compliance of this 
treatment with the standards.

If by "RDF semantics" you understand the formal semantics described in 
terms of set theory in http://www.w3.org/TR/rdf-mt/ , then I cannot see 
how the treatment below can be non-compliant with RDF semantics. But if 
by "RDF semantics" you understand what is expressed by the *wording* of 
that document, then my treatment is different in some details. The first 
understanding of semantics reflects current state of the deployed 
things, the second understanding might affect the future versions and it 
is also important.

1. Names

After some "terminological reconciliation" of the foundational work on 
semantics "On Sense and Reference" by Gottlob Frege, 1892, with modern 
terminology, I treat a *name* as a triad (three things):

1. A *lexeme*, which is the set of lexical forms of a name 
(http://en.wikipedia.org/wiki/Lexeme). *Lexeme* corresponds to *lexical 
space* in RDF semantics, but in RDF, the term *lexical space* is applied 
only to typed literals. Therefore, for the more general notion of *name* 
I am using the term *lexeme*.
2. A *meaning* (or *sense*), which is a correlation of the name with 
other names in the same language. According Frege, *meaning* or is 
relevant only to a name built according the rules of a language.
3. An *interpretation*, which corresponds to *lexical-to-value-mapping* 
from RDF semantics, also limited to typed literals. I am using the term 
*interpretation* from model theory, because the expression 
*lexical-to-value-mapping* is too long, and because I am unwilling to 
alter the meaning of a term in standards (in this case - by making it 
more general).

Several comments on this definition follow. By using terminology 
specific to languages (like lexeme) I place names into the context of a 
language, which might be regarded as limitation of generality. But I 
would *not* call *name* something which lies outside any language. For 
such "standalone" things we can use terms like "sign" or "label". Also, 
I treat *sense* and *meaning* as two different things - I treat *sense* 
as the name of a property of the *name*, and *meaning* - as the value of 
such property. Such a distinction helps in lots of situations.

What's in a name? There might be lots of things, but, following Frege, I 
am treating the three things above as essential properties for a thing 
in order to call it "name". Generally, I refer as *attributes* to the 
*essential properties* of a thing. But what is an "essential property"? 
In a definition, we say something to be X if it has certain properties 
P1, P2,... It is P1,P2,... which I call essential properties of X. In 
short, I am saying a property to be essential for a concept, if it 
participates in the definition of the concept.

How mandatory (optional) are these 3 things in a name? At first glance, 
this question might look incorrect - all these 3 things are essential 
properties of a name and, obviously, what is essential must be also 
mandatory! But, the question starts making sense immediately as we find 
to what "mandatory" ("optional") apply - they apply to *values* of 
properties and not to properties (even though are used as modifiers 
before the names of properties). It is like in a form which you have to 
fill - if against a field you found written "optional", this does not 
mean that the field is not there, but that you may leave it blank. 
Therefore, even though the three properties above are essential for a 
name (and thus mandatory for any thing which you call a name), some of 
their values might be missing!

If we admit that all three properties of a name are optional, then 
something which has none of the three can be useful only in an "algebra 
of names" as a fixed element like a "unit" in an algebra with 
multiplication, but it cannot be useful for an agent, which must have at 
least one "handle" to "hold" the object which it works with. Therefore, 
the I regard as mandatory property of a name its lexeme.

There is no reason to regard mandatory other properties of a name and 
this allows for useful partial cases of the notion of name. Say, I call 
*reference* a name which meaning is unessential in the discourse, and I 
call *label* a name which is meaningless in a discourse. A label is a 
partial case of a reference. Also, I treat a name without interpretation 
as something *used as* a name. SemanticWeb is said to employ the 
intensional approach (versus extensional approach) - this means that the 
names used in descriptions, are not expected to be interpreted at the 
time when the description is created (but they can obtain values and 
become interpreted afterwards). The intensional aproach is possible due 
to the property *interpretation* in a name being treated as optional, as 
we did above.


2. Blank Nodes as names.

It is important to place blank nodes inside a well understood framework, 
and the concept *name* as a triad above provides such a framework.

The name "blank node" comes from RDF graphs, where a node is not labeled 
- that is, the place, where the label is expected, is blank. A RDF graph 
can be treated as a proposition in a graphic language (planary or 3D). 
But all natural languages, spoken or written, are linear, and the term 
"node" sounds bizarre. Also, in a serialization of an RDF graph with 2 
or more blank nodes, you have to use a label, which makes the word 
"blank" also sound bizarre. If they tell you that the label you use is 
"temporary", this also does not satisfy, because you might have chosen 
so appropriate a name, that you might want to keep it forever, or at 
least, longer than some of short-lived URIreferences. Thus, the 
"temporary" character of the names of "blank names" sounds incorrect.

To find good terminology we must return to natural languages, where 
semantics is well studied and things are named most appropriately. But 
before doing this, we must answer the question - can blank names be 
treated as names at all? The meaningful elements in a description are 
*not* intended by an agent, but are used by the agent to *intend* other 
things (see the notion “intentionality” 
http://en.wikipedia.org/wiki/Intentionality). It is names (more 
precisely, lexemes) that are used to intend objects. Since by graph 
nodes an agent intends things generally different from graph nodes, this 
compels to accept that blank nodes themselves *are* names. More 
precisely, blank nodes in a graph are lexemes of names. Generally, a 
node (blank or not blank) in a graph is both a graphical object (like a 
oval or rectangle) and the textual label associated with it, and these 
two things are just different lexical form of the same lexeme. A blank 
node is one lexical form of the lexeme of a name.

According this terminology (and conceptuality behind it) a blank node is 
a real name like any other names. Moreover, since a blank node can be 
"renamed", its "lexical space" (which we called "lexeme") is potentially 
infinite. How then can we describe in these terms the situation when in 
a description we need only one blank node and we chose such a node to be 
“anonymous”, that is "without a name"? This situation can be described 
differently in a graphic language and in linear languages. In a graphic 
language like RDF graphs langauge, a blank node as a graphical object is 
a lexical form of the lexeme. In linear languages, like natural 
languages, where you have to use a string, this must be a special 
string. For example, English uses the word "something" for this purpose 
- "something" is the counterpart of "blank node" in English. In any 
cases, blank nodes treated as names have a non-void lexeme.

I regard as incorrect the expression “name of a blank node” - it sounds 
to me like “name of a name” as well as the expression “renaming blank 
nodes” which sounds to me like "renaming a name" (you can "rename" the 
value, but not a name). What is actually meant in this context by 
"renaming", is "using another lexical form of the same name".

Due to the arguments above, I prefer instead of "blank nodes", to say 
*local names*, as they do in N3. Generally, the terminology should be 
invariant of the dimensionality of language and the term *local name* 
satisfy this requirement, but *blank node* does not satify it, because 
"nodes" are possible in languages with dimensionality 2 or higher.

Since in RDF Semantics, they call *names* the URI references and 
literals, the first benefit from treating blank nodes also as names, is 
that we obtain that a triple is a sequence of three names - name of a 
subject, name of a predicate and name of an object. This brings us 
closer to natural languages linguistics which is concerned more with 
expressiveness, than with certain names being on certain places in a 
triple, in order to obtain "decidable" theories or to keep inside the 
first-order logic of predicates, which are concerns of RDF standard 
describing tools. By changing the *wording* of RDF standard document, we 
can make it usable also by those who are more concerned in 
expressiveness than in tools.

3. Local Names Scheme

Is there a scheme for local names? I treat the underscore “_” as the 
name of this scheme, and this complies with the fact that we write such 
names like this _:foo. The standards recommend to use this notation, but 
do not say that we are proceeding according a URI scheme. Therefore, 
probably, many people like me have to discover the simple fact that they 
are dealing with a scheme. Why it would be so difficult to discover 
this? Because, you first have to discover that blank nodes are also 
*names* (and the standards exclude them from names for a simple reason 
that they call names only the URIrefs and literals).

If you look in the text of a real ontology (not an example), then you 
find that the overwhelming majority of names are local names starting 
with underscores. The local names by far outnumber the global names, and 
a scheme for them probably must be defined in a standard for names. It 
is URI standard which defines global names and if it also brings to 
attention the URI scheme same way as it does with URL and URN schemes, 
so that URI treats both local names and global names, then the name of 
such a more general standard could be a standard of "universal names".

There is a difficulty about “local name” - the word “local” is 
understood by most people as applicable also to the terms defined within 
an ontology. To help solve this problem, I would develop a piece of 
terminology.


How do you call a term defined in an ontology? They say “term or word in 
the ontology vocabulary” but English does not have one word for this. In 
Romanian there is a word *vocabula* which correlates a word with a 
vocabulary, and I borrowed it into my English SemanticWeb terminology. 
(Romanian is one of the main natural languages which preserved Latin 
lexicology pretty “intact” and the experience of Romanian with the word 
"vocabula" is really useful).

A vocabula of an ontology (that is, a vocabula in the vocabulary of an 
ontology) is a lexeme which has two forms – with “prepended” namespace 
name when it is used outside the ontology and “plain”, without anything 
prepended when it is used within the ontology. I call them *full name* 
form and *citation* form, respectively (see again 
http://en.wikipedia.org/wiki/Lexeme) to find this terminology) . I used 
the expression “to prepend the namespace name” and not “to prefix the 
namespace name”, because the full name is not a qualified name but a URI 
reference. The treatment of vocabulas as lexemes explains why 
SemanticWeb standards call "vocabulary" both the URI references and 
local names of the ontologies. Notice that rdf, rdfs, owl, and other, 
are *citation forms* of certain ontologies names.

As I said above, there is a difficulty with modifier “local” in “local 
name”. Really, why a vocabula, which can be used in different ontologies 
with different meanings (like Class in RDF and Class in OWL), is *not* 
also “local”? That is, why does not “local name” apply to both blank 
nodes names and to vocabulas? . The answer is that people treat names, 
proper or common, as *lexemes* - not as lexical forms, and we should 
also treat vocabulas as lexemes (with two forms discussed above). A 
vocabula is a global name, but it can have a “local part” and only the 
“local part” of a vocabula is governed by the ontology namespace . On 
the other hand, a blank node name is a local name, no matter in which 
lexical form it is used. This explanation must eliminate any confusion 
between vocabulas and local names.

4. How can all this help?

Such treatment of blank nodes as exposed above can help in RDF graph 
mergers.

A local name as _:foo looks like a *citation form* of a vocabula, and we 
can look for a form which we could call *full name* form. To proceed 
similarly with a vocabula lexeme, and prepend the ontology URI say like 
this A#_:foo, where A is an ontology URI and _:foo is local part, is 
obviously *incorrect*, because the “_” scheme name cannot be inside the 
name of this scheme, and the local full name must be within the same 
scheme as the citation form. Therefore, I propose the form _:U:foo – 
where U is the URI of ontology A, as a full name lexical form for local 
names.

I treat the _ scheme as non-limitative regarding the characters used 
after (:), even less limitative that the syntax of “local parts”. This 
allows to use URIs as lexical forms of local names. But in case of local 
names, alongside full name form and citation form it makes sense to use 
also *qualified full name form*, which has a syntax like this _:C:foo, 
where C is the citation form of the ontology A. I assume that this form 
will be most useful for local names.

Now, suppose, we have two ontologies A and B both using _:foo and I want 
to merge them. The recommendation in standards is that for a merger of 
two ontologies you rename the blank node in one of the ontologies. But I 
would recommend this - *for a merger, use full local names lexical forms 
in both ontologies*.

Why this is better than “renaming”? First, this is close to natural 
language – if John said “there exist something X which.....”, I can say 
“There is something which John named X and which”. Since this is the 
usage of a natural language, I would *not* call this recommendation a 
“trick”, but rather a natural recommendation. On the other hand, 
renaming blank nodes for a merger recommended by RDF is obviously a 
trick which solves a difficulty, but it goes against the nature (of 
natural languages, which keep names).

There are also pragmatical and logical reasons why RDF, should better 
*not* give the recommendation of “renaming blank nodes for a merger” to 
chemists, mathematicians, linguists and other lay people. Really, in 
real life they rarely use “unlimited quantifiers” like “there is 
something, which”. A mathematician would use a “limited quantifier” and 
say “there is a separable Hilbert space, which”, where 'separable 
Hilbert space' is a term which has a very complex description. Now, how 
would one mathematician take the recommendation to rename such names in 
order to merge his ontology with another mathematician's ontology, if 
both ontologies are masterpieces of mathematical and terminological 
thought!? Instead of renaming blank nodes of an ontology to be merged 
with another ontology, I would recommend to “use *full* local names 
forms in a merger” (something they do daily when they reference the 
source of a notion treated differently from how he/she does) .

The recommendation to rename blank nodes in a merger reflects a fact 
from mathematical logics - you can rename variables in the scope of a 
quantifier. But RDF governs another domain – the development (of 
software or of ontologies) and it does not govern the logical science. 
Therefore, I believe, RDF standard should state a *development specific* 
corollary of the logical fact, and this corollary is *use full local 
names in a merger*.

As I said above, I don't see how this treatment of blank nodes can 
affect the set theoretic apparatus of "RDF Semantics" document. It only 
affects the wording. On the other hand, this treatment looks to me as 
leading to natural languages and as I showed above, and it helps in RDF 
graphs mergers which is "integration of knowledge". I believe, this 
treatment can be considered for the next versions of RDF standards.

Ioachim Drugus, Ph.D
ioachim.drugus@semanticsoft.net
Main Architect
SemanticSoft, Inc.

http://www.semanticsoft.net
Received on Wednesday, 18 June 2008 18:01:30 UTC