- From: <sw@semanticsoft.net>
- Date: Wed, 18 Jun 2008 21:03:38 +0300
- To: semantic-web@w3.org
Greetings! I am sharing with this mailing list a treatment of blank nodes - both as an exchange of experience and in order to get advice on whether this treatment is completely compliant with the standards. Blank nodes have a key importance on SemanticWeb. Without them, RDF would be a simple decidable calculus, but with them, playing the roles of existential quantifier variables, the logical expressiveness of Semantic Web is raised to that of the first order logics of predicates, which together with set theory served as a strict foundation, first for mathematics and, finally, - for all sciences. Without trying to shed some light on currenly somewhat "occult knowledge" of blank nodes, it would be hard to live on Semantic Web. Therefore I am writing as much detail as I can on blank nodes which I am treating as "local names", and to be precise I have to share about a larger context of "names" in general. All this makes the message long. I would appreciate, if somebody got enough patience to read it and shares other details on blank nodes, or comments on compliance of this treatment with the standards. If by "RDF semantics" you understand the formal semantics described in terms of set theory in http://www.w3.org/TR/rdf-mt/ , then I cannot see how the treatment below can be non-compliant with RDF semantics. But if by "RDF semantics" you understand what is expressed by the *wording* of that document, then my treatment is different in some details. The first understanding of semantics reflects current state of the deployed things, the second understanding might affect the future versions and it is also important. 1. Names After some "terminological reconciliation" of the foundational work on semantics "On Sense and Reference" by Gottlob Frege, 1892, with modern terminology, I treat a *name* as a triad (three things): 1. A *lexeme*, which is the set of lexical forms of a name (http://en.wikipedia.org/wiki/Lexeme). *Lexeme* corresponds to *lexical space* in RDF semantics, but in RDF, the term *lexical space* is applied only to typed literals. Therefore, for the more general notion of *name* I am using the term *lexeme*. 2. A *meaning* (or *sense*), which is a correlation of the name with other names in the same language. According Frege, *meaning* or is relevant only to a name built according the rules of a language. 3. An *interpretation*, which corresponds to *lexical-to-value-mapping* from RDF semantics, also limited to typed literals. I am using the term *interpretation* from model theory, because the expression *lexical-to-value-mapping* is too long, and because I am unwilling to alter the meaning of a term in standards (in this case - by making it more general). Several comments on this definition follow. By using terminology specific to languages (like lexeme) I place names into the context of a language, which might be regarded as limitation of generality. But I would *not* call *name* something which lies outside any language. For such "standalone" things we can use terms like "sign" or "label". Also, I treat *sense* and *meaning* as two different things - I treat *sense* as the name of a property of the *name*, and *meaning* - as the value of such property. Such a distinction helps in lots of situations. What's in a name? There might be lots of things, but, following Frege, I am treating the three things above as essential properties for a thing in order to call it "name". Generally, I refer as *attributes* to the *essential properties* of a thing. But what is an "essential property"? In a definition, we say something to be X if it has certain properties P1, P2,... It is P1,P2,... which I call essential properties of X. In short, I am saying a property to be essential for a concept, if it participates in the definition of the concept. How mandatory (optional) are these 3 things in a name? At first glance, this question might look incorrect - all these 3 things are essential properties of a name and, obviously, what is essential must be also mandatory! But, the question starts making sense immediately as we find to what "mandatory" ("optional") apply - they apply to *values* of properties and not to properties (even though are used as modifiers before the names of properties). It is like in a form which you have to fill - if against a field you found written "optional", this does not mean that the field is not there, but that you may leave it blank. Therefore, even though the three properties above are essential for a name (and thus mandatory for any thing which you call a name), some of their values might be missing! If we admit that all three properties of a name are optional, then something which has none of the three can be useful only in an "algebra of names" as a fixed element like a "unit" in an algebra with multiplication, but it cannot be useful for an agent, which must have at least one "handle" to "hold" the object which it works with. Therefore, the I regard as mandatory property of a name its lexeme. There is no reason to regard mandatory other properties of a name and this allows for useful partial cases of the notion of name. Say, I call *reference* a name which meaning is unessential in the discourse, and I call *label* a name which is meaningless in a discourse. A label is a partial case of a reference. Also, I treat a name without interpretation as something *used as* a name. SemanticWeb is said to employ the intensional approach (versus extensional approach) - this means that the names used in descriptions, are not expected to be interpreted at the time when the description is created (but they can obtain values and become interpreted afterwards). The intensional aproach is possible due to the property *interpretation* in a name being treated as optional, as we did above. 2. Blank Nodes as names. It is important to place blank nodes inside a well understood framework, and the concept *name* as a triad above provides such a framework. The name "blank node" comes from RDF graphs, where a node is not labeled - that is, the place, where the label is expected, is blank. A RDF graph can be treated as a proposition in a graphic language (planary or 3D). But all natural languages, spoken or written, are linear, and the term "node" sounds bizarre. Also, in a serialization of an RDF graph with 2 or more blank nodes, you have to use a label, which makes the word "blank" also sound bizarre. If they tell you that the label you use is "temporary", this also does not satisfy, because you might have chosen so appropriate a name, that you might want to keep it forever, or at least, longer than some of short-lived URIreferences. Thus, the "temporary" character of the names of "blank names" sounds incorrect. To find good terminology we must return to natural languages, where semantics is well studied and things are named most appropriately. But before doing this, we must answer the question - can blank names be treated as names at all? The meaningful elements in a description are *not* intended by an agent, but are used by the agent to *intend* other things (see the notion “intentionality” http://en.wikipedia.org/wiki/Intentionality). It is names (more precisely, lexemes) that are used to intend objects. Since by graph nodes an agent intends things generally different from graph nodes, this compels to accept that blank nodes themselves *are* names. More precisely, blank nodes in a graph are lexemes of names. Generally, a node (blank or not blank) in a graph is both a graphical object (like a oval or rectangle) and the textual label associated with it, and these two things are just different lexical form of the same lexeme. A blank node is one lexical form of the lexeme of a name. According this terminology (and conceptuality behind it) a blank node is a real name like any other names. Moreover, since a blank node can be "renamed", its "lexical space" (which we called "lexeme") is potentially infinite. How then can we describe in these terms the situation when in a description we need only one blank node and we chose such a node to be “anonymous”, that is "without a name"? This situation can be described differently in a graphic language and in linear languages. In a graphic language like RDF graphs langauge, a blank node as a graphical object is a lexical form of the lexeme. In linear languages, like natural languages, where you have to use a string, this must be a special string. For example, English uses the word "something" for this purpose - "something" is the counterpart of "blank node" in English. In any cases, blank nodes treated as names have a non-void lexeme. I regard as incorrect the expression “name of a blank node” - it sounds to me like “name of a name” as well as the expression “renaming blank nodes” which sounds to me like "renaming a name" (you can "rename" the value, but not a name). What is actually meant in this context by "renaming", is "using another lexical form of the same name". Due to the arguments above, I prefer instead of "blank nodes", to say *local names*, as they do in N3. Generally, the terminology should be invariant of the dimensionality of language and the term *local name* satisfy this requirement, but *blank node* does not satify it, because "nodes" are possible in languages with dimensionality 2 or higher. Since in RDF Semantics, they call *names* the URI references and literals, the first benefit from treating blank nodes also as names, is that we obtain that a triple is a sequence of three names - name of a subject, name of a predicate and name of an object. This brings us closer to natural languages linguistics which is concerned more with expressiveness, than with certain names being on certain places in a triple, in order to obtain "decidable" theories or to keep inside the first-order logic of predicates, which are concerns of RDF standard describing tools. By changing the *wording* of RDF standard document, we can make it usable also by those who are more concerned in expressiveness than in tools. 3. Local Names Scheme Is there a scheme for local names? I treat the underscore “_” as the name of this scheme, and this complies with the fact that we write such names like this _:foo. The standards recommend to use this notation, but do not say that we are proceeding according a URI scheme. Therefore, probably, many people like me have to discover the simple fact that they are dealing with a scheme. Why it would be so difficult to discover this? Because, you first have to discover that blank nodes are also *names* (and the standards exclude them from names for a simple reason that they call names only the URIrefs and literals). If you look in the text of a real ontology (not an example), then you find that the overwhelming majority of names are local names starting with underscores. The local names by far outnumber the global names, and a scheme for them probably must be defined in a standard for names. It is URI standard which defines global names and if it also brings to attention the URI scheme same way as it does with URL and URN schemes, so that URI treats both local names and global names, then the name of such a more general standard could be a standard of "universal names". There is a difficulty about “local name” - the word “local” is understood by most people as applicable also to the terms defined within an ontology. To help solve this problem, I would develop a piece of terminology. How do you call a term defined in an ontology? They say “term or word in the ontology vocabulary” but English does not have one word for this. In Romanian there is a word *vocabula* which correlates a word with a vocabulary, and I borrowed it into my English SemanticWeb terminology. (Romanian is one of the main natural languages which preserved Latin lexicology pretty “intact” and the experience of Romanian with the word "vocabula" is really useful). A vocabula of an ontology (that is, a vocabula in the vocabulary of an ontology) is a lexeme which has two forms – with “prepended” namespace name when it is used outside the ontology and “plain”, without anything prepended when it is used within the ontology. I call them *full name* form and *citation* form, respectively (see again http://en.wikipedia.org/wiki/Lexeme) to find this terminology) . I used the expression “to prepend the namespace name” and not “to prefix the namespace name”, because the full name is not a qualified name but a URI reference. The treatment of vocabulas as lexemes explains why SemanticWeb standards call "vocabulary" both the URI references and local names of the ontologies. Notice that rdf, rdfs, owl, and other, are *citation forms* of certain ontologies names. As I said above, there is a difficulty with modifier “local” in “local name”. Really, why a vocabula, which can be used in different ontologies with different meanings (like Class in RDF and Class in OWL), is *not* also “local”? That is, why does not “local name” apply to both blank nodes names and to vocabulas? . The answer is that people treat names, proper or common, as *lexemes* - not as lexical forms, and we should also treat vocabulas as lexemes (with two forms discussed above). A vocabula is a global name, but it can have a “local part” and only the “local part” of a vocabula is governed by the ontology namespace . On the other hand, a blank node name is a local name, no matter in which lexical form it is used. This explanation must eliminate any confusion between vocabulas and local names. 4. How can all this help? Such treatment of blank nodes as exposed above can help in RDF graph mergers. A local name as _:foo looks like a *citation form* of a vocabula, and we can look for a form which we could call *full name* form. To proceed similarly with a vocabula lexeme, and prepend the ontology URI say like this A#_:foo, where A is an ontology URI and _:foo is local part, is obviously *incorrect*, because the “_” scheme name cannot be inside the name of this scheme, and the local full name must be within the same scheme as the citation form. Therefore, I propose the form _:U:foo – where U is the URI of ontology A, as a full name lexical form for local names. I treat the _ scheme as non-limitative regarding the characters used after (:), even less limitative that the syntax of “local parts”. This allows to use URIs as lexical forms of local names. But in case of local names, alongside full name form and citation form it makes sense to use also *qualified full name form*, which has a syntax like this _:C:foo, where C is the citation form of the ontology A. I assume that this form will be most useful for local names. Now, suppose, we have two ontologies A and B both using _:foo and I want to merge them. The recommendation in standards is that for a merger of two ontologies you rename the blank node in one of the ontologies. But I would recommend this - *for a merger, use full local names lexical forms in both ontologies*. Why this is better than “renaming”? First, this is close to natural language – if John said “there exist something X which.....”, I can say “There is something which John named X and which”. Since this is the usage of a natural language, I would *not* call this recommendation a “trick”, but rather a natural recommendation. On the other hand, renaming blank nodes for a merger recommended by RDF is obviously a trick which solves a difficulty, but it goes against the nature (of natural languages, which keep names). There are also pragmatical and logical reasons why RDF, should better *not* give the recommendation of “renaming blank nodes for a merger” to chemists, mathematicians, linguists and other lay people. Really, in real life they rarely use “unlimited quantifiers” like “there is something, which”. A mathematician would use a “limited quantifier” and say “there is a separable Hilbert space, which”, where 'separable Hilbert space' is a term which has a very complex description. Now, how would one mathematician take the recommendation to rename such names in order to merge his ontology with another mathematician's ontology, if both ontologies are masterpieces of mathematical and terminological thought!? Instead of renaming blank nodes of an ontology to be merged with another ontology, I would recommend to “use *full* local names forms in a merger” (something they do daily when they reference the source of a notion treated differently from how he/she does) . The recommendation to rename blank nodes in a merger reflects a fact from mathematical logics - you can rename variables in the scope of a quantifier. But RDF governs another domain – the development (of software or of ontologies) and it does not govern the logical science. Therefore, I believe, RDF standard should state a *development specific* corollary of the logical fact, and this corollary is *use full local names in a merger*. As I said above, I don't see how this treatment of blank nodes can affect the set theoretic apparatus of "RDF Semantics" document. It only affects the wording. On the other hand, this treatment looks to me as leading to natural languages and as I showed above, and it helps in RDF graphs mergers which is "integration of knowledge". I believe, this treatment can be considered for the next versions of RDF standards. Ioachim Drugus, Ph.D ioachim.drugus@semanticsoft.net Main Architect SemanticSoft, Inc. http://www.semanticsoft.net
Received on Wednesday, 18 June 2008 18:01:30 UTC