Re: A valuable lesson on the difference between XML Schemas and ontologies (UNCLASSIFIED) from Cheney, Edward A SSG RES USAR USARC on 2011-11-04 (xmlschema-dev@w3.org from November 2011)

From: Cheney, Edward A SSG RES USAR USARC <austin.cheney@us.army.mil>
Date: Fri, 04 Nov 2011 12:11:02 -0500
To: "xmlschema-dev@w3.org" <xmlschema-dev@w3.org>
Message-ID: <7690dd8639ca9.4eb3d656@us.army.mil>
Classification: UNCLASSIFIED
Roger,

I do not agree, although I certainly agree from the perspective of an English language user.  The reason I disagree is due to complexities of linguistic use and linguistic determinism.  There are two considerations to be aware of if your native language is an Indo-European language.

Generally, Indo-European languages operate in a manner of starting with a commonly known, or perhaps agreed upon, subject.  From that subject context driven elaboration is applied first in the form of statements and then in the multidimensional form of paragraphs.  This is how we come to know of context.  From a data perspective context is frustrating, because data is stale and so models must exist in order to map relationships that bind meta data sufficiently to allow context model.

There are other languages that do not work like that.  For example, take Swahili, and likely other Bantu languages.
http://en.wikipedia.org/wiki/Swahili_language#Semantic_motivation

In the case of Swahili context is less necessary, because breaking down a single word into its respective prefix(es) yields the direct meta data that describes the word.  In this case the description is known and the thing are known because of the formulation of the single word and so contextual mappings are either partially or completely irrelevant compared to equal understanding in an English communication.  The result is that if a descriptive structure were created with absolute and commonly agreed upon certainties of grammar then it may be safe to claim such a structure is certainly semantic because its components and types are already know without additional inferences.

Another fault of many Indo-European languages is in their written expression.  Namely that such languages are reliant upon an alphabet.  Alphabets are certainly technologically superior static pictographs.  I come to this conclusion because pictographic means of writing take substantially longer to learn even if they are possibly more expressive and potentially more compact and alphabetic communications.  My inference is that the shorter durations of learning a writing without sacrifice to equal oral communication of a given language is the measure by which I claim technological superiority.

Alphabetic writings are problematic for several reasons.  I am solely an English speaker, so I will only speak from an English perspective.  Alphabets are expressions where written glyphs represent a sound.  In English words are not formed of sounds, but are formed of syllables that may constitute a plurality of sounds.  Furthermore the English language has something like 42 sounds, but only contains 26 letters in its Alphabet of which some of the letters are phonetically redundant.  This is horribly confusing.

Fortunately there is something technologically superior to alphabet, called syllabary.  A syllabary is a written system of glyphs where each glyph represents a full syllable expression.  I have come to the conclusion that a syllabary is technologically superior to an alphabet, because it takes about 9 to 11 months for an illiterate person to learn to read from the English alphabet.  In contrast it takes roughly 3 to 5 months for an illiterate person to learn to read Cherokee.

This incompleteness of alphabets, particularly in regards to English, is astounding.  It does not merely hamper the duration of learning to read, but continues to hamper comprehension forever because the complexities and confusion never go away.  We attempt to mask this complexity with memorization and wisdom of context.  I have already examined above that context is not the most efficient means of supplying meta-data at an extremely foundational level, but now we are applying this inefficiency to mask the numerous inconsistent conventions of our English writing system.  Because we are so hopelessly dependent upon contextual formulations to even figure out one a single instance of one common word I have no choice, at least from an English perspective, but to agree that data structures are not capable of being ontology mappings.  With a proper understanding of taxonomies and a limited knowledge of linguistics it does not have to be this way.

It is certainly possible for a descriptive data structure to be a complete and stable ontology system on its own.  You just need the proper conventions and start from a solid and primitive foundation.

Austin Cheney, CISSP

On 11/04/11, "Costello, Roger L."  <costello@mitre.org> wrote:

> Hi Folks,
> 
> This week I learned a valuable lesson on the difference between XML Schemas and ontologies. I think you will find it of interest.
> 
> Warning: in the following two sections I will lead you down a path and attempt to persuade you that everything is reasonable and logical. Then, in the two sections after that I will change my position 180 degrees and attempt to persuade you that what I said previously is unreasonable and illogical.
> 
> ---------------------------------------------------------------------------------------
> The Problem: Element Has No Information About The Kind Of Thing It Is
> ---------------------------------------------------------------------------------------
> 
> In this section and the next I will attempt to persuade you to connect every element in your XML Schema to a semantic identifier.
> 
> --- 
> 
> Some XML Schemas declare elements and do not associate them to anything. That is, there is no indication of what kind of thing an element is. For example, in the following XML Schema there is no indication of what kind of thing the title element is:
> 
>       <element name="title" type="string" />
> 
> That element declaration states the name of the thing (title), the type of data that the thing can have (string), but it says nothing about what kind of thing it is.
> 
> More ... http://www.xfront.com/What-Kind-Of-Thing-Is-It.pdf 
> 
> Comments welcome.
> 
> /Roger
Classification: UNCLASSIFIED
Received on Friday, 4 November 2011 17:11:44 UTC