Information Resources, language and media-type, an ontology design question from Reto Bachmann-Gmür on 2006-02-10 (semantic-web@w3.org from February 2006)

From: Reto Bachmann-Gmür <reto@gmuer.ch>
Date: Fri, 10 Feb 2006 17:37:13 +0100
To: semantic-web@w3.org
Message-ID: <43ECC139.7060101@gmuer.ch>

Hello

I'd like to hear opinions on how to design an ontology to describe 
something like:

"The thing that results from interpreting the text as English that 
results from interpreting the byte-sequence with hash 87987987 as a 
utf-8 character sequence."

A variant with Interpretion Properties [1] could look like this:

[
    :englishVersion [
        :utf8encoded [
            :hash urn:hash:87987987
        ]
    ]
]

Another variant without interpretation property could look like this

[
    :languageVersion [
        :language "en";
        :encodingVersion [
            :encoding "utf-8";
            :bytes [
                :hash :urn:hash:87987987
            ]
        ]
    ]
]

Also, I would welcome suggestion on how to best specify if the result of 
the utf-8 decoding should be interpreted as XHTML. Should this be done 
with an additional level of indirection or rather put into to one 
encoding/media-type property? A relevant difference seems to me, that a 
plain text does not have an inherent language while XHTML may have: the 
text "pain" is not French (but may successfully be interpreted as such), 
but for the XHTML-snipped <span xml:lang="fr">pain</span> the attribute 
"dc:language 'fr'" seems to be inferable. The opinions whether 
[:vorbisAudioEndoded [:hash urn:hash:987098]] is French or if it just 
happens to be successfully interpretable as French, may diverge (if the 
audio is long, perfectly understandable and makes sense in French, it is 
highly unlikely that it is or could be another language, for a short and 
unclear utterance the uncertainty can be higher).

reto



1.http://esw.w3.org/topic/InterpretationProperties

Received on Friday, 10 February 2006 16:37:24 UTC