Re: xml without rdf, but with an ontology [0] from Henry Story on 2005-01-14 (www-rdf-interest@w3.org from January 2005)

From: Henry Story <henry.story@bblfish.net>
Date: Fri, 14 Jan 2005 16:40:56 +0100
To: atom-owl@googlegroups.com
Cc: www-rdf-interest@w3.org, jsled@asynchronous.org, bloged <users@bloged.dev.java.net>
Message-Id: <AE55CAE0-6642-11D9-9C7C-000A95D9FA7A@bblfish.net>
On 14 Jan 2005, at 15:27, Ian Davis wrote:
> On 14/01/2005 12:51, Henry Story wrote:
>> [snip]
>> I don't think any heuristics are needed. Just an ontology for the
>> tags and attributes plus a systematic way to map xml into graph space
>> which I have described in this thread.
>
> The point I was attempting to get across in my original email was that  
> I could think of no heuristic that would give a consistent mapping  
> using only the instance document. The mappings that Josh came up with  
> aren't consistent: I would need to use a different query against the  
> triple store for each form to extract the content (for display,  
> perhaps).

I agree. But my point is that you won't have just the instance  
document. You
have an instance document + an ontology. An ontology should be  
available at
the uri of the element or attribute in question.

So for example you could have the following extract

<feed xmlns:foaf="http://xmlns.com/foaf/0.1/#"
       xmlns="http://atom.org/">
    ...
    <entry>
       <author>
          <email>henry.story@bblfish.net</email>
          <foaf:homepage>http://bblfish.net</foaf:homepage>
          <foaf:aimChatID>aim:unbabelfish</foaf:aimChatID>
       <author>
     ...
     </entry>
</feed>

so we can find the definition of foaf:homepage at

http://xmlns.com/foaf/0.1/#homepage

You can verify this by using sending the http header using
telnet

-----------------8<----------------------------------------
hjs@bblfish:0$ telnet  xmlns.com 80
Trying 82.32.5.17...
Connected to 82-32-5-17.cable.ubr01.azte.blueyonder.co.uk.
Escape character is '^]'.
GET http://xmlns.com/foaf/0.1/ HTTP/1.0
Accept: application/rdf+xml

-----------------8<----------------------------------------

As a result you will get a OWL file with the definition of home page
in machine readable form.

----------------8<-------------------------------------------------
   <rdf:Property rdf:about="http://xmlns.com/foaf/0.1/homepage"  
vs:term_status="stable" rdfs:label="homepage" rdfs:comment="A homepage  
for some thing.">
     <rdfs:subPropertyOf rdf:resource="http://xmlns.com/foaf/0.1/page"/>
     <rdf:type  
rdf:resource="http://www.w3.org/2002/07/owl#InverseFunctionalProperty"/ 
 >
     <!--  previously: rdfs:domain  
rdf:resource="http://xmlns.com/foaf/0.1/Agent" -->
     <rdfs:domain  
rdf:resource="http://www.w3.org/2000/01/rdf-schema#Resource"/>
     <rdfs:range rdf:resource="http://xmlns.com/foaf/0.1/Document"/>
     <rdfs:isDefinedBy rdf:resource="http://xmlns.com/foaf/0.1/"/>
   </rdf:Property>
----------------8<-------------------------------------------------

(( ok they screwed up a little here, because they did not use the same
url for fetching the human readable web page definition, and fetching  
the
rdf definition -- but forget about that ))

Here we see that the the domain of a foaf:homepage is pretty much  
anything, and
that the range is a foaf:Document. (I suppose they are thinking that  
toothbrushes may  have their home page one day too)

Since the Person construct is something it is also pretty much anything
and so the homepage relation can apply to it.


Here is the definition of aimChatID:

----------------8<-------------------------------------------------
  <rdf:Property rdf:about="http://xmlns.com/foaf/0.1/aimChatID"  
vs:term_status="testing" rdfs:label="AIM chat ID" rdfs:comment="An AIM  
chat ID">
     <rdfs:isDefinedBy rdf:resource="http://xmlns.com/foaf/0.1/"/>
     <rdfs:subPropertyOf rdf:resource="http://xmlns.com/foaf/0.1/nick"/>
     <rdfs:domain rdf:resource="http://xmlns.com/foaf/0.1/Agent"/>
     <rdfs:range  
rdf:resource="http://www.w3.org/2000/01/rdf-schema#Literal"/>
     <rdf:type  
rdf:resource="http://www.w3.org/2002/07/owl#InverseFunctionalProperty"/ 
 >
----------------8<-------------------------------------------------

This has a domain of foaf:Agent. Presumably foaf:Agent will somewhere be
defined to be closely related to atom:Person. In any case we know that  
if the
xml is to make sense the Person object has to also be a foaf:Agent  
object.



>
> With some kind of schema annotation one could declare that the content  
> of <extension> is always an XMLLiteral.

In my proposal the ontology plays the role of schema annotation. It is
an OO and declarative type of schema annotation I think (can't tell for
sure cause I don't yet know schamas inside out)

> For the format that provoked this discussion (Atom) it's expected that  
> authors will extend the format in new and exciting ways. Giving the  
> parser knowledge of all possible extensions up front in the form of a  
> schema for each is rather defeating the point of the mapping.

EXACTLY.
What is needed to make sense of arbitrary extensions is their OWL files
+ the minimal procedure for mapping xml to a graph, I sketched in this
thread.


> So, there need to be some sort of structural rules that define what  
> types of triples should be produced. These rules should be consistent  
> and work from the instance document. For the specific case of Atom the  
> following suggested equivilences between markup and NTriples might  
> serve:
>

Before I go into these examples, can you tell me how my proposal does
NOT deal with them?



> 1. Element is empty.
>
> <item>
>   <ex:extension/>
> [snip]


> Is this worth writing up as a Pace for Atom?

I think we may be onto a much more general thing here,
applicable to all xml. I think what atom certainly needs
is a good accompanying Ontology. I have produced one for
the previous format. I can put it up on the wiki.

Henry Story

> Ian
>
Received on Friday, 14 January 2005 15:41:00 UTC