RE: yet another strawman, was Re: generic XML to RDF triple mappi ng from Dan Brickley on 2000-09-09 (www-rdf-interest@w3.org from September 2000)

From: Dan Brickley <danbri@w3.org>
Date: Sat, 9 Sep 2000 10:52:32 -0400 (EDT)
To: James Tauber <JTauber@bowstreet.com>
cc: "'Jonathan Borden'" <jborden@mediaone.net>, www-rdf-interest@w3.org
Message-ID: <Pine.LNX.4.21.0009091008300.11062-100000@tux.w3.org>
(sorry, this is a bit long.)

Summary: infoset vs application dataset distinction is important. Some
excerpts from Cambridge Communique to this effect. Speculation about using
Schematron for the application dataset mapping problem.

On Fri, 8 Sep 2000, James Tauber wrote:

> > Suppose we have arbitrary XML
> > 
> > <person>
> >     <name type='full'>
> >             <first>John</first>
> >             <last>Doe</last>
> >     </name>
> >     <name type='nickname'>Johnny Dee</name>
> > </person>
> > 
> > is <name> a property of the person, or is name an instance of 
> > a class which has properties <first> and <last>?
> 
> That's up to the designer of the XML schema. It could be one, the other or
> both. What I am suggesting is the designer of the XML schema is the one that
> specifies how an instance maps to RDF triples.

I think that's right, if we want the triples to represent some
meaningful entity/relationship style model rather than simply be an
edge-labelled graph version of the DOM. I think Ora raised a similar point
a week or two back; we should be wary of mechanically shovelling any/all
XML into RDF and expecting something meaningful at the end.

One slippery point here is that quite a few people have (very
productively) been looking at the latter scenario as well. Some of the
early XML Query proposals projected arbitrary XML into an RDF-like graph
model, *without taking into account the intentions behind the XML
vocabularies being used*. While I can see value in both approaches, it's
important to distinguish between them. 

The Cambridge Communique gives us some conceptual machinery that might
help here. 

	Excerpts from...
	http://www.w3.org/TR/1999/NOTE-schema-arch-19991007
	The Cambridge Communique
	W3C NOTE 7 October 1999 

	1.The XML data model is the XML Information Set being specified by the 
	XML Information Set Working Group. Other data
        models exist, both generic and application-specific. RDF is an
	example of one such generic data model.[...]

	2.An XML Schema schema document will be able to hold declarations for
	validating instance documents. It should also be
        able to hold declarations for mapping from instance document XML
	infosets to application-oriented data structures. [...]

	4.The extension mechanism should be appropriate for use to incorporate
	declarations ("mapping declarations") to aid the
        construction of application-oriented data structures (e.g. ones
	implementing the RDF model) as part of the
        schema-validation and XML infoset construction process. This
	facility should not be exclusive to RDF, but should also be
        useable to guide the construction of data structures conforming
	to other data models, e.g. UML. 

	5.Such mapping declarations should ideally also be useable by other 
	schema processors to map in the other direction, i.e.
        from application-oriented data structures to XML infosets. 



By now it has become pretty clear that *both* the XML infoset data
structures (elements + attributes stuff) *and* application-oriented data
structures (eg. entity-relationships models, UML, RDF models) can 
be represented in edge-labelled graphs. 

The thing that we need to be most careful about is talk of turning
'any arbitrary XML into RDF', as if there were a sole, simple answer to
this challenge. ('Colloquial XML' is one phrase I've heard used btw).
I can think of lots of RDF-ifications of any chunk of 'colloquial' XML. In
particular, two broad categories: one where we reflect infoset
constructs directly into RDF, another where we reflect the
XML-encoded "application data structures" into RDF without preserving
details of that encoding. The latter seems to me to be one holy grail
of web-data aggregation: we might have two differently serialized
chunks of application data that were talking about the same stuff, and use
Cambridge Communique-style mapping techniques to form a common
representation. The alternative approach, infoset-over-RDF, has it's uses
too, so long as we don't make the mistake of assuming that nodes and arcs
are and end in themself...


So, I look forward to seeing how Redfoot shapes up. I'm wondering if
Schematron might be an interesting model to follow, at least in its
broad approach to using XSLT. See 
 http://www.ascc.net/xml/resource/schematron/schematron.html and
the paper at http://www.ascc.net/xml/resource/schematron/Schematron2000.html
(which the former page mentions as in need of corrections, but
is still a good read). In particular, Schematron-RDF is intriguing. This
"creates RDF statements for each detected pattern in a schema"...

Dan



ps. another reference to a SOAP/RDF thread from xml-dev some time back; 
http://lists.w3.org/Archives/Public/www-rdf-interest/2000May/0114.html
quoting a helpful clarification from Andrew Layman that's (temporarily I
hope) 404-ing at http://xml.org/archives/xml-dev/2000/05/0335.html
Received on Saturday, 9 September 2000 10:52:34 UTC