An analytic framework for (Re: comparing XML and RDF data models) from Bijan Parsia on 2008-07-04 (semantic-web@w3.org from July 2008)

From: Bijan Parsia <bparsia@cs.man.ac.uk>
Date: Fri, 4 Jul 2008 09:57:07 +0100
To: Semantic Web <semantic-web@w3.org>
Message-Id: <1738D627-A2EA-4ADE-B369-C506E72A859E@cs.man.ac.uk>

Hmm. I wonder if your framework is salvagable.

Let a data model be a standard model that results from parsing a  
linear syntax according to standard rules (this is generalizable of  
course, but let's keep it simple for the moment). An RDF/XML document  
can be parsed to a DOM/Infoset (one model) and to a set of triples  
(another).

1) Structurally identical models
	Two data models are structurally identical iff there is a  
"straightforward" (model respecting?) isomorphism between them.  
Intuitively, whatever parts and structures we have, they have to line  
up. Thus attributes map to attributes and triples to triples. Note  
that this can be strict (the names align, in which case we are  
identical, although perhaps we have some slight differences, "01"  
instead of "1") or merely structural (names don't align).

	If there's a merely structural identity between two models, we can  
reuse the same queries to get the same model parts from each as long  
as we have a renaming function or ignore names.

2) Structurally compatible models
	I think this is easiest to see in extension cases. Take an XML  
document under some schema. Now suppose you *extend* that document in  
accordance with the schema. There's a fairly large class of  
(positive) XPath queries that will return all the old answers plus  
maybe some new ones.
	XPath is actually more robust than that, since I can change  
heirarchy levels pretty easily. That is, I can write queries that are  
fairly insensitive to a class of structural changes. This is harder  
to do in SPARQL (in part because of the lack of a  transitive closure  
operator, I think).
	There are a *lot* of variables that can affect structural  
compatibility including features of the data model, features of the  
schema/ontology language, mapping languages, and features of the  
query language. Each variable can compensate (to some degree) for the  
deficiencies of the others.
	
3) Structurally incompatible models
	Using an attribute instead of an element. These require a transform  
mapping or a change to the query. In other words, the query can't  
give the same answers through this shift in the model.

Now, both XML and RDF have some pushes toward structurally  
incompatible models. XML has some representational choice (e.g.,  
attributes vs. elements). RDF has some (e.g., containers vs.  
collections vs. properties vs. home grown stuff). XML has syntactic  
context, RDF doesn't (without data uris, literals, or reification).

XML tends to solve structural incompatibility via *normalization*,  
i.e., a transformation. RDF/OWL tends to solve it by *inference*/ 
augmentation. Relation databases with or without ontologies (e.g.,  
for distributed query) tend to solve it by *mapping* (i.e., Global as  
View, etc. etc.)

(ETL, in all cases, works by normalization, obviously.)

It would be interesting to develop a set of both toy and realistic  
examples illustrating the various issue and techniques for handling  
them. That could serve as a reasonable basis, perhaps, for discussing  
technology choices.

Cheers,
Bijan.

Received on Friday, 4 July 2008 08:57:58 UTC