RE: comparing XML and RDF data models from tim.glover@bt.com on 2008-07-02 (semantic-web@w3.org from July 2008)

From: <tim.glover@bt.com>
Date: Wed, 2 Jul 2008 14:42:52 +0100
To: <bparsia@cs.man.ac.uk>, <semantic-web@w3.org>
Message-ID: <AEF15555D64C494CA393778177A3A1710457E3E6@E03MVC1-UKBR.domain1.systemhost.net>
OK, my final tuppence. Apologies to those for whom this is trivial and
trite. 

1.
	<Person/> 
or 
	<Person></>

These are different syntax, but equivalent meaning in XML (same parse
tree) and have the same RDF representation.


2.
	<Person name="John"/>
or
	<Person personName="John"/> 

These are distinguishable in XML and have different RDF representations,
but differ only in the label used for the name, and if the label names
are both free in the context, they refer to the same abstract model.
This is the level at which I think trivial *views* are useful.


3.
	<Person name="John"/>
or 
	<Person><name>John</name></Person>

These have different meanings in XML (different parse trees), but
intuitively refer to the same abstract model and have the same RDF
representation.  This is the level at which I think XML is inferior to
RDF, because of multiple representations of the same thing (yes I know
the second syntax allows multiple name elements, but that's not my point
right now). 


4.
	<Person><name>John</name></Person>  
or 
	<Person><name value="John"/></Person>

These intuitively refer to different abstract models, 

	{Person name John}
or
	{Person name X . X value John} 

However, there is an obvious mapping between the models. You could write
a view that hid the difference, but this would involve reasoning, not
just label renaming. I think this is the kind of thing OWL should be
able to do. 



5.
	<Person name="John Doe"/> 
or 
	<Person firstName="John" lastname="Doe"/>

These are different at the syntactic and semantic level in any
representation. There is a simple intuitive mapping between them, but it
is beyond the scope of OWL or views, because it involves using functions
on the data values. 


Conclusion: Thanks for listening, it helped clarify things for me
anyway. 

Tim. 


 

-----Original Message-----
From: Bijan Parsia [mailto:bparsia@cs.man.ac.uk] 
Sent: 02 July 2008 13:31
To: Glover,T,Tim,CXR3 R; Semantic Web
Subject: Re: comparing XML and RDF data models

(back to the list because I think the discussion is valuable;

it would be a *very* good idea to get our concrete stories straight; to
share them; and to criticize them *in house*; indeed, I think it's very
important for us to "police our own" in the wider world; don't let
people make wooly statements; don't get upset when a fellow traveller
critiques you; sure, be sensible and try a private message to work out a
good public strategy if you feel up to it)

On 2 Jul 2008, at 12:15, <tim.glover@bt.com> wrote:

> Bijan, thanks for your reply. I am replying off-thread to reduce 
> traffic, but feel free to post to the list if you wish.
>
>
>> John hasType Person.
>> John name "John"
>
> (Yes, I was sloppy with my RDF, but I think my point stands)

That wasn't the point. "hasType" isn't rdf:type. It's easy to think of
dozens of different ways to represent this in rdf. They may be hacky,
but again, you've picked a sweet spot.

Consider representing that john has name "John" at time t1. All of the
XML examples you give handle that more gracefully than RDF.


>> I see type columns in RDBMSs all the time.
>> Now, you might want to say that this isn't an "obvious"
>> representation, and I'd agree. But we need to be very careful about
> cherry picking
>> examples that work well for RDF and not so well for XML without
> considering
>> counterexamples. (Think about representing ordered collections :))
>
> Yes, but you have now changed the *data model*.

I don't think we mean by data model the same thing. Without a clear
definition we'll talk past each other.

> My point (which I think
> you accept from your previous reply to the thread) is that XML has 
> different representations for the *same model*.

So does RDF. XML may have more and less guidance (as I pointed out in my
goodness post), but picking a single example won't show this.

> Different models may be
> appropriate for different purposes. Perhaps it is useful sometimes to 
> have an explicit type - but this is a *model* change.

john isAPersonWithName "john".

I don't believe this changes the model even in your lights.

john isa _:x typeOfNamedThing Person;
	_:x withName "john".

Etc.

This is sticking with your example. If we go to places where XML is more
natural, things will look worse. For example, my xpath queries for a
name will remain unchanges when I go from:

<person>
	<name>john</john>
</person>

to

<person>
	<name>john</john>
	<atTime>t</atTime>
</person>

So for the use case where we want to *extend* models (i.e., change
them!) for some classes of model and query, XML does much much better
than RDF (as a first approximation).

So, RDF can have more than one representation for the same model even in
your simple case. And for some cases when you update your model, RDF
forces a more radical change.

My experience with OWL RDF syntax really backs this up. Adding
annotations to OWL axioms is trivial in the XML, really really really
hard, perhaps practically insoluable in the RDF. Certainly involves a
lot of work, just look at this thread:
	http://www.w3.org/mid/484FF27E.8010007@oracle.com

This (and *way more*) is all spouted off whether to including a triple
with a reified triple when it's not semantically wrong to do so (i.e.,
doesn't work for negated class assertions). Brutal! And so weirdly
trivial.

We couldn't have data/object punning because we had to radically change
our model (to incorporate new vocabulary) because there's no syntactic
context for occurrences of URIs.


>>> With more complicated data, the possible XML representations vary in

>>> different ways, and increase exponentially w.r.t. the number of 
>>> atoms
>
>>> of information.
>
>> Do they really increase *exponentially*? How do you identify an atom
> of information?
>
> By an atom of information I mean a triple, which is the lowest common 
> denominator for these systems.

I'd need an argument for that.

> Yes, they increase exponentially. There are two ways (at least) of 
> representing one triple in XML.  With two triples, each may be 
> represented in 2 ways (that's 4 ways). With 3 triples there are 8 
> ways. Etc. That's just with this one simple change in representation.

> In practice the exponent is bigger than 2.

I need to think about it. But then RDF is no better off as soon as you
reach variance. I think triples is a biasing starting point as well.

> (OK, in practice it would be VERY eccentric to use different 
> representations for different peoples names :)

I think affordances are key.

> And anyway, these myriad
> representations could be captured by a single query. But my main point

> stands. For real, complicated data, there are many representations of 
> the *same model*, which require different queries.

Yeah, but I think we've amply showed that you can get radically
different representations of the same model in RDF. In practice, you
don't have to get too many different representations for it to be a
problem.

Plus, if you normalize/map things are simpler.

>>>   To extract the data from the XML we have to know the detailed 
>>> representation chosen. Saying we can UNION different queries misses 
>>> the point - we still have to write 3 queries. Saying we can use 
>>> transformations misses the point - we still have to write 
>>> transformations.
>
>> Even if this is true for this example, I've given several (and Paul's
> given an in
>> principle) where RDF has similar problems. It seems that at best XML
> would be
>> polynomially better (which can be significant, obviously). In the SVG
> argument, I
>> pointed out that if you are in a sweet spot for something, then that
> something often
>> (but not always) wins.
>
> RDF has similar problems if you *change the model* eg use "Name"  
> instead
> of "name" for the name property.

Why is this a change in the model, whereas using an attribute is not?  
Most of the time, "Data model" refers to the the actual structure of the
data, not to the conceptual model it representss.

> The problems with XML are in addition
> to these model changes.

Perhaps XML trades a bit of representation confusion between arbitrary
representations for better evolvability.

>>> The issue here is that XML fails to abstract the data from the 
>>> representation as effectively as RDF and RDBMS. In this sense, RDF
> and
>>> RDBMS are better data representations than XML.
>
>> So, even if I accept this example, we need more to make the
> generalization work. In
>> principle, we need to make sure we've not cherry picked.
>
>> (But, big kudos for making a sensible, rational attempt.)
>
> Thanks :)

Thank *you*.

>> Cheers,
>> Bijan.
>
> Thanks for the discussion

Cheers,
Bijan.
Received on Wednesday, 2 July 2008 13:43:53 UTC