Re: comparing XML and RDF data models from Bijan Parsia on 2008-07-02 (semantic-web@w3.org from July 2008)

From: Bijan Parsia <bparsia@cs.man.ac.uk>
Date: Wed, 2 Jul 2008 13:30:34 +0100
To: tim.glover@bt.com, Semantic Web <semantic-web@w3.org>
Message-Id: <2DE1A161-28F8-4B25-A0E4-1EBC38B6A348@cs.man.ac.uk>
(back to the list because I think the discussion is valuable;

it would be a *very* good idea to get our concrete stories straight;  
to share them; and to criticize them *in house*; indeed, I think it's  
very important for us to "police our own" in the wider world; don't  
let people make wooly statements; don't get upset when a fellow  
traveller critiques you; sure, be sensible and try a private message  
to work out a good public strategy if you feel up to it)

On 2 Jul 2008, at 12:15, <tim.glover@bt.com> wrote:

> Bijan, thanks for your reply. I am replying off-thread to reduce
> traffic, but feel free to post to the list if you wish.
>
>
>> John hasType Person.
>> John name "John"
>
> (Yes, I was sloppy with my RDF, but I think my point stands)

That wasn't the point. "hasType" isn't rdf:type. It's easy to think  
of dozens of different ways to represent this in rdf. They may be  
hacky, but again, you've picked a sweet spot.

Consider representing that john has name "John" at time t1. All of  
the XML examples you give handle that more gracefully than RDF.


>> I see type columns in RDBMSs all the time.
>> Now, you might want to say that this isn't an "obvious"
>> representation, and I'd agree. But we need to be very careful about
> cherry picking
>> examples that work well for RDF and not so well for XML without
> considering
>> counterexamples. (Think about representing ordered collections :))
>
> Yes, but you have now changed the *data model*.

I don't think we mean by data model the same thing. Without a clear  
definition we'll talk past each other.

> My point (which I think
> you accept from your previous reply to the thread) is that XML has
> different representations for the *same model*.

So does RDF. XML may have more and less guidance (as I pointed out in  
my goodness post), but picking a single example won't show this.

> Different models may be
> appropriate for different purposes. Perhaps it is useful sometimes to
> have an explicit type - but this is a *model* change.

john isAPersonWithName "john".

I don't believe this changes the model even in your lights.

john isa _:x typeOfNamedThing Person;
	_:x withName "john".

Etc.

This is sticking with your example. If we go to places where XML is  
more natural, things will look worse. For example, my xpath queries  
for a name will remain unchanges when I go from:

<person>
	<name>john</john>
</person>

to

<person>
	<name>john</john>
	<atTime>t</atTime>
</person>

So for the use case where we want to *extend* models (i.e., change  
them!) for some classes of model and query, XML does much much better  
than RDF (as a first approximation).

So, RDF can have more than one representation for the same model even  
in your simple case. And for some cases when you update your model,  
RDF forces a more radical change.

My experience with OWL RDF syntax really backs this up. Adding  
annotations to OWL axioms is trivial in the XML, really really really  
hard, perhaps practically insoluable in the RDF. Certainly involves a  
lot of work, just look at this thread:
	http://www.w3.org/mid/484FF27E.8010007@oracle.com

This (and *way more*) is all spouted off whether to including a  
triple with a reified triple when it's not semantically wrong to do  
so (i.e., doesn't work for negated class assertions). Brutal! And so  
weirdly trivial.

We couldn't have data/object punning because we had to radically  
change our model (to incorporate new vocabulary) because there's no  
syntactic context for occurrences of URIs.


>>> With more complicated data, the possible XML representations vary in
>>> different ways, and increase exponentially w.r.t. the number of  
>>> atoms
>
>>> of information.
>
>> Do they really increase *exponentially*? How do you identify an atom
> of information?
>
> By an atom of information I mean a triple, which is the lowest common
> denominator for these systems.

I'd need an argument for that.

> Yes, they increase exponentially. There
> are two ways (at least) of representing one triple in XML.  With two
> triples, each may be represented in 2 ways (that's 4 ways). With 3
> triples there are 8 ways. Etc. That's just with this one simple change
> in representation.  In practice the exponent is bigger than 2.

I need to think about it. But then RDF is no better off as soon as  
you reach variance. I think triples is a biasing starting point as well.

> (OK, in practice it would be VERY eccentric to use different
> representations for different peoples names :)

I think affordances are key.

> And anyway, these myriad
> representations could be captured by a single query. But my main point
> stands. For real, complicated data, there are many representations of
> the *same model*, which require different queries.

Yeah, but I think we've amply showed that you can get radically  
different representations of the same model in RDF. In practice, you  
don't have to get too many different representations for it to be a  
problem.

Plus, if you normalize/map things are simpler.

>>>   To extract the data from the XML we have to know the detailed
>>> representation chosen. Saying we can UNION different queries misses
>>> the point - we still have to write 3 queries. Saying we can use
>>> transformations misses the point - we still have to write
>>> transformations.
>
>> Even if this is true for this example, I've given several (and Paul's
> given an in
>> principle) where RDF has similar problems. It seems that at best XML
> would be
>> polynomially better (which can be significant, obviously). In the SVG
> argument, I
>> pointed out that if you are in a sweet spot for something, then that
> something often
>> (but not always) wins.
>
> RDF has similar problems if you *change the model* eg use "Name"  
> instead
> of "name" for the name property.

Why is this a change in the model, whereas using an attribute is not?  
Most of the time, "Data model" refers to the the actual structure of  
the data, not to the conceptual model it representss.

> The problems with XML are in addition
> to these model changes.

Perhaps XML trades a bit of representation confusion between  
arbitrary representations for better evolvability.

>>> The issue here is that XML fails to abstract the data from the
>>> representation as effectively as RDF and RDBMS. In this sense, RDF
> and
>>> RDBMS are better data representations than XML.
>
>> So, even if I accept this example, we need more to make the
> generalization work. In
>> principle, we need to make sure we've not cherry picked.
>
>> (But, big kudos for making a sensible, rational attempt.)
>
> Thanks :)

Thank *you*.

>> Cheers,
>> Bijan.
>
> Thanks for the discussion

Cheers,
Bijan.
Received on Wednesday, 2 July 2008 12:43:43 UTC