Re: comparing XML and RDF data models

On 2 Jul 2008, at 14:42, <tim.glover@bt.com> wrote:

>
> OK, my final tuppence. Apologies to those for whom this is trivial and
> trite.
>
> 1.
> 	<Person/>
> or
> 	<Person></>
>
> These are different syntax, but equivalent meaning in XML (same parse
> tree) and have the same RDF representation.

I don't know what that represents in RDF.

> 2.
> 	<Person name="John"/>
> or
> 	<Person personName="John"/>
>
> These are distinguishable in XML and have different RDF  
> representations,
> but differ only in the label used for the name, and if the label names
> are both free in the context, they refer to the same abstract model.

Same structural model maybe. But suppose "name" just indicates that  
"Person" is a name of type "john"? That is, the conceptual model can  
be completely divorces.

(In type theory, compare "structural types" vs. "named types".)

Plus, this has a ripple effect. Considering one triple in isolation  
is one thing. But now if you have multiple possible mappings and you  
may be mapping subgraphs in different dependent ways...

> This is the level at which I think trivial *views* are useful.

Sure. But a view is map in this case. So we've not avoided mapping :(

> 3.
> 	<Person name="John"/>
> or
> 	<Person><name>John</name></Person>
>
> These have different meanings in XML (different parse trees), but
> intuitively refer to the same abstract model

I don't know. Different conceptual model perhaps. But in the first,  
you can only ever have one name, and order can never be relevant. Not  
so in the second.

> and have the same RDF
> representation.

No. For example, if you wanted to capture the nesting, you'd need  
something like:

	john type person.
	john hasNames ("John")

Now john can have more than one name.

Actual data models have subtle differences based on what you *can* do  
with them, not just what you *did* do with them. You presume that the  
intend in the xml case was to model the single named case.

Actually, I can't even get that with RDF. I need to block alternative  
names.

>   This is the level at which I think XML is inferior to
> RDF, because of multiple representations of the same thing (yes I know
> the second syntax allows multiple name elements, but that's not my  
> point
> right now).

But why not? These seem critical to the model. Take the first case  
where a node can have only one name. You need a fair bit of OWL to  
get that in the RDF. So why isn't that relevant?

Two people, working independently, could easily thing these represent  
different abstract models.

> 4.
> 	<Person><name>John</name></Person>
> or
> 	<Person><name value="John"/></Person>
>
> These intuitively refer to different abstract models,
>
> 	{Person name John}
> or
> 	{Person name X . X value John}

Depends on the level of abstraction. I think they don't because I'm  
aware of a certain pattern.

> However, there is an obvious mapping between the models. You could  
> write
> a view that hid the difference, but this would involve reasoning,

Transformation at least.

> not
> just label renaming.

Depends on what you think is a label :)

> I think this is the kind of thing OWL should be
> able to do.

The above? not easily.

> 5.
> 	<Person name="John Doe"/>
> or
> 	<Person firstName="John" lastname="Doe"/>
>
> These are different at the syntactic and semantic level in any
> representation.

Not really. Esp. if the first is treated as an xsd list.

> There is a simple intuitive mapping between them, but it
> is beyond the scope of OWL or views, because it involves using  
> functions
> on the data values.

I don't think that's a meaningful difference. Functions on structure,  
function of strings...it's all the same :)

> Conclusion: Thanks for listening, it helped clarify things for me
> anyway.

Good attempt. I think the underlying flaw in your reasoning is that  
you presume that there is "a" abstract model and we all have access  
to it. So you presume variance where there may be none and *impose*  
variance where there may be one. If I choose the models a bit  
differently, I get different results.

Cheers,
Bijan.

Received on Wednesday, 2 July 2008 14:14:05 UTC