- From: <tim.glover@bt.com>
- Date: Wed, 2 Jul 2008 14:42:52 +0100
- To: <bparsia@cs.man.ac.uk>, <semantic-web@w3.org>
OK, my final tuppence. Apologies to those for whom this is trivial and trite. 1. <Person/> or <Person></> These are different syntax, but equivalent meaning in XML (same parse tree) and have the same RDF representation. 2. <Person name="John"/> or <Person personName="John"/> These are distinguishable in XML and have different RDF representations, but differ only in the label used for the name, and if the label names are both free in the context, they refer to the same abstract model. This is the level at which I think trivial *views* are useful. 3. <Person name="John"/> or <Person><name>John</name></Person> These have different meanings in XML (different parse trees), but intuitively refer to the same abstract model and have the same RDF representation. This is the level at which I think XML is inferior to RDF, because of multiple representations of the same thing (yes I know the second syntax allows multiple name elements, but that's not my point right now). 4. <Person><name>John</name></Person> or <Person><name value="John"/></Person> These intuitively refer to different abstract models, {Person name John} or {Person name X . X value John} However, there is an obvious mapping between the models. You could write a view that hid the difference, but this would involve reasoning, not just label renaming. I think this is the kind of thing OWL should be able to do. 5. <Person name="John Doe"/> or <Person firstName="John" lastname="Doe"/> These are different at the syntactic and semantic level in any representation. There is a simple intuitive mapping between them, but it is beyond the scope of OWL or views, because it involves using functions on the data values. Conclusion: Thanks for listening, it helped clarify things for me anyway. Tim. -----Original Message----- From: Bijan Parsia [mailto:bparsia@cs.man.ac.uk] Sent: 02 July 2008 13:31 To: Glover,T,Tim,CXR3 R; Semantic Web Subject: Re: comparing XML and RDF data models (back to the list because I think the discussion is valuable; it would be a *very* good idea to get our concrete stories straight; to share them; and to criticize them *in house*; indeed, I think it's very important for us to "police our own" in the wider world; don't let people make wooly statements; don't get upset when a fellow traveller critiques you; sure, be sensible and try a private message to work out a good public strategy if you feel up to it) On 2 Jul 2008, at 12:15, <tim.glover@bt.com> wrote: > Bijan, thanks for your reply. I am replying off-thread to reduce > traffic, but feel free to post to the list if you wish. > > >> John hasType Person. >> John name "John" > > (Yes, I was sloppy with my RDF, but I think my point stands) That wasn't the point. "hasType" isn't rdf:type. It's easy to think of dozens of different ways to represent this in rdf. They may be hacky, but again, you've picked a sweet spot. Consider representing that john has name "John" at time t1. All of the XML examples you give handle that more gracefully than RDF. >> I see type columns in RDBMSs all the time. >> Now, you might want to say that this isn't an "obvious" >> representation, and I'd agree. But we need to be very careful about > cherry picking >> examples that work well for RDF and not so well for XML without > considering >> counterexamples. (Think about representing ordered collections :)) > > Yes, but you have now changed the *data model*. I don't think we mean by data model the same thing. Without a clear definition we'll talk past each other. > My point (which I think > you accept from your previous reply to the thread) is that XML has > different representations for the *same model*. So does RDF. XML may have more and less guidance (as I pointed out in my goodness post), but picking a single example won't show this. > Different models may be > appropriate for different purposes. Perhaps it is useful sometimes to > have an explicit type - but this is a *model* change. john isAPersonWithName "john". I don't believe this changes the model even in your lights. john isa _:x typeOfNamedThing Person; _:x withName "john". Etc. This is sticking with your example. If we go to places where XML is more natural, things will look worse. For example, my xpath queries for a name will remain unchanges when I go from: <person> <name>john</john> </person> to <person> <name>john</john> <atTime>t</atTime> </person> So for the use case where we want to *extend* models (i.e., change them!) for some classes of model and query, XML does much much better than RDF (as a first approximation). So, RDF can have more than one representation for the same model even in your simple case. And for some cases when you update your model, RDF forces a more radical change. My experience with OWL RDF syntax really backs this up. Adding annotations to OWL axioms is trivial in the XML, really really really hard, perhaps practically insoluable in the RDF. Certainly involves a lot of work, just look at this thread: http://www.w3.org/mid/484FF27E.8010007@oracle.com This (and *way more*) is all spouted off whether to including a triple with a reified triple when it's not semantically wrong to do so (i.e., doesn't work for negated class assertions). Brutal! And so weirdly trivial. We couldn't have data/object punning because we had to radically change our model (to incorporate new vocabulary) because there's no syntactic context for occurrences of URIs. >>> With more complicated data, the possible XML representations vary in >>> different ways, and increase exponentially w.r.t. the number of >>> atoms > >>> of information. > >> Do they really increase *exponentially*? How do you identify an atom > of information? > > By an atom of information I mean a triple, which is the lowest common > denominator for these systems. I'd need an argument for that. > Yes, they increase exponentially. There are two ways (at least) of > representing one triple in XML. With two triples, each may be > represented in 2 ways (that's 4 ways). With 3 triples there are 8 > ways. Etc. That's just with this one simple change in representation. > In practice the exponent is bigger than 2. I need to think about it. But then RDF is no better off as soon as you reach variance. I think triples is a biasing starting point as well. > (OK, in practice it would be VERY eccentric to use different > representations for different peoples names :) I think affordances are key. > And anyway, these myriad > representations could be captured by a single query. But my main point > stands. For real, complicated data, there are many representations of > the *same model*, which require different queries. Yeah, but I think we've amply showed that you can get radically different representations of the same model in RDF. In practice, you don't have to get too many different representations for it to be a problem. Plus, if you normalize/map things are simpler. >>> To extract the data from the XML we have to know the detailed >>> representation chosen. Saying we can UNION different queries misses >>> the point - we still have to write 3 queries. Saying we can use >>> transformations misses the point - we still have to write >>> transformations. > >> Even if this is true for this example, I've given several (and Paul's > given an in >> principle) where RDF has similar problems. It seems that at best XML > would be >> polynomially better (which can be significant, obviously). In the SVG > argument, I >> pointed out that if you are in a sweet spot for something, then that > something often >> (but not always) wins. > > RDF has similar problems if you *change the model* eg use "Name" > instead > of "name" for the name property. Why is this a change in the model, whereas using an attribute is not? Most of the time, "Data model" refers to the the actual structure of the data, not to the conceptual model it representss. > The problems with XML are in addition > to these model changes. Perhaps XML trades a bit of representation confusion between arbitrary representations for better evolvability. >>> The issue here is that XML fails to abstract the data from the >>> representation as effectively as RDF and RDBMS. In this sense, RDF > and >>> RDBMS are better data representations than XML. > >> So, even if I accept this example, we need more to make the > generalization work. In >> principle, we need to make sure we've not cherry picked. > >> (But, big kudos for making a sensible, rational attempt.) > > Thanks :) Thank *you*. >> Cheers, >> Bijan. > > Thanks for the discussion Cheers, Bijan.
Received on Wednesday, 2 July 2008 13:43:53 UTC