Re: comparing XML and RDF data models

2008/7/3 Bijan Parsia <bparsia@cs.man.ac.uk>:
>
> On Jul 2, 2008, at 11:02 PM, Peter Ansell wrote:
>
>> 2008/7/2 Bijan Parsia <bparsia@cs.man.ac.uk>:
>>>
>>> On 2 Jul 2008, at 12:19, Mark Birbeck wrote:
>>>
>>>> Hi Tim,
>>>>
>>>> I'm not sure that this is where the differences lie.
>>>>
>>>> In my view the key point is that with RDF we have unique identifiers
>>>> for concepts--whether that is the things we're talking about, or the
>>>> vocabulary we're using to talk about them.
>>>
>>> [snip]
>>>
>>> I stop reading here.
>>>
>>> Here's why. At best we have unambiguous (not unique) identifiers and we
>>> don't have those either.
>>
>> I don't think it is too hard in scientifically based/real world
>> ontology instances to determine whether things are infact unique or
>> common names for a universal thing.
>
> I think you radically underestimate this. Talking with biologists and
> bioinformatics people reveals otherwise.

It depends on which level you are trying to determine uniqueness. Ie,
whether things have to be the same thing with all the same properties
and functionality and context, or whether you classify them as being
the same if they are different things with a shared property or
functionality.

> It's also basic logico-mathematical fact. We can't get unique and
> unambiguous names for *integers*.

Why are we naming integers? There are multiple dimensions that people
want to describe things using, such as time and place, and RDF allows
you to describe them independently according to one schema without
putting a restriction on what can be known past that schema if you are
actually interested in determining equality in other dimensions.

> So, I guess it is not too hard, for any name, you can say, "it's in fact not
> unique" :)
>
>>> This isn't even a coordination issue. In a single ontology it's highly
>>> nontrivial to establish formal uniqueness (i.e., two names aren't
>>> equivalent/equal; requires lots of reasoning) and even harder to
>>> establish
>>> intended uniqueness (I might coin a term twice because I didn't recognize
>>> them to be the same).
>>
>> Are you talking about establishing conclusively that two names are
>> definitely not referring to the same universal thing?
>
> I don't know what a universal thing.

Sorry, I thought that would be obvious. In my realist view there are
things that people are trying to describe using RDF and those things
have identities, which we are labelling with names, or describing
through identifying properties. These are universal things. If you
want to establish equality you are likely to be doing it at this
level, unless you assume that markup is everything and a thing is only
in existence because there is a description of it, and the description
of the thing causes the thing to exist.

> But yes, I don't think it's easy to establish either that two people
> intended their distinct names to refer to different things, and it's
> certainly difficult to establish that equivalence in a model (OWL reasoning
> is difficult!). And even if you declare two class to be disjoint...that
> might be *wrong* (and one might find an isomorphism between them).

Isomorphism's can easily be found at the markup level without having
to imply them as being isomorphisms that then also translate to the
universal thing being similar.

>> With the open
>> world assumption you can never deny that they might refer to the same
>> thing,
>
> Well, you can say they are unequal or disjoint. (Or infer that.) But in an
> alignment situation, you might find that certain structurally identical but
> disjoint classes actually "ought" to denote the same thing (i.e., your model
> is wrong).

That is not a problem with being able to define something as being
equal or not. That is a modelling issue.

>> possibly in an inconsistent way according to the ontology, but
>> if you establish a consistent ontology and they do not match up given
>> your chosen reasoning rules then you have to make the best assumption
>> you can and say that they are not likely statistically to be the same
>> and run with that. RDF doesn't provide a solution to dirty data,
>> although it can be used to trace it better.
>
> Please see:
>
>  <http://www.w3.org/mid/44444506-BC65-4BB9-AF6B-01FE434C6C3A@cs.man.ac.uk>

Not sure how that helps... I am saying pretty much the same thing I thought.

>> The point I think Mark may have been trying to make is that with
>> certain property combinations, InverseFunctionalProperty's can be
>> utilised in order to determine this uniqueness or non-uniqueness,
>
> ? IFPs aren't about identifiers per se. And they can't determine uniqueness!
>  Infact, generally you are trying to say that two names refer to the same
> thing (i.e., aren't unique).

Their non-equality can be used to determine un-uniqueness, and I don't
see a reason why, within a particular set of data that you are
reasoning about, that you can't imply that means they are not unique.
You are right though, I am not talking about unique naming. I thought
that was clear earlier on. Talking about things being the same even if
they have a different name is possible using IFP's.

>> along with it being easier in RDF to distinguish between your
>> acknowledged terms, and outside terms,
>
> This is contentless for me. I literally have no idea what you mean.

In XML you have to accept entire documents, unless one schema
explicitly allows you to enrich documents of its type with elements
from other namespaces. In RDF you can acknowledge that you only want
to deal with a specific part of the graph, without having to follow
the implications that come from the definition of every element you
know about. Ie, you "import" ontologies, while still being able to
utilise other terms as names without importing their schema
explicitly.

>> which in XML can result in you
>> not acknolwedging that two infoset elements are unique because one
>> contains an unknown namespaced property against the desires of one
>> schema, and you need to validate XML before you can work with it at
>> this level.
>
> Again, see:
>
>  <http://www.w3.org/mid/44444506-BC65-4BB9-AF6B-01FE434C6C3A@cs.man.ac.uk>
>
> I hope you don't find this offensive, but this just seems like *babble* to
> me. Perhaps a concrete example will help.

I don't find it offensive pretty much. I am not sure why you keep
referring back to that message though. It doesn't actually provide
anything to this discussion IMO.

> Tim Glover really did a heroic job of trying to make a case. I commend his
> effort as a model. The problem with getting clear is that one is likely to
> be wrong :( That sucks, but it's better to get clear and find out what we
> *can* usefully claim.
>
> Again, I urge semanticwebbians to be *ruthless* in our scrutiny of the sorts
> of claims we make.  We have a *bad* reputation for Koolaidoisity, and, I'm
> afraid, it's well deserved. We're not going to win over unconvinced people
> by grandiose and vague magical claims.

I don't make any of those claims. Acknowledging that two names aren't
going to be unique, why explicitly being able to say that two names
refer to the same thing isn't grandiose, its quite pragmatic.

> ...or maybe we will. Plenty of stuff works that way. But I don't like it.
> I'd rather say true and extremely intelligible things.

I am not sure what you mean by extremely intelligible. Settling for
the basics, at least initially, is a good thing. Some people will use
RDF just for documenting things without having to first write up a
schema, others will use it because they have complex models and rules
that they want to represent. Both should be allowable within semantic
applications, and constant debate over whether a particular string
should be universally unique as a way of talking about a particular
universal thing doesn't help that goal at all. It is true that using
common names can optimise decision making about the classical
consistency of the "ontology" but people don't necessarily need or
want to know about decision making for RDF to be useful for them.

Cheers,

Peter

Received on Wednesday, 2 July 2008 23:28:12 UTC