Re: Tag ontology RFC from Richard Newman on 2005-04-06 (semantic-web@w3.org from April 2005)

From: Richard Newman <r.newman@reading.ac.uk>
Date: Wed, 6 Apr 2005 18:44:57 +0100
To: Stefano Mazzocchi <stefanom@mit.edu>
Cc: semantic-web@w3.org
Message-Id: <33ba87b821b97c8de9c67369762317e5@reading.ac.uk>
>> Having some way to express relationships between tags is useful, I 
>> think. E.g. I use data from two sources, one American and one 
>> English; the former tags with "humor" and the latter with "humour". 
>> Someone else might not equate the two, but I might wish to integrate 
>> them. Various uses can be applied for less-strict relationships, 
>> right down to relatedTag (which, as you note, is pretty much 
>> synonymous with the existence of any property). If it's going into 
>> ad-hoc ontologies, I may as well make them less ad-hoc and put them 
>> on the same page, no?
>
> Slap! Stop thinking like a librarian! :-)
>
> No, seriously, I hear you, but that is the Plato in you, get over it: 
> 'tag:related' does not add any semantics other than "this is related 
> to that"... which is the same information that you get when you have 
> *any* property between the two nodes: the 'tag:related' is a statement 
> that can be *always* inferred from the existance of another property 
> between two nodes, so it's totally redundant.

I did agree with you on tag:related, y'know :)
However, with reified relationships it's necessary to have _something_ 
to put in as the least-specific relation. I don't really care what it 
is.

> If you specify more "solid" relationships, you enter the platonic 
> space were *you* think that you can identify the relationships that 
> your users would want to use.
>
> Sure, we could do that, we could create a schema, an ontology for 
> those properties, what's the big deal?

Well, someone will have to do it... if in my tagging software I can 
allow a user to personally equate two tags, or mark one as broader, 
then I'm going to have to define those properties (or reuse them). I'm 
not suggesting that those links be global, but the terms must exist. Or 
do you think users should also mint tag relationships as well as tags?

> Well, the *WHOLE* point of tagging is to let users pick their things 
> and move from forcing people in the same mindsets, to promoting 
> personalized ontological mappings.
>
> Let's do this simple exercise:
>
>  "humour"@en-uk -(???)-> "humor"@en-us

<snip>

> Well, there is a challenge for you people: fill the above blanks and 
> send me the results privately (so that you don't influence one 
> another).  I will post the results, if we can get a reasonable 
> agreement without even making compromises, those deserve to be in the 
> ontology, if not, not.

I think you misunderstand: I don't think _any_ such relationships 
should be in a tag ontology -- it's not correct to relate tags 
objectively under any circumstances (unless you're the URI owner on two 
different systems, in which case owl:sameAs should work fine).

What I think should be in the ontology are _reified_ properties, such 
that I assert the following tuple:

(tags:equivalentTag tag:humour tag:humor rich:Richard "2005-04-06")

i.e. Richard asserts that humour and humor are equivalent in his view. 
Highly personal, and it doesn't affect anyone else unless they want it 
to.

(There is an alternative way to do it that totally loses 
interoperability -- if I want to equate tag:humor and tag:humour, I 
mint a new property -- richprops:equivalentOn20050406 -- with the 
explicit semantics of "Richard thought that these two tags were 
equivalent on 2005-04-06". It works, but it's horrible. This is fixing 
an n-ary relationship into a triple by encoding the additional actors 
into the term.)

>>> :taggedResource is useless: it can be easily inferred.
>> You mean as the inverse of tags:tag? I kept both in at this stage to 
>> provide an option for discussing. I can see two possible uses:
>> 1. you're focused on a resource, and wish to tag it. tags:tag 
>> pointing to a reification makes most sense.
>> 2. you're focused on a tag, or a user, and wish to model their 
>> tagging. The tagged resource isn't primary, so you point to it rather 
>> than from it.
>> The same applies for retrieval. Any thoughts from other interested 
>> parties? Which way should the Tagging <-> resource relationship 
>> point?
>
> Whichever way you pick, somebody will want the other way. My 
> suggestion: define both, and make one the inverse of the other, so 
> that no matter what people use, the inferencer can go the other way as 
> well.

That's what I was aiming for when I left both in... did I miss out the 
owl:inverseOf? :) I wasn't suggesting that people explicitly use both.

>>> Honestly, I don't think the complexity is worth the value of 
>>> modelling the 'act of tagging',
>> I would disagree with that... even if one takes the simplest kind of 
>> collaborative tagging, del.icio.us, exporting that database to RDF 
>> requires dealing with that problem. del.icio.us's database has two 
>> dates, an author, a resource, and a set of tags in addition to all 
>> the bookkeeping. Try fitting that lot in a triple :)
>
> That is not just a problem of delicious exporting, is a problem that 
> *everybody* has in the RDF space: how to model provenance (both in 
> space and time).
>
> Given the complexity of the problem (see the Harmony ABC ontology for 
> an example), I would strongly suggest to avoid modelling it inside the 
> tag ontology, but rather join forces with those trying to solve the 
> problem for RDF in general.

Despite the fact that it's not going to be practicable any time soon? 
"RDF-based interoperability between del.icio.us, Flickr, del.irio.us -- 
coming soon!"

If RDF used quads, or any similar system that allowed annotation of 
statements, I'd agree with you; it would make me reassess the modelling 
problem. It's a shame that RDF doesn't do it.

>> That's quite an interesting point. In an ideal world, this work 
>> indeed wouldn't be necessary at all; RDF would have shipped in '99 
>> with quads/named graphs/signing/WoT etc. as necessary, and we'd be 
>> able to simply tag a resource with some RDF and figure out when and 
>> who did the tagging. But we can't. RDF as it stands can't do 
>> generalised annotation of statements within the model.
>
> why can't you use named graphs for modelling provenance? [just curious]

Non-standard. AFAIK none of the tools I'm familiar with -- cwm, 
Redland, and Wilbur -- support named graphs, and I'm only peripherally 
aware of the Jena NG extension. As I think I mentioned, most stores use 
quads or similar, but RDF itself can't do it -- so you get a completely 
useless chunk of data when it hits the wire.

> Sure. I'm not concerned with the complexity of the model, I'm 
> concerned with the fact that you are modelling provenance and since 
> *everybody* has this problem, not just you, it seems like a waste of 
> time to model it in different efforts.

I see your point, but until something gets standardised we're 
unfortunately going to see a lot of domain-specific approaches. One can 
either see tagging as a special case of provenance/attribution (i.e. 
"user tag 'monkey'" being attributed to a person on a date), or as an 
n-ary relation (person, resource, tag, date). You are of the former 
persuasion; I see your point but I think it's the latter (mostly down 
to the importance of the person). Without _pervasive_ support for quads 
or named graphs, provenance isn't going to be solved any time soon, so 
I think the n-ary relation approach works best, even if it's less 
general.

The best practices for n-ary relations was on my mind when modelling 
the reifications.
<http://www.w3.org/TR/swbp-n-aryRelations/>

> But RDF is not XML, even if you model it in the ontology, it doesn't 
> mean that I have to use it ;-)
>
>> The reification is pretty handy in practice --- I've already put 
>> together a system that knows about related nodes through shared 
>> authors/tags/resources, and can do all the other stuff that one would 
>> expect a tagging system to do. Expecting to use provenance etc. at 
>> the moment wouldn't even be able to separate my tags from yours, or 
>> sort my taggings by date, which is unfortunate. If it could, it would 
>> be through using a non-standard technology like Named Graphs, Redland 
>> context nodes, etc.
>
> Non standard? Sparql introduced named graphs as a first order concept.

I thought SPARQL was still in Public Working Draft? "Nearly standard" :)
Please correct me if I'm wrong (I haven't looked into it thoroughly), 
but I don't think that one can maintain multiple named graphs in a flat 
RDF serialisation, even as a result from a SPARQL query. One can query 
based on the attributes of a graph, but you can't keep the provenance 
attributes in the result. No dice :(

-R
Received on Wednesday, 6 April 2005 17:45:06 UTC