Re: Tag ontology RFC from Stefano Mazzocchi on 2005-04-07 (semantic-web@w3.org from April 2005)

From: Stefano Mazzocchi <stefanom@mit.edu>
Date: Wed, 06 Apr 2005 21:26:33 -0400
To: Richard Newman <r.newman@reading.ac.uk>, semantic-web@w3.org
Message-ID: <42548C49.6030101@mit.edu>
Richard Newman wrote:
> 
>>> Having some way to express relationships between tags is useful, I 
>>> think. E.g. I use data from two sources, one American and one 
>>> English; the former tags with "humor" and the latter with "humour". 
>>> Someone else might not equate the two, but I might wish to integrate 
>>> them. Various uses can be applied for less-strict relationships, 
>>> right down to relatedTag (which, as you note, is pretty much 
>>> synonymous with the existence of any property). If it's going into 
>>> ad-hoc ontologies, I may as well make them less ad-hoc and put them 
>>> on the same page, no?
>>
>>
>> Slap! Stop thinking like a librarian! :-)
>>
>> No, seriously, I hear you, but that is the Plato in you, get over it: 
>> 'tag:related' does not add any semantics other than "this is related 
>> to that"... which is the same information that you get when you have 
>> *any* property between the two nodes: the 'tag:related' is a statement 
>> that can be *always* inferred from the existance of another property 
>> between two nodes, so it's totally redundant.
> 
> I did agree with you on tag:related, y'know :)
> However, with reified relationships it's necessary to have _something_ 
> to put in as the least-specific relation. I don't really care what it is.

Ok, I see. You are reifying RDF :-)

>> If you specify more "solid" relationships, you enter the platonic 
>> space were *you* think that you can identify the relationships that 
>> your users would want to use.
>>
>> Sure, we could do that, we could create a schema, an ontology for 
>> those properties, what's the big deal?
> 
> Well, someone will have to do it... if in my tagging software I can 
> allow a user to personally equate two tags, or mark one as broader, then 
> I'm going to have to define those properties (or reuse them). I'm not 
> suggesting that those links be global, but the terms must exist. Or do 
> you think users should also mint tag relationships as well as tags?

yep, that's my entire point.

>> Well, the *WHOLE* point of tagging is to let users pick their things 
>> and move from forcing people in the same mindsets, to promoting 
>> personalized ontological mappings.
>>
>> Let's do this simple exercise:
>>
>>  "humour"@en-uk -(???)-> "humor"@en-us
> 
> 
> <snip>
> 
>> Well, there is a challenge for you people: fill the above blanks and 
>> send me the results privately (so that you don't influence one 
>> another).  I will post the results, if we can get a reasonable 
>> agreement without even making compromises, those deserve to be in the 
>> ontology, if not, not.
> 
> 
> I think you misunderstand: I don't think _any_ such relationships should 
> be in a tag ontology -- it's not correct to relate tags objectively 
> under any circumstances (unless you're the URI owner on two different 
> systems, in which case owl:sameAs should work fine).
> 
> What I think should be in the ontology are _reified_ properties, such 
> that I assert the following tuple:
> 
> (tags:equivalentTag tag:humour tag:humor rich:Richard "2005-04-06")
> 
> i.e. Richard asserts that humour and humor are equivalent in his view. 
> Highly personal, and it doesn't affect anyone else unless they want it to.
> 
> (There is an alternative way to do it that totally loses 
> interoperability -- if I want to equate tag:humor and tag:humour, I mint 
> a new property -- richprops:equivalentOn20050406 -- with the explicit 
> semantics of "Richard thought that these two tags were equivalent on 
> 2005-04-06". It works, but it's horrible. This is fixing an n-ary 
> relationship into a triple by encoding the additional actors into the 
> term.)

I agree, this is horribly hacky :-) Encoding semantics in the URIs to 
avoid reification is terrible practice.

>>>> :taggedResource is useless: it can be easily inferred.
>>>
>>> You mean as the inverse of tags:tag? I kept both in at this stage to 
>>> provide an option for discussing. I can see two possible uses:
>>> 1. you're focused on a resource, and wish to tag it. tags:tag 
>>> pointing to a reification makes most sense.
>>> 2. you're focused on a tag, or a user, and wish to model their 
>>> tagging. The tagged resource isn't primary, so you point to it rather 
>>> than from it.
>>> The same applies for retrieval. Any thoughts from other interested 
>>> parties? Which way should the Tagging <-> resource relationship point?
>>
>>
>> Whichever way you pick, somebody will want the other way. My 
>> suggestion: define both, and make one the inverse of the other, so 
>> that no matter what people use, the inferencer can go the other way as 
>> well.
> 
> That's what I was aiming for when I left both in... did I miss out the 
> owl:inverseOf? :) I wasn't suggesting that people explicitly use both.

Oh, all right, maybe *I* missed it :-)

>>>> Honestly, I don't think the complexity is worth the value of 
>>>> modelling the 'act of tagging',
>>>
>>> I would disagree with that... even if one takes the simplest kind of 
>>> collaborative tagging, del.icio.us, exporting that database to RDF 
>>> requires dealing with that problem. del.icio.us's database has two 
>>> dates, an author, a resource, and a set of tags in addition to all 
>>> the bookkeeping. Try fitting that lot in a triple :)
>>
>>
>> That is not just a problem of delicious exporting, is a problem that 
>> *everybody* has in the RDF space: how to model provenance (both in 
>> space and time).
>>
>> Given the complexity of the problem (see the Harmony ABC ontology for 
>> an example), I would strongly suggest to avoid modelling it inside the 
>> tag ontology, but rather join forces with those trying to solve the 
>> problem for RDF in general.
> 
> Despite the fact that it's not going to be practicable any time soon? 
> "RDF-based interoperability between del.icio.us, Flickr, del.irio.us -- 
> coming soon!"

I don't think the world is going to stop bashing RDF just because we 
have a way to model folksonomies and jump on the bangwagon tomorrow.

Actually, I really hope they don't: we still have a lot of work to do on 
the foundations of the semantic web before we can stand the pressure of 
real-life usages at a massive global scale.

> If RDF used quads, or any similar system that allowed annotation of 
> statements, I'd agree with you; it would make me reassess the modelling 
> problem. It's a shame that RDF doesn't do it.

Here I cannot agree more.

It feels like HTTP/1.0 and HTTP/1.1: the first was a cool idea, but 
missed a few things that were needed in the real world (virtual hosts, 
keep-alive, better proxy controls) and made it scale to what it is today.

We need an RDF 1.1 that introduces those things that are needed.... the 
problem is that there is no agreement on how to do those things, so, I 
changed my mind: let's go modelling all the tagging events and let's see 
how far we can go with that.

>>> That's quite an interesting point. In an ideal world, this work 
>>> indeed wouldn't be necessary at all; RDF would have shipped in '99 
>>> with quads/named graphs/signing/WoT etc. as necessary, and we'd be 
>>> able to simply tag a resource with some RDF and figure out when and 
>>> who did the tagging. But we can't. RDF as it stands can't do 
>>> generalised annotation of statements within the model.
>>
>>
>> why can't you use named graphs for modelling provenance? [just curious]
> 
> 
> Non-standard. AFAIK none of the tools I'm familiar with -- cwm, Redland, 
> and Wilbur -- support named graphs, and I'm only peripherally aware of 
> the Jena NG extension. As I think I mentioned, most stores use quads or 
> similar, but RDF itself can't do it -- so you get a completely useless 
> chunk of data when it hits the wire.

Sparql defines named graphs as first class citizens of the RDF world and 
you can be sure that every triple-store that is worth this name will 
implement it, once recommended (and I don't think NG are going away 
between now and then)

>> Sure. I'm not concerned with the complexity of the model, I'm 
>> concerned with the fact that you are modelling provenance and since 
>> *everybody* has this problem, not just you, it seems like a waste of 
>> time to model it in different efforts.
> 
> I see your point, but until something gets standardised we're 
> unfortunately going to see a lot of domain-specific approaches. 

Very true.

> One can 
> either see tagging as a special case of provenance/attribution (i.e. 
> "user tag 'monkey'" being attributed to a person on a date), or as an 
> n-ary relation (person, resource, tag, date). You are of the former 
> persuasion; I see your point but I think it's the latter (mostly down to 
> the importance of the person). Without _pervasive_ support for quads or 
> named graphs, provenance isn't going to be solved any time soon, so I 
> think the n-ary relation approach works best, even if it's less general.

All right, let's try with this.

> The best practices for n-ary relations was on my mind when modelling the 
> reifications.
> <http://www.w3.org/TR/swbp-n-aryRelations/>

ok

>> But RDF is not XML, even if you model it in the ontology, it doesn't 
>> mean that I have to use it ;-)
>>
>>> The reification is pretty handy in practice --- I've already put 
>>> together a system that knows about related nodes through shared 
>>> authors/tags/resources, and can do all the other stuff that one would 
>>> expect a tagging system to do. Expecting to use provenance etc. at 
>>> the moment wouldn't even be able to separate my tags from yours, or 
>>> sort my taggings by date, which is unfortunate. If it could, it would 
>>> be through using a non-standard technology like Named Graphs, Redland 
>>> context nodes, etc.
>>
>>
>> Non standard? Sparql introduced named graphs as a first order concept.
> 
> 
> I thought SPARQL was still in Public Working Draft? "Nearly standard" :)

eheh

> Please correct me if I'm wrong (I haven't looked into it thoroughly), 
> but I don't think that one can maintain multiple named graphs in a flat 
> RDF serialisation, even as a result from a SPARQL query. One can query 
> based on the attributes of a graph, but you can't keep the provenance 
> attributes in the result. No dice :(

Ok, let's go with the full modelling then and see where this leads us.

-- 
Stefano Mazzocchi
Research Scientist                 Digital Libraries Research Group
Massachusetts Institute of Technology            location: E25-131C
77 Massachusetts Ave                   telephone: +1 (617) 253-1096
Cambridge, MA  02139-4307              email: stefanom at mit . edu
-------------------------------------------------------------------
Received on Thursday, 7 April 2005 01:26:36 UTC