Re: Tag ontology RFC from Stefano Mazzocchi on 2005-04-06 (semantic-web@w3.org from April 2005)

From: Stefano Mazzocchi <stefanom@mit.edu>
Date: Wed, 06 Apr 2005 12:56:17 -0400
To: Richard Newman <r.newman@reading.ac.uk>
Cc: semantic-web@w3.org
Message-ID: <425414B1.5060707@mit.edu>
Richard Newman wrote:
> 
> Stefano,
>   Thank you for your comments. Replies below.

You are welcome.

>> First of all, tagging is the idea of allowing people to choose their 
>> own 'things' instead of relying on somebody else's concepts... and 
>> here you are, defining things like
>>
>>  :equivalentTag
>>  :relatedTag
>>
>> that are exactly those idealized concepts that fit your mindset but 
>> might not fit mine. I suggest to remove those alltogether and let 
>> ad-hoc-ontologies handle the relationships between tags. Why? well, 
>> there is no difference between tag:relatedTag and a general purpose 
>> RDF property anyway.
> 
> Firstly, I have been drifting towards reifying tag relations, too, as I 
> think I added to that document. I am of the belief that both taggings 
> and interrelationships between tags are user-centric, and so must be 
> reified to capture this additional information. Danny has eloquently put 
> forth a companion view, that similar goals can be accomplished by 
> distinguishing between users' tags through namespaces.

The idea of 'grouping-via-namespace' is very XMLish and not sure applies 
very well here, but at the end of the day, what's important for 
interoperability is that the identifiers for your tags are globally 
unique, the rest is just personal taste.

> Having some way to express relationships between tags is useful, I 
> think. E.g. I use data from two sources, one American and one English; 
> the former tags with "humor" and the latter with "humour". Someone else 
> might not equate the two, but I might wish to integrate them. Various 
> uses can be applied for less-strict relationships, right down to 
> relatedTag (which, as you note, is pretty much synonymous with the 
> existence of any property). If it's going into ad-hoc ontologies, I may 
> as well make them less ad-hoc and put them on the same page, no?

Slap! Stop thinking like a librarian! :-)

No, seriously, I hear you, but that is the Plato in you, get over it: 
'tag:related' does not add any semantics other than "this is related to 
that"... which is the same information that you get when you have *any* 
property between the two nodes: the 'tag:related' is a statement that 
can be *always* inferred from the existance of another property between 
two nodes, so it's totally redundant.

If you specify more "solid" relationships, you enter the platonic space 
were *you* think that you can identify the relationships that your users 
would want to use.

Sure, we could do that, we could create a schema, an ontology for those 
properties, what's the big deal?

Well, the *WHOLE* point of tagging is to let users pick their things and 
move from forcing people in the same mindsets, to promoting personalized 
ontological mappings.

Let's do this simple exercise:

  "humour"@en-uk -(???)-> "humor"@en-us
  "humor"@en -(???)-> "humors"@en
  "humor"@?x -(???)-> "humor"@?y where ?x != ?y
  "humor"@en -(???)-> "Humor"@en
  "humor"@en -(???)-> "umore"@it

These seem pretty objective relationships to define, unlike

  "humor"@en -(???)-> "irony"@en

which we all understand it could be pretty subjective.

Well, there is a challenge for you people: fill the above blanks and 
send me the results privately (so that you don't influence one another). 
  I will post the results, if we can get a reasonable agreement without 
even making compromises, those deserve to be in the ontology, if not, not.

>> The only relationships that should be put in a tag ontology are those 
>> that are objective to the tag themselves, for example "collidesWith" 
>> if they share at least one label. The rest should be left to the users 
>> to decide (whether they are equivalent, related, or in what kind of 
>> relation they are).
> 
> I think you're solely commenting on the single-triple tag 
> interrelations. Yes, I quite agree... I was of a mind to remove 
> equivalentTag for that reason. However, :relatedTag can be considered as 
> objective as any statement (e.g. shared tagged object = related?).

Like I said before :relatedTag is totally redundant since its existance 
can be inferred from the existance of any other property between nodes.

>> Minor, but I think :tagName should be :name, let the namespace provide 
>> the context.
> 
> Mmm.
> 
>> :taggedResource is useless: it can be easily inferred.
> 
> You mean as the inverse of tags:tag? I kept both in at this stage to 
> provide an option for discussing. I can see two possible uses:
> 
> 1. you're focused on a resource, and wish to tag it. tags:tag pointing 
> to a reification makes most sense.
> 2. you're focused on a tag, or a user, and wish to model their tagging. 
> The tagged resource isn't primary, so you point to it rather than from it.
> 
> The same applies for retrieval. Any thoughts from other interested 
> parties? Which way should the Tagging <-> resource relationship point?

Whichever way you pick, somebody will want the other way. My suggestion: 
define both, and make one the inverse of the other, so that no matter 
what people use, the inferencer can go the other way as well.

>> Honestly, I don't think the complexity is worth the value of modelling 
>> the 'act of tagging',
> 
> I would disagree with that... even if one takes the simplest kind of 
> collaborative tagging, del.icio.us, exporting that database to RDF 
> requires dealing with that problem. del.icio.us's database has two 
> dates, an author, a resource, and a set of tags in addition to all the 
> bookkeeping. Try fitting that lot in a triple :)

That is not just a problem of delicious exporting, is a problem that 
*everybody* has in the RDF space: how to model provenance (both in space 
and time).

Given the complexity of the problem (see the Harmony ABC ontology for an 
example), I would strongly suggest to avoid modelling it inside the tag 
ontology, but rather join forces with those trying to solve the problem 
for RDF in general.

>> but in any case I definately disagree with rss:Item rdfs:subClassOf 
>> tags:Tagging .
>>
>> If you go down this path, pretty much any action related to add RDF to 
>> something has to be a subClass of tagging.... and pretty soon you end 
>> up modelling provenance, trust and all that yourself.
> 
> We (Seth Russell, Danny, and myself) had a fair amount of discussion of 
> this, which led to my conclusion:
> - some rss:items might be considered taggings (those that annotate a 
> resource with some categories)
> - but it's far from a perfect match.

Well, I see no difference between tagging and in any other creation of 
an RDF statement. And associating the publishing of an item with the 
creation of the metadata about feels wrong to me.

> I summarised this by saying "Personally, I would tend to go for the "tag 
> the rss:item" approach." i.e. it's not a close enough match to formalise.
> 
>> There is really nothing different between tagging and adding RDF. The 
>> only difference is that the inference needed to extract :collidesWith 
>> is different enough that requires me to type it.
>>
>> Anything else is just the exact same modelling that applies to any RDF 
>> creation action, so we should just build on the shoulders of those who 
>> are working on provenance and trust, instead of reinventing the wheel 
>> every single time.
> 
> 
> That's quite an interesting point. In an ideal world, this work indeed 
> wouldn't be necessary at all; RDF would have shipped in '99 with 
> quads/named graphs/signing/WoT etc. as necessary, and we'd be able to 
> simply tag a resource with some RDF and figure out when and who did the 
> tagging. But we can't. RDF as it stands can't do generalised annotation 
> of statements within the model.

why can't you use named graphs for modelling provenance? [just curious]

> As such, this little ontology came about with the goal of modelling the 
> output of something like del.icio.us --- a system which allows _users_ 
> to apply text-string tags to URIs. The complexity of the data model (use 
> of reification) is rather irrelevant. To create a tag: mint a URI and 
> point it to the given label (optionally language tagged). Tagging a 
> resource: encode who and when, and dump them out as a few triples. The 
> user interface hides the rest.

Sure. I'm not concerned with the complexity of the model, I'm concerned 
with the fact that you are modelling provenance and since *everybody* 
has this problem, not just you, it seems like a waste of time to model 
it in different efforts.

But RDF is not XML, even if you model it in the ontology, it doesn't 
mean that I have to use it ;-)

> The reification is pretty handy in practice --- I've already put 
> together a system that knows about related nodes through shared 
> authors/tags/resources, and can do all the other stuff that one would 
> expect a tagging system to do. Expecting to use provenance etc. at the 
> moment wouldn't even be able to separate my tags from yours, or sort my 
> taggings by date, which is unfortunate. If it could, it would be through 
> using a non-standard technology like Named Graphs, Redland context 
> nodes, etc.

Non standard? Sparql introduced named graphs as a first order concept.

-- 
Stefano Mazzocchi
Research Scientist                 Digital Libraries Research Group
Massachusetts Institute of Technology            location: E25-131C
77 Massachusetts Ave                   telephone: +1 (617) 253-1096
Cambridge, MA  02139-4307              email: stefanom at mit . edu
-------------------------------------------------------------------
Received on Wednesday, 6 April 2005 16:56:20 UTC