- From: Mark Birbeck <mark.birbeck@x-port.net>
- Date: Fri, 10 Jun 2005 15:34:44 +0100
- To: "'Al Gilman'" <Alfred.S.Gilman@IEEE.org>
- Cc: <www-rdf-interest@w3.org>, <semantic-web@w3.org>, <www-html@w3.org>, <iptc-metadata@yahoogroups.com>, "'Misha Wolf'" <Misha.Wolf@reuters.com>, <dc-general@jiscmail.ac.uk>
Hi Al, In response to my point: > >Misha and the IPTC's interest though is in putting subject > codes in the > >form of QNames into the object part of the statement, not > the predicate part. you said: > That is not how I read the question. As I read the question, > that is not what they want to do; rather that is *where they > found a roadblock in the path to a solution that they were > going down.* I don't think you did the "if that's not the > answer, what's the question?" step to back off to the real > requirement. I see no need to be as rude as you were, but if you're unable to suppress it, why not at least make sure you are correct? Anyway...enough moaning... THE PROBLEM Misha clarified his initial issue in a follow-up email, like this: > Indeed, XHTML2 lets us define any element or string to be, say, > "dc:creator" or "dc:subject". What it does *not* do is let us > specify that the value of dc:subject is, say, "foo:bar", where > "foo" identifies a vocabulary and "bar" a term in that vocabulary. That seems pretty clear to me. His initial raising of the question was attached to your email, and went like this: > >> >The News Architecture Working Party of the International Press > >> >Telecommunications Council (IPTC) is investigating the use of > >> >XHTML2 for expressing DC and other metadata. A major > problem for > >> us is >the lack of support in the current XHTML2 draft (as in > >> RDF/XML) for the > >> >use of QNames to express terms in controlled vocabularies (aka > >> values >of properties). > >> > > >> >At the moment, the XHTML2 @content attribute takes > PCDATA and the > >> @href >attribute takes IRIs. There is no attribute available for > >> QNames. > >> > > >> >We want to be able to use, eg, <dc:subject> with a QName as a > >> value (ie >the object of the RDF statement). The reasons > include > >> legibility and >compactness. News items (and news > headlines) often > >> carry numerous > > > >subject codes, hence the need for compactness. That also seems pretty clear. Anyway, far from 'not taking a step back', I've been thinking about nothing else since Misha and Laurent Le Meur explained their requirements to me in London on Sunday (and explained them again to both myself and Steven on Tuesday, during Steven's presentation to their AGM). As I said at the meeting, just when we think we have the metadata story wrapped up, along comes another requirement! But as I've stressed in these discussions before, the HTML WG is fully committed to solving such issues as far as we can (at least without turning the syntax into something ridiculous). So, on to the issue itself. You said: > What I believe they want is a compact notation by which they > can associate content blocks with well-known (much re-used) > subject categories. Well, that it should be compact is pretty obvious. But if we take your advice and 'take a step back', you'll find that the much bigger issue is that they want to make statements about statements *not* statements about subject codes. That isn't how the question was posed, but that is the stumbling block for what they want to do, as I'll try to show. COMPACTNESS Let's go through the issue, beginning with Misha's summary, above: > News items (and news headlines) often carry numerous > subject codes, hence the need for compactness. Their primary concern is the sheer quantity of subject codes that they want to have for a document. Since the news organisations may sometimes just send a simple headline, they are worried that the amount of metadata transmitted will dwarf the article itself. The reason that this can happen is that the IPTC have a requirement that a document should contain pretty much all of the metadata that it would need. They don't want a document to have just a small number of codes, and then at some point in the process go off to another database to look-up some inferred values--they want as much information as possible to be in the document there and then. (They may of course use inference databases to do the initial population.) The main motivation for this is to reduce network bandwidth as well as speed up any processing that will take place on that document as it moves through the system. It also means that you only have to send one package to a consumer of your news. Anyway, whatever the motivation, this certainly means that the syntax for expressing the subject codes needs to be succinct. META-METADATA However, if it was just a matter of the quantity of subject codes then things might be tolerable. But the problem is greatly compounded by the fact that each subject code might have information about who added it to the document. As a document moves through the system, different codes may be added on the way, by different organisations and individuals--they want to record the who, the when and the why. But what they are adding is metadata about the editing process--not the "subject categories", as you imply--and this leads into good old reification! I'll go through their examples to show what the *real* problem with their mark-up is. Laurent and Misha had the following 'fantasy mark-up': <iptc:contributor val="afp:llm">Laurent Le Meur</iptc:contributor> <iptc:subject val="srs:15000000" assignee="afp:llm" date="2005-06-06" xml:lang="fr">Sport</iptc:subject> To even begin to accommodate this we've already agreed that we need QNames, somewhere, somehow. In the following examples I'll use the '['/']' form that I referred to in another post, but whether it's a set of new QName attributes, or some 'adorned' syntax doesn't matter...let's assume for now that we have somehow solved the issue. In addition to this tweak to XHTML 2, we need a further enhancement which is to allow attributes in other namespaces to serve as predicates. So, using the strawman proposals for QNames and allowing additional predicates, we now have: <link rel="[iptc:contributor]" href="[afp:llm]">Laurent Le Meur</link> <link rel="[iptc:subject]" href="[srs:15000000]" iptc:assignee="[afp:llm]" iptc:date="'2005-06-06'" xml:lang="fr">Sport</link> Note a couple of additional things: * the date is quoted to show that it is a string literal; * the @rel and @rev values should now have '['/']' for consistency. FINALLY...THE 'REAL' PROBLEM However, the 'real' problem with this for the IPTC is that in XHTML 2 syntax, all of the predicates in this example are about the *document*, and we want them to be about the assignment of the subject code. Note the wording there--it's not about the subject code, but about the *assignment* of the subject code. So it's not a simple matter of making statements that have the subject code as the 'subject': <meta about="[srs:15000000]" iptc:assignee="[afp:llm]" iptc:date="'2005-06-06'" xml:lang="fr" /> since these are *not* statements about its *assignment*. (Hopefully now you can see why I wanted to think about it a bit more before I made any suggestions.) So the 'real' problem is ultimately to do with having a means to refer to statements. In the current draft you can nest statements in such a way that they refer to their parent statement. It might look like this (assuming the new additions to XHTML 2, described above): <link rel="[iptc:contributor]" href="[afp:llm]">Laurent Le Meur</link> <link id="a" rel="[iptc:subject]" href="[srs:15000000]"> <meta iptc:assignee="[afp:llm]" dc:date="'2005-06-06'" xml:lang="fr">Sport</meta> </link> The nested statements refer to the containing statement, not the document, giving you this: <> iptc:contributor afp:11m . <> dc:subject srs:15000000 . <#a> iptc:assignee afp:11m . <#a> dc:date "2005-06-06" . ('srs:15000000' won't actually work as a format, but ignore that for now.) However, you still don't know what #a refers to, and this is where we almost certainly need to introduce reification. We probably need the presence of an @id to 'explode' the statement (similar to @ID in RDF/XML), as follows: <> iptc:contributor afp:11m . <> dc:subject srs:15000000 . <#a> rdf:type rdf:Statement . <#a> rdf:subject <> . <#a> rdf:predicate dc:subject . <#a> rdf:object srs:15000000 . <#a> iptc:assignee afp:11m . <#a> dc:date "2005-06-06" . I'll leave it there, since this may have frightened the life out of most people on the HTML list, and perhaps we can continue this thread over on the RDF lists. (HTML people won't miss anything, since whether we add a sprinkling of reification or not has consequence for the RDF processing side, and not the mark-up.) > So I claim that creating a short QName notation that > (sufficiently formally and authoritatively) expands to "has > 'subject' (per Dublin > Core) of 'sports' (per IPTC)" is solving precisely the > problem that they face. Hopefully now you can see why your 'claim' is wrong...that was the easy-peasy problem to solve. > And if we were to make 'role' M-ary > rather than unary, this would flow neatly in there. Having multiple @role values is certainly desirable, but it doesn't solve their problem either. I didn't follow most of the remaining parts of your email, so forgive for not responding to them (I think they concerned mapping types with rdf:type). I have, however, some comments on the final part about @class, which I'll try to get down at some point, in a separate email. Regards, Mark Mark Birbeck CEO x-port.net Ltd. e: Mark.Birbeck@x-port.net t: +44 (0) 20 7689 9232 w: http://www.formsPlayer.com/ b: http://internet-apps.blogspot.com/ Download our XForms processor from http://www.formsPlayer.com/
Received on Friday, 10 June 2005 15:38:48 UTC