- From: Mark Birbeck <Mark.Birbeck@x-port.net>
- Date: Fri, 1 Aug 2003 12:35:16 +0100
- To: www-html@w3.org
- Cc: 'Tantek Çelik' <tantek@cs.stanford.edu>, Nigel Peck - MIS Web Design <nigel@miswebdesign.com>, Jeroen Budts <jbudts@mail.be>
Nigel wrote: > I think RDF would be the best solution to this, a much richer syntax > can be provided in this way and I believe it is already in place. Tantek replied: > I think RDF tries to be a solution to this, a more (needlessly?) complex, > hard to write, read or understand syntax can be provided in this way and > it is rumored that some folks actually do this. Mmm ... seems RDF has as few friends as XLink ;-) I think it's worth clarifying some concepts, and then looking at Jeroen's and Tantek's proposals. I'd then like to suggest a couple of small changes to XHTML 2.0 which I think would facilitate their proposals. First, some of the concepts I'll need for my argument. CONCEPTS 1. RDF doesn't have a prescribed "syntax" that is complex - "needlessly" or otherwise. True there is RDF/XML which is one way of transporting RDF using XML, but <meta> inside HTML is another way of transporting RDF via XML. ("What! I've been using the dreaded RDF and no-one told me ...") 2. So what is RDF, if not XML? It's simply a way of clarifying common features of meta data. It draws from things like database theory and AI, and tries to say "what can we find that is common across different types of meta data". Some of the conclusions are: * that meta data can be regarded as a set of statements, for example "this email was written by Mark", "this document has a title of 'My Life'", "this dog is called Fido"; * that those statements break down into the thing you are making the statement about ("this email", "this dog", "this document") ... * ... the aspect of the object that you are describing ("author", "title", "name") ... * ... and the contents of that property ("Mark", "My Life", "Fido"). 3. Since all the meta data we could want can be expressed using this simple three-part structure (usually called "triples") then we now have a pretty powerful cross-discipline way of conveying information. (More complex structures can be converted to these basic triples.) 4. One particularly powerful aspect of this is that you can 'make statements' about other people's data. So a central library database could maintain an index of books by storing triples that indicate the title, ISBN number, and so on. Then we could set up a series of triples that say whether we thought the book was good or bad, without having to have any control over the original data. THE meta ATTRIBUTE OK - if we agree that RDF isn't really going to take off our children in the night, and we also agree that RDF's power is in the triples *not* in the RDF/XML syntax (which is as we said only one of the many ways that triples could be expressed) let's go back to Jeroen's proposition. Jeroen suggested we allow a @meta attribute on elements, and gave an example: > <blockquote xml:lang="en-us" meta="#AndyQuote"> > The most beautiful thing in Tokyo is McDonald's. > The most beautiful thing in Stockholm is McDonald's. > The most beautiful thing in Florence is McDonald's. > Pecking and Moscow don't have anything beautiful yet. > </blockquote> Whilst this is a good suggestion it loses one of the aspects of our RDF triples, which is that the statements about the data can be made 'outside' of that data. In order to attribute the quote you have had to modify the quote. But if we think using RDF triples is a good idea (and a lot of people who know a lot about meta data seem to think so) we need to come up with a way of doing what Jeroen wants, using triples. Essentially we are saying, how do we 'carry' the triples in XHTML 2.0, and how do we 'apply' them to the quote. THE META ELEMENT We already have one way of carrying meta data, which is <meta>. Jeroen's example shows: > <meta id="AndyQuote"> > <meta name="author">Andy Warhol</meta> > <meta name="DC.Language">en-us</meta> > <meta name="DC.Title">THE Philosophy of Andy Warhol</meta> > <meta name="chapter">4 - Beauty</meta> > <meta name="page">71</meta> > </meta> The problem is though, if we think back to our triples we're missing a piece of information - we don't know what these statements are about. We know that 'something' was written by Andy Warhol, and whatever it was, was written in American English. But we don't know what this 'something' is yet. TANTEK'S PROPOSAL One way of connecting this set of incomplete statements to the thing they are about is simply to nest them inside the object they were about. This was Tantek's proposal, illustrated as follows: > <blockquote xml:lang="en-us"> > <!-- and the following only about the quote --> > <meta> > <meta name="author">Andy Warhol</meta> > <meta name="DC.Language">en-us</meta> > <meta name="DC.Title">THE Philosophy of Andy Warhol</meta> > <meta name="chapter">4 - Beauty</meta> > <meta name="page">71</meta> > </meta> > The most beautiful thing in Tokyo is McDonald's. > The most beautiful thing in Stockholm is McDonald's. > The most beautiful thing in Florence is McDonald's. > Pecking and Moscow don't have anything beautiful yet. > </blockquote> I think that this proposal would be very powerful, and should definitely find it's way into XHTML - the idea of 'meta' anywhere. However, we are still not quite there - we still want to be able to make statements without modifying the data that we are making statements about. For this - as Nigel says - we might have to use RDF/XML! THE RDF/XML SOLUTION So what would be the 'orrible RDF/XML solution? Just to recap, we're saying that we want to express our meta data as a set of statements 'about something', and that we want to be able to say what 'thing' the statements are about, without having to modify 'the thing'. Well there are many ways to express it in RDF/XML, but here's one. (Turn away now if you don't want to be scared!): <!-- xmlns:x is some book related namespace. x:author should ideally derive from dc:Creator. Note that really we should have a separate layer for 'source of the quote', distinct from things like its language and author. --> <rdf:description rdf:about="#Quote"> <x:author>Andy Warhol</x:author> <dc:Language>en-us</dc:Language> <dc:Title>THE Philosophy of Andy Warhol</dc:Title> <x:chapter>4 - Beauty</x:chapter> <x:page>71</x:page> </rdf:description> <blockquote xml:lang="en-us" id="#Quote"> The most beautiful thing in Tokyo is McDonald's. The most beautiful thing in Stockholm is McDonald's. The most beautiful thing in Florence is McDonald's. Pecking and Moscow don't have anything beautiful yet. </blockquote> The RDF/XML reads as follows: There is an object identified by "#Quote" Which has a property of Author And the value of that property is "Andy Warhol"; Which has a property of Language And the value of that property is "en-us"; As you can see, the only thing we have had to introduce is <rdf:Description> and @rdf:about. How might we express this in XHTML 2.0? Well, if the RDF/XML was embedded in the XHTML document in either of the following ways an RDF/XML parser would have no problems: <head> <rdf:Description rdf:about="#Quote"> <x:author>Andy Warhol</x:author> <dc:Language>en-us</dc:Language> <dc:Title>THE Philosophy of Andy Warhol</dc:Title> <x:chapter>4 - Beauty</x:chapter> <x:page>71</x:page> </rdf:Description> </head> or: <head> <rdf:Description rdf:about="#Quote" x:author="Andy Warhol" dc:Language="en-us" dc:Title="THE Philosophy of Andy Warhol" x:chapter="4 - Beauty" x:page="71" /> </head> However, whilst an RDF/XML parser would have no problems with this, it may not be so desirable for XHTML 2.0 browsers. One technique would be to allow the RDF/XML inside <meta>: <head> <meta> <rdf:Description rdf:about="#Quote"> <x:author>Andy Warhol</x:author> <dc:Language>en-us</dc:Language> <dc:Title>THE Philosophy of Andy Warhol</dc:Title> <x:chapter>4 - Beauty</x:chapter> <x:page>71</x:page> </rdf:Description> </meta> </head> and then require that XHTML allow any elements inside <meta>. Unfortunately, this can cause some problems with validation in some systems. So, a simple trick would be to allow any attributes on <meta>: <meta rdf:about="#Quote" x:author="Andy Warhol" dc:Language="en-us" dc:Title="THE Philosophy of Andy Warhol" x:chapter="4 - Beauty" x:page="71" /> If you run this through an RDF/XML validator (with the suitable namespaces added) you'll find that this is perfectly valid RDF/XML, and expresses exactly what we want (if you are an RDF 'expert' then you will spot an extra triple, but I think we can live with that). So, to summarise: * I agree with Tantek that we should allow <meta> anywhere. * I think that <meta> should allow any attributes from any namespace, not just @name. * <meta> should allow @rdf:about as a means of specifying what the 'statements' are about. Thoughts and comments would be most welcome, since I do think it is important to 'crack' the meta data problem in XHTML 2.0, in a way that works with RDF. Regards, Mark Mark Birbeck Co-author Professional XML and Professional XML Meta Data, both by Wrox Press Managing Director x-port.net Ltd. 4 Pear Tree Court London EC1R 0DS E: Mark.Birbeck@x-port.net W: www.x-port.net T: +44 (20) 7689 9232
Received on Friday, 1 August 2003 07:39:01 UTC