RE: [XHTML2] meta attribute from Mark Birbeck on 2003-08-01 (www-html@w3.org from August 2003)

From: Mark Birbeck <Mark.Birbeck@x-port.net>
Date: Fri, 1 Aug 2003 12:35:16 +0100
To: www-html@w3.org
Cc: 'Tantek Çelik' <tantek@cs.stanford.edu>, Nigel Peck - MIS Web Design <nigel@miswebdesign.com>, Jeroen Budts <jbudts@mail.be>
Message-ID: <E3ED00A7C285EE408679DE2A26D1C7810148F661@S007.x-port.net>
Nigel wrote:
> I think RDF would be the best solution to this, a much richer syntax 
> can be provided in this way and I believe it is already in place.

Tantek replied:
> I think RDF tries to be a solution to this, a more (needlessly?) complex,
> hard to write, read or understand syntax can be provided in this way and
> it is rumored that some folks actually do this.


Mmm ... seems RDF has as few friends as XLink ;-)



I think it's worth clarifying some concepts, and then looking at Jeroen's
and Tantek's proposals. I'd then like to suggest a couple of small changes
to XHTML 2.0 which I think would facilitate their proposals.

First, some of the concepts I'll need for my argument. 



CONCEPTS

1. RDF doesn't have a prescribed "syntax" that is complex - "needlessly" or
otherwise. True there is RDF/XML which is one way of transporting RDF using
XML, but <meta> inside HTML is another way of transporting RDF via XML.
("What! I've been using the dreaded RDF and no-one told me ...")

2. So what is RDF, if not XML? It's simply a way of clarifying common
features of meta data. It draws from things like database theory and AI, and
tries to say "what can we find that is common across different types of meta
data". Some of the conclusions are:

* that meta data can be regarded as a set of statements, for example "this
  email was written by Mark", "this document has a title of 'My Life'",
  "this dog is called Fido";

* that those statements break down into the thing you are making the
  statement about ("this email", "this dog", "this document") ...

* ... the aspect of the object that you are describing ("author", "title",
  "name") ...

* ... and the contents of that property ("Mark", "My Life", "Fido").

3. Since all the meta data we could want can be expressed using this simple
three-part structure (usually called "triples") then we now have a pretty
powerful cross-discipline way of conveying information. (More complex
structures can be converted to these basic triples.)
 
4. One particularly powerful aspect of this is that you can 'make
statements' about other people's data. So a central library database could
maintain an index of books by storing triples that indicate the title, ISBN
number, and so on. Then we could set up a series of triples that say whether
we thought the book was good or bad, without having to have any control over
the original data.



THE meta ATTRIBUTE

OK - if we agree that RDF isn't really going to take off our children in the
night, and we also agree that RDF's power is in the triples *not* in the
RDF/XML syntax (which is as we said only one of the many ways that triples
could be expressed) let's go back to Jeroen's proposition.

Jeroen suggested we allow a @meta attribute on elements, and gave an
example:

>       <blockquote xml:lang="en-us" meta="#AndyQuote">
>         The most beautiful thing in Tokyo is McDonald's.
>         The most beautiful thing in Stockholm is McDonald's.
>         The most beautiful thing in Florence is McDonald's.
>         Pecking and Moscow don't have anything beautiful yet.
>       </blockquote>

Whilst this is a good suggestion it loses one of the aspects of our RDF
triples, which is that the statements about the data can be made 'outside'
of that data. In order to attribute the quote you have had to modify the
quote.

But if we think using RDF triples is a good idea (and a lot of people who
know a lot about meta data seem to think so) we need to come up with a way
of doing what Jeroen wants, using triples. Essentially we are saying, how do
we 'carry' the triples in XHTML 2.0, and how do we 'apply' them to the
quote.



THE META ELEMENT

We already have one way of carrying meta data, which is <meta>. Jeroen's
example shows:

>   <meta id="AndyQuote">
>     <meta name="author">Andy Warhol</meta>
>     <meta name="DC.Language">en-us</meta>
>     <meta name="DC.Title">THE Philosophy of Andy Warhol</meta>
>     <meta name="chapter">4 - Beauty</meta>
>     <meta name="page">71</meta>
>   </meta>

The problem is though, if we think back to our triples we're missing a piece
of information - we don't know what these statements are about. We know that
'something' was written by Andy Warhol, and whatever it was, was written in
American English. But we don't know what this 'something' is yet.



TANTEK'S PROPOSAL

One way of connecting this set of incomplete statements to the thing they
are about is simply to nest them inside the object they were about. This was
Tantek's proposal, illustrated as follows:

>    <blockquote xml:lang="en-us">
>     <!-- and the following only about the quote -->
>     <meta>
>      <meta name="author">Andy Warhol</meta>
>      <meta name="DC.Language">en-us</meta>
>      <meta name="DC.Title">THE Philosophy of Andy Warhol</meta>
>      <meta name="chapter">4 - Beauty</meta>
>      <meta name="page">71</meta>
>     </meta>
>      The most beautiful thing in Tokyo is McDonald's.
>      The most beautiful thing in Stockholm is McDonald's.
>      The most beautiful thing in Florence is McDonald's.
>      Pecking and Moscow don't have anything beautiful yet.
>    </blockquote>

I think that this proposal would be very powerful, and should definitely
find it's way into XHTML - the idea of 'meta' anywhere.

However, we are still not quite there - we still want to be able to make
statements without modifying the data that we are making statements about.
For this - as Nigel says - we might have to use RDF/XML!



THE RDF/XML SOLUTION

So what would be the 'orrible RDF/XML solution? Just to recap, we're saying
that we want to express our meta data as a set of statements 'about
something', and that we want to be able to say what 'thing' the statements
are about, without having to modify 'the thing'.

Well there are many ways to express it in RDF/XML, but here's one. (Turn
away now if you don't want to be scared!):

    <!--
        xmlns:x is some book related namespace. x:author should ideally
        derive from dc:Creator. Note that really we should have a
        separate layer for 'source of the quote', distinct from things
        like its language and author.
    -->

    <rdf:description rdf:about="#Quote">
      <x:author>Andy Warhol</x:author>
      <dc:Language>en-us</dc:Language>
      <dc:Title>THE Philosophy of Andy Warhol</dc:Title>
      <x:chapter>4 - Beauty</x:chapter>
      <x:page>71</x:page>
    </rdf:description>

    <blockquote xml:lang="en-us" id="#Quote">
      The most beautiful thing in Tokyo is McDonald's.
      The most beautiful thing in Stockholm is McDonald's.
      The most beautiful thing in Florence is McDonald's.
      Pecking and Moscow don't have anything beautiful yet.
    </blockquote>

The RDF/XML reads as follows:

    There is an object identified by "#Quote"
      Which has a property of Author
        And the value of that property is "Andy Warhol";
      Which has a property of Language
        And the value of that property is "en-us";

As you can see, the only thing we have had to introduce is <rdf:Description>
and @rdf:about.

How might we express this in XHTML 2.0? Well, if the RDF/XML was embedded in
the XHTML document in either of the following ways an RDF/XML parser would
have no problems:

    <head>
      <rdf:Description rdf:about="#Quote">
        <x:author>Andy Warhol</x:author>
        <dc:Language>en-us</dc:Language>
        <dc:Title>THE Philosophy of Andy Warhol</dc:Title>
        <x:chapter>4 - Beauty</x:chapter>
        <x:page>71</x:page>
      </rdf:Description>
    </head>

  or:

    <head>
      <rdf:Description
        rdf:about="#Quote"
        x:author="Andy Warhol"
        dc:Language="en-us"
        dc:Title="THE Philosophy of Andy Warhol"
        x:chapter="4 - Beauty"
        x:page="71"
      />
    </head>


However, whilst an RDF/XML parser would have no problems with this, it may
not be so desirable for XHTML 2.0 browsers. One technique would be to allow
the RDF/XML inside <meta>:

    <head>
      <meta>
        <rdf:Description rdf:about="#Quote">
          <x:author>Andy Warhol</x:author>
          <dc:Language>en-us</dc:Language>
          <dc:Title>THE Philosophy of Andy Warhol</dc:Title>
          <x:chapter>4 - Beauty</x:chapter>
          <x:page>71</x:page>
        </rdf:Description>
      </meta>
    </head>

and then require that XHTML allow any elements inside <meta>. Unfortunately,
this can cause some problems with validation in some systems.

So, a simple trick would be to allow any attributes on <meta>:
 
    <meta
      rdf:about="#Quote"
      x:author="Andy Warhol"
      dc:Language="en-us"
      dc:Title="THE Philosophy of Andy Warhol"
      x:chapter="4 - Beauty"
      x:page="71"
    />

If you run this through an RDF/XML validator (with the suitable namespaces
added) you'll find that this is perfectly valid RDF/XML, and expresses
exactly what we want (if you are an RDF 'expert' then you will spot an extra
triple, but I think we can live with that).

So, to summarise:

* I agree with Tantek that we should allow <meta> anywhere.

* I think that <meta> should allow any attributes from any namespace, not
  just @name.

* <meta> should allow @rdf:about as a means of specifying what the
  'statements' are about.

Thoughts and comments would be most welcome, since I do think it is
important to 'crack' the meta data problem in XHTML 2.0, in a way that works
with RDF.

Regards,

Mark


Mark Birbeck
Co-author Professional XML and
Professional XML Meta Data,
both by Wrox Press

Managing Director
x-port.net Ltd.
4 Pear Tree Court
London
EC1R 0DS

E: Mark.Birbeck@x-port.net
W: www.x-port.net
T: +44 (20) 7689 9232
Received on Friday, 1 August 2003 07:39:01 UTC