XHTML2: Meta Data Use Cases / Requirements document? from Bjoern Hoehrmann on 2004-09-07 (www-html-editor@w3.org from July to September 2004)

From: Bjoern Hoehrmann <derhoermi@gmx.net>
Date: Tue, 07 Sep 2004 18:43:19 +0200
To: www-html-editor@w3.org
Cc: public-rdf-in-xhtml-tf@w3.org
Message-ID: <4184c275.318109327@smtp.bjoern.hoehrmann.de>
Dear HTML Working Group,
Dear RDF-in-XHTML Task Force,

  I've been looking around for your discussions on use cases for meta
data in XHTML 2.0, candidates for mandatory ontologies, XHTML 2.0 user
agent requirements relevant to meta data, etc. but I have been unable to
find such material. I would expect that you have carefully considered
the requirements for meta data in XHTML 2.0 but it seems there is no
such requirements document that documents your consensus on this matter.

For example, I would expect a document that lists use cases such as
authors must be able to specify

  * the location of the imprint for the site
  * copyright and other legal information for the document
  * the date of the last semantic modification of the document
  * the author of the document
  * contact information for the author of the content
  * contact information for the current maintainer of the document
  * robots may (or not) follow the links in the document
  * others should not re-publish the document
  * the IA Archiver robot may archive and re-publish the document
  * the content of the document is a part of a mailing list archive
  * the location of help information for the current document
  * the location of a resource from which the current document or
    parts thereof can be edited
  * which software was used to create the document
  * the current document is outdated and kept as historic information,
    along with the location of a more current document
  * Microsoft Internet Explorer should not use the Smart Tags feature
    for this document
  * the category the documents belongs to in my site's structure
  * the location of an overview document for that category
  * which documents should be read prior to this document
  * alternate and common misspellings of words in the document
  * for a specific quote that it cites someone named "x y"
  * for a specific element that it is an example
  * for a specific element that it is an abstract
  * robots may index the XHTML document but not the images it refers to
  * the location of a compressed archive of docs that includes this one
  * ...

The first example seems to be a very good one, IANAL, but at least here
in Germany most web sites are required by national law to make an
imprint for a site easily reachable and usable from each document on the
site. Encoding the location of the imprint into the normal navigation
facilities is not necessarily a good solution, many authors would rather
be able to specify this through other means and depend on a requirement
that user agents must make this information available through other
means. W3C, so I am told, also has a policy that certain documents must
have legal information such as which you find in <p class="copyright">
on the current W3C homepage. I would have expected that you have
investigated how XHTML 2.0 could help to provide a more satisfactory
solution to the problems this is trying to solve.

There is for example the definition of the 'rel' attribute in section
19.1 of the current draft that has a 'copyright', but this is just a
copy of what we had in HTML 3.2 so this does not seem to be an
improvement in this area. Maybe you could explain why XHTML 2.0 made no
apparent progress in this area? Maybe there is member-only information
that I have missed?

Then there is the author use case. Section 19.1 has an example

[...]
  This example refers to a hypothetical profile that defines useful
  properties for document indexing. The properties defined by this
  profile -- including "author", "copyright", "keywords", and "date" --
  have their values set by subsequent meta declarations.

  <html ... xmlns:mp="http://www.example.com/profiles/rels">
  <head>
  <title>How to complete Memorandum cover sheets</title>
  <link rel="profile" resource="http://www.example.com/profiles/rels" />
  <meta property="mp:author">John Doe</meta>
  <meta property="mp:copyright">&copy; 2004 Example Corp.</meta>
  <meta property="mp:keywords">corporate,guidelines,cataloging</meta>
  <meta property="mp:date">1994-11-06T08:49:37+00:00</meta>
[...]

Why is this "hypothetical"? Section 20.2.1.1 then has an example

[...]
  <head>
    <meta property="author">Mark Birbeck</meta>
    <meta property="created" content="2004-03-20" />
  </head>
[...]

I do not really understand why these are different? I am also not sure
where I would find a definition of this "author" "property", maybe there
is something missing in the example? Section 20.2.1.2 has

[...]
  <head>
    <meta property="author" content="Albert Einstein" />
[...]

This seems to be yet another way to specify the author of the document,
and it still does not say where this "author" thing is defined. Then
there is section 20.2.2 with

[...]
  <head>
  <link rel="author"
        resource="http://example.com/people/MarkBirbeck/654" />
  </head>
[...]

I believe this is an error in the document, the specified URL returns
404. So far, all examples used the <meta> element to specify meta data,
I am not sure why this uses <link>. According to the definition in 19.1
however, this is not a legal fragment, cf.

[...]
  Users may extend this collection of relationships. However, extensions
  must be defined in their own namespace, and the relationship names
  must be referenced in documents as qualified names (e.g., dc:creator
  for the Dublin Core "creator" relationship).
[...]

And "author" is not in the collection of relationships. I also think it
is generally not clear to me why one would use an URL to specify the
author of the document, shouldn't this be e.g. a "author-homepage"
relationship? Then there is section 20.6 which has

[...]
  <meta property="Author">Steven Pemberton</meta>
[...]

This is similar to the example in 20.2.1.1 except that it uses "Author"
rather than "author". I was also unable to find a definition for
"Author" in the document. I am not sure whether "author" and "Author"
are actually different. It is even less clear for the <link> example in
the specification, in XHTML 1.1 <link rel="Author" ... /> and <link
rel="author" ... /> would be equivalent, I am not sure whether this is
the case in XHTML 2.0?

Then there is section 20.2.3 which has

[...]
  Best practice for specifying metadata is to try as much as possible to
  make use of common property names. This can often be achieved by using
  lists in use by other document authors within a similar field. There
  are many such lists for different sectors and industries, but for our
  examples here we will use Dublin Core[DCORE].

  To replace the term 'author' with the more widely used Dublin Core
  term 'creator', we would need to not only substitute 'creator' for
  'author', but also to indicate which list we are using. We achieve the
  latter by using XML namespaces:

  <head xmlns:dc="http://purl.org/dc/elements/1.1/">
    <meta property="dc:creator">Mark Birbeck</meta>
  </head>
[...]

So this is yet another way to specify the author of the document? I am
also confused that this section suggests that this example actually
represents best practise, then why are there so many other examples for
the same thing?

And then one comes across section 20.4. which has

[...]
  <html xmlns:dc="http://purl.org/dc/elements/1.1/">
    <head />
    <body>
      <blockquote id="q1">
        <link rel="dc:source" resource="urn:isbn:0140449132">
          <link rel="dc:creator">
            <meta property="con:motherTongue">rus</meta>
          </link>
        </link>
[...]

and then

[...]
  <blockquote id="q1">
    <link about="#q1" rel="dc:source" resource="urn:isbn:0140449132">
      <meta property="dc:creator">
        <meta property="con:motherTongue">rus</meta>
      </meta>
    </link>
    <p>...</p>
  </blockquote>
[...]

The first example is clearly non-conforming due a wide range of reasons,
for example the "prefix" "con" is not declared, it is generally very
disappointing to see that the HTML Working Groups pays so little
attention to the conformance of the examples in the document. The prose
for these examples note that these are semantically differnt. The
document referenced for these "dc" things notes

  Examples of a Creator include a person, an organisation, or a service.
  Typically, the name of a Creator should be used to indicate the
  entity.

It seems that the latter example is saying that "<meta
property="con:motherTongue">rus</meta>" is the name of the "creator"
here? This does not seem to be best practise then. I am also not sure,
why is this not for example

  <blockquote cite="urn:isbn:0140449132">

what is this <link> with "dc:source" about? Section 20.6 notes

  Note. The meta element is a generic mechanism for specifying meta
  data. However, some XHTML elements and attributes already handle
  certain pieces of meta data and may be used by authors instead of
  meta to specify those pieces: the title element, the address element,
  the edit and related attributes, the title attribute, and the cite
  attribute.

Does this not apply to the link element aswell?

Say I want to express that a quote is from urn:isbn:0140449132 and the
author is either Fyodor Dostoyevsky or David McDuff, would I write this
as

  <blockquote cite = 'urn:isbn:0140449132'>
    <meta property = 'author'>Fyodor Dostoyevsky</meta>
    <meta property = 'author'>David McDuff</meta>
    ...
  </blockquote>

Does this say the author of "..." is both Fyodor and David or does it
say that the author of urn:isbn:0140449132 is both Fyodor and David or
what does this mean exactly?

  <blockquote>W3C is where the future of the Web is made.</blockquote>

This is from Tim Berners-Lee, to encode this information, would I do

  <blockquote>
    <meta property = 'author'>Tim Berners-Lee</meta>
    W3C is where the future of the Web is made.
  </blockquote>

or something else? There seem to be so many ways to do it already for
the document, I am afraid there might be even more for this quote
example. Do you really expect search engines to implement all the
various ways to express such things?

I could not find a clue how to implement the "robots may (or not) follow
the links in the document" use case, how would I do that exactly? This
is something that I believe is important to many web authors, does W3C
disagree here?

There are many other things to say about this proposal. For example,
where would I find the rationale for removing the possibility to specify
multiple relationships for links? Surely there might be a document
documenting the privacy and copyright information along with an imprint,
so specifying <link rel = "privacy copyright imprint" ... /> would make
a lot of sense to me. Why is this forbidden now? I am also concerned
about for which things there are specific language elements and for
which one can only use "generic" means which seems to be optional. For
example, the type, restype, hreftype, and hreflang attributes (and even
more so proposed attributes like citelang, citytype, srclang, reslang,
etc.) why do these have special means?

I have also looked for a general purpose XHTML 2.0 Requirements and/or
Use Cases document and could not find one. Personally, I find myself way
more often in a situation where I want to specify the author of a
document than the language of referenced resources, I would say this is
indeed more common, yet there is no author attribute and/or element. So
maybe the HTML Working Group collected statistics about such usage on
the web? But this might be more relevant to linking for which

[...]
  This version also does not address the issues revolving around the use
  of [XLINK] by XHTML 2.0. Those issues are being worked independent of
  the evolution of this specification. Those issues should, of course,
  be resolved as quickly as possible, and the resolution will be
  reflected in a future draft.
[...]

There is no solution yet. As, if I understand correctly, the group is
still researching solutions for keyboard navigation, alternate style
sheets, scripting, frames, conformance, ... so maybe I am looking at
these things too early? Most of XHTML 2.0 seems to be in a early and
rough working draft state...

So maybe I am just http://www.w3.org/MarkUp/xhtml-roadmap/ confused by
your roadmap that you expect to issue a last call for comments in two
weeks? I was confused about this before when in Aug 2003 the roadmap
said last call would be in October 2003 and asked for a new Working
Draft (which I was told should be published soon, and indeed, about a
year later it happend...)

Or maybe I misunderstand your charter, 

[...]
  The main scope of this charter is to complete the transition from HTML
  to XHTML, carried over from the previous charter. This includes
  finishing work on XHTML 2.0, the next generation of XHTML whose design
  goal is to use generic XML technologies as much as possible.
[...]

maybe I should be very concerned about this if I consider designing a
language that meets current and future needs of web authors and web
application developers best more important than re-using as much XML
technology as possible? That would explain that the meta data module
is essentially just a confusingly documented funny way to re-use RDF in
XHTML 2.0 without much practical benefit. But I might miss something
here...

Thanks for any light you can shed on this.
Received on Tuesday, 7 September 2004 16:44:03 UTC