Re: metadata content from Dr. Olaf Hoffmann on 2009-01-04 (public-html@w3.org from January 2009)

From: Dr. Olaf Hoffmann <Dr.O.Hoffmann@gmx.de>
Date: Sun, 4 Jan 2009 20:15:05 +0100
To: public-html@w3.org
Message-Id: <200901042015.05246.Dr.O.Hoffmann@gmx.de>
Ian Hickson:
> On Wed, 19 Nov 2008, Dr. Olaf Hoffmann wrote:
> > > On Tue, 18 Nov 2008, Dr. Olaf Hoffmann wrote:
> > > > I would only like to know, if I interprete this correct, that authors
> > > > finally can put metadata elements from other namespaces like RDF in
> > > > the head element of a HTML5 document without any conflicts?
> >
> > Ian Hickson:
> > > In the XML serialisation, yes.
> >
> > Ok, XML formats provide anyway much more possibilities for authors than
> > HTML4 or HTML5. For XML it is not really a problem anyway, because in
> > doubt one can always use a compound document with a root-element from a
> > language with sufficient capabilities.
> >
> > To avoid disappointments, it might be a good idea to say something
> > like:
> >
> > "In the XML serialisation elements from other namespaces whose semantics
> > are primarily metadata-related (e.g. RDF) are also metadata content."
>
> We can't say exactly that, because even in other serialisations (e.g. the
> DOM), it's still true.
>
> I've added an example though. Let me know if it's clear enough.
>

Yes.


> > The case for the HTML5 variant looks more problematic, because currently
> > the profile attribute is removed too, which had the capability to
> > produce something like a defined subject-predicate-object construction
> > together with meta elements.
>
> You can still do anything, in HTML5, that profile="", as defined in HTML4,
> allowed in HTML4, since we now allow registrations of meta names.


For example DCMI
http://dublincore.org/documents/2008/08/04/dc-html/

I think, then you have to define all these rel values for link
too, because one can define any prefix, part of the name
value of a meta element. 
This may get very complex, if other organisations 'foo' and
'bar' want/need to have additional names defined under their
own responsibility.

Such an open end is something, what is finally required, because
one can never solve all problems with just one approach
(up to now, there is not even a world formula to describe
everything at least in theory ;o)


>
> > The profile attribute seems to be used for example by DCMI, or
> > 'microformats' uses it to map class value items to specific meanings.
>
> In practice, microformats don't really use profile="".

Well, then the class values have no predefined meanings anymore
(without a/link with rel="profile"), because it is not possible to
identify, who defined a microformat for which meaning and
purpose or if it is only an arbitrary class name without a relation
to a microformat at all. 


>
> > What seems to be left for HTML5 is the a/link with rel="profile".
>
> As far as I can tell, that would be as useful as HTML4's profile="", which
> is to say, not useful. Just use the features you want, without declaring
> that you're going to use them. If name clashes are a concern, use meta
> names that have domain components (e.g. "org.example.family.parent" or
> whatever).

Another concern is to have short class names without plurivalences, 
for example as possible with CURIES for role and XHTML+RDFa, then
one really can define somewhere a namespace with a URI and 
to use something short like role="l:tip" or property="l:poem".
The last one is already pretty long. Imagine, one wants to write
a poem with XHTML+RDFa you already get this result
(assuming somewhere a namespace definition for l):

<div property="l:poem">
<h1 property="l:h">Found Poetry</h1>
<div property="l:st"> 
  <div property="l:sl">Found poetry created,</div> 
  <div property="l:sl">recycled or "untreated"</div> 
  <div property="l:sl">makes a philosophical comment</div> 
  <div property="l:sl">by altering the rearrangement</div> 
</div> 
 
<div property="l:st"> 
  <div property="l:sl">words, phrases,</div> 
  <div property="l:sl">and sometimes whole passages</div> 
  <div property="l:sl">contain clever ironic contradictions</div> 
  <div property="l:sl">or a visual collage of juxtapositions</div> 
</div>
</div>

With your approach you get this:

<div class="org.example.family.parent.poem">
<h1 class="org.example.family.parent.h">Found Poetry</h1>
<div class="org.example.family.parent.st"> 
  <div class="org.example.family.parent.sl">Found poetry created,</div> 
  <div class="org.example.family.parent.sl">recycled or "untreated"</div> 
  <div class="org.example.family.parent.sl">makes a philosophical 
comment</div> 
  <div class="org.example.family.parent.sl">by altering the 
rearrangement</div> 
</div> 
 
<div class="org.example.family.parent.st"> 
  <div class="org.example.family.parent.sl">words, phrases,</div> 
  <div class="org.example.family.parent.sl">and sometimes whole passages</div> 
  <div class="org.example.family.parent.sl">contain clever ironic 
contradictions</div> 
  <div class="org.example.family.parent.sl">or a visual collage of 
juxtapositions</div> 
</div>
</div>


I think, already the XHTML+RDFa approach is not very convenient
for authors, but your microformat approach makes HTML5 almost 
unusable for such applications. If this is intended, one should note,
that such a microformat class name must have at least something
like a prefix with more than 50 arbitrary characters including a 
central registration to ensure, that it is unique for each document ;o)


>
> > The main problem of these possible HTML4 or HTML5 constructions I can
> > see is, that it is difficult (or impossible) to address a specific
> > element with metadata content as with 'about' from RDF for example.
>
> I don't really follow. Could you describe the problem in more detail?


What often happens, is that editors join/jam together different content
(from different authors) in one document, for example with PHP.
The simple and convenient and safe approach would be to have 
something like the metadata element in SVG in each of these 
fragments to contain all the metadata for the related fragment.
This avoids information losses and it is simple to identify, which 
metadata belongs to which content.

The other, less convenient, but still technically sufficient method
would be to collect all meta information of a page in one element
like the html:head and to address, to which fragment which meta information
belongs. Obviously it is better to be able to group and to structure
meta elements in the head for this purpose somehow.
The potential risk of this distributed approach (even worse would be
to put the meta information in a completely separated RDF document
somewhere else) is, that the meta information is lost, if fragments
are extracted again, either by other programs or interested readers
(copy and paste). Concerning author rights and copyrights etc
this is a nasty and time consuming problem for both authors
and people, who want to republish those fragments. 

>
> > The other problem is, that it is difficult (or impossible) to use more
> > than one resource for the definition of metadata and there are several,
> > not just DCMI or microformats or my 'collection'.
>
> I don't follow. Why can't you do what users of Microformats do?
>

I did not see a real solution for this with microformats yet (maybe it
is hidden somewhere, I don't know - or maybe they did not try to
solve the problem at all or they ignored it simply, because there is
no clean solution within HTML4).

DCMI defines something like prefixes within a profile (see URI above).
Obviously one can do something like this to define some kind of
prefix for class names too, to say unambiguously for example, that
all class names starting with l_ belong to a specific profile and those
with m_ belong to another or vice versa to say all names with a 
prefix o_ do not belong to the current profile and all others do
(this simply excludes purly technical class names from semantical
relevant things). But this requires a profile attribute or a rel="profile"
and a mechanism to relate the rel/profile to different meta elements.
 
XHTML+RDFa has CURIEs and new attributes to separate
different problems and to solve such relation problems at 
least technically.

Currently the XHTML+RDFa (+role) approach seems to be the
most elegant to tune (X)HTML at least somehow for the next few 
years, if the authors does not want to use a more specialised 
XML-format, because current browsers still have not really the
capabilities to interprete compound documents (using for 
example XLink to have some hyperlink functionality or to
add some metadata or poetry markup to an ordinary XHTML 
document).


> > Therefore some advanced metadata strategy like RDF or the RDFa approach
> > could be very helpful for authors, even if they do not use already the
> > XML serialisation, but still the 'HTML5 serialisation'.
>
> RDFa in HTML5 is an open issue.
>
>    http://www.whatwg.org/issues/#rdfa
>

The page only notes:
"This page will only be useful to you if you have a relatively modern 
JavaScript-enabled browser."

I know, which browsers I use and I know, that java-script is switched
off for unknown pages, but I cannot see any relation to RDFa here ...
If this is the main information about RDFa, accessibility seems to
be the main open issue for http://www.whatwg.org currently ;o)



> > Is there any idea for a convenient use of structured metadata in HTML5?
>
> It depends on what kind of metadata you mean. Could you elaborate on your
> precise use case?
>

See above - for example author rights, copyrights, links to related 
information, for raster images maybe descriptions, (maybe machine
readable) information about the author of an fragment, the creation
date, contact information, about the (literature) genre or type of
text or art or classifications schemes like the APS:PACS, unique 
identifiers like PURL, ISBN, DOI etc, information about publication, 
relations to alternatives, other publications of the same content.

Or if we have a work distributed on several documents (URIs) 
we need to have a list of links, an index with relations.
Currently in HTML4 we have such relations like chapter, section,
subsection, but because the link elements in the head cannot be
structured, it is not obvious, which subsection belongs to
which section, which section to which chapter etc. This needs
a much better structure. The current interpretation in a browser
like SeaMonkey or Iceape is already usable, but still plurivalent.
This is maybe a reason, why this link navigation works only more
or less useful in these browsers, is somehow only available in newer
Opera versions and absent in several other browsers.


> > Is there any idea to structure metadata with HTML5 elements or to adopt
> > some RDF approach to avoid to reinvent the wheel?
>
> I don't intend to introduce a generic mechanism, no; generally, the more
> abstract a mechanism, the less it is actually used in practice. Instead,
> we should address problems on a case by case basis. Thus <video> rather
> than <object>, or <p> rather than <div class="prose.grammar.paragraph">.

Well, in the last years I got the opposite impression from the WG concerning
poetry for example ;o) Maybe it is more the cowpath model, in contrast to
the assumption of some people, that the mankind has more intellectual
capabilities and technical possibilities to provide solutions with some more
overview and foresight ;o)
Or for me as an author I cannot see a big technical difference between audio,
video or object - in SMIL the difference between such elements is semantical,
not technical for example, because they all cover the same technical
functionality, what is more helpful for authors than the SVG tiny 1.2 approach
or the current HTML5 approach.
Most authors only know, that they want to embed 'some stuff' and maybe
they understand, that a music clip of a current pop song is mainly audio
with additional bonus video (use audio for it, not video, even if it contains
some visual information - or don't use it all all, because there are some
nasty copyright problems). They understand too, that the souvenir with the
sunset from last holidays is mainly video (use video for it, even if there is
some trashy romantic music along with it). But finally for both they need
the same technical functionality, what is more a problem for the poor
browser implementor to solve the problem, which codecs are required
and why this is not available on the current computer). 

I think, what XHTML+RDFa provides is more a compromise. If an authors
discovers a lot of these attributes and constructions within his documents,
it is time to look for a more convenient and more advanced format than
(X)HTML to markup the document. However, if just used as a minor addition 
or transitional solution until more advanced formats than HTML are 
well interpreted by common browsers, this is the best approach I have seen
up to now. 
I started to markup two projects with XHTML+RDFa the last two months 
and compared it to an approach with XML (+XLink) and I think, this is,
what authors can currently do and still having something, what can be
presented and styled with current browsers for the common audience 
at the same time without any XSLT or other tricky server sided things
like nasty user-agent sniffing (what is for XHTML+RDFa only useful for the
MSIE and the Google-robot, because both do not like XHTML ;o)



Olaf
Received on Sunday, 4 January 2009 19:22:56 UTC