Re: Marking up links to alternative versions of content from Sander Tekelenburg on 2007-08-01 (public-html@w3.org from August 2007)

From: Sander Tekelenburg <st@isoc.nl>
Date: Wed, 1 Aug 2007 20:30:52 +0200
To: public-html@w3.org
Message-Id: <p06240640c2d65aad5c2e@[192.168.0.101]>
At 16:52 +1000 UTC, on 2007-07-31, Lachlan Hunt wrote:

> Sander Tekelenburg wrote:

[...]

> I amended the text on that page by appending "of the following
> transcript" after the link to the audio recording. It now reads.
>
>    "You may also download the presentation slides (PowerPoint) and
>     audio recording (Ogg Vorbis) of the following transcript."
>
> I believe that indicates, using natural language, that the audio and
> transcript are indeed equivalents.  I also added rel="alternate" to the
> link, though I'm not sure what real practical benefit (if any) that will
> achieve.

Skipping for a moment whether UAs actually do anything with <a
rel="alternate">, I wonder what it should *mean* in this case. The markup
doesn't define what it is an alternate for. So I'd guess that it could only
be interpreted as an alternate for the entire document.

If, for example, <http://lachy.id.au/dev/presentation/future-of-html/> would
contain only the transcript, then link rel="alternate" *would* indicate that
it is an alternate. That would satsify the "explicit indication of
equivalents", I think. However, how can we satisfy that same need for parts
of a document? (More on this below.)

[... 'non-explicit relationship']

>> [1] No program could make use of that.
>> - Not an indexing bot.
>
> When discussing applications that can't determine the association of
> being alternatives from natural language, it's necessary to explain why
> it would be beneficial for that particular application to do so.

You might be searching for audio about x. The only way the search engine can
tell you that <http://domain.example/file.ogg> is about x is if that search
engine can know that that file is about x. Potential techniques to achieve
that:
- the audio file format allows authors to include meta data (like id3 tags
for instance)
   con: each file format would have to provide for that, and an indexing bot
would have to know all their formats
   con: we'll always have secret file formats, limiting who can read its
contents
   con: the search engine will need to probably grab the entire file and
parse it
   con: the author needs a special tool to enter the metadata
   con: the author needs to be capable of providing 'good' useful meta data
(think how hard it is to write proper alt text)
   con: inserting that meta data into the file will probably require the
author to use a special tool; given different file formats all using
different meta data formats, and especially given secret file formats
   con: it seems unlikely to me that for instance an audio file format that
provides for such metadata would allow an entire transcript
   con: etc :)
- HTML allows authors to define the relationship between equivalents
   pro: a textual equivalent can be understood by the search engine to be the
equivalent. So a search for "x as audio" allows the search engine to know
that some document contains the perfect search result, and that it provides
an audio equivalent of that. "I feel lucky" becomes usable.

> As far as an indexing bot is concerned, presumably for a search engine,
> is it not enough that it can determine a relationship between them
> (based on the presence of the link), even if it doesn't know exactly
> what kind of relationship?

No, it isn't :) Consider the situation where there are multiple objects on a
page that and that for each multiple equivalents are available. How would a
search engine understand what goes with what?

That aside, authors won't like to write "A <a>transcript</a>, a
<a>caprioned</a>, and an <a>audio-only</a> version of the movie below are
also available". And rightly so. Such content distracts from the actual
content. Many users will only be interested in one format of the content.
Their ideal experience is to be only confronted with that (many users will
likely be confused y the options). And authors want to provide that ideal
experience; not distract users from the content.

One more thing considering indexing bots: they don't all have Google's
budget. It'd be much cheaper to make use of "this is an equiivalent", than to
parse context and ty to deduce something meaningful.

[... 'non-explicit relationship']

>> - Not a tool that helps authors judge the universality/accessibility of
>>their
>> document.
>
> I'm not sure how useful a such a tool would really be in this case.
> Consider the alt attribute for images.  A tool can only tell you if it
> is or is not present.  It can't tell you whether or not the alternative
> text provided is indeed appropriate for the image.

Exactly. So if you say "we don't need @alt; just type some textual equivalent
close to the image", then it cannot even help authors to *that* extend.

Btw, as to the quality of the alternative: an authoring tool can (and IIRC
should, according to ATAG) provide a summary for a document. So it could
present the author with an overview, listing one by one each non-textual
object with the provided equivalent next to it. That would for instance be
useful when you have a group of authors entering content, and an
universality/accessibility specialist checking and improving that aspect of
the document. (Much like a book or newspaper editor has a use for helpful
tools to improve the quality/richness of twhat someone else provided.)

A tool can also suggest certain possible equivalents for specific types of
ojects.

You can say that that is general information that a web publisher 'should
know', but we all know that the majority of web publishers just 'learn when
they need to'. So being given such tips at the exact moment that they are
practical is valuable.

> So even if there was a way to markup an explicit association between
> alternatives, the tool could only tell you that the alternative is
> present.  It can't tell you anything about the quality of that
> alternative.  So while there may be some benefit to having the tool say
> yes, an alternative has been provided.  That in itself doesn't really
> tell you that much about the overall accessibility.

Here's a situation that doesn't seem at all far-fetched to me: you have a
group of people entering content through one publishing system. You know them
to be educated enough to know to, and how to, provide decent equivalents. But
you also know they're only humans and so will sometimes simply forget
something. It would then be helpful to configure the publishing system to
alert that user before publishing a document that lacks one an equivalent for
one or more objects.

Currently HTML validators already do this, but for @alt only.

>> - Not an authoring tool that needs to help the author to not mess up what a
>> previous author carefully added to try to help certain accessibility
>> situations.
>
> Are there any tools that can do that reliably for existing embedded
> content, or any other type of association?
>
> For example, an author could move a form control to a different location
> on the page.  Are there any authoring tools that let the user know
> they've forgotten to move the label with it?  In theory, that sounds
> somewhat useful, but is it possible to implement in a practical and
> usable way?

I don't see why it would not be possible. Thanks to the explicitness of the
markup, an authoring tool can know that an existing label belongs to an
existing form control, so it can for instance allow them to only be moved as
one.

Such behaviour would impose a limit on what the author can edit, so it
wouldn't be appropriate for all users in all situations. But that applies to
all tools. The Mac OS X Finder by default hides file name extensions for
instance. It's always all about picking the right tool for the job. And tools
can allow configurations of such things. A Web Publishing System should IMO
allow the admin to impose certain limits on certain users (disallowing moving
a form control without its label), and allow only certain other users to edit
the label.

[...]

> I'm not sure exactly what you mean by "consistency across sites" in this
> context.

For the general idea of "consistency across sites", read my article about
link: <http://www.euronet.nl/~tekelenb/WWW/LINK/>.

Applied to the case at hand:

If a UA indicates longdesc through a specific cursor change on hover, and
allows access to the resource longdesc points to through a certain item in
its contextual menu, then that works the same across sites.

If you leave things up to inividual authors, users will have no such
consistency across sites. They'll have to figure things out on each and every
site. It's like 802.11g working everywhere vs at each hotspot needing to
first have to read a manual on how to connect to it.

Really, this is no different than that UAs indicate the existence of an RSS
feed in the chrome. It helps users,because that consistency makes RSS feeds
easily identifiable as being RSS feeds.

> So I don't understand what the problem is, why it is a real
> problem, nor whether the problem would really be solved if explicit
> markup were available.

The authoring of <link rel="alternate" type="application/rss+xml" href="URLl"
title="blah"> in combination with UAs providing access to the chrome has
clearly solved a real problem for users. If anything, why else did UA vendors
bother? I honestly don't see why it is so hard to understand that that same
principle "consistency across sites" applies to many many other aspects of
the Web.

Btw, it seems that RSS is only used by a relatively small minority, and yet
UA vendors were willing to not only implement it, but even give it a
prominent place in the chrome, have it on by default and not even provide an
option to switch it off.

Seems relevant to the usual complaints that something is too much work,
serves too few users, introduces problems for too many users, is too hard for
authors, etc.

[...]

> In my opinion, although this is slightly off topic for this thread,
> screen readers need to do a lot more to improve the way they handle
> forms in tables.

Agreed (and not just for forms). Which is why it is so incredibly important
that the developers of such tools participate here.

[...]

> But anyway, perhaps the question is, what is the simplest way to express
> an association that a tool can understand? [...]
>
> This is what we've got, and what I think may be useful in this situation:
>
> * <section> or <article>
> * <aside>
> * <figure> and <legend>
> * rel=alternate and the proposed rel=longdesc
>
> Here's an example:
>
> <article>
>    <h1>Movie Title</h1>
>    <figure>
>      <video src="movie">
>        <a href="movie">Download the movie</a>
>      </video>
>      <legend>Brief description or caption.</legend>
>    </figure>
>    <aside>
>      <p>Metadata for the video (author, description, tags, etc.)</p>

Note that this is tempting, but doesn't explicitly define it as meta data.

>      <p><a href="transcript" rel="alternate">Transcript</a></a>
>    </aside>
> </article>

Perhaps, yes. Depends on how exactly <figure> is defined. From the current
definition it isn't clear to me that <figure> is only for a single object,
for instance. Also <aside> is currently defined as representing a tangential
relation. My dictionary dictionary translates "tangential" as "hardly
touching a matter".

That aside, how would you mark-up multiple equivalents then? Perhaps simply
like the below:

<article>
   <h1>Movie Title</h1>
   <figure>
     <video src="movie">
       <a href="movie">Download the movie</a>
     </video>
     <legend>Brief description or caption.</legend>
   </figure>
   <aside>
     <table><caption>Metadata</caption
     <tr><th>title</th><td></td></tr>
     <tr><th>director</th><td></td></tr>
     <tr><th>year</th><td></td></tr>
     <tr><th>starring</th><td></td></tr>
     </table>
     <p><a href="transcript" rel="alternate">Transcript</a></p>
     <p><a href="audio" rel="alternate">audio-only version</a></p>
     <p><a href="captioned" rel="alternate">captioned video</a></p>
   </aside>
</article>

While this approach might indeed get us closer to markup that allows for
explicit defining of equivalents, I don't yet see how it would allow a UA to
default to a certain equivalent.

I'm also tempted to suggest we introduce rel="equivalent". Because
rel="alternate" doesn't truly cover equivalents. rel="alternate" is used to
indicate a version of the same resource in another language, for instance, or
even a RSS feed that's only vaguely related.

> Note: That example kind of misuses the alternate relationship, since
> it's currently defined to represent the alternative for the entire
> document, and not the alternative of the article or section it is in.

Right. Perhaps we could define that within <figure> the scope of <a rel> is
limited to that <figure>. (Probably the same are needed for <article> and
<section>.)

> Perhaps, longdesc would be a better value.

I'm not sure. That would seem a big change to how @longdesc currently is
defined.


-- 
Sander Tekelenburg
The Web Repair Initiative: <http://webrepair.org/>
Received on Wednesday, 1 August 2007 18:40:09 UTC