Re: draft HTML5: Techniques for the provision of text alternatives from Dr. Olaf Hoffmann on 2010-01-19 (public-html@w3.org from January 2010)

From: Dr. Olaf Hoffmann <Dr.O.Hoffmann@gmx.de>
Date: Tue, 19 Jan 2010 17:21:47 +0100
To: public-html@w3.org
Message-Id: <201001191721.47996.Dr.O.Hoffmann@gmx.de>
Steven Faulkner:
> hi Olaf, thanks for the feedback.
>
>
> The example was based pretty much on this example from the HTML5 spec:
> http://dev.w3.org/html5/spec/text-level-semantics.html#a-purely-decorative-
>image-that-doesn-t-add-any-information which is where the use of the image
> the poem and the use of <p> for poetry comes from. I guessed since it was
> written by the editor it was conforming HTML5.

I know this, but it is no reason to continue to use improper markup, just
because a previous editor failed or did not pay much attention to the example.
Obviously a conformance checker does not understand the text content,
therefore it will not raise any questions, whether the used elements are
meaningful for the content or not ;o)
If it is the idea, that HTML5 should not cover poetry/literature/text at all,
it should not use examples from domains, HTML5 seems not to be intended
for - or because there are now drafts for HTML5+RDFa, this is available as
a mechanism to fix the problem (this option was not available two or three
years ago, when Ian Hickson wrote the sample, but now there is a new chance
to improve it, even without adding new elements. I think, this was the
solution provided by several members of the group in the past - now
HTML5+RDFa and maybe aria-roles are available and one can do more
about the sample than just saying, that it can be fixed in theory with a
feature in the future).
I consider RDFa and aria-roles as some accessibility approach for gaps and
things, HTML5 does not want to care about. If  a user-agent knows the
referenced vocabulary, it can fix the problem and work around the gaps
of HTML5, especially for a non visual presentation (since authors tend 
to care more about fixing gaps for visual presentation on their own, even
with improper markup, if they are forced to do this).

As always, present and future provide more options than to worry about
the past.

>
> I did a little digging and found this post about poetry in HTML5
> http://blog.signified.com.au/a-poem-element-for-html5/
> in which a thread from the what wg mailing list on the subject and your
> discussions  are cited.

I know this too, therefore I looked around for more proper markup languages
for literature and text and collected and joint together:
http://purl.oclc.org/net/hoffmann/lml/

Basically this means: Either do not use HTML5 to markup poetry, plays,
complex combinations of literature and arts or fix the gaps using a
mechanism like RDFa in (X)HTML. And if a draft about HTML5 contains
for whatever reason pieces of literature, it is not intended for, there is
no way around some method to fix the gap of HTML5 not being capable
to markup such kinds of text. Else the example will not help authors, it
will just lead them to wrong comclusions about the capabilities of HTML5.

>  I would encourage you to follow lauras advice (in the following
> email) about filing a bug on the W3C html5 spec in regards to poetry
> markup.
>

As explained already in my reply to Lauras email, either this is more
than a bug or not relevant for the HTML5 WG, because HTML5 is
not intended to cover arbitrary types of text. 
Whether it is relevant for HTML5 to be able to markup text or not, is more
a general question about the responsibilities of the HTML5 WG,
not a question of a bug.

At least HTML up to version 4 was a markup language for
(hyper)text, therefore it was responsible for markup of text as
well - with known large gaps.
A new version of the lanuage may care about obvious gaps or
may continue to ignore them - choice of the designers of the language
(and maybe limited abilities to find a proper name for the language,
which is now focused on other things than to markup text).

> >My suggestion is to use either another format to markup
> >literature/text properly or to use divs with RDFa or some other
> >mechanism to indicate the role of the divs.
> >Especially for a non visual representation it is for many
> >people pretty confusing/depressing, if poetry is presented as prose
> >(I know this personally, because one of my nephews tends to
> >recite poetry much like prose ;o)
>
> As this example is about text alternatives for images not about how to mark
> up poetry, I would prefer not to include mark up that is extraneous to the
> purpose.
> I would also suggest that using divs and RDFa would not currently provide
> any benefit for none visual representations.
>

Obviously it is a useful real world example, where poetry and an image are
combined. It contains contributions from at least three authors and one 
publisher. It can be already considered as a test case, if HTML5 has the
capabilties to solve real problems. Currently the sample fails, not just
due to the poetry problem. One could simply replace the poetry with
some prose telling the same story using only some paragraphs of
text, still the problem persists. 
It does not indicate which author belongs to which part of of the sample
(poem, image, text  alternative for image), what can be for a real world 
sample the core problem of img and alt attributes, if the author of the image
does not provide the text alternative itself, as in this case of a
reproduction of an old painting. This problem appears both for a graphical
rendering of the sample and for any other representation, however because 
in one case the image is presented and in another case the text alternative,
the author indication changes too, and suddenly the identification of the
author of the text alternative or the image becomes an accessibility problem
or a possible issue of confusion for the audience.

Benefits for presentation appear only, if the presentation depends somehow
on the RDFa or whatever mechanism is used to fix the format gaps.
If a user-agent knows the mechanism and the referenced fix for the gap,
of course the audience can benefit from an improved default presentation,
if the author does not provide specific style sheets for visual, aural,
tactile etc presentations. 
If you spend some money, I'm sure, you will find someone who writes
a user-agent with advanced capabilities to present literature in a convenient
aural way - if it is indicated properly as such. Else one has to spend even
much more money to pay some human actors, readers, interpreters to do it
right, with or without proper markup or accessibility alternatives for images,
videos etc. Finally this could be an argument against accessibility features -
with enough money there is always some access available for any tag soup,
even for advertising pillars.
However, proper markup provides the option to get if for less now or in the
future. 

> >Another problem may occur with the relation of h1, h2, image
> >and stanza. The current order implies more or less, that
> >Alfred Lord Tennyson is the author of the poem, the image
> >and the alternative text - is this really true?
> >Some metadata (RDF) might be necessary to put the
> >relations correct.
>
>  I will try to modify the example to make it clearer, the use of RDF in the
> context of this example i consider is out of scope.
>

Of course, there are always several options to put the relations right,
however this HTML5+RDFa is a draft of the group too, therefore available
to solve such problems as well. If it works without in HTML5 - even better,
keep it simple, but not simpler as required to get it right.

> >This problem is only slightly better handled with example 6.2
> >due to the hyperlink (not only because it points to an error 404 page).
>
> the example URL is made up, the actual URL was, I considered, too long and
> was not needed for the example code, but it is linked in the explanation
> above the example code [
> http://www.tate.org.uk/servlet/ViewWork?cgroupid=-1&workid=15984&searchid=f
>alse&roomid=false&tabview=text&texttype=10 ]
>
> do you think it would be better to include this link in the code?
>

Either a valid address (what could unfortunately of course change within 
ten or twenty years) or only a relative address, this indicates in an example
already, that the link itself will not really work under any condition.

> >Is there a mechanism currently to relate metadata to the
> >value of an attribute like alt? If not, it might be better to
> >replace the old img with a new element with the
> >possibility to contain the alternative text as element content,
> >including metadata about the content ;o)
>
> Not that I know of. If we replace the image then it is no longer an example
> of providing text laternatives for images (<img>) :-)

As mentioned above. If the author of the image is another one than
that of the alternative text, you have to switch the author relation
as well. Well for public domain documents, one can mess up a 
lot without beeing hated by the already dead authors ;o)
Publishers may have newer samples with still alive authors, 
for those they cannot benefit from such a public domain sample, 
if authors rights are not honoured. 

>
> >The sample seems to be already old enough to be public
> >domain, therefore it is at least not really problematic for the
> >draft to blur all these relations. However, if the sample is
> >intended to be useful for current works, one has to put those
>
> relations somewhere due to copyright restrictions - and even
>
> >without, I think, the works of authors should be always
> >honoured by putting the relations correctly.
>
> i understand your concern, do you think it may be better not to use this
> image and associated poetry?
>

It depends, if the intend of the draft is to solve real world problems,
it should contain such delicate samples, but should markup it
properly.

Of course, one could start with a more simple example, where
publisher, prose-text-author, author-of-the-image and
author-of-the-text-alternative is the same person, who does 
not have to care about sophisticated problems ;o)

But to show, what really has to happen, if works of different
authors are combined, including raster images (with no
embedded meta information) is an interesting task and
I think it would really help to have such a well considered
markup sample. This can give authors some ideas, how
to solve such problems within the limitation of HTML without
messing up everything in tag soup. And in the best case 
it could be a starting point to create some new ideas how 
to put together even more complex combinations
of art works in an accessible and understandable way - 
understandable for anyone including those not familiar with
wikipedia or other sources of information to interprete the
tag soup properly.

Indeed the current sample is so interesting, because 
it raises several questions, a publisher has to face:
- how to make poetry (or other gap areas) accessible with 
proper markup, as long as HTML5 does not provide specific 
elements for the specific purpose?
- how to combine works from different authors in an understandable way,
adding some kind of (meta) information about which work belongs to 
which author?
- how to add a text alternative from another author as a
replacement for the raster image including some meta information
about the problem, that the image is from another person than
the text alternative?
- is it relevant for the audience of the text alternative, that the
image has another author than the text they are reading? Might 
it be interesting for them to identify work and author to find other
interpretations of the image on their own? Is it essential, that
those people do not have to rely on the choice of the publisher
to find out, what the image presents? Obviously, if the audience
has the option to identify the image by title and author and that
the text alternative is an interpretation from another author, this is 
an improvement  for their autonomy.


Olaf
Received on Tuesday, 19 January 2010 16:53:54 UTC