RE: Coments - last call draft (design forward first?) from Al Gilman on 2005-04-12 (public-tt@w3.org from April 2005)

From: Al Gilman <Alfred.S.Gilman@IEEE.org>
Date: Tue, 12 Apr 2005 12:29:20 -0400
To: <public-tt@w3.org>
Cc: "Charles McCathieNevile" <charles@sidar.org>
Message-Id: <p06110405be8198435886@[10.0.1.2]>
At 12:19 AM -0400 4/12/05, Glenn A. Adams wrote:
>A few comments inline below.

Some of these are questions or implied questions.
Answered bottom-up.  This changes the order.

[GA]
>
>Could you give us some examples of what types of adaptation you might
>see applying to pure textual content? [which is presently the case for
>DFXP since it does not specify image or any other types of embedded
>content formats for which alternatives may be preferred].

[AG]

Types of adaptation (might include, but not be limited to...):

For cognitive disabilities, the caption window could be embellished with
a persona icon which migrates around the corners of the layout region.
[Simultaneous presentation of text and speech is a key technique for
those with difficulty reading or learning to read.  Fire exit training
materials for group homes for those with learning disabilities are an
example of content it is important to get to everyone that would
benefit from adapted presentation along these lines.]

When rendering into Braille, speaker changes might be expressed by
what CSS calls 'generated text.' That is to say there is a
parenthetical speaker identity remark inserted, making it read more
like a script.

This is because of a general scarcity of applicable font and alignment effects.

For the Hard of Hearing, a relatively futuristic technique is to
morph the video of the actual speakers to provide the logical
equivalent of 'cued speech.' In existing cued speech, hand signs
convey not the full symbolization of the idea but just the missing
phonemes -- the ones that are relatively hard to perceive. It
provides a reinforced visual scene so speech reading can succeed
where the missing phonemes would otherwise cause it to fail. Anyway,
in the processed-video technique highlights are applied to the
speaker's face to cue the hard-to-hear details.

For this the speakers would need to be identified in the video, so
the morpher knows which face video to perturb, and the text, which
identifies what is being said so that the phonemes to cue can be
extracted.

In summary, if there are multiple speakers in a scene, who says
what is usually very significant to understanding the dialog.  So
having this information captured and communicated to the
re-renderer is important.

[AG before]
>  >
>>  Note that in interactive Braille as the delivery context, right-
>>  justification
>>  and color are not appropriate as speaker-change cues.  So we
>>  need the speaker-change semantics available, separable from any
>>  particular visual-presenation effects.  DFXP gives the author the
>>  capability to express this, but will the information be there in
>>  instances?
>
>[GA] I'm not sure how to interpret your question. Are you suggesting
>that the spec mandate the appearance of specific information, such as
>mandating the presence of ttm:agent or ttm:role attributes or something
>else? If so, that seems a little unusual for a technical specification,
>which, from my view, is merely a definition of a tool set, and not a
>policy about the use of the tool set.

[AG] Yes, I am suggesting that the specification should perhaps
mandate or otherwise foster the capture and transmission of key
information. It may seem unusual, but it's not inappropriate. HTML4
mandates the use of the ALT attribute, and SSML requires xml:lang
information.

Toolkits should not be finalized without attention to the usage that
they foster. Leading the Web to its full potential includes those
things at intermediate-tool levels that are required to assure
successful interoperation on an end-to-end, that is to say
person-to-person basis.  Properties that are big swingers for
perception and comprehension merit careful cultivation.

While W3C does "technology, not policy," it is not policy-blind
technology. W3C develops technologies with the purpose of expanding
the sharing of information.

QA and PF are there to try to see we walk this talk.

[GA]
>
>Do you see adaptation occurring along some semantic axis, e.g., select
>only the text of some specific agent?
>

Adaptation occurring along semantic axes? Absolutley. Not so much
filtering by speaker, but finding a delivery-context-appropriate
representation for the semantic content. Adaptation must be aware of
the semantic axes to preserve semantics and not inadvertently change
the meaning.

Mostly it's the need for the new presentation to represent the semantics.

Another user-scenario example illustrates another semantic that is
very important: typed data. This includes typically dates, money, and
addressing information.

One of the things that people who use screen readers in conjunction
with a refreshable Braille display do is to snatch things for which
spelling is critical out of a passing audio stream into the Braille
display and thence to a text scratchpad. Compare with the use of text
chat alongside video chat by Deaf people.

A hearing, blind user would want to dip into the timed-text stream
for spelling-critical items all the while following the broadcast
audio with their ears. For this purpose, type knowledge about the
typed data is valuable in the retail representation available
to the user agent.

Al

>  > -----Original Message-----
>>  From: Al Gilman [mailto:Alfred.S.Gilman@IEEE.org]
>>  Sent: Monday, April 11, 2005 10:50 AM
>>  To: public-tt@w3.org
>>  Cc: Charles McCathieNevile
>>  Subject: RE: Coments - last call draft (design forward first?)
>>
>>
>>  At 10:02 AM -0500 4/1/05, Glenn A. Adams wrote:
>>  >Thanks Al, this is useful input. I am drafting a longer response on
>the
>>  >issue of defining UA Behavior that I expect to send this evening or
>>  >tomorrow. However, in the mean time, I want to point out that it was
>NOT
>>  >a requirement for AFXP or DFXP that it be delivered to a user agent.
>In
>>  >particular, the system model for TTAF does not include a user agent;
>>  >rather, it posits a subsequent transformation process to an actual
>>  >distribution format that is wedded to a UA. At the same time, neither
>>  >AFXP nor DFXP are precluded from direct distribution, nor from direct
>>  >presentation by a UA; on the other hand, the task of defining such a
>>  >usage was not included in the requirements, and is, at present,
>>  >considered to be largely out of scope for the current chartered work.
>>
>>  Thank you for offering us the chance to refine our input for a little
>>  longer.
>>
>>  Since you are meeting face-to-face, let me offer the following
>thoughts
>>  of an individual and preliminary nature.
>>
>>  Key thoughts:
>>
>>  - if the user can receive the content on a programmable device, we
>>  need to develop the [Web] distribution options and content
>>  constraints [with format support] to serve alternative (adaptive)
>>  presentation for individuals.
>>
>>  - there is going to be a lot of content that sees the light of
>>  intra-broadcast-industry pipelines in DFXP encoding.  Deferring
>>  adaptive use to the availability of an AFXP spec is not necessarily
>>  an acceptable policy from the standpoint of disability access.
>>
>>  While the DFXP specification may not define a CPE player for the
>format
>>  per se, there is still reason to consider use cases for people with
>>  disabilities which require an alternate presentation of the material.
>>
>>  Just because there is no anticipation that the DFXP would be used
>>  directly in mass-market set-top-box processes, it doesn't mean that
>>  there aren't authoring-time requirements on the content that should
>>  be supported in the intermediate form i.e. the DFXP.
>>
>>  Making the DFXP available to a transcoder of the user's choice is
>>  one way that the content encoded in the DFXP could be served to
>>  a person with a disability requiring alternate presentation.
>>
>>  Or the content could be browsed offline using a mainstream XML reader
>>  and a schema-aware assistive technology.
>>
>>  [start use scenario]
>>
>>  Here is a scenario sketch to illustrate what I mean:
>>
>>  There is a meeting held by videoconference over a corporate extranet.
>>  To serve strategic partners in other countries and technology
>>  platforms, Internet technologies are used including subtitles
>generated
>>  in real time and distributed using DFXP as an intermediate form.
>>
>>  One of the people whose job requires interacting with the content of
>>  the meeting is Deaf and blind.  So a complete log of the meeting is
>>  kept for this participant's offline review.
>>
>>  supposition:  The DFXP, as an XML format, is the dataset of choice
>>  on which to base this person's browse of what transpired in the
>>  session.  Not just the formal statement of the decisions that were
>>  reached, but the dialog that led to the decisions.
>>
>>  This would mean that the DFXP would be spooled and archived with
>  > the audio and video.  Quite possibly there would be a SMIL wrapper
>>  created as a replay aid.  But the deaf-blind user would be reviewing
>>  this through a refreshable Braille device and primarily reviewing the
>>  timed text as transcript.
>  >
>>  Note that in interactive Braille as the delivery context, right-
>>  justification
>>  and color are not appropriate as speaker-change cues.  So we
>>  need the speaker-change semantics available, separable from any
>>  particular visual-presenation effects.  DFXP gives the author the
>>  capability to express this, but will the information be there in
>>  instances?
>
>[GA] I'm not sure how to interpret your question. Are you suggesting
>that the spec mandate the appearance of specific information, such as
>mandating the presence of ttm:agent or ttm:role attributes or something
>else? If so, that seems a little unusual for a technical specification,
>which, from my view, is merely a definition of a tool set, and not a
>policy about the use of the tool seet.
>
>>  So regardless of whether a collated transcript is created by a
>>  transcoder, or the several text streams are browsed as is with an
>>  adaptive user agent, the availability of speaker identification in
>>  the DFXP instance, the working base for the adapted use, or at a
>>  minimum speaker-change events if the identity of the speakers was not
>>  captured, would be important in affording this user comparable quality
>>  of content as those receiving the same information as real-time
>>  display integrated with the video and audio.
>>
>>  [end use scenario]
>>
>>  This is just to illustrate that there are people with disabilities for
>>  whom
>>  the introduction of something like the DFXP into the content pipelines
>>  of broadcast happenings reflects an opportunity that should not be
>>  wasted to raise the level of service and lower the cost of delivering
>>  that service.
>>
>>  In particular, the use cases for adapted presentation do not
>necessarily
>>  presume that the DFXP would be pushed to all consumers in the
>>  broadcast bundle.  The distribution protocol might be on an ask-for
>>  or 'pull' basis.   And the user interaction might be in non-real-time
>>  after the fact and not at speed.
>>
>>  But the non-availability of the AFXP format as a "source in escrow"
>>  format for adapted uses means that the user needs the DFXP that
>>  gets produced to be as fit an adaptation basis as we can make it.
>>  This will be true while the AFXP is undefined, and will still be true
>>  for those situations where a copy of the DFXP can be obtained
>>  and a copy of a standard, XML source for that content cannot.  The
>>  latter is likely to be common even after the AFXP has been
>>  specified by W3C.
>
>[GA] Another way to "produce .. as fit an adaptation basis" is to
>generate all permutations of DFXP from an original source, whatever it
>may be (AFXP or something else), in accordance to whatever adaptation
>parameter space applies to the original content.
>
>Could you give us some examples of what types of adaptation you might
>see applying to pure textual content? [which is presently the case for
>DFXP since it does not specify image or any other types of embedded
>content formats for which alternatives may be preferred].
>
>Do you see adaptation occurring along some semantic axis, e.g., select
>only the text of some specific agent?
>
>  > Thank you (the whole group) for bringing this important technology
>>  this far.  Best wishes for your meeting.
>>
>>  Al
>>
>>  >Regards,
>>  >Glenn
>>  >
>>  >>  -----Original Message-----
>>  >>  From: Al Gilman [mailto:Alfred.S.Gilman@IEEE.org]
>  > >>  Sent: Friday, April 01, 2005 9:40 AM
>>  >>  To: public-tt@w3.org
>>  >>  Cc: Charles McCathieNevile
>>  >>  Subject: Re: Coments - last call draft (design forward first?)
>>  >>
>>  >>
>>  >>  At 11:46 AM +1000 3/31/05, Charles McCathieNevile wrote:
>>  >>  >1. Meeting requirements
>>  >>  >
>>  >>  >[[[
>>  >>  >
>>  >>  >It is intended that a more feature-rich profile, known presently
>as
>>  >>  >the Authoring Format Exchange Profile (AFXP), be developed and
>>  >>  >published to address the full set of documented requirements.
>>  >>  >
>>  >>  >]]]
>>  >>  >
>>  >>  >Is there any concrete reason to believe this will take place? The
>  > >>  >group has had its charter extended already, just to produce this
>>  >>  >restricted draft. Is the group working on this more complete
>version
>>  >>  >already? Or is this just a hope?
>>  >>
>>  >>   From an accessibility perspective, the following are not
>unreasonable
>>  >>  to expect:
>>  >>
>>  >>  1 [constraint] The DFXP, if processed through disability-adaptive
>>  >>  presentation transforms, must achieve a functional user
>experience.
>>  >>
>>  >>  2. [preference] The DFXP should contain, fully modeled, all the
>>  >>  information in the AFXP needed for deriving alternate look and
>feel
>>  >>  bindings adaptive for diverse people with disabilities [not just
>>  >>  sighted, Deaf people].
>>  >>
>>  >>  2 [prognosis] Deriving disability-adaptive look and feel from the
>>  >>  AFXP will produce more usable user experiences than deriving
>>  >>  disability-adaptive look and feel from the DFXP, given the current
>>  >>  order of freezing the profiles.
>>  >>
>>  >>  3 [prognosis] Experience with disability-adaptive alternative
>>  >>  presentations of AFXP content will make clear places where we
>should
>>  >>  have done things differently in the DFXP.
>>  >>
>>  >>  So some concern about the freezing of the DFXP to a PR before
>>  >>  completing the CR experience with the AFXP is natural from an
>>  >>  accessibility perspective.
>>  >>
>>  >>  In terms of meeting requirements, the possibly under-explored use
>>  >>  case is one where the DFXP is delivered directly to a player (User
>>  >>  Agent) running on a Customer Premises Equipment computer -- is the
>>  >>  display and control of the text stream suitably adaptable in this
>>  >>  case?
>>  >>
>>  >>  Al
>>  >>
>>  >>
>>  >>
>>
>>
Received on Tuesday, 12 April 2005 16:42:52 UTC