RE: Recently discovered issue with WCAG2ICT definition of "document" - suggesting a new note to clarify from Michael Pluke on 2013-07-04 (public-wcag2ict-tf@w3.org from July 2013)

From: Michael Pluke <Mike.Pluke@castle-consult.com>
Date: Thu, 4 Jul 2013 06:49:07 -0400
To: Loïc Martínez Normand <loic@fi.upm.es>, Gregg Vanderheiden <gv@trace.wisc.edu>
CC: Peter Korn <peter.korn@oracle.com>, David MacDonald <david100@sympatico.ca>, "public-wcag2ict-tf@w3.org" <public-wcag2ict-tf@w3.org>, Gregg Vanderheiden <ez1testing@gmail.com>, "kirsten@can-adapt.com" <kirsten@can-adapt.com>
Message-ID: <5735ED0D92A3E6469F161EB41E7C28A86797CBDB05@MAILR001.mail.lan>
Dear all

My proposal was trying to stick very closely to the basic definition that a document is an "assembly of content". Our definition does not say that a document is exclusively content, so it could legitimately include things that are not content as well as content. However the definition makes it clear that if it has no content then it is not a document. So, if our definition is correct my proposal is a warning and a clarification of what does and does not fit.

However, there is an alternate approach that Loïc has taken in his email (below). This picks up on another part of the definition. However I have some slight concerns about how to interpret the words "that is not part of software" (that is part of a definition). I fear that this concept is open to interpretation with:


-          some arguing that this virus definition file is exclusively used by the software as part of the way the software works - therefore it is "part of the software";

-          whilst others will argue that the software application is one file and the virus definitions are in another file - therefore the virus definitions are separate from the software and they are not "part of the software".

I think that Gregg has correctly addressed this ambiguity in his much longer last attempt to solve our dilemma. He says that:


-          "But they function as, and would be evaluated as, part(s) of the software and not as separate entities or as documents."

I think that if we take Loïc's much simpler and shorter note and add in Gregg's point we could end up with someone that effectively removes or significantly reduces the ambiguity:

(New) Note 3: Software configuration and storage files such as databases and virus definitions, as well as computer instruction files such as source code, batch/script files, and firmware, are examples of files that function as part of software and thus are not examples of documents.

If we can accept "that function as" here instead of "are" we should be OK. If not we might have to throw the spotlight on the word "are" in our main definition - and I'd rather not go there!!

Best regards

Mike


From: Loïc Martínez Normand [mailto:loic@fi.upm.es]
Sent: 04 July 2013 11:07
To: Gregg Vanderheiden
Cc: Peter Korn; David MacDonald; public-wcag2ict-tf@w3.org; Gregg Vanderheiden; kirsten@can-adapt.com
Subject: Re: Recently discovered issue with WCAG2ICT definition of "document" - suggesting a new note to clarify

Dear all,

What a discussion! I just went to be minutes after receiving the first email... and bang! I woke up with a very long thread.

I think that things are getting overcomplicated as the discussion has progressed and I'm going to try to simplify.

But first I need to go back to the origin of the discussion. We have the definitions of "content" and "document":

 *   content (non-web content): information and sensory experience to be communicated to the user by means of software, including code or markup that defines the content's structure, presentation, and interactions.
 *   document (as used in WCAG2ICT): assembly of content, such as a file, set of files, or streamed media that is not part of software and that does not include its own user agent
First, lets not forget that the definition of content includes the code or markup that defines the structure, presentation and interactions. That means that we can have a file written in markup language that can be considered to be a document.

Second, the important bit of the definition of document for this discussion is that a document "is not part of software". I think that the files that Peter has been talking about (configuration files, virus definition files, internal databases) are in fact, part of software and thus are not documents.

So my proposal for the new (shorter) note is:

(New) Note 3: Software configuration and storage files such as databases and virus definitions, as well as computer instruction files such as source code, batch/script files, and firmware, are examples of files that are part of software and thus are not examples of documents.

What do you think? I don't think that we need to add text to explain that it is the software who "contains" these files who need to be considered, do we?

Best regards,
Loïc

On Thu, Jul 4, 2013 at 8:41 AM, Gregg Vanderheiden <gv@trace.wisc.edu<mailto:gv@trace.wisc.edu>> wrote:
Wow -- this is getting long.

I think I see another way around the problem.    (see below )

First - what was the problem.
- the problem comes from talking about a file that is "separate from the software"  (such as an update file or database) that is used by the software and subsequently  causes information not in the software to be displayed.     Is this a 'document?"
- the concern was that if the software doesn't know of the contents of the file in advance, then any new non-text content of the file that gets presented to a user  cannot be made accessible by the software.  nohow.   So the file needs to follow the SC and itself provide the alternate form of the non-text content just like any html file for example.

The language below (and previous versions) did not cover this -- and said that the software was responsible and the file did not need to follow the SC.   This is a problem.

HOWEVER - I think we can get where you want to be by talking about the virus update etc as and UPDATE to the Software rather than a separate piece of content or 'document'.

Something like this:


Note 3: Software configuration and storage files such as databases and virus definitions, as well as computer instruction files such as source code, batch/script files, and firmware, that are part of a software package, or an update to part of the software package are not examples of documents.  As with any update, if they include new non-text information for presentation to users, they would be expected to include accompanying alternate text presentations if the software doesn't already have them or the ability to create them.  But they function as, and would be evaluated as, part(s) of the software and not as separate entities or as documents.


Does that address the problem - without creating a new one?



Gregg
--------------------------------------------------------
Gregg Vanderheiden Ph.D.
Director Trace R&D Center
Professor Industrial & Systems Engineering
and Biomedical Engineering University of Wisconsin-Madison
Technical Director - Cloud4all Project - http://Cloud4all.info
Co-Director, Raising the Floor - International - http://Raisingthefloor.org
and the Global Public Inclusive Infrastructure Project -  http://GPII.net

On Jul 4, 2013, at 12:56 AM, Peter Korn <peter.korn@oracle.com<mailto:peter.korn@oracle.com>> wrote:

Gregg, David,

I think where we are getting tripped up is around the common-sense concept of what a document is, vs. files that could contain information that in some fashion gets displayed to a user, at some point, by software.

I think about files used internally by some software to persist the user interface (see the last example paragraph in SC 4.1.1<http://www.w3.org/WAI/GL/wcag2ict/#ensure-compat-parses>: "Examples of markup used internally for persistence of the software user interface that are never exposed to assistive technology include: XUL, GladeXML, and FXML. In these examples assistive technology only interacts with the user interface of generated software.").  These files define a software program's user interface - the contents of the menus and toolbars and dialog boxes.  But for the fact that they happen to exist as a separate file on disk, they are simply part of the software program as shipped, and we don't treat them as documents.  If instead of being encoded in ASCII/UNICODE, they were in binary form, nobody would be the wiser that these files weren't executable programs.  We don't think of these XML UI definition files as "documents" for the purposes of WCAG2ICT.  We don't attempt to apply all of the success criteria to them separately; they are simply a part of the software program and they are covered through the evaluation the software program.  If there is something missing in them needed for accessibility (e.g. ALT text for the icon in the toolbar), that causes the software to fail a success criterion, then the software simply fails the SC.

Similarly, a virus definition file that had embedded within it the names of known viruses and the names of places they appear - which may get displayed by the user when a virus is found - is really part of the anti-virus application (as periodically updated by the vendor).  If they were binary files that were delivered as "software patches" we wouldn't think of them as documents.  That they happen to have filenames encoded in ASCII/UNICODE should make no difference.  As with the XUL/GladeXML/FXML example in the paragraph above, they are simply a part of the software program and they are covered through the evaluation of the software program.  If there is something missing in them, that causes the software to fail a success criterion, then the software simply fails the SC.  It doesn't matter from which software file the failure arises.

Finally, if someone were to write a program (and defined the accompanying database) that stored & retrieved documents, the fact that the storage mechanism is in a database file (or collection of files) is no different than if instead the "file" was a filesystem on a disk drive.  If you have ever run virtualization software like VirtualBox, you may notice that the "hard drive" that gets created for your virtual machine is in fact a file in the filesystem of the underlying platform.  That "hard drive" file will contain any number of documents (and programs and so forth).  That doesn't make the hard drive file itself a document (anymore than a database into which someone has stored documents thereby becomes itself a document).  We don't apply WCAG2ICT's success criteria to the VirtualBox hard drive file in the underlying platform.


So... assuming we all agree with those three paragraphs above, the question becomes how best to state this.

Gregg - the approach you are advocating puts a constraint on the types of files: they avoid being called "documents" only if they "do not present information to users through a user agent" (this is because of where you have placed the comma).  But since we have redefined content from what it was in WCAG - to remove the term "user agent" from it - we have content being any "information and sensory experience to be communicated to the user by means of software".  So we have something that is circular.


Maybe I can get at this another way: by making clear that where files that are simply part of software happen to contain "information and sensory experience to be communicated to the user", you don't consider those separate files to be documents, but instead apply WCAG2ICT to that software (and the content rendered by it, where ever it may have come from).  See the new 2nd sentence below:
Note 3: Software configuration and storage files such as databases and virus definitions, as well as computer instruction files such as source code, batch/script files, and firmware, are not examples of documents.  If and where software retrieves "information and sensory experience to be communicated to the user" from such files, those files contribute to content<http://www.w3.org/WAI/GL/wcag2ict/#keyterms_content> that occurs in software (and WCAG2ICT applies to that software<http://www.w3.org/WAI/GL/wcag2ict/#keyterms_software>).

David - in the case of your example of a database containing documents... if the document is never available separately (e.g. the software program that stores/retrieves/displays the document from the database is the only way a user can ever read & interact with the document), then I claim it isn't a document.  If this were a closed system (e.g. a kiosk) displaying canned information stored entirely inside it (not retrieved over the web), we would only evaluate it as software (with closed functionality).  We wouldn't attempt to say that the kiosk's information was contained one or more documents that can be separately evaluated - that information is opaque to us.

Now, if/when a document is retrieved from a database and emitted into a stand-alone form that can separately be retrieved and presented by a user agent (e.g. I've obtained a Word file from Microsoft Sharepoint and stored a snapshot of it on my local hard drive), then that becomes a document and it can be separately evaluated as such.  But the datastore maintained by Microsoft Sharepoint (containing any number of documents and document revisions, in any number of snapshots and states), isn't itself a document.  It is a file that is internal to the application.


Peter

On 7/3/2013 6:40 PM, Gregg Vanderheiden wrote:

On Jul 3, 2013, at 8:22 PM, Peter Korn <peter.korn@oracle.com<mailto:peter.korn@oracle.com>> wrote:


Gregg,

Your suggestion leads to circular reasoning.

The problem with this route is then any time we have some information in some file somewhere, and that information is the source in some fashion of "content", the software that presents it becomes a user agent?  And the file becomes a document?

If the information is displayed to users -- it IS content.  and if the database contains the text and images to display -- then it HAS to contain the alternate text for the images. (The app displaying the data can't add alt text itself - it doesn't know what they data is til display time)

So this is exactly what we WANT it to say.



So if my virus definition file contains the names of viruses, and those names are displayed in my anti-virus program, the anti-virus program is now a user agent?  And the virus definition file is now a document?

Absolutely.   And if the virus definition files used icons instead of text to 'name' the viruses - the virus definition file would have to have alt text for those icons.

And if there is any other non-text information to be displayed to the user-- the virus definition file would need to have the text alternative so the application could provide that text as well in a programmatically determinable way.




That makes no sense.

Make sense now?


G










Peter
On 7/3/2013 6:18 PM, Gregg Vanderheiden wrote:
how about instead of raw - we pick up on the key distinction.


(New) Note 3: Software configuration and storage files such as databases and virus definitions, as well as computer instruction files such as source code, batch/script files, and firmware, that do not present information to users through a user agent are not examples of documents.  Such files are not "information and sensory experience to be communicated to the user" and therefore are not considered content

If a database IS just data that a user agent displays- then it WOULD be covered.  One could argue that an html file is sourcecode for the page rendering.  Certainly the javascript is.


Gregg
--------------------------------------------------------
Gregg Vanderheiden Ph.D.
Director Trace R&D Center
Professor Industrial & Systems Engineering
and Biomedical Engineering University of Wisconsin-Madison
Technical Director - Cloud4all Project - http://Cloud4all.info<http://cloud4all.info/>
Co-Director, Raising the Floor - International - http://Raisingthefloor.org<http://raisingthefloor.org/>
and the Global Public Inclusive Infrastructure Project -  http://GPII.net<http://gpii.net/>

On Jul 3, 2013, at 7:50 PM, Peter Korn <peter.korn@oracle.com<mailto:peter.korn@oracle.com>> wrote:


David,

What makes a file "raw"?  I view the situation of a program retrieving data from somewhere and presenting it within it's user interface as "content" that is displayed in software.  Said content must be accessible.  Said content could come from a database file.  Said content could be a persisted user interface (cf. SC 4.1.1<http://www.w3.org/WAI/GL/wcag2ict/#ensure-compat-parses>).  And just like the 4.1.1 case (addressing your PS in the following e-mail), there could be information in that file that helps with accessibility (e.g. the database contains images and also ALT text for those images).

But we aren't loosing anything here - whatever is in the database that winds up being presented in a user interface is content that must be accessible.  If it isn't accessible when presented in software, WCAG2ICT catches it.

But it doesn't make sense to try to apply all of WCAG to a database file as if it was a web page or a word processing file.  That's the point here.


Peter
On 7/3/2013 5:43 PM, David MacDonald wrote:
Just one nit...

Can we add the word "raw" or some other word to make it clearer...

... raw storage files such as databases

I'm a little nervous it might make the pendulum swing the other way and some administrators might think it's not a document if a user agent serving up content from a database on the backend...

Cheers
David MacDonald

CanAdapt Solutions Inc.
  Adapting the web to all users
            Including those with disabilities
www.Can-Adapt.com<http://www.can-adapt.com/>

From: Peter Korn [mailto:peter.korn@oracle.com]
Sent: July-03-13 6:59 PM
To: public-wcag2ict-tf@w3.org<mailto:public-wcag2ict-tf@w3.org> Force
Subject: Recently discovered issue with WCAG2ICT definition of "document" - suggesting a new note to clarify

Hi gang,

As part of a wider review of WCAG2ICT (asking colleagues who aren't on the Task Force to look at it), I just discovered an issue with the definition of "document<http://www.w3.org/WAI/GL/wcag2ict/#keyterms_document>".  The issue is that readers will see the term "document" and think "file", and therefore try to apply WCAG requirements to all manner of files (virus definition files and programming files were two specific concerns that came up from colleagues).

While our definition of "document" is based on the term "content<http://www.w3.org/WAI/GL/wcag2ict/#keyterms_content>" (which is scoped to "information and sensory experience to be communicated to the user"), I fear this fact is too easily missed.  Therefore, I propose that we add an additional Note to clarify this:
Note: Software configuration and storage files such as databases and virus definitions, as well as computer instruction files such as source code, batch/script files, and firmware, are not examples of documents.  Such files are not "information and sensory experience to be communicated to the user" and therefore are not considered content.
I have added that note in context, as proposed "(New) Note 3" in red text as part of the full definition of document, below:
document (as used in WCAG2ICT)

assembly of content<http://www.w3.org/WAI/GL/wcag2ict/#keyterms_content>, such as a file, set of files, or streamed media that is not part of software and that does not include its own user agent

Note 1: A documents always requires a user agent to present its content to the user.

Note 2: Letters, spreadsheets, emails, books, pictures, presentations, and movies are examples of documents.

(New) Note 3: Software configuration and storage files such as databases and virus definitions, as well as computer instruction files such as source code, batch/script files, and firmware, are not examples of documents.  Such files are not "information and sensory experience to be communicated to the user" and therefore are not considered content.

Note 34: Anything that can present its own content without involving a user agent, such as a self playing book, is not a document but is software.

Note 45: A single document may be composed of multiple files such as the video content, closed caption text, etc. This fact is not usually apparent to the end-user consuming the document / content. This is similar to how a single web page can be composed of content from multiple URIs (e.g. the page text, images, the JavaScript, a CSS file etc.).


I would like to propose this edit as part of the WCAG WG review next Tuesday July 9th, so it can get into the 3rd/final public draft that we publish later in July.

Any thoughts/edits before I do this as part of my WCAG WG "Ultimate? Survey"<https://www.w3.org/2002/09/wbs/35422/Ultimate/> response?


Peter
--
<Mail Attachment.gif><http://www.oracle.com/>
Peter Korn | Accessibility Principal
Phone: +1 650 5069522<tel:+1%20650%205069522>
500 Oracle Parkway | Redwood City, CA 94064
<Mail Attachment.gif><http://www.oracle.com/commitment>Oracle is committed to developing practices and products that help protect the environment

--
<oracle_sig_logo.gif><http://www.oracle.com/>
Peter Korn | Accessibility Principal
Phone: +1 650 5069522<tel:+1%20650%205069522>
500 Oracle Parkway | Redwood City, CA 94064
<green-for-email-sig_0.gif><http://www.oracle.com/commitment> Oracle is committed to developing practices and products that help protect the environment


--
<oracle_sig_logo.gif><http://www.oracle.com/>
Peter Korn | Accessibility Principal
Phone: +1 650 5069522<tel:+1%20650%205069522>
500 Oracle Parkway | Redwood City, CA 94064
<green-for-email-sig_0.gif><http://www.oracle.com/commitment> Oracle is committed to developing practices and products that help protect the environment


--
<oracle_sig_logo.gif><http://www.oracle.com/>
Peter Korn | Accessibility Principal
Phone: +1 650 5069522<tel:+1%20650%205069522>
500 Oracle Parkway | Redwood City, CA 94065
<green-for-email-sig_0.gif><http://www.oracle.com/commitment> Oracle is committed to developing practices and products that help protect the environment




--
---------------------------------------------------------------
Loïc Martínez-Normand
DLSIIS. Facultad de Informática
Universidad Politécnica de Madrid
Campus de Montegancedo
28660 Boadilla del Monte
Madrid
---------------------------------------------------------------
e-mail: loic@fi.upm.es<mailto:loic@fi.upm.es>
tfno: +34 91 336 74 11
---------------------------------------------------------------
Received on Thursday, 4 July 2013 10:49:52 UTC