RE: web resource and terminology from Matt Garrish on 2015-09-28 (public-digipub-ig@w3.org from September 2015)

From: Matt Garrish <matt.garrish@bell.net>
Date: Mon, 28 Sep 2015 11:43:01 -0400
To: "'Ivan Herman'" <ivan@w3.org>
Cc: "'Bill Kasdorf'" <bkasdorf@apexcovantage.com>, "'Leonard Rosenthol'" <lrosenth@adobe.com>, "'W3C Digital Publishing IG'" <public-digipub-ig@w3.org>
Message-ID: <CB19E215A6C149F9811857A88010BF7B@mgarrish>
Rather than pick at what I think are ambiguous words, here’s quick rundown
of changes I’d suggest:

Don't use "content" in the web resource definition because content means
something else in the context of a document you read. You already say it is
a digital resource, so state that it can be accessed and leave it at that:

A Web Resource is a digital resource that can be uniquely addressed by a
Unified Resource Identifier (URI) [URI], and _that_ can be accessed through
standard protocols like HTTP, FTP, the File Protocol, etc.

Include the wcag definition perhaps tweaked to:

content (Web content)
information and sensory experience to be communicated to the user by means
of a user agent, including web all _web resources_ that define the
content's structure, presentation, and interactions

Then move essential content and functionality under this definition where
they're more tightly bound.

State what a web document represents (single document/publication), not only
that it is a web resource. You start the definition saying it is a web
resource, so why end it saying it's to be considered as one?

Sorry if I'm being pedantic about these definitions, but I think they make
the concepts harder to understand than they need to be.

Anyway, I'm coming down with yet another cold, so if I go silent for a while
I'm not just ignoring people.

Matt

________________________________________
From: Ivan Herman [mailto:ivan@w3.org] 
Sent: September-28-15 10:13
To: Matt Garrish
Cc: Bill Kasdorf; Leonard Rosenthol; W3C Digital Publishing IG
Subject: Re: web resource and terminology


On 28 Sep 2015, at 16:01 , Matt Garrish <matt.garrish@bell.net> wrote:

Getting rid of "collate" is a useful step toward clarity, but I'm not
suggesting dropping resources for content. What I'm saying is that you're
already using content without any clear definition of what you mean when you
use it, and that's equally confusing.

I misunderstood you. I really thought you wanted to remove resources in
favour of content

Just to be clear, would it be o.k. with you and others if

- copied the WCAG definition for content into the definition, ie:

• content (Web content): information and sensory experience to be
communicated to the user by means of a user agent, including code or markup
that defines the content's structure, presentation, and interactions
- change collate to aggregate

I fine making these changes, unless somebody stops me:-)

Ivan


 
To run through your definitions again, from web resource:
 
> and whose content can be accessed
 
What does content mean here? A style sheet has content that can be accessed
by any protocol. It's like you're trying to scope the RDF meaning of web
resource here without stating why this even matters. There's a difference
between the content of a file and the content that gets consumed by a user.
WCAG recognizes this, but you took two sub-definitions and omitted stating
what content is. It leaves me having to read between the lines.
> Essential Content of a Web Resource: if removed, would fundamentally
change the information or functionality of the content.
Here content becomes "essential", but the only "content" mentioned so far is
the data of the resource. Isn't all the data of a single resource
fundamental? Why would any user agent be removing bytes of data? This
statement makes no sense unless I go off on my own tangent and assume that
you don't really mean the data of the resource anymore but (perhaps) other
resources that are referenced by the resource (e.g., images, audio, video,
etc.).
 
I'd ask in that case why essential content isn't defined under web document,
since the impact is on the document, whether or not it affects a particular
resource. If you remove certain essential resources, I can follow that you
break the fundamental information/functionality expressed by the document.
 
Anyway, I'm running out of steam. Most casual readers I suppose skip right
over terminology, anyway, and read whatever meaning they want into documents
from their titles and loose skimming of the content...
 
Matt
 
From: Ivan Herman [mailto:ivan@w3.org] 
Sent: September 28, 2015 7:40 AM
To: Matt Garrish <matt.garrish@bell.net>
Cc: Bill Kasdorf <bkasdorf@apexcovantage.com>; Leonard Rosenthol
<lrosenth@adobe.com>; W3C Digital Publishing IG <public-digipub-ig@w3.org>
Subject: Re: web resource and terminology
 
(A common response to the thread, not only to this mail.)
 
- I must admit I do not have the same feeling about "resource" v.a.v.
"content". I guess everyone comes with a different baggage that influences
our reactions. For me (and I think it was Deborah who brought this into the
discussion) the term 'resource' is very generic and I was primarily
influenced by the term as used in RDF[1], although we intentionally
restricted the RDF term to Web resources (in RDF, conceptually, I can also
be considered as a resource:-).
 
Also, to be awfully pedantic: the "content" of a resource is not the same as
the resource itself. If I remove some content from a resource, it is still
the same resource, though with a different content. Ie, I do not think
relying exclusively on the concept of 'content' would cut it either.
 
- I accept the criticism on "collation". I must admit I did not realize it
has the concept of ordering in it but I obviously yield to my anglo-saxon
colleagues (and the Merriam Webster entry:-).
 
Trying to retrace the history in the thread[2], the way we got to this term
(and not only use 'set') is, primarily, because we wanted to differentiate
between a random set of resources bound together and something with a clear
intention of expressing something. The term 'curated' did come up, but there
was a sense that the term has a jargon meaning in museums or libraries, ie,
we should avoid using it. "Collated" came into the picture, expressing the
intentionality. Another term that did come up during the discussion is
"aggregated"; maybe that term is better than "collated". I just checked in
Merriam Webster, and this terms does not suggest ordering, so I am happy to
change that if people agree.
 
Thanks
 
Ivan
 
 
 
[1] http://www.w3.org/TR/rdf11-concepts/#resources-and-statements
[2] http://j.mp/1O8eB6g
 
On 28 Sep 2015, at 01:45 , Matt Garrish <matt.garrish@bell.net> wrote:
 
I just hate nuances, and web document and html document are often used
interchangeably without consideration that a web document isn't restricted
to being an html document. It's clear that html documents aren't the only
content-carrying resources allowed, but outside an audience well-versed in
web terminology I expect the difference will get lost. I did a quick search
and after an initial wikipedia entry that got it right, every use equated
web document with html page.
 
But I get there is also awkwardness when you do, in fact, only want to
represent a single html document.
 
Go with Portable Web Content and no one wins... ;)
 
Matt
 
From: Bill Kasdorf [mailto:bkasdorf@apexcovantage.com] 
Sent: September 27, 2015 7:11 PM
To: Matt Garrish <matt.garrish@bell.net>; 'Leonard Rosenthol'
<lrosenth@adobe.com>; 'W3C Digital Publishing IG' <public-digipub-ig@w3.org>
Subject: RE: web resource and terminology
 
On "curation," I wasn't actually recommending it, I was just speculating
that perhaps that was what was meant rather than "collation." I agree, a
more neutral term, something like "assemble" or "collect" or their noun
forms might be best. "Assemble" has the connotation of a bunch of stuff
intended to work together, whereas "collect" really just connotes "gather
together."
 
I like the direction you're going with the definition, but I still have a
problem calling it a Web Document instead of a Web Publication. I have a
hard time thinking of a big complex collection of resources as a document,
but I don't have a hard time thinking of a simple standalone document as a
publication.
 
--Bill K
 
From: Matt Garrish [mailto:matt.garrish@bell.net] 
Sent: Sunday, September 27, 2015 7:04 PM
To: Bill Kasdorf; 'Leonard Rosenthol'; 'W3C Digital Publishing IG'
Subject: RE: web resource and terminology
 
I agree that's better than collation, but curation is still odd. Do you
curate your epub file? Do you curate a web page to make a portable
representation of it?
 
Using "curation" also suggests strong ties with digital curation, and, while
that activity that might use this portable format as part of the larger
process of curation, it seems like unnecessary baggage to saddle the
definition with.
 
Is how the resources came to be collected together of any importance
compared to what they're intended to represent? That point is currently hard
to discern, but why not something like "A Web Document is set of
interrelated Web Resources that is intended to be considered as a single
document or publication."?
 
Matt
 
From: Bill Kasdorf [mailto:bkasdorf@apexcovantage.com] 
Sent: September 27, 2015 4:53 PM
To: Leonard Rosenthol <lrosenth@adobe.com>; Matt Garrish
<matt.garrish@bell.net>; 'W3C Digital Publishing IG'
<public-digipub-ig@w3.org>
Subject: RE: web resource and terminology
 
Actually, three of the four non-religious definitions of "collate" in
Merriam Webster are about arranging in a proper order, and people in
publishing almost always associate it with ordering. So although you're
technically correct that it doesn't always mean ordering, most of the time
it does.
 
My guess was that possibly "curation" was meant, not "collation," which has
more of a sense of a purposeful gathering together.
 
 
 
From: Leonard Rosenthol [mailto:lrosenth@adobe.com] 
Sent: Sunday, September 27, 2015 1:55 PM
To: Matt Garrish; 'W3C Digital Publishing IG'
Subject: Re: web resource and terminology
 
Matt – let me see if I can help.  (and, anyone else, feel free to correct
me)
 
You are correct that a style sheet and a script (or a font) are as much
resources as HTML is.  That is as it should be, because in the context of a
web document, they aren’t necessarily different.  There is no reliance on a
“primary resource” (as there is with EPUB, for example).  
 
Essential content is what would be displayed to the user and/or machine
processor – depending on the context.   So it might be displaying text, or a
.csv of spreadsheet data or … But it’s not a font, for example, that
wouldn’t (necessarily) change the content itself (granted there are
exceptions to that rule as well, but…)
 
Collation is simply a grouping – it has nothing to do with ordering. 
 
I don’t recall if “web content” was suggested or not, but from your
description, I don’t think it fits our model (or at least mine).  There are
things that fit into a PWD that are neither “web content” nor “rendering
resource” - for example, my .csv in the previous example.  But that is a
perfectly valid web resource.  I think web resource is a more generic form
of both – and maybe we could define it that way, if necessary. (though I
don’t see the necessity right now).
 
I think the single page vs. multiple page – or the general problem of
“sectioning’ a web document hasn’t yet been raised.
 
Leonard
 
From: Matt Garrish
Date: Sunday, September 27, 2015 at 8:52 AM
To: 'W3C Digital Publishing IG'
Subject: web resource and terminology
Resent-From: <public-digipub-ig@w3.org>
Resent-Date: Sunday, September 27, 2015 at 8:52 AM
 
I've been trying to read through the terminology and find there's a
confusing reliance on "web resource" to mean both the content of the
document/publication and the resources needed to render the document.
 
The definition of web resource seems reasonable enough, in that anything
that can be referenced by a URI is a resource. By that definition, an HTML
document is a web resource, but so is a style sheet, script, etc. Stating
that the content of the resource can be retrieved by a protocol doesn't mean
that a resource has content in the readable content of the document sense
(e.g., a style sheet's "content" is all the rules defined in it).
 
The two sub-bullets then start to make an unstated distinction between types
of web resources, however, as an html document will have "essential
content", but a style sheet or script wouldn't appear to.
 
The confusion grows in the web document definition, as now web resources are
"collated." Is it really the case that fonts, scripts, etc. are combined
into a specific ordering? I didn't follow the entire email chain,
unfortunately, but I do recall seeing this in relation to an ordering of the
content in the web document. Collation makes sense in that context, as it is
analogous to the epub spine.
 
And finally, web resource reappears in its more general sense in the third
bullet, but here suggesting "essentiality" of certain resources but not
others (I take from the discussions this has to do with not every resource
impacting the overall readability).
 
Long story short, was consideration given to including a definition of "web
content" (as also exists in WCAG) to disambiguate these many uses of "web
resource" for both content and rendering resources? Essential web content
and functionality is clearer than stated now for resources. A web document
as a collation of web content is also clearer, and it being a web resource
is less confusing. Portability would depend on the ability to present the
content, even if some rendering resources aren't available.
 
Anyway, just wanted to share that thought I had while reading. The
definitions are very nuanced right now without the context of the email
discussions.
 
And as a side note, if "web document" is the ultimate choice for this then
it might be good to bump up in importance that web document != html document
from the last sub-bullet of the web document definition. I expect the terms
are read as synonymous by many people, in which case having a web document
made up of resources makes it sound like you're defining portability only
for single pages.
 
Matt
 

----
Ivan Herman, W3C 
Digital Publishing Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
ORCID ID: http://orcid.org/0000-0003-0782-2704



 


----
Ivan Herman, W3C 
Digital Publishing Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
ORCID ID: http://orcid.org/0000-0003-0782-2704
Received on Monday, 28 September 2015 15:44:13 UTC