- From: Matt Garrish <matt.garrish@gmail.com>
- Date: Fri, 30 Jul 2021 12:02:04 -0300
- To: "'Ivan Herman'" <ivan@w3.org>, "'John Foliot'" <john@foliot.ca>
- Cc: "'W3C EPUB 3 Working Group'" <public-epub-wg@w3.org>, "'Avneesh Singh'" <avneesh.sg@gmail.com>, "'Dan Lazin'" <dlazin@google.com>
- Message-ID: <004f01d78553$dbd04c00$9370e400$@gmail.com>
Ivan’s already made the most salient points, so the only thing I have to add is that not invalidating existing content is the highest priority requirement of our charter: “The primary goal of EPUB 3.X is to remain compatible with existing content. Any existing valid EPUB 3.2 should remain valid under EPUB 3.X, unless it relies on features discovered to have serious issues (such as a security bug).” Yes, warnings aren’t errors, and warnings aren’t even technically a violation of a specification, but when the rubber hits the road in the real world, specification nuances like this are the lesser concern. It's a real cost to publishers to have to remediate their publications to republish them, so we have a high bar for justifying any new issue that they must fix. General improvements like we’re discussing for landmarks, don’t rise to that bar. I’m sure everyone here has at least one pet peeve they’d love to drop or do over, but we’re stuck in a kind of holding pattern until an EPUB 4 rolls around to fix the things that already have authoring uptake. Matt From: Ivan Herman <ivan@w3.org> Sent: July 30, 2021 11:03 AM To: John Foliot <john@foliot.ca> Cc: Matt Garrish <matt.garrish@gmail.com>; W3C EPUB 3 Working Group <public-epub-wg@w3.org>; Avneesh Singh <avneesh.sg@gmail.com>; Dan Lazin <dlazin@google.com> Subject: Re: Looking for some clarification On 30 Jul 2021, at 15:43, John Foliot <john@foliot.ca <mailto:john@foliot.ca> > wrote: Hi Matt > The problem is this probably invalidates every existing EPUB Hmmm... that's not how I understand SHOULD w.r.t RFC 2119 <https://datatracker.ietf.org/doc/html/rfc2119> : SHOULD - This word, or the adjective "RECOMMENDED", mean that there may exist valid reasons in particular circumstances to ignore a particular item, but the full implications must be understood and carefully weighed before choosing a different course. Legacy content that does not meet this requirement would be "exempt" using the 'valid reason' justification (it's why I suggested SHOULD in the first place) - i.e. the document was published prior to the addition of the specification requirement, so it is 'grandfathered' (the valid reason), yet it also sets expectations going forward. John, this is not an RFC issue, but of an industry practice. This community is fairly strict in validating the content they produce (as opposed to an average Web site creation) and that is a good thing. It relies on epubcheck[1] which is, afaik, known by pretty much all serious publishers. And the policy of epubcheck is to issue a warning for the violation of SHOULD statements. What happens is that a publisher submits an EPUB publication to a company getting the epub instance to the market, that company runs epubcheck, and may to decide to reject any publication which has even a warning and of course an error (I am probably oversimplifying). Overall, this is a good thing I believe, but the price we have to pay is to be very careful with SHOULD statements… [1] https://www.w3.org/publishing/epubcheck/ (Before you ask: epubcheck is not a W3C project!) Additionally, large publishers (with legacy systems or simply time and volume constraints) have the ability to adopt this over time: again the 'valid reason' justification is the legacy nature of the requirement. (Conceptually, it's sort of like a reverse 'deprecation <https://en.wikipedia.org/wiki/Deprecation> ' :-) ) At any rate, it is but a suggestion from the new guy: I will defer to the long-time experts in the group. Sorry John, you are not new any more :-) Ivan JF On Fri, Jul 30, 2021 at 9:13 AM Matt Garrish <matt.garrish@gmail.com <mailto:matt.garrish@gmail.com> > wrote: > I asked about potentially "extracting" that list from the actual spec, but instead including it by reference, and looking to host the normative *Definitions* external to the spec, but nonetheless in a normative TR space at the W3C. That’s the way it works right now. The official structure vocab is currently at https://idpf.github.io/epub-vocabs/structure/, not in the epub 3.2 specification, so that it could be maintained as a registry not bound to revision cycles. We also tried defining definitions outside the specification while retaining usage requirements inside, specifically for fixed layouts. See the fixed layout vocabulary <https://idpf.github.io/epub-vocabs/rendition/> versus the fixed layout package definitions <http://idpf.org/epub/31/spec/epub-packages.html#sec-package-metadata-rendering> . Part of overhauling the specifications for this revision was to merge all the vocabularies back into the authoring specification to avoid the problem of having information scattered across a lot of documents. That said, I’m still not convinced the SSV belongs in the specification; it was the last one we integrated. I also don’t believe we had the option of a registry when the revision began, as the details for their creation and maintenance had not been fully worked out. It’s something to consider, if perhaps not for every vocabulary we have. > EPUB documents SHOULD include, as a minimum, the following landmarks: frontmatter, bodymatter, backmatter, any of loi, lot, lov, loa, etc when/where applicable The problem is this probably invalidates every existing EPUB, which we can’t do. It’s an old and annoying problem we face in publishing, but many vendors won’t accept files with warnings, so we’re handcuffed in what we can recommend or require when it’s clear we’d be making a backwards incompatibility. I don’t see how we’ll be able to offer more than advice on what to use given that restraint. Matt From: John Foliot <john@foliot.ca <mailto:john@foliot.ca> > Sent: July 30, 2021 9:30 AM To: Ivan Herman <ivan@w3.org <mailto:ivan@w3.org> > Cc: Matt Garrish <matt.garrish@gmail.com <mailto:matt.garrish@gmail.com> >; W3C EPUB 3 Working Group <public-epub-wg@w3.org <mailto:public-epub-wg@w3.org> >; Avneesh Singh <avneesh.sg@gmail.com <mailto:avneesh.sg@gmail.com> >; John Foliot <john@foliot.ca <mailto:john@foliot.ca> >; Dan Lazin <dlazin@google.com <mailto:dlazin@google.com> > Subject: Re: Looking for some clarification Hello All, First, thanks to all who've contributed to this, I have a much clearer understanding of at least this part of the EPUB 3.x spec. I have a whole bunch of thoughts here, based on the discussion, but the most significant (I think) is clustered around those epub:type definitions, and usage in EPUB publications. Normative versus Non-Normative Term Definitions - FWIW, think there is a distinction between normative *definitions* versus normative *usage requirements*. In other words, from what I can tell, there is no harm (that I can see) and conversely a real advantage to 'hardening' the definitions of the listed terms, without at the same time specifically mandating their use in EPUB publications. In one of my earlier posts, I asked about potentially "extracting" that list from the actual spec, but instead including it by reference, and looking to host the normative *Definitions* external to the spec, but nonetheless in a normative TR space at the W3C. Some back-story here may be helpful for some: there are an increasing number of instances at the W3C where specs are sharing or referencing vocabularies/definitions and/or similar types of content. The WCAG WG (now AG WG) stumbled across this issue when updating to WCAG 2.1, where we introduced a new SC (1.3.5 <https://www.w3.org/TR/WCAG21/#identify-input-purpose> Purpose of Input), which essentially leveraged an existing HTML5 attribute (@autocomplete) and it's normative 'taxonomy' of attribute values and definitions. However, that original taxonomy is/was hosted in the HTML5 specification, which we all know is now "Living" in WHAT WG space, and there was a concern that *if* the WHAT WG modified or changed that taxonomy, it would have a knock-on (but negative) effect on *OUR* spec. And so the solution was to duplicate that list in WCAG 2.1 (See Section 7: <https://www.w3.org/TR/WCAG21/#input-purposes> Input Purposes for User Interface Components). This solved the larger issue at the time, but also bloated our spec somewhat, and also introduced a small level of 'brittleness' - a level deemed 'acceptable', but brittle nonetheless. However, during that time, it lead us (me) to investigate the potential of a 'normative' registry where groups could host 'fragments' of common items, like that taxonomy; it turned out that not only had the idea been batted around before, but that multiple WGs at the W3C had a similar need, and so AFAIK, today there is a serious movement to actually create such a Registry <https://github.com/w3c/w3process/pull/335> . And so, returning to the topic at hand, I will ask directly whether moving those terms and definitions into such a Registry would be of any value? As I suggested, having normative definitions of all of those terms is, overall, a net win in my books, yet at the same time, the EPUB spec can note that usage is not mandated (BUT, if you choose to use a term, choose the right one that aligns to the normative definition). > We generally have an issue at W3C on how we would define testing and implementation for vocabulary terms (I explicitly cc Dan here, who is our testing champion…). The generally accepted approach is that for vocabulary items the term "implementation" is a misnomer and it should be considered to be an alias to the term "usage" for our CR process. I will also suggest that seeking to use a Registry here may be a way around Ivan's noted concern regarding "implementation" versus "usage", as I think a registry such as being proposed would not have that strict implementation need. > Personally, I see the most value in the landmarks nav if it is limited to major reference sections, like lists of tables/examples/illustrations, indexes, bibliographies, and glossaries, plus the major document partitions (front, body and back matter). Even if we can’t limit it to those, some guidance in that direction would improve understanding. Because (thankfully) the EPUB 3.3 Rec includes a direct reference to RFC 2119, perhaps the way forward would be to leverage SHOULD and MAY here. That way the section could more clearly define some expectations (SHOULDs), without specifically mandating usage anywhere. In a hand-wavey, not-too-much-thought suggestion, perhaps something like: * EPUB documents MUST include a set of (epub:type) landmarks * EPUB documents SHOULD include, as a minimum, the following landmarks: frontmatter, bodymatter, backmatter, any of loi, lot, lov, loa, etc when/where applicable * EPUB documents MAY include any of the other (normatively defined) landmarks * EPUB documents MUST NOT introduce new landmark values not included in the normative list of definitions * When associated with an internal link, EPUB landmark link text SHOULD be written to be consumed by users, in the native language of the associated document. (?? This is a first-pass attempt to ensure that links are not to "backmatter', but rather something closer to "Appendix". Perhaps also reference the following: https://www.w3.org/TR/coga-usable/#clear-language-written-or-audio-user-story, and specifically the following: "I need to understand the meaning of the text. I do not want unexplained, implied or ambiguous information...") Thoughts? > ...the question on how we would handle all the metadata vocabularies in the spec Would the use of a Registry be a way forward there Ivan? > The edupub terms might also fall into the same category, except having the terms in the SSV allows their use outside of edupub-specific publications. I wouldn’t be surprised to find them in use. ...also makes the case for extracting the terms and definitions to a common normative registry IMHO. JF On Fri, Jul 30, 2021 at 6:50 AM Ivan Herman <ivan@w3.org <mailto:ivan@w3.org> > wrote: On 30 Jul 2021, at 12:39, Matt Garrish <matt.garrish@gmail.com <mailto:matt.garrish@gmail.com> > wrote: > However: would all SSV terms pass such a requirement? Maybe not all, but probably more than you think if we’re going by use in content. I am known to be pessimistic :-) But I am happy to be proven wrong. Landmarks are a niche use of the vocabulary, for sure, but use in content (esp. in publisher workflows) is not. I can’t speak to who’s using what right now, but in the past Hachette, Pearson and others were making extensive use of the vocabulary. Great. I would think it is easy to get this information for documentations. But what exactly would be the difference between a “normative” term and an “informative” one? They’re not going to “do” anything in the vast majority of cases, anyway, so what would we be trying to signal to authors by adding labels? In practice… not much, you are right. In this case it would be nothing more than some sort of a quality stamp, proving that there is a consensus behind that specific term (or not). Do informative ones generate warnings, or what makes them different? RS-s may decide to warn for, or simply ignore, non standard terms. But that is out of our hands. If the idea is we deprecate the ones we can’t find support for, how can we be sure we’ve captured the complete picture of use given how many more authors/publishers there are than reading systems? RS feature checking is far less complicated. It’d also be problematic to call the whole vocabulary informative when it’s a normative default of the epub:type attribute, which is normatively required in places like the navigation document. It’d create a cascading failure of informativeness! It is probably not a good idea to declare the whole thing informative. But maybe we have to divide the vocabulary into standard and non-standard ones; I am sure there are ones that would surely pass the bar for normativeness. It would be good to gather this information asap, using the relationships WG members have with publishers. Ivan As to the other vocabularies, I recall we did a review of their properties back in 3.1, which is why a number of them, like meta-auth and the various link metadata record types, are now deprecated. Their use is a bit more assured. Matt From: Ivan Herman <ivan@w3.org <mailto:ivan@w3.org> > Sent: July 30, 2021 4:52 AM To: W3C EPUB 3 Working Group <public-epub-wg@w3.org <mailto:public-epub-wg@w3.org> > Cc: Avneesh Singh <avneesh.sg@gmail.com <mailto:avneesh.sg@gmail.com> >; Matt Garrish <matt.garrish@gmail.com <mailto:matt.garrish@gmail.com> >; John Foliot <john@foliot.ca <mailto:john@foliot.ca> >; Dan Lazin <dlazin@google.com <mailto:dlazin@google.com> > Subject: Re: Looking for some clarification (First of all, thanks to John for starting this thread. Never ever apologize for asking relevant questions!) Editorial issues put aside, Matt is right that current spec describes the SSV as normative. However, is this realistic in terms of the W3C process? We generally have an issue at W3C on how we would define testing and implementation for vocabulary terms (I explicitly cc Dan here, who is our testing champion…). The generally accepted approach is that for vocabulary items the term "implementation" is a a misnomer and it should be considered to be an alias to the term "usage" for our CR process. Indeed, there is no RS behavioral requirement for any of the SSV entries, so traditional requirements on implementations would not apply. Instead we may, for example, define "usage" of a specific vocabulary term, for our CR process, that there are at least two publishers out there who use these terms in production (hoping that at least some reading systems do something sensible with it). Such, or similar, ways for "testing" (per W3C process) for vocabularies was used in other specifications in the past. However: would all SSV terms pass such a requirement? All the answers on John's mail suggest that the answer would be no (and Tzviya made this point explicitly), because many terms have been introduced to the spec as part of an aspiration for something. Ie, we may be creating a problem for ourselves. Shouldn't we mark the SSV vocabulary as non-normative overall with, possibly, explicitly mark a few terms as normative because we know they are accepted by the community (e.g., landmarks)? Note that if we keep all the terms as normative and then we fail on the CR tests, the usual expectation would be to remove them altogether, which we do not want (I presume). This raises (again?) the question on how we would handle all the metadata vocabularies in the spec (Matt, please, do not kill me for raising this problem again…)? I realize that, eventually, the right place to discuss this will be a github issue. But a preliminary discussion on this thread may not harm (and our fearless chairs may want to put it on our next WG call's agenda…) Ivan On 29 Jul 2021, at 20:30, Matt Garrish <matt.garrish@gmail.com <mailto:matt.garrish@gmail.com> > wrote: > rereading this and I now understand that only section D.8.1 is non-normative … it was a nuance that I missed on first read Ah, the subsection labels again! We wrote out some of these sections earlier this revision in the package vocabularies to avoid a similar kind of issue, as I recall. We could probably do the same here and merge the paragraphs under the Structural Semantics heading to get rid of that subsection (and label), as all the other “About this vocabulary” sections are now gone. I’ll open an issue to fix this. Thanks! Matt From: John Foliot <john@foliot.ca <mailto:john@foliot.ca> > Sent: July 29, 2021 2:18 PM To: Matt Garrish <matt.garrish@gmail.com <mailto:matt.garrish@gmail.com> > Cc: John Foliot <john@foliot.ca <mailto:john@foliot.ca> >; W3C EPUB 3 Working Group <public-epub-wg@w3.org <mailto:public-epub-wg@w3.org> >; Avneesh Singh <avneesh.sg@gmail.com <mailto:avneesh.sg@gmail.com> >; Ivan Herman <ivan@w3.org <mailto:ivan@w3.org> > Subject: Re: Looking for some clarification Matt and Tzviya, Thank you, this is most helpful. Matt, to your last question, I apologize in that I saw this: D.8 Structural Semantics Vocabulary D.8.1 About this Vocabulary This section is non-normative. ...and got ahead of myself - rereading this and I now understand that only section D.8.1 is non-normative; D.8.2 through D.8.19 are not tagged as non-normative (so, ergo, normative), but it was a nuance that I missed on first read. As simply a suggestion, perhaps moving that around a bit may help (but not a hill to die on): D.8 Structural Semantics Vocabulary Note: Section D.8.1 is non-normative, all other sections are normative. D.8.1 About this Vocabulary (...just a thought: if I missed it, it's possible others might as well...) JF On Thu, Jul 29, 2021 at 12:39 PM Matt Garrish <matt.garrish@gmail.com <mailto:matt.garrish@gmail.com> > wrote: Hi John, > Looking specifically at usage in <nav epub:type="landmarks">, is there a minimum set or collection of landmark-values expected in a publication? No. We’ve looked at this issue in the past, but the landmarks are for reading system use and there’s no requirement that reading systems implement any functionality based on landmarks. I believe the only behaviour that’s been documented as having some uptake is including a link with epub:type=bodymatter so that reading systems can automatically skip the front matter when opening a publication. But that’s not universal and not something we’d want to enforce now. > I am *presuming* that the Partition value (actually, any epub:type's value) should also use a "Human Readable" label (accessible name), It would be helpful if the links were exposed to users, yes, but the landmarks links are the more central part. The original idea was that the reading system could use these links to implement its UI (e.g., have a dedicated button to open a glossary or index), so the text of the links would most likely be discarded to provide a consistent interface. Listing all kinds of general content destinations in the landmarks for users is largely redundant with the table of contents. > Document Partitions vs. Document Sections and Components, does one category have stronger or more important semantics in practical usage? Front, body and back matter are conceptual divisions of a publication that overarch the content. Front matter in most English texts, for example, is demarked by roman numerals and contains title pages, tables of contents, dedications, forwards, etc. You don’t necessarily have to pick only one semantic, in other words, as this isn’t the role attribute where only one is recognized. All of the listed semantics are applicable. This is also because the semantics (and epub:type) were designed first for publishers wanting to use the semantics in internal workflows. They then took on a life of their own and have been used (and abused) in a variety of ways for different purposes in EPUB. Placing them on links in the landmarks is a useful hack, for example, but what does it mean that a link is the front matter and/or a forward? That the semantics describe what is at the end of the link is a creation that only exists for the landmarks, I believe. Structuring the list of landmarks probably does nothing but further complex an underutilized feature of EPUB. You’re starting to turn it back into a table of contents. You could put two semantics on a single item if it makes sense (e.g., a forward is also your first piece of front matter), but otherwise I’d keep the list flat. I’m kind of surprised the restriction to a flat list of entries, as we have for the page list, isn’t also defined for the landmarks. > why is the Structural Semantics Vocabulary non-normative in the Recommendation? The appendixes are normative unless marked otherwise and I don’t see a non-normative label on the vocabulary. Where are you seeing it is informative? Matt From: John Foliot <john@foliot.ca <mailto:john@foliot.ca> > Sent: July 29, 2021 12:45 PM To: W3C EPUB 3 Working Group <public-epub-wg@w3.org <mailto:public-epub-wg@w3.org> >; Avneesh Singh <avneesh.sg@gmail.com <mailto:avneesh.sg@gmail.com> >; Ivan Herman <ivan@w3.org <mailto:ivan@w3.org> > Subject: Looking for some clarification Hi All, As I continue to digest the current EPUB 3.3 Rec, I'd like to ask some specific questions (if I may) regarding Structural Semantics Vocabulary*. Specifically, I am looking to understand the relationship/similarities/differences between Document Partitions <https://www.w3.org/TR/epub-33/#partitions> and Document Sections and Components <https://www.w3.org/TR/epub-33/#sections> , as they appear (to me) to be very similar in function. (But, for example, would/could a Section or Component be a child of a Partition? Or are they hierarchically equal? ) * Looking specifically at usage in <nav epub:type="landmarks">, is there a minimum set or collection of landmark-values expected in a publication? * if yes, what are they? * if no, should there be? (why/why not?) * Additionally, from an accessibility perspective, while the Rec is currently silent on this specific scenario, based on the following supplied code example <https://www.w3.org/TR/epub-33/#example-33> I am *presuming* that the Partition value (actually, any epub:type's value) should also use a "Human Readable" label (accessible name), as seen here with the epub:type of bodymatter, where the label is Start of Content: <nav epub:type="landmarks"> <h2>Guide</h2> <ol> <li><a epub:type="toc" href="#toc">Table of Contents</a></li> <li><a epub:type="loi" href="content.html#loi">List of Illustrations</a></li> <li><a epub:type="bodymatter" href="content.html#bodymatter">Start of Content</a></li> </ol> </nav> I ask this, because currently I am seeing (in sample books I am reviewing) that in many instances the epub:type value is being echoed as the human-readable label as well (i.e. <a epub:type="Frontmatter" href="...">Frontmatter</a>), which my gut is cringing at, as being less than useful for some users with different forms of cognitive disability. (It's a bit of a stretch to be sure, but WCAG SC 3.1.3 Unusual Words <https://www.w3.org/TR/WCAG21/#unusual-words> (Level AAA) states: A mechanism is available for identifying specific definitions of words or phrases used in an unusual or restricted way, including idioms and jargon. - and to *my* mind at least, Frontmatter is fairly "jargony <https://www.w3.org/TR/WCAG21/#dfn-jargon> " on the surface - it's clearly not a common term in regular public usage AFAIK. Ditto "backmatter".) (@Avneesh?) * Returning to Document Partitions vs. Document Sections and Components, does one category have stronger or more important semantics in practical usage? i.e. both Frontmatter and Forward feel very similar (synonymous?) to each other conceptually - If I had to choose just one, which would/should I choose? (and why?) Or, as I asked previously, could I/would I seek to do something like this? <nav epub:type="landmarks"> <h2>Guide</h2> <ol> <li><a epub:type="Frontmatter" href="content.html#frontmatter">Frontmatter</a> (* ya, yech) <ol> <li><a epub:type="Forward" href="content.html#forward">Forward</a></li> <li><a epub:type="Preface" href="content.html#preface">Preface</a></li> </ol> </li> <li><a epub:type="loi" href="content.html#loi">List of Illustrations</a></li> <li><a epub:type="bodymatter" href="content.html#bodymatter">Start of Content</a></li> </ol> </nav> (Or am I overthinking this?) Thanks in advance for any insights you can provide me. JF (* At the risk of asking too many questions, why is the Structural Semantics Vocabulary non-normative in the Recommendation? It appears to be furnishing specific definitions to multiple value terms. As a standards wonk, it strikes me that those definitions would probably want to be normative - or, again, am I missing something here? @Ivan - has there been any discussion of moving those definitions into the proposed W3C Registry <https://github.com/w3c/w3process/pull/335> ?) -- John Foliot | Senior Industry Specialist, Digital Accessibility "I made this so long because I did not have time to make it shorter." - Pascal "links go places, buttons do things" -- John Foliot | Senior Industry Specialist, Digital Accessibility "I made this so long because I did not have time to make it shorter." - Pascal "links go places, buttons do things" ---- Ivan Herman, W3C Home: http://www.w3.org/People/Ivan/ mobile: +33 6 52 46 00 43 ORCID ID: https://orcid.org/0000-0003-0782-2704 ---- Ivan Herman, W3C Home: http://www.w3.org/People/Ivan/ mobile: +33 6 52 46 00 43 ORCID ID: https://orcid.org/0000-0003-0782-2704 -- John Foliot | Senior Industry Specialist, Digital Accessibility "I made this so long because I did not have time to make it shorter." - Pascal "links go places, buttons do things" -- John Foliot | Senior Industry Specialist, Digital Accessibility "I made this so long because I did not have time to make it shorter." - Pascal "links go places, buttons do things" ---- Ivan Herman, W3C Home: http://www.w3.org/People/Ivan/ mobile: +33 6 52 46 00 43 ORCID ID: https://orcid.org/0000-0003-0782-2704
Received on Friday, 30 July 2021 15:02:22 UTC