RE: Some significant items for discussion on "What is a Web Publication?"

+1 to all of Ivan's comments.

Bill Kasdorf

VP and Principal Consultant | Apex CoVantage

p:

734-904-6252  m:   734-904-6252

ISNI: http://isni.org/isni/0000000116490786

ORCiD: https://orcid.org/0000-0001-7002-4786<https://orcid.org/0000-0001-7002-4786?lang=en>


From: Ivan Herman [mailto:ivan@w3.org]
Sent: Monday, January 23, 2017 7:30 AM
To: Leonard Rosenthol
Cc: W3C Digital Publishing IG
Subject: Re: Some significant items for discussion on "What is a Web Publication?"
Importance: High

Leonard & al,

(I have the advantage to be on the other side of the pond, ie, that you have already discussed some of the issues while I was asleep:-)

On 22 Jan 2017, at 17:16, Leonard Rosenthol <lrosenth@adobe.com<mailto:lrosenth@adobe.com>> wrote:

While working on the PWP document today, I can into a few things that I’d like to raise for discussion (either via email or phone tomorrow, or both).

Let’s start right up front with the definition of a Web Publication ☺.   It currently reads “A Web Publication (WP) is a bounded collection of resources, envisioned and created as a whole”.  I would like to review the second half of that sentence – about the envisioned and created as a whole.  In the world of documents, the most popular feature of processing applications is the ability to combine parts of other documents together to create a new one.  In that use case, the resources weren’t “envisioned and created as a whole”.  You could say that the author/publisher envisioned that collection and intentionally collated those resources together – but that’s different from what is here.  I would also put forth that the application of annotations to a WP can create a new WP that also was not “envisioned and created as a whole”.


I tend to agree with David, that the definition elsewhere in the document ("A Web Publication (WP) is a collection of one or more constituent resources, organized together in a uniquely identifiable grouping, and presented using standard Open Web Platform technologies.") is more precise. I would probably retain that as a "formal" definition of a WP. (As an aside, we should indeed avoid double definitions.)

That being said: if I do not consider the statement above as being the formal definition of a WP, but rather as a higher level description of what we are after. Leonard, you seem to read the half sentence ("envisioned and created as a whole") to refer to the resources. I actually read the same half sentence as referring to the collection, and not to the individual resources.  If one reads that sentence like that, then I do not see any problem with all the rest of the issues you describe: the additional act of creating the WP is to collect all the resources into one coherent resource which is then considered as a whole. Whether the individual resources were thought to be part of that collection from the start or not is then besides the point.

What about something like:

"A Web Publication (WP) is a bounded collection of resources, where the collection is envisioned and created as a whole"

May be sounds a bit more convoluted, but more precise.



There is a requirement that “The package must include the unique identifier of the manifestation—a Web Publication’s origin is essential information if a PWP becomes portable”.  Two paragraphs later it goes into further detail about the origin inclusion and even mentions trust. Unfortunately, that requirement seems to imply some potential implementation considerations that the WebPackaging work is proving to not be feasible – see https://github.com/dimich-g/webpackage/issues/7.  I would like to remove the second half of that sentence (about the origin) and also remove the bit about trust from the latest paragraph.  Let’s just leave it open that we want a unique identifier, but that’s it, and that the origin is not necessarily related to the identifier.


I am not sure what you mean by "Let's just leave it open that we want a unique identifier". To make it clear, I believe we do have to say that a WP must have a unique identifier. But I am not sure that the word 'origin' is to be taken literally here, ie, that the origin must be an HTTP address. It can be a generic URN of some sort, a DOI in http form, whatever. It is an 'origin' in the abstract sense. I do not see how that would imply any implementation issue apart from the fact that it must be available.

Maybe we should remove or change the word 'origin'? Simply say "a Web Publication's identity is essential information…"



Here’s the one where George, Charles and others are going to be scream – but I believe it is an extremely important point – you can’t mandate accessibility in a WP (ie. “A Web Publication must be accessible to the broadest possible range of readers”). We should make it a strong recommendation (a “should” vs. a “shall” in ISO terminology) and do all we can to promote this direction.  However, given our goals to support not only curated publications but also ad-hoc publications, it is not reasonable to expect them all to be accessible.  Just as not every page on the web is accessible, web publications need not be either.


I am torn on that one, to be honest. Just as, I believe, we do not say anywhere at W3C that a Web Page MUST be accessible, I wonder whether can do anything more for a WP in general. After all, the goal of this (and subsequent) work is to make WP-s first class entities on the Web, minimizing the step it takes to go from a 'traditional' web site to a WP.

We may get into a different discussion if we were to impose a MUST on EPUB4, for example, but I tend to agree with Leonard on this one for general WPs.



Another area that we cannot mandate – but should make a strong recommendation – is that “A Web Publication must be available and functional while the user is offline”. An author may produce a publication that is only designed to be used online – for example, one that connects to an online system. We don’t wish to prevent the development of such a publication.


I think offline/online is one of the essential features that differentiate WPs from average Web pages. I believe the term 'functional' is sufficiently (and intentionally) vague: it does not imply that it should have exactly the same features, it only says that it should not be impossible to consume the WP's content when it is offline. (The, by now evergreen, example of different fonts come to my mind.) If it connects to an online system: it depends what it means. Obviously, a gmail application cannot really function offline (although one could imagine a complicated caching system), ie, it would not be a WP. I am fine with that. On the other hand, a mathematical publication reaching out, say, to a Maple server to run examples if necessary, but where the core of the publication is simply the mathematical theory is fine; the necessary examples should just make it clear to the user that he/she should be online to use that.

Ie, we may want to make it clearer what 'functional' means, but I would stick with the rest of the sentence.


Finally, I think we say too much about the use of the manifest.  It says “We also introduce the abstract concept of a manifest, which serves to carry information about the constituent resources of the publication, their sequence, and presentation”.  I think we should only say that it carries the resources and not mention sequence and presentation. This is consistent with our statement, earlier in the same section, about how we aren’t going to define “manifest” (and leave it in the generic FRBR sense).


I am not sure I agree. I think it is an editorial issue; the term manifest in "which must be “manifested” (in the FRBR [frbr<http://w3c.github.io/dpub-pwp/#bib-frbr>] sense) by having files on a Web server" seems to be used in a very different way than "concept of a manifest, which serves to carry information about the constituent resources of the publication, their sequence, and presentation". I guess the first occurrence of the term clearly refers to FRBR only, and that is different. Maybe we should use a different term in the second usage although, unfortunately, the terminology is there.

Terminology put aside, I do not understand what the problem is with the description of a manifest (in the more technological sense).

Talk to you later!

Ivan






Leonard


----
Ivan Herman, W3C
Digital Publishing Technical Lead
Home: http://www.w3.org/People/Ivan/

mobile: +31-641044153
ORCID ID: http://orcid.org/0000-0003-0782-2704

Received on Monday, 23 January 2017 17:00:48 UTC