Re: definition of Web Publication

Hi all,

> How does this differ from a website? The manifest. What is a manifest?

This question points to the core of my annoyance with the process we’re going through. This discussion on how to define web publication alternates between simply describing websites as they currently function and describing websites as they cannot realistically function (e.g. immutable web pages).

For example, the boundedness attribute is pretty meaningless. A website is bounded; it has only a limited set of HTML files within its origin. A web app is bounded by its scope; pages within that scope belong to it. And as Leonard pointed out packaging explicitly bounds whatever it packages. ‘Bounded’ is meaningless. It needs to be replaced with an explicit description of what people mean here with bounded. Without actual detail on the technical consequences of ‘bounded’ the definition is not only unhelpful but actively counter-productive because everybody will interpret it differently: web devs will assume it’s just describing something like a web app, digital publishing people will assume it implies packaging, humanities types will infer that we’re talking about a level of fixity in web publications as a form or medium.

I could go through the proposed definitions attribute by attribute and do the same: all of the attributes or characteristics either describe the web as it already exists or could be interpreted as dictating multiple completely incompatible technical requirements and will _cause_ conflict later in the standardisation process. I may agree with these attributes in principle but that’s only because I consider them to be describing websites as they exist today. Others will agree to them as they see them as an explicit description a web-based ePub. Those two points of view are not compatible when it comes to hashing out actual technical details later on.

This is a very bad foundation for collaboration.

Even the algorithm—we already have an algorithm where a bounded web publication, composed of multiple resources, with a default reading order is discovered from a single identifier: a single web page. The definitions proposed so far apply equally to a regular web page or a packaged publication. The only thing that brings clarity to the algorithm (or, indeed, any part of the definition) is the idea of a manifest being the source of publication-ness.

Or the publication date issue, which is going to be ignored in practice because a key characteristic of the web is that unlike print—which tends to work with abstractions of print (e.g. PDFs with crop marks, doc files)—websites in progress _are still websites_. If we keep this in the definition then we end up with the—frankly weird—quirk where a draft, in progress web publication magically doesn’t exist as a web publication until it gets a publication date.

I can guess what people here mean with these suggestions because of how long I’ve worked in the ePub scene, but most of this will  be interpreted by the web community (who don’t really have many representatives here on this list) as either a description of the status quo (with maybe some pending specs) or an outline of unattainable academic ideals.

If this is supposed to guide future technical work, then we need much more specificity and explicit detail in the definition, with references to pre-existing concepts in HTTP and HTML. We don’t have to reference specific implementations but we definitely need to be referencing specific concepts. Much in the same way that Tzviya brought in references to Functional Requirements For Bibliographic Records to help bring clarity to the idea of manifestation we need to be working with references to the HTTP and HTML architectural concepts that we will be building on. Otherwise the definition will be interpreted in hundreds of different ways and cause substantial conflict and disagreement throughout the specifying process.

And I’m of the opinion that the damage has already begun. The discussion in the github issues and how they’ve turned into a slog clearly demonstrate that there are substantial divides between various groups as to what a web publication is supposed to be. A vague, high level, and abstract definition will just exacerbate those divides.

AFAICT, the only concrete detail proposed that is relatively unambiguous and has technical implications is the idea that a web publication should be defined by an self-identified manifest that can be external to the HTML files that compose the publication and whose URL identifies the publication as a whole. But I’m not sure we have consensus even on that detail.

- best
- Baldur Bjarnason
  baldur@rebus.foundation



> On 26 Jul 2017, at 15:16, Siegman, Tzviya - Hoboken <tsiegman@wiley.com> wrote:
> 
> Hi all,
>  
> We seem to be coming to something like consensus around working definitions of Web Publication. The idea of this exercise is not to agree on every single word but to make sure that we agree on basic concepts so that we are all having the same conversation. Please only comment if you disagree in principle not specifics of terminology.
>  
> Here is a summary:
>  
>  • A Web Publication is an explicitly authored/created collection of one or more constituent resources, bound together through a manifest into a single logical work with a defined though not necessarily required reading order. The Web Publication is uniquely identifiable, presentable using Open Web Platform technologies, and available online or off. (from Greg/Matt)
>  
> (from Luc)
>  • A Web Publication is defined by a “boundary”
>  • The boundary is defined by a curator/author
>  • There is a creator
>  • It is more than organized. In the FRBR [1] sense, it brings the idea of a manifestation of a work.
>  • The creator's intent makes him or her create and/or choose content that represent for him/her an intellectual idea, the work.
>  • A Web Publication  is a possibility to manifest in digital form this work 
> Other notes:
>  • A Web Publication has an Identifier or “uniquely identifiable grouping”
>  • A WP, even not in the context of book, has a publication date. It doesn’t exist before that date, and afterwards, it can be refered to as « published at » that date, along with other metadata like « by » for the creator/author.
>  • Requirement: possible to create bibliographic references
>  • Self-declaration of Web Publication-ness
>  • Offline-ability
> How does this differ from a website? The manifest. What is a manifest?
>  • A manifest is structured information about a Web Publication, such as informative metadata and the default reading order of its primary constituents.  (Laurent)
>  • There will be an algorithm that, starting with a given URI (that of a manifest, for example) can locate all of the component resources reliably, which is not always true of a Web site. There is a predefined, default reading order for the resources – again this is not always true of Web sites that span multiple Web pages. It seems to be more about providing guarantees rather than about having unique features which only Web publications possess. (Jason)
> To be determined:
>  • More specific definitions of boundaries
>  • Updating, revisions, versioning
> [1] http://archive.ifla.org/archive/VII/s13/frbr/frbr_current_toc.htm
>  
> Tzviya Siegman
> Information Standards Lead
> Wiley
> 201-748-6884
> tsiegman@wiley.com
>  
> From: George Kerscher [mailto:kerscher@montana.com] 
> Sent: Wednesday, July 26, 2017 10:56 AM
> To: 'Matt Garrish' <matt.garrish@gmail.com>; 'AUDRAIN LUC' <LAUDRAIN@hachette-livre.fr>; 'Avneesh Singh' <avneesh.sg@gmail.com>; 'Garth Conboy' <garth@google.com>; 'Laurent Le Meur' <laurent.lemeur@edrlab.org>
> Cc: 'Leonard Rosenthol' <lrosenth@adobe.com>; 'Greg Albers' <GAlbers@getty.edu>; public-publ-wg@w3.org
> Subject: RE: definition of Web Publication
>  
> I too like what is being expressed. One point:
>  
> I want to be able to create a bibliographic reference. I should be able to do this with confidence that the item will not disappear in the next update. Perhaps identifying the content that is part of the intrinsic publication is worth exploring.
>  
> Yes, yes, I know I am getting into details that we don’t want to get into right now, but this is a +1 for documents that can change, but with some restrictions.
>  
> Best
> George
>  
> From: Matt Garrish [mailto:matt.garrish@gmail.com] 
> Sent: Wednesday, July 26, 2017 6:03 AM
> To: 'AUDRAIN LUC' <LAUDRAIN@hachette-livre.fr>; 'Avneesh Singh' <avneesh.sg@gmail.com>; 'Garth Conboy' <garth@google.com>; 'Laurent Le Meur' <laurent.lemeur@edrlab.org>
> Cc: 'Leonard Rosenthol' <lrosenth@adobe.com>; 'Greg Albers' <GAlbers@getty.edu>;public-publ-wg@w3.org
> Subject: RE: definition of Web Publication
>  
> A general +1 to everything you've said, Luc. I also prefer Greg's original wording. I only wonder if it would make sense to be even more explicit that we're creating a work out of the resources, and that's what makes a publication unique. For example:
>  
> A Web Publication is an explicitly authored/created collection of one or more constituent resources, bound together through a manifest into a single logical work with a defined though not necessarily required reading order. The Web Publication is uniquely identifiable, presentable using Open Web Platform technologies, and available online or off.
>  
> (As a side note, I hate acronyms in specifications and would prefer we avoid WP as a shorthand, even if we use it for simplicity in discussions.)
>  
> Matt
>  
> From: AUDRAIN LUC [mailto:LAUDRAIN@hachette-livre.fr] 
> Sent: July 26, 2017 4:32 AM
> To: Avneesh Singh <avneesh.sg@gmail.com>; Matt Garrish <matt.garrish@gmail.com>; 'Garth Conboy' <garth@google.com>; 'Laurent Le Meur' <laurent.lemeur@edrlab.org>
> Cc: 'Leonard Rosenthol' <lrosenth@adobe.com>; 'Greg Albers' <GAlbers@getty.edu>;public-publ-wg@w3.org
> Subject: Re: definition of Web Publication
>  
> Hi,
>  
> Boundedness/boudaries and creator intent: Work
> This is where the library FRBR model brought us in IG to speak about « “manifested” (in the FRBR [frbr] sense) ».
> There is a boundary around what has been chosen, curated, included in the WP by the creator/editor. 
>  • I use creator and not author, so that we don’t think it is only for books… IMO, it is also relevant for any document
>  • I think it is more than « organized ». In the « FRBR sense », it brings the idea of a manifestation of a work.
>  • The creator's intent makes him create and/or choose content that represent for him/her an intellectual idea, the work.
>  • A WP is a possibility to manifest in digital form this work 
>  
> This confort the idea that a WP differs from a website by its manifest (that should reflect somehow the manifestation boundaries)
> => I support Greg’s wording « A Web Publication (WP) is a[n explicitly authored/created] collection of one or more constituent resources, bound together »
>  
> Controlled updating:
> We shouldn’t limit these boundaries to « static content ».
> I like here the idea brought by Jason « an algorithm »: WP content should be updatable under the control of a creator algorithm.
> This kind of updating includes the dynamic view of the web in the boundaries of the WP.
>  
> Out of bounds: a generic link to a Web page that may disappear in time is IMO 
> Within bounds: an internal process included by the creator in a WP making a call to a controlled set of data from a reliable source
>  
> => the WP Definition should somehow reflects this essential processable nature of WP, perhaps in adding that algorithm are among the primary resources?
>  
> Best,
> Luc
>  
> De : Avneesh Singh <avneesh.sg@gmail.com>
> Date : mercredi 26 juillet 2017 à 05:57
> À : Matt Garrish <matt.garrish@gmail.com>, 'Garth Conboy' <garth@google.com>, 'Laurent Le Meur' <laurent.lemeur@edrlab.org>
> Cc : 'Leonard Rosenthol' <lrosenth@adobe.com>, 'Greg Albers' <GAlbers@getty.edu>, "public-publ-wg@w3.org" <public-publ-wg@w3.org>
> Objet : Re: definition of Web Publication
> Renvoyer - De : <public-publ-wg@w3.org>
> Renvoyer - Date : mercredi 26 juillet 2017 à 05:57
>  
> We have developed a lot of usecases on basis of current stage of publishing industry, which is good.
> At the same time, the publishing industry is likely to evolve with time, and soon we may see the publications that are updated on weekly or even daily basis.
> I see the following differences between publications and webpages.
> 1. Publisher defined Boundaries and reading order for at least primary resources.
> 2. Well defined information about major and minor updates.
> 3. well defined metadata (point 2 is also related to it).
> 4. Online as well as offline access.
>  
> With regards
> Avneesh
> From: Matt Garrish
> Sent: Wednesday, July 26, 2017 05:09
> To: 'Garth Conboy' ; 'Laurent Le Meur' 
> Cc: 'Leonard Rosenthol' ; 'Greg Albers' ; public-publ-wg@w3.org
> Subject: RE: definition of Web Publication
>  
> The phrase "intentional curation" sounds more like what web publications enable than a characteristic of the content, although I appreciate what is being sought with it.
>  
> And leaving out boundedness from the definition while it was heavily emphasized in the vision document doesn't make a lot of sense to me. What makes publications unique from web pages is the idea that they represent a bounded work, even if the bound is a single document. If that's not true, then can we really call these "web publications" or are they just "identifiable document sets on the web"?
>  
> Matt
>  
> From: Garth Conboy [mailto:garth@google.com] 
> Sent: July 25, 2017 5:12 PM
> To: Laurent Le Meur <laurent.lemeur@edrlab.org>
> Cc: Leonard Rosenthol <lrosenth@adobe.com>; Greg Albers <GAlbers@getty.edu>;public-publ-wg@w3.org
> Subject: Re: definition of Web Publication
>  
> And to a certain extent these "bounds" could also be the part of the publication that is published on the publication date, and can be expected not to change without a new publication.  This lack of change after publication is key to me (or at least some way to get back to the "originally published content") -- signatures may play a role here.
>  
> Best,
>    Garth
>  
> On Tue, Jul 25, 2017 at 1:34 PM, Laurent Le Meur <laurent.lemeur@edrlab.org> wrote:
> The bounds of a WP are IMO the resources that will be packaged when a PWP is created. Take the exemple of an html page (a primary resource of a WP) containing a video hosted on YouTube. The video content will stay out of the boundaries of the PWP. We can package some constituents of a WP, not all of them.
>  
> Laurent
>  
> Le 25 juil. 2017 à 22:20, Leonard Rosenthol <lrosenth@adobe.com> a écrit :
>  
> I don’t understand how a user would ever know (or care) about the “bounds” of a WP.  Can you give an example?
>  
> At its simplest, a PWP is a WP that has been packaged up into a single physical container of content (ala EPUB).  Beyond that, we still have lots of work to do to understand how (if at all) it would differ from a WP.
>  
> On the “states” issue, we spent a *lot* of time in the IG trying to use that states model and when we presented it to the rest of the W3C it was too confusing for many as it’s a very complex grid.   It’s also not clear whether we actually need all the various differences in that grid given many things going on with the OWP itself…
>  
> Leonard
>  
> From: Greg Albers <GAlbers@getty.edu>
> Date: Tuesday, July 25, 2017 at 3:30 PM
> To: Leonard Rosenthol <lrosenth@adobe.com>
> Cc: Laurent Le Meur <laurent.lemeur@edrlab.org>, "public-publ-wg@w3.org" <public-publ-wg@w3.org>
> Subject: Re: definition of Web Publication
>  
> Thanks all. Glad to be here and I think, now that I gave the w3c permission to archive my posts, they'll show up here normally.
>  
> Leonard, good thoughts, thanks! On this though:
>  
>  • “bound” vs. organized:  The word bound, to me, feels more like packaging – and so I think we should avoid it for now.  But it’s a good word for when we get to PWP
>  
> I would argue that a Web Publication, whether packaged or not, must have a sense of boundedness. That those boundaries and a users implicit or explicit understanding of them are a key to exactly what distinguishes a web publication from a website. Particularly from a user's (reader's) perspective, whereas yes, I think from a user agent's perspective, it is the manifest. That makes a lot of sense to me.
>  
> A related question I had for you all was around the distinction between a WP and a PWP. To me packaging is a state of a WP not a separate entity from it. And even in our charter it states the PWP as something that we might define and spec out but that we might not depending on activities elsewhere in the w3c. Shouldn't then our definition of a WP encompass its states more holistically. Online v offline, packaged v not packaged, with everything v only with essential resources, etc...?
>  
> Thanks,
> Greg
>  
>  
> 
> Sent from my iPhone
> 
> On Jul 25, 2017, at 10:54 AM, Leonard Rosenthol <lrosenth@adobe.com> wrote:
> 
> Greg had an excellent point about curation, so let me try to add that in using a term that we’ve been trying out here (so feedback on that welcome too)
>  
> A Web Publication (WP) is an intentionally curated collection of one or more Web resources organized together through a manifest and presented to users using Open Web Platform technologies.
>  
> There were some other things in the suggestion that I didn’t take and I’d like to explain
>  • “bound” vs. organized:  The word bound, to me, feels more like packaging – and so I think we should avoid it for now.  But it’s a good word for when we get to PWP
>  • “uniquely identifiable grouping”: As we have discussed, identification of a WP is a separate issue so that doesn’t belong in the definition
>  • “reading order”: Having this in the manifest definition, I saw no need to duplicate it in the WP definition.
>  
> Leonard
>  
> From: Leonard Rosenthol <lrosenth@adobe.com>
> Date: Tuesday, July 25, 2017 at 1:34 PM
> To: Laurent Le Meur <laurent.lemeur@edrlab.org>, "public-publ-wg@w3.org" <public-publ-wg@w3.org>
> Subject: Re: definition of Web Publication
> Resent-From: <public-publ-wg@w3.org>
> Resent-Date: Tuesday, July 25, 2017 at 1:34 PM
>  
> Laurent - good rewrites, but let me play with it a bit…
>  
> Do we really need the middle sentence? It doesn’t say anything useful (IMO).   The first and third, however are good.   We can then put it all together as:
>  
> A Web Publication (WP) is a collection of one or more Web resources organized together through a manifest and presented to users using Open Web Platform technologies.
>  
> Now to apply some simplification to the Manifest definition:
>  
> A manifest is structured information about a Web Publication, such as informative metadata and the default reading order of its primary constituents.
>  
> I’m not thrilled with that since it’s still not clear to me if we want all that stuff (metadata + resources + reading order + ….) in a single “manifest” *or* we will end up with multiple ones (but even then, it may still conceptually be a manifest).
>  
> Thoughts?
>  
> Leonard
>  
> From: Laurent Le Meur <laurent.lemeur@edrlab.org>
> Date: Tuesday, July 25, 2017 at 11:38 AM
> To: "public-publ-wg@w3.org" <public-publ-wg@w3.org>
> Cc: W3C Publishing Working Group <public-publ-wg@w3.org>
> Subject: Re: definition of Web Publication
> Resent-From: <public-publ-wg@w3.org>
> Resent-Date: Tuesday, July 25, 2017 at 11:38 AM
>  
> The current definition is facing a large set of comments. From these comments, I tried a variant of Matt's proposal:
>  
> A Web Publication (WP) is a collection of one or more Web resources organized together through a manifest. The content of a Web Publication can take a wide variety of forms, from formal artistic and intellectual works to ad hoc documents and memos. Web Publications are presented to end-users using Open Web Platform technologies.
>  
> A manifest is the structured information necessary for the proper identification and description of a Web Publication, plus the default reading order of its primary constituents.
>  
> Laurent

Received on Thursday, 27 July 2017 14:04:45 UTC