- From: Nick Ruffilo <nickruffilo@gmail.com>
- Date: Mon, 1 Jun 2015 11:04:13 -0400
- To: Brady Duga <duga@google.com>
- Cc: Ivan Herman <ivan@w3.org>, W3C Digital Publishing IG <public-digipub-ig@w3.org>, Ralph Swick <swick@w3.org>
- Message-ID: <CA+Dds58a+mCWRAK=G=ABf9A33mu0JkaEgxPKgBKTb0ka3uC3xA@mail.gmail.com>
Brady, No matter what there is still a package file. The real question is - is that package file "zip" as it is today or some other logical grouping. I'm conceptually unable to figure out a situation in which there is not a "package file" Even if it's a directory - then it's just nomenclature... A directory is still a package. On Mon, Jun 1, 2015 at 10:58 AM, Brady Duga <duga@google.com> wrote: > If there is no package file, do these problems still exist? It seems like > that is mentioned in the original email, but I am not sure if any of these > cons apply to it. > > On Mon, Jun 1, 2015 at 7:12 AM Nick Ruffilo <nickruffilo@gmail.com> wrote: > >> What if we leave it up to the client/server to determine what the root of >> the package is and handle it approrpiately? >> >> So, an epub-web object (or whatever we call it) might live at : >> //my/item/awesome.epub >> >> To address a specific FILE in that, you go to >> //my/item/awesome.epub/text/chap2.html >> >> To get to a fragment, you just use # in reference to whatever the >> fragment is: >> //my/item/awesome.epub/text/chap2.html#first_header >> //my/item/awesome.epub#SomeCrazyTextRangeIdentifier >> >> If run on a server, it would be the server's job to extract the >> appropriate package files (when thinking about epub, the OPF for example) >> and provide that to the client, who can then determine the resources it >> needs and request them from the server. >> >> When run LOCALLY, the client will simply extract the package files >> directly. Otherwise there is no duplication of work or resources, etc. >> >> There was a note about the fragment (things after the #) not being sent >> to the server. If that is truly the case - and not just that the server >> ignores it - a DIFFERENT marker - what - i have no idea... >> >> -Nick >> >> On Mon, Jun 1, 2015 at 6:22 AM, Ivan Herman <ivan@w3.org> wrote: >> >>> Hi all, >>> >>> my sincere apologies for the length of this mail, but I thougt it would >>> be worthwhile to get some issues written down to clarify our discussions... >>> >>> On the F2F meeting I made the claim that the identifier/fragment issue >>> may be the most tricky one facing us around EPUB-WEB. I thought it is worth >>> writing this down; maybe somebody can also prove me wrong that this is not >>> such a complex issue after all. Actually, what is below is a summary of a >>> very short email/personal discussion Markus, Tzviya, and I had on the >>> matter after the F2F. (At some point it is probably worth writing down the >>> conclusions of this thread somewhere on the wiki.) >>> >>> With that, here is where I see a real problem. >>> >>> Let us consider a Packaged Document. The URL of this document is >>> http://www.example.org/doc. The document includes, among others, >>> chapter 2 in file chap2.html. This has a section whose ID is 'sec' (for the >>> sake of simplicity, I consider here the simplest and best known fragment >>> used in an HTML file, ie, using the @id attribute on a, say, <h1> element). >>> The question arising is: what is the full URI for that section? Or, to be >>> more exact, what is the full, *canonical* URI for that section, ie, a URI >>> that is independent on whether the document is off-line or on-line? >>> >>> An Aside: How do URI-s work? >>> ---------------------------- >>> >>> Tzviya told me privately that not everyone on the group may know how >>> exactly URI-s and fragments work in browsers and on the Web. So maybe just >>> a few words may be relevant here. If you know this, my apologies, you can >>> just skip this part. >>> >>> A URL consists of, roughly, two parts: >>> >>> - A "primary" address that identifies the resource somewhere on the web. >>> Say, 'http://xyx.example.com/mydoc' >>> - A "fragment", that is added after the '#' sign, which identifies >>> something *within* the resource; say, 'mysection' >>> >>> There are two steps in handling this to take into account: >>> >>> - There can be *only one fragment id in a URL*, ie, only one occurence >>> of '#'. What is after the '#' is interpreted in accordance with a >>> corresponding specification that is bound to the media type of the resource >>> >>> - A Web browser interprets the fragment locally. Ie, if it gets ' >>> http://xyx.example.com/mydoc#mysection' it >>> 1. strips the fragment >>> 2. it issues a request, through the HTTP protocol, for '/mydoc' >>> to the 'http://xyx.example.com' server >>> 3. it gets the full resource and then uses the fragment (i.e., >>> 'mysection') to identify something within the returned resource. >>> >>> >>> What is the URI with fragment for section 'sec' in a package? >>> ------------------------------------------------------------- >>> >>> (For the sake of this discussion I refer to the way the packaging >>> specification works in terms of fragments.) >>> >>> 1. If http://www.example.org/doc refers to a real, physical package on >>> the Web, accessing 'sec' chap2.html, using the current fragment >>> specification in the packaging document, would be: >>> >>> http://www.example.org/doc#url=/chap2.html;fragment=sec >>> >>> meaning: >>> 1. The client retrieves the package http://www.example.org/doc >>> 2. Unpackages the package in a local cache (or equivalent) >>> 3. It interprets the fragment 'url=/chap2.html;fragment=sec' by >>> (per the current specification of packaging) by >>> 3.1. identifying the 'part' within the package, yielding >>> 'chap2.html' >>> 3.2. 'chap2.html' is an HTML file; because the server >>> knows how to identify something within the file with a fragment, ie, it >>> gets to section 'sec' >>> >>> It is important to realize that, in this model, the 'unpackaging' is >>> done by the client (the browser i.e., the reading system) >>> >>> 2. If the package is just 'virtual', ie, all documents are on the Web, >>> then there is of course a much simpler approach. The URL of the section is >>> >>> http://www.example.org/doc/chap2.html#sec >>> >>> meaning >>> 1. The client retrieves the HTML document >>> http://www.example.org/doc/chap2.html >>> 2. It knows how to identify something within the HTML file with >>> a fragment, ie, it gets to section 'sec' >>> >>> >>> Back to the original question: what is the 'canonical' URI with fragment? >>> ------------------------------------------------------------------------- >>> >>> It should be one of the two above. However, both have issues: >>> >>> A. http://www.example.org/doc/chap2.html#sec >>> >>> Pro: this is the 'natural', Web way. >>> >>> Con 1: *if* the document is, in fact, a real package then there are two >>> possible approaches to handle this: >>> >>> Con 1.1: The *server* handles the unpackaging. Ie, it should be in >>> position to analyze the URL it receives, realize that there is a 'package' >>> in between and do an unpackaging. What this would mean is that the client >>> would have to make requests for all chapters separately, which is not >>> optimal (although it can of course be cached)/ >>> >>> Con 1.2: The *client* handles unpackaging. This would require a >>> different server-client protocol, namely: >>> 1. The client issues a request to ' >>> http://www.example.org/doc/chap2.html' >>> 2. The server returns 'http://www.example.org/doc/' as a >>> package instead of the original chap2.html file (ie, the server should know >>> that this is part of a package through some redirection) >>> 3. The client should then unpack and locate the chap2.html file >>> in the package >>> 4. the fragment should be identified and handled. >>> >>> Steps 1-2-3 is not the current practice on the Web in terms of Web >>> Architecture: a client does not 'decompose' the 'primary' part of a URL >>> (beyond separating the server's identification from the part within that >>> server). It is unclear whether changing that is a viable/acceptable for the >>> browsers, and for the overal Web Architecture; it certainly requires a >>> discussion with the TAG. >>> >>> Con 2: If the URL is, in fact, a file:///... type one, this means that, >>> for that case, the unpackaging must be done on the client. Ie, there may be >>> duplication of functionality with the server and the client, which is not >>> optimal. >>> >>> B. http://www.example.org/doc#url=/chap2.html;fragment=sec >>> >>> Pro: this works for a package. >>> >>> For a document on the Web, it may also work if there is a 'conceptual' >>> entity on the Web for the document. I.e., http://www.example.org/doc >>> returns some sort of an information to the client that this is, fact, a >>> 'virtual' package, and then the server can issue a new request to >>> http://www.example.org/doc/chap2.html and take it from there. >>> >>> (Note that, regardless of the original issue, having a 'conceptual' >>> package handle for a document may not be a bad thing!) >>> >>> Con: The URL form is (much) more complex, and may be in danger of being >>> ignored for documents that are on the Web only. >>> >>> Personally, I do not have a clear solution in my head. Hence this mail, >>> trying to see how we can move on... >>> >>> Let me also add another remark, coming originally from Tzviya, just to >>> add it to the mix: "We need to think about situations such as multiple >>> authors creating one package or peer review (one or many authors + one or >>> many editors submit article + data set to journal for review. It undergoes >>> peer review by one or many reviewers. Journal rejects the article. >>> Something happens to the reviews, and the package is submitted to a second >>> journal) and so on.) In scenarios like this, the concept of versioning and >>> revisioning are a lot more important. It may be covered by OA. I don’t know >>> that we can resolve versioning with an identifier." >>> >>> (Again, apologies to be so verbose…) >>> >>> Ivan >>> >>> >>> >>> >>> ---- >>> Ivan Herman, W3C >>> Digital Publishing Activity Lead >>> Home: http://www.w3.org/People/Ivan/ >>> mobile: +31-641044153 >>> ORCID ID: http://orcid.org/0000-0003-0782-2704 >>> >>> >>> >>> >>> >> >> >> -- >> - Nick Ruffilo >> @NickRuffilo >> http://Aerbook.com >> http://ZenOfTechnology.com <http://zenoftechnology.com/> >> >> -- - Nick Ruffilo @NickRuffilo http://Aerbook.com http://ZenOfTechnology.com <http://zenoftechnology.com/>
Received on Monday, 1 June 2015 15:04:41 UTC