- From: Nick Ruffilo <nickruffilo@gmail.com>
- Date: Mon, 1 Jun 2015 11:03:04 -0400
- To: Ivan Herman <ivan@w3.org>
- Cc: W3C Digital Publishing IG <public-digipub-ig@w3.org>, Ralph Swick <swick@w3.org>
- Message-ID: <CA+Dds5_+O1jWXu6erTfTakUWjd=4DLG6k++B+yzfmd7-KGx9Vw@mail.gmail.com>
Ivan, Yes, I'm saying that the server would unpack & provide only the manifest & OPF etc. It would then be the client who requests specific items (or all items) within the package. While a server may not be able to do that today, there's nothing stopping a server from unzipping that epub and doing it today... There are also existing apache extensions that let you utilize a ZIP file as a directory, so while it needs to be installed, it does work and enabled people to utilize that information today. The ability to send just the manifest does not exist, but I am not sure that's a real issue, you could gracefully degrade by: 1) If the server CANNOT handle sending the manifest, send the entire epub to the client, let the client deal with it 2) if the server CAN handle sending the manifest, awesome. #2 isn't a great solution for very large epubs, but services that are releasing very large epubs can just ensure their servers support sending the manifest. On Mon, Jun 1, 2015 at 10:58 AM, Ivan Herman <ivan@w3.org> wrote: > > > On 01 Jun 2015, at 16:11 , Nick Ruffilo <nickruffilo@gmail.com> wrote: > > > > What if we leave it up to the client/server to determine what the root > of the package is and handle it approrpiately? > > > > So, an epub-web object (or whatever we call it) might live at : > //my/item/awesome.epub > > > > To address a specific FILE in that, you go to > //my/item/awesome.epub/text/chap2.html > > > > To get to a fragment, you just use # in reference to whatever the > fragment is: > > //my/item/awesome.epub/text/chap2.html#first_header > > //my/item/awesome.epub#SomeCrazyTextRangeIdentifier > > > > If run on a server, it would be the server's job to extract the > appropriate package files (when thinking about epub, the OPF for example) > and provide that to the client, who can then determine the resources it > needs and request them from the server. > > I am not sure what you mean. Do you mean the server > > - unpacks the package and returns the html file (chap2.html); or > - sends the full package to the client that would then take care of > unpackaging? > > The first one is what I referred to in Con 1.1; the second is Con 1.2. > > Actually, an extra 'Con' to 1.1 is that we would need special servers. Ie, > a person cannot 'just' put an EPUB-WEB document on a Web site (eg, a self > publisher) unless he/she installs an extension to the server. And that may > be a drag. > > > > > When run LOCALLY, the client will simply extract the package files > directly. Otherwise there is no duplication of work or resources, etc. > > > > There was a note about the fragment (things after the #) not being sent > to the server. If that is truly the case - and not just that the server > ignores it - a DIFFERENT marker - what - i have no idea… > > I do not think the fragment is really a problem. Once the client has the > resource, it can take care of it. > > Ivan > > > > > > -Nick > > > > On Mon, Jun 1, 2015 at 6:22 AM, Ivan Herman <ivan@w3.org> wrote: > > Hi all, > > > > my sincere apologies for the length of this mail, but I thougt it would > be worthwhile to get some issues written down to clarify our discussions... > > > > On the F2F meeting I made the claim that the identifier/fragment issue > may be the most tricky one facing us around EPUB-WEB. I thought it is worth > writing this down; maybe somebody can also prove me wrong that this is not > such a complex issue after all. Actually, what is below is a summary of a > very short email/personal discussion Markus, Tzviya, and I had on the > matter after the F2F. (At some point it is probably worth writing down the > conclusions of this thread somewhere on the wiki.) > > > > With that, here is where I see a real problem. > > > > Let us consider a Packaged Document. The URL of this document is > http://www.example.org/doc. The document includes, among others, chapter > 2 in file chap2.html. This has a section whose ID is 'sec' (for the sake of > simplicity, I consider here the simplest and best known fragment used in an > HTML file, ie, using the @id attribute on a, say, <h1> element). The > question arising is: what is the full URI for that section? Or, to be more > exact, what is the full, *canonical* URI for that section, ie, a URI that > is independent on whether the document is off-line or on-line? > > > > An Aside: How do URI-s work? > > ---------------------------- > > > > Tzviya told me privately that not everyone on the group may know how > exactly URI-s and fragments work in browsers and on the Web. So maybe just > a few words may be relevant here. If you know this, my apologies, you can > just skip this part. > > > > A URL consists of, roughly, two parts: > > > > - A "primary" address that identifies the resource somewhere on the web. > Say, 'http://xyx.example.com/mydoc' > > - A "fragment", that is added after the '#' sign, which identifies > something *within* the resource; say, 'mysection' > > > > There are two steps in handling this to take into account: > > > > - There can be *only one fragment id in a URL*, ie, only one occurence > of '#'. What is after the '#' is interpreted in accordance with a > corresponding specification that is bound to the media type of the resource > > > > - A Web browser interprets the fragment locally. Ie, if it gets ' > http://xyx.example.com/mydoc#mysection' it > > 1. strips the fragment > > 2. it issues a request, through the HTTP protocol, for '/mydoc' > to the 'http://xyx.example.com' server > > 3. it gets the full resource and then uses the fragment (i.e., > 'mysection') to identify something within the returned resource. > > > > > > What is the URI with fragment for section 'sec' in a package? > > ------------------------------------------------------------- > > > > (For the sake of this discussion I refer to the way the packaging > specification works in terms of fragments.) > > > > 1. If http://www.example.org/doc refers to a real, physical package on > the Web, accessing 'sec' chap2.html, using the current fragment > specification in the packaging document, would be: > > > > http://www.example.org/doc#url=/chap2.html;fragment=sec > > > > meaning: > > 1. The client retrieves the package http://www.example.org/doc > > 2. Unpackages the package in a local cache (or equivalent) > > 3. It interprets the fragment 'url=/chap2.html;fragment=sec' by > (per the current specification of packaging) by > > 3.1. identifying the 'part' within the package, yielding > 'chap2.html' > > 3.2. 'chap2.html' is an HTML file; because the server > knows how to identify something within the file with a fragment, ie, it > gets to section 'sec' > > > > It is important to realize that, in this model, the 'unpackaging' is > done by the client (the browser i.e., the reading system) > > > > 2. If the package is just 'virtual', ie, all documents are on the Web, > then there is of course a much simpler approach. The URL of the section is > > > > http://www.example.org/doc/chap2.html#sec > > > > meaning > > 1. The client retrieves the HTML document > http://www.example.org/doc/chap2.html > > 2. It knows how to identify something within the HTML file with > a fragment, ie, it gets to section 'sec' > > > > > > Back to the original question: what is the 'canonical' URI with fragment? > > ------------------------------------------------------------------------- > > > > It should be one of the two above. However, both have issues: > > > > A. http://www.example.org/doc/chap2.html#sec > > > > Pro: this is the 'natural', Web way. > > > > Con 1: *if* the document is, in fact, a real package then there are two > possible approaches to handle this: > > > > Con 1.1: The *server* handles the unpackaging. Ie, it should be in > position to analyze the URL it receives, realize that there is a 'package' > in between and do an unpackaging. What this would mean is that the client > would have to make requests for all chapters separately, which is not > optimal (although it can of course be cached)/ > > > > Con 1.2: The *client* handles unpackaging. This would require a > different server-client protocol, namely: > > 1. The client issues a request to ' > http://www.example.org/doc/chap2.html' > > 2. The server returns 'http://www.example.org/doc/' as a > package instead of the original chap2.html file (ie, the server should know > that this is part of a package through some redirection) > > 3. The client should then unpack and locate the chap2.html file > in the package > > 4. the fragment should be identified and handled. > > > > Steps 1-2-3 is not the current practice on the Web in terms of Web > Architecture: a client does not 'decompose' the 'primary' part of a URL > (beyond separating the server's identification from the part within that > server). It is unclear whether changing that is a viable/acceptable for the > browsers, and for the overal Web Architecture; it certainly requires a > discussion with the TAG. > > > > Con 2: If the URL is, in fact, a file:///... type one, this means that, > for that case, the unpackaging must be done on the client. Ie, there may be > duplication of functionality with the server and the client, which is not > optimal. > > > > B. http://www.example.org/doc#url=/chap2.html;fragment=sec > > > > Pro: this works for a package. > > > > For a document on the Web, it may also work if there is a 'conceptual' > entity on the Web for the document. I.e., http://www.example.org/doc > returns some sort of an information to the client that this is, fact, a > 'virtual' package, and then the server can issue a new request to > http://www.example.org/doc/chap2.html and take it from there. > > > > (Note that, regardless of the original issue, having a 'conceptual' > package handle for a document may not be a bad thing!) > > > > Con: The URL form is (much) more complex, and may be in danger of being > ignored for documents that are on the Web only. > > > > Personally, I do not have a clear solution in my head. Hence this mail, > trying to see how we can move on... > > > > Let me also add another remark, coming originally from Tzviya, just to > add it to the mix: "We need to think about situations such as multiple > authors creating one package or peer review (one or many authors + one or > many editors submit article + data set to journal for review. It undergoes > peer review by one or many reviewers. Journal rejects the article. > Something happens to the reviews, and the package is submitted to a second > journal) and so on.) In scenarios like this, the concept of versioning and > revisioning are a lot more important. It may be covered by OA. I don’t know > that we can resolve versioning with an identifier." > > > > (Again, apologies to be so verbose…) > > > > Ivan > > > > > > > > > > ---- > > Ivan Herman, W3C > > Digital Publishing Activity Lead > > Home: http://www.w3.org/People/Ivan/ > > mobile: +31-641044153 > > ORCID ID: http://orcid.org/0000-0003-0782-2704 > > > > > > > > > > > > > > > > -- > > - Nick Ruffilo > > @NickRuffilo > > http://Aerbook.com > > http://ZenOfTechnology.com > > > > > ---- > Ivan Herman, W3C > Digital Publishing Activity Lead > Home: http://www.w3.org/People/Ivan/ > mobile: +31-641044153 > ORCID ID: http://orcid.org/0000-0003-0782-2704 > > > > > -- - Nick Ruffilo @NickRuffilo http://Aerbook.com http://ZenOfTechnology.com <http://zenoftechnology.com/>
Received on Monday, 1 June 2015 15:03:33 UTC