On Fri, Feb 22, 2013 at 12:55 AM, Tom Morris <tfmorris@gmail.com> wrote:
> PG does all kinds of weird stuff. They insisted on 7-bit ASCII for ages
> after everyone else moved to ISO Latin-1. They strip all edition
> information claiming that they are creating new editions (which means none
> of the citations would be any good anyway since you can't match them up
> with the correct edition).
>
> If you look at the millions of books of PD books in the Internet Archive,
> HathiTrust, Google Books, etc, you'll see that they certainly do include
> page information. It's only the few thousand in the quirky Project
> Gutenburg which don't (and even PG has that information at the beginning of
> the process until they intentionally throw it away).
>
It is not a PG issue only, there are many other digital libraries that
don't signal page breaks or don't use any standard method to indicate it.
Even in Wikisource there are many transcribed texts that do mention the
edition but have no information about the pagination. One possible solution
could be to have several scoping options (default:whole document, page
number, css fragment, pararagraph+delimiter, etc) and then use a finer text
selection on that area (character count or quote selector).
Btw, if anyone has a contact in PG, I'd love to talk with them.
David