- From: Ben De Meester <ben.demeester@ugent.be>
- Date: Thu, 28 Jan 2016 11:11:27 +0100
- To: "DPUB mailing list (public-digipub-ig@w3.org)" <public-digipub-ig@w3.org>
- Message-ID: <CAJ-O9TvS_u-0obtg5t9B8K9xjC5EzbTAysOTDW-gcDcOmsV1BA@mail.gmail.com>
Hi all, Based on the discussion yesterday, I have been musing, and drafted my thought below. It is insanely long, sorry for that, the short version is that I make following statements: * A PWP locator can be absolute or relative. * The relative locator allows to link to resources once you know where the PWP is located * and can be derived using the PWP manifest * The absolute locator consist of the relative locator and the PWP locator. * The PWP locator is always in a certain state (e.g., locally unpacked, or hosted packed, or ...) * However, all instantiations of the PWP link back to the state-less, abstract PWP, via its Canonical URL * and that Canonical URL needs to point to at least one instantiation of a PWP. * Thus, a PWP can be referenced using its specific instantiation, or via its Canonical URL. All of these statements are open to debate of course :). Also: @Romain: could you give an update to the current state of the use cases, and how we can help you? Greetings, Ben ## States - scope As per the current state of the PWP WD, we scope this work specifically that a PWP can have different states (packed/unpacked, protocol/file), but otherwise, the contents of the PWP is exactly the same across those states. Locating content between PWPs that have different contents (e.g., in another language, or an earlier version), are currently out of scope. Things such as the FRBR model is out of scope, as this is more about identifiers than about locators. Also, with locators, there is meant (entire) PWP's and/or individual resources inside the PWP. For more fine-grained locations (e.g., the second paragraph of document X), other efforts are going on, e.g., in the annotation working group. ## Remark: Absolute vs Relative As far as I see it, it is possible to have relative and absolute locators, where relative locators will mostly (exclusively?) be used inside the PWP, and absolute locators might be used for internal links, but probably mostly for external sources linking to the PWP. As such, I think of a locator as having two parts: [PWP locator]*[resource locator] In the case of a relative locator, the [PWP locator] is missing, and needs to be derived from context. ### Internal links Inside the PWP > i.e., inside the 'container' that holds all contents of the PWP, > for a packed PWP, this is straightforward, i.e., inside the package, > for an unpacked PWP, > I mean inside the subfolder, whether it is file or protocol state `<p>See <a href="[resource locator]">Section 2</a> for more info.</p>` Q1: Is this locator the same when (* Q1a. section 2 is the same file) * Q1b. section 2 is a different file, but within the same PWP * Q1c. the PWP is opened protocol/unpacked * Q1d. the PWP is opened file/packed * Q1e. the PWP is opened protocol/packed * Q1f. the PWP is opened file/unpacked * Q1g. the PWP is opened in a different protocol (e.g., via http or https or ftp) * Q1h. the PWP is moved/copied protocol-wise (e.g., from example.com to books.org) * Q1i. the PWP is moved/copied file-wise (e.g., from /usr/home/ben/ to /user/home/bjdmeest/) * Q1j. the PWP is packed vs unpacked ### External links >From a (online) website/ (offline) paper/... <p>John et al. describe an <a href="[PWP locator][resource locator]>interesting algorithm</a> for this problem.</p> Q2: Is this locator the same when * Q2a. The referring document is actually inside the PWP * Q2b. The referred PWP is accessed protocol/unpacked * Q2c. The referred PWP is accessed file/packed * Q2d. The referred PWP is accessed protocol/packed * Q2e. The referred PWP is accessed file/unpacked * Q2f. The referred PWP is accessed in a different protocol (e.g., via http or https or ftp) * Q2g. the referred PWP is moved/copied protocol-wise (e.g., from example.com to books.org) * Q2h. the referred PWP is moved/copied file-wise (e.g., from /usr/home/ben/ to /user/home/bjdmeest/) * Q2i. the referred PWP is packed vs unpacked ## Idea Personally, I see this as two different problems, i.e., the [PWP locator] depends on the protocol the PWP is in, whereas the [resource locator] depends on how the the packed vs unpacked PWP should be accessed. To me, the [resource locator] is more technical, i.e., once you have the PWP, you can (probably via the manifest) access and link to the individual resources. Given the discussion yesterday, I see the following high-level model, to solve the [PWP locator]: 1. Most importantly, a PWP consists of a Canonical URL and some resources. 2. The identifiers of a PWP are, e.g., ISBN numbers, but could coïncide with this Canonical URL 3. The Canonical URL is the reference to the abstract PWP, whereas different State URLs refer to specific instantiations of that PWP 4. The Canonical URL does not need to be on the same online place as the actual PWP (cfr. DOI) 5. The State URLs could be, e.g., the packed version on the publishers website, the unpacked version on the publishers website 6. or the URL of the local copy of the downloaded PWP When referencing a publication, the user can reference the Canonical URL or the state URL. When referencing the state URL, the Canonical URL could be found, as it is part of the PWP. ### (technical) TODOs Systems need to be in place to make sure the Canonical URL can refer to at least one state URL, as otherwise only the abstract PWP exists, but no real content. It should be specified how a PWP references to the Canonical URL. It should be specified how to access and link to specific resources in a PWP, via some kind of manifest. ### Fun things Fun thing #1: the most minimal website can already be a PWP, namely: the Canonical URL is also a State URL to the unpacked protocol version of the PWP. Fun thing #2: a user can remix the local PWP as much as he likes -- e.g., stripping out all the videos to create a 'slim' PWP and republishing it -- the remixed PWP could still refer to the 'official' PWP via its Canonical URL, and the publisher still keeps authority on 'correct' PWPs, as the Canonical URL does not need to refer to the remixed PWP, but only to the authorized PWPs. Add in checksums etc., and any user can verify whether a received PWP is the same as the published PWP. ### Bad things Bad thing #1: there is an insane amount of pressure on the Canonical URL. If this URL dies, then all instantiations of the PWP are disconnected. Ben De Meester Researcher Semantic Web Ghent University - iMinds - Data Science Lab | Faculty of Engineering and Architecture | Department of Electronics and Information Systems Sint-Pietersnieuwstraat 41, 9000 Ghent, Belgium t: +32 9 331 49 59 | e: ben.demeester@ugent.be | URL: http://users.ugent.be/~bjdmeest/
Received on Thursday, 28 January 2016 10:12:20 UTC