Re: TheWebConf 2018 Trip Report from Henry Story on 2018-05-01 (semantic-web@w3.org from May 2018)

From: Henry Story <henry.story@bblfish.net>
Date: Tue, 1 May 2018 17:52:53 +0100
To: Stian Soiland-Reyes <soiland-reyes@manchester.ac.uk>
Cc: "me@harshp.com" <me@harshp.com>, "semantic-web@w3.org" <semantic-web@w3.org>
Message-Id: <83A3287B-D270-4DCE-A482-F104F19FC0C6@bblfish.net>
> On 1 May 2018, at 17:31, Stian Soiland-Reyes <soiland-reyes@manchester.ac.uk> wrote:
> 
> Your ePub is great! And very much more tablet-friendly than PDFs and even web HTMLs (which are hard to make offline).
> 
> And I would say much better than DOIs to landing pages to PDFs, but it is not ideal from a Linked Research perspective..
> 
> 
> I can't annotate or share inner URIs like file:///run/user/1000/gvfs/archive:host=file%25253A%25252F%25252F%25252Ftmp%25252Fmozilla_stain0%25252FEpistemology%25252520In%25252520The%25252520Cloud.epub/OPS/chapter-1.xhtml  (puh)

Does it help if I unzip it?

     http://bblfish.net/blog/2018/04/21/epub/

At least that makes for dereferenceable URLs (and allows people here to inspect that epub
online to better follow your points below.


> 
> We can however combine ePub (and other Web Packaging efforts, see https://w3c.github.io/publ-cg/ ) with a URI schema like  my proposed arcp:// https://tools.ietf.org/id/draft-soilandreyes-arcp-03.html to reference files inside:
> 
> UUID-v5 from your location
> 
>>>> arcp.arcp_location("http://bblfish.net/blog/2018/04/21/Epistemology%20In%20The%20Cloud.epub", "OPS/chapter1.xhtml")
> 'arcp://uuid,925c4933-ebba-52e4-a3df-0d0a0138c7c6/OPS/chapter1.xhtml'
> 
> Or, if you are ePub aware, using the existing 3A7D8D27-320B-4325-93A1-87655295E39A bookID inside epb.opf after parsing the rootfile - as it happens at least in this case to look like a UUIDv4 (the Python method will verify):
> 
>>>> arcp.arcp_random(uuid="3A7D8D27-320B-4325-93A1-87655295E39A", path="OPS/chapter1.xhtml")
> 'arcp://uuid,3a7d8d27-320b-4325-93a1-87655295e39a/OPS/chapter1.xhtml'
> 
> In this approach anyone getting your epub from anywhere (E.g. dropbox) will still get same URI.
> 
> 
> This depends on which uniqueness constraints you want, obviously, the very locked-down one would use hash of the epub file:
> 
>>>> arcp.arcp_hash(bytes, path="OPS/chapter1.xhtml")
> 'arcp://ni,sha-256;kqdD0q0zGAx7WmucTW36ZJlKA_QUPvVVy6I-z_YPfKg/OPS/chapter1.xhtml'
> 
> 
> 
> However in all of these you won't know where to get the epub from given that chapter1 URI. So that is the main problem with archive formats like epub.. you don't get retrievable URLs.
> 
> 
> For arcp you would need to either keep track of known locations, or use, as above, the RFC6920 /.well-known/ni/ protocol to look up the hash in some known epub repository.
> 
> 
> So we can combine this with ORE to say where the epub archive, our aggregation, came from. 
> 
> 
> <arcp://uuid,925c4933-ebba-52e4-a3df-0d0a0138c7c6/OPS/chapter1.xhtml> ore:isAggregatedBy <arcp://uuid,925c4933-ebba-52e4-a3df-0d0a0138c7c6/>
> 
> and PROV to say the aggregation is a variant of the epub file (it might also be elsewhere, so I won't do specialization hierarchy)
> 
> <arcp://uuid,925c4933-ebba-52e4-a3df-0d0a0138c7c6/> prov:alternateOf <http://bblfish.net/blog/2018/04/21/Epistemology%20In%20The%20Cloud.epub> .
> 
> 
> More specific with PAV, particularly if you are doing byte snapshots:
> 
> <arcp://uuid,925c4933-ebba-52e4-a3df-0d0a0138c7c6/> pav:retrievedFrom <http://bblfish.net/blog/2018/04/21/Epistemology%20In%20The%20Cloud.epub> ;
>  pav:retrievedOn "2018-05-01T17:30:11+01:00"^^xsd:dateTime;
>  pav:retrievedBy <https://orcid.org/0000-0001-9842-9718> .

so perhaps one needs also a relation from the epub folder to the zipped epub file

> 
> 
> And then of course once you have all of that, THEN you can finally start using Web Annotation Model to start annotating paragraphs, figures, etc.
> 
> 
> But if it's straight on the web.. a bit simpler. :) 

So perhaps to get started all one would need is is little epub js viewer that could fetch 
the unzipped files and show them? 

It seems like one could immediately then use the web annotation model.

Henry

> 
> --
> Stian Soiland-Reyes
> The University of Manchester
> http://www.esciencelab.org.uk/
> http://orcid.org/0000-0001-9842-9718
> 
> 
> ________________________________________
> From: Henry Story [henry.story@bblfish.net]
> Sent: 01 May 2018 12:23
> To: me@harshp.com
> Cc: semantic-web@w3.org
> Subject: Re: TheWebConf 2018 Trip Report
> 
>> On 1 May 2018, at 01:27, Harshvardhan J. Pandit <me@harshp.com> wrote:
>> 
>> Great report Sarven, it is very well put!
>> With PDF (vs HTML), the issue is PDF being (seen as) an end or viewing format. It abstracts out the underlying complexity of representations, and lots of tools can spew out a PDF. With HTML, it is like asking everyone to write their work in LaTeX only (which isn't bad in itself).
>> Maybe a point of convergence could be to get everything to produce decent HTML which isn't mangled by inherent messy CSS+JS native to the product/tool?
>> I (still) see "Microsoft Word..." in the title-bars of PDFs people submit, so I assume that a lot of people cannot or don't want to write their papers in LaTeX or other tools.
>> How to work towards exporting this text-based writing approach to native web formats?
> 
> Actually Apple's Pages produces epub format which is html plus some extra packaging
> zipped together in one file.
> 
> I put my "Epistemology in the Cloud" up in PDF and epub format here
> http://bblfish.net/blog/2018/04/21/
> 
> Is there a way to get browsers to turn into an ePub reader? Then I could
> just unzip it and make it visible as html.
> 
> Henry
> 
> 
>> 
>> On Monday 30 April 2018 10:29 PM, Sarven Capadisli wrote:
>>> Remark: PDF can be still welcomed because fundamentally there shouldn't
>>> be any discrimination on how someone wants to communicate their work
>>> (*). If a "Linked Data" researcher feels that PDF is the best way to
>>> communicate and disseminate their knowledge, that's their call. So, I
>>> think we shouldn't set that restriction, but then we are damned to make
>>> it hard on ourselves. Ohwell, let's see how else we can move things
>>> forward... perhaps more how-tos and stuff - something the other chairs
>>> suggested before the event even.
>> 
>> --
>> ---
>> Harshvardhan J. Pandit
>> PhD Researcher
>> ADAPT Centre, Trinity College Dublin
>> https://harshp.com
>> GPG: D81BF4F31D31B413
>> 
>> 
> 
>
Received on Tuesday, 1 May 2018 16:53:25 UTC