Re: html for scholarly communication: RASH, Scholarly HTML or Dokieli? from Johannes Wilm on 2017-10-19 (public-scholarlyhtml@w3.org from October 2017)

From: Johannes Wilm <mail@johanneswilm.org>
Date: Thu, 19 Oct 2017 15:13:27 +0200
To: Mike Perlman <perlmanm@me.com>
Cc: Stian Soiland-Reyes <soiland-reyes@manchester.ac.uk>, sebastien <sebastien.ballesteros@gmail.com>, "public-scholarlyhtml@w3.org" <public-scholarlyhtml@w3.org>
Message-ID: <CABkgm-Q-LprxrUJSy+Drx9Jtrp=i4BHvNa-Y1m9uvH2VhctMFg@mail.gmail.com>
On Thu, Oct 19, 2017 at 1:31 PM, Mike Perlman <perlmanm@me.com> wrote:

> Hi
>
> I solved this issue a long time ago - use a CMS like WordPress to develop
> the HTML
>


But then you are simply using Wordpress to filter it down to the Wordpress
version of a restricted version of HTML [1], aren't you? Using
Wordpress-HTML is probably a good idea in many cases just because Wordpress
has become so popular, but it's not just arbitrary HTML.


In the two cases mentioned here: Dokieli and Substance.io/eLife, Dokieli
seems to not filter the HTML (much?) so if I take arbitrary content for
example copying the guardian frontpage and pasting into Dokieli gives a lot
of garbage + margins I cannot control, etc. . In the case of Substance, it
filters the HTML down to what that application can handle.

The conventional logic is that unless you clearly define what restricted
version of HTML you permit, you cannot really create an editor that is able
to handle it all. But it sounds like the science.ai people have been able
to go beyond this. Is that correctly understood?



> and then export it in to a single file HTML complete with javascript, CSS
> and media converted to base64. This file can be easily accessed in a
> browser via the old (over a quarter century) reliable file url.
>
> If it works in WP it can work as a stand alone HTML file.
>


[1] https://en.support.wordpress.com/code/#html-tags


>
> Cheers
> Mike
>
> On 19 Oct 2017, at 13:26, Johannes Wilm <mail@johanneswilm.org> wrote:
>
> Hey,
>
> not limiting HTML within an editor sounds fascinating.
>
> For many here this will be obvious, but the main reason why editors like
> Fidus Writer until now have not liked the idea of arbitrary HTML is that
> the task of programming an editor that can deal with any type of HTML/CSS
> combination seems close to impossible because there are so many different
> ways the same visual output can be expressed, and we really cannot be
> entirely sure whether our editor can work with any particular combination
> until we have tried out what the browser supports natively and then add a
> lot of our own code to get around those particular bugs. For example, I
> spent a lot of time trying to get the caret to move around inline canvas
> elements a few years. Inline canvas elements may not be terribly common,
> but they do exist, and we used them for something at the time. We have
> since given up on that and only use more commonly used elements (and we
> have switched to library rather than trying to tackle the problem of caret
> movement by ourselves). But still, we continue to have a lot of code that
> tries to standardize HTML. This starts with simple tasks such as handling
> paste data coming from Google Docs or Microsoft Word: All the import
> filters I have seen so far attempt as to find as much useful information in
> the HTML as possible, and then throw the rest away.
>
> Of course, we can try to write an editor that is as forgiving as possible,
> but without a limitation of what HTML/CSS we allow, I don't think we really
> "guarantee" to any of our users that they can use the output with the next
> tool in the pipeline.
>
> It would be really interesting to hear how you guys have overcome this
> issue.
>
> On Tue, Oct 17, 2017 at 1:11 PM, Stian Soiland-Reyes <
> soiland-reyes@manchester.ac.uk> wrote:
>
>> This looks really good, Sebastien!  I agree in that using structured
>> RDFa/JSON-LD in free HTML is much preferably than trying to limit ourselves
>> to a sub-set of HTML – as we see in this thread it is hard to reach
>> agreement without also limiting future publication styles.  We should not
>> be aiming to replicate 1960-style computer science papers with the odd
>> hyperlink as the only enhancement.
>>
>>
>>
>> I like how well you have given full, yet clear examples for each concept,
>> and re-used JSON-LD and schema.org.  This should be quite compatible
>> with the effort of http://bioschemas.org/ which has a lot of traction in
>> the biology/bioinformatics community (but many of their standards are
>> general for academics)  – perhaps Publication could be added there based on
>> your effort and then propagate into schema.org? Recommend you to get in
>> touch – see http://bioschemas.org/howtojoin/
>>
>>
>>
>> I think the science.ai approach have lots of overlap with not just
>> Scholarly HTML, but also our work on http://www.researchobject.org/ - in
>> particular our Research Object Bundle https://w3id.org/bundle/ which
>> also have a JSON-LD-based manifest  https://researchobject.github
>> .io/specifications/bundle/#manifest – there we didn’t attempt to
>> “deconstruct” the publication, but focused more on the supporting data and
>> software sources to go along the black-box publication in the RO. Combing
>> with your approach would allow embedding rich structured metadata that can
>> then easily be extracted (say into separate annotations) using off the
>> shelf RDFa/JSON-LD tools.
>>
>>
>> There’s also concurrent work such as eLife’s Reproducible Document Stack
>> https://elifesciences.org/labs/7dbeb390/reproducible-documen
>> t-stack-supporting-the-next-generation-research-article - although that
>> is working with JATS XML as the base format it has similar archiving
>> considerations, and I’ve been pushing for them to add some kind of
>> Scholarly HTML as an embedded format.
>>
>>
>> One challenge as usual is how to squeeze the structured metadata out of
>> the authors. eLife are working on interactive editors for this, similar
>> HTML-based approaches are of course the previously mentioned
>> https://dokie.li which in the WYSIWYG editor allow you to add microdata
>> anywhere (as well as generating structural microdata for paragraphs etc).
>>
>>
>>
>>
>> Side-note for manifest people:
>>
>> I see in https://nightly.science.ai/documentation/archive#graph-content
>>  you have quite a minimal manifest (good!) as a @graph, but without
>> relating the contained resources to the (implied) aggregation. This can
>> make it hard to understand what is part of the aggregation (e.g. what you
>> directly list under @graph), and what is just a sub-resource (like your
>> DataDownload example). Is there a reason why you didn’t use a property to
>> list these? We reused OAI-ORE ore:aggregates for this purpose (mapped
>> through our JSON-LD context) – I think your archive is also in effect
>> making an ore:Aggregation or even an ro:ResearchObject – so perhaps reuse
>> of those would be beneficial.
>>
>>
>>
>> Happy to set up a call if you like to discuss further!
>>
>>
>> --
>> Stian Soiland-Reyes, eScience Lab
>> School of Computer Science, The University of Manchester
>> http://orcid.org/0000-0001-9842-9718
>>
>>
>> *From: *sebastien <sebastien.ballesteros@gmail.com>
>> *Sent: *16 October 2017 10:15
>> *To: *public-scholarlyhtml@w3.org
>> *Subject: *Re: html for scholarly communication: RASH, Scholarly HTML or
>> Dokieli?
>>
>> Hello,
>>
>> A quick update on science.ai documentation effort.
>>
>> As Robin mentioned we have been iterating quite a lot on scholarly
>> HTML internally. What we learned along the way (working with several
>> established players in the field) is that trying to standardize or
>> define constraints at the HTML level is somewhat too constraining (we
>> are planning to provide more context on that soon).
>>
>> In our case, agreeing on a vocabulary and using RDFa and / or JSON-LD
>> to express it (without additional constraints) has proven to be more
>> productive.  For us, schema.org (and the process in place to extend
>> it) provides enough basis to make that work. For that reason we are
>> now mostly focused on exposing and documenting schema.org patterns
>> that are useful in the context of scholarly publishing.
>>
>> I will post an updated link when our documentation hits our production
>> website but in the meantime feel free to check out
>> https://nightly.science.ai/documentation/archive if you are curious
>> about what we have been doing since the days of
>> http://scholarly.vernacular.io/.  If you look don't pay too much
>> attention to the archive stuff, but the JSON-LD / RDFa examples should
>> provide a good idea of the schema.org patterns that we have found
>> useful in the context of scholarly publishing.
>>
>> Sebastien
>>
>>
>>
>
>
> --
> Johannes Wilm
> http://www.johanneswilm.org
> tel: +1 (520) 399 8880 <(520)%20399-8880>
>
>
>


-- 
Johannes Wilm
http://www.johanneswilm.org
tel: +1 (520) 399 8880
Received on Thursday, 19 October 2017 13:13:56 UTC