Re: html for scholarly communication: RASH, Scholarly HTML or Dokieli? from Sarven Capadisli on 2017-10-19 (public-scholarlyhtml@w3.org from October 2017)

From: Sarven Capadisli <info@csarven.ca>
Date: Thu, 19 Oct 2017 09:43:45 -0600
To: public-scholarlyhtml@w3.org
Message-ID: <3eee5f41-de3c-ca9a-ee9c-fdb242087178@csarven.ca>
On 2017-10-19 09:20, Johannes Wilm wrote:
> On Thu, Oct 19, 2017 at 4:59 PM, Sarven Capadisli <info@csarven.ca
> <mailto:info@csarven.ca>> wrote:
> 
>     On 2017-10-19 07:13, Johannes Wilm wrote:
>     > In the two cases mentioned here: Dokieli and Substance.io/eLife, Dokieli
>     > seems to not filter the HTML (much?) so if I take arbitrary content for
>     > example copying the guardian frontpage and pasting into Dokieli gives a
>     > lot of garbage + margins I cannot control, etc. . In the case of
>     > Substance, it filters the HTML down to what that application can handle. 
> 
> 
>     You are pasting "garbage", so you are seeing "garbage". What's the use
>     case for pasting "garbage"? dokieli is not intended to handle "garbage"
>     pasting.
> 
> 
> 
> Sorry, this was not meant to say that Doki.eli is garbage. When I pasted
> the HTML from the Guardian frontpage, what ends up in the document is
> content such as 
> 
> "'); hiddenDoc.close();
> })(); {"uid":1,"hostPeerName":"https://www.theguardian.com","initialGeometry":"{\"windowCoords_t\":0,\"windowCoords_r\":1920,\"windowCoords_b\":1053,\"windowCoords_l\":0,\"frameCoords_t\":3277,\"frameCoords_r\":1905,\"frameCoords_b\":3277,\"frameCoords_l\":0,\"styleZIndex\":\"auto\",\"allowedExpansion_t\":0,\"allowedExpansion_r\":0,\"allowedExpansion_b\":0,\"allowedExpansion_l\":0,\"xInView\":0,\"yInView\":0}""
> 
> 
> (Basically code). This is what I tried to describe as "garbage". There
> is nothing wrong with Doki.eli, but like all the other editors we have
> seen so far, it cannot just handle arbitrary HTML in a smooth way
> without filtering it down to a restricted subset of HTML of some kind.
> 
> If I try a less absurd paste, and instead just copy the contents of an
> article and then paste it into Dokieli, the margins are all strange, and
> the controls for headlines, etc. don't have any effect.


Cleaning of arbitrary HTML pastes is not one of dokieli's features.

>     > The conventional logic is that unless you clearly define what restricted
>     > version of HTML you permit, you cannot really create an editor that is
>     > able to handle it all. But it sounds like the science.ai <http://science.ai>
>     > <http://science.ai> people have been able to go beyond this. Is that
>     > correctly understood?
>     The HTML(+RDFa) patterns in Scholarly HTML, dokieli, and scienca.ai
>     <http://scienca.ai> are
>     very similar. The focus is mostly on RDFa for data reuse/exchange, as
>     opposed to HTML. The observed HTML patterns just happens to be best
>     practices. The CSS and JavaScript try to make the best of what's
>     available in their respective ways. This doesn't mean that this approach
>     is infinitely flexible or flawless. It just means that the constraints
>     and the handling is elsewhere.
> 
> 
> Ok, so the question here is: how do we avoid the issues mentioned above
> if we do not restrict the HTML? Because basically if each one of our
> different tools has it's own idea about what tags to allow and each
> creates a slightly differently structured code, I cannot quite see how
> we are going to make integrated systems with that where a document can
> move from say an editing app, to a conversion app, to a printed version
> and to a commented online version.


I still don't think there is a one size fits all solution for editors to
handle arbitrary HTML. Editors can do whatever they see appropriate.
RDFa is the focus for exchange and that's where we interop. Trying to
create "Scholarly HTML" that handles a range of scholarly information
that's strictly around certain HTML elements/attributes (without RDF
focus) is flawed IMO. I've stressed this several times in this mailing list.

Neither is it the case that information is strictly about articles
("papers"). So, if this CG is interested in moving the discussion
towards RDF in HTML patterns for *scholarly information*,
https://dokie.li/docs already has some patterns documented. See also:

https://lists.w3.org/Archives/Public/public-scholarlyhtml/2017Sep/0046.html

-Sarven
http://csarven.ca/#i
Received on Thursday, 19 October 2017 15:44:26 UTC