Re: html for scholarly communication: RASH, Scholarly HTML or Dokieli? from Johannes Wilm on 2017-10-19 (public-scholarlyhtml@w3.org from October 2017)

From: Johannes Wilm <mail@johanneswilm.org>
Date: Thu, 19 Oct 2017 17:54:12 +0200
To: Sarven Capadisli <info@csarven.ca>
Cc: Scholarly HTML community group <public-scholarlyhtml@w3.org>
Message-ID: <CABkgm-TpkCGsOUAPi89KoUz5tOdcVNXfz0z=nq39L6nO6-pzMA@mail.gmail.com>
On Thu, Oct 19, 2017 at 5:43 PM, Sarven Capadisli <info@csarven.ca> wrote:

> On 2017-10-19 09:20, Johannes Wilm wrote:
> > On Thu, Oct 19, 2017 at 4:59 PM, Sarven Capadisli <info@csarven.ca
> > <mailto:info@csarven.ca>> wrote:
> >
> >     On 2017-10-19 07:13, Johannes Wilm wrote:
> >     > In the two cases mentioned here: Dokieli and Substance.io/eLife,
> Dokieli
> >     > seems to not filter the HTML (much?) so if I take arbitrary
> content for
> >     > example copying the guardian frontpage and pasting into Dokieli
> gives a
> >     > lot of garbage + margins I cannot control, etc. . In the case of
> >     > Substance, it filters the HTML down to what that application can
> handle.
> >
> >
> >     You are pasting "garbage", so you are seeing "garbage". What's the
> use
> >     case for pasting "garbage"? dokieli is not intended to handle
> "garbage"
> >     pasting.
> >
> >
> >
> > Sorry, this was not meant to say that Doki.eli is garbage. When I pasted
> > the HTML from the Guardian frontpage, what ends up in the document is
> > content such as
> >
> > "'); hiddenDoc.close();
> > })(); {"uid":1,"hostPeerName":"https://www.theguardian.com",
> "initialGeometry":"{\"windowCoords_t\":0,\"windowCoords_r\":1920,\"
> windowCoords_b\":1053,\"windowCoords_l\":0,\"frameCoords_t\":3277,\"
> frameCoords_r\":1905,\"frameCoords_b\":3277,\"frameCoords_l\":0,\"
> styleZIndex\":\"auto\",\"allowedExpansion_t\":0,\"
> allowedExpansion_r\":0,\"allowedExpansion_b\":0,\"
> allowedExpansion_l\":0,\"xInView\":0,\"yInView\":0}""
> >
> >
> > (Basically code). This is what I tried to describe as "garbage". There
> > is nothing wrong with Doki.eli, but like all the other editors we have
> > seen so far, it cannot just handle arbitrary HTML in a smooth way
> > without filtering it down to a restricted subset of HTML of some kind.
> >
> > If I try a less absurd paste, and instead just copy the contents of an
> > article and then paste it into Dokieli, the margins are all strange, and
> > the controls for headlines, etc. don't have any effect.
>
>
> Cleaning of arbitrary HTML pastes is not one of dokieli's features.
>

That may be fine for a purely experimental editor. But in a production
environment even for just web output to a blog, authors will be copying and
pasting back and forth between different tools, won't they? And if they do,
I assume we need to handle this in some way?


>
> >     > The conventional logic is that unless you clearly define what
> restricted
> >     > version of HTML you permit, you cannot really create an editor
> that is
> >     > able to handle it all. But it sounds like the science.ai <
> http://science.ai>
> >     > <http://science.ai> people have been able to go beyond this. Is
> that
> >     > correctly understood?
> >     The HTML(+RDFa) patterns in Scholarly HTML, dokieli, and scienca.ai
> >     <http://scienca.ai> are
> >     very similar. The focus is mostly on RDFa for data reuse/exchange, as
> >     opposed to HTML. The observed HTML patterns just happens to be best
> >     practices. The CSS and JavaScript try to make the best of what's
> >     available in their respective ways. This doesn't mean that this
> approach
> >     is infinitely flexible or flawless. It just means that the
> constraints
> >     and the handling is elsewhere.
> >
> >
> > Ok, so the question here is: how do we avoid the issues mentioned above
> > if we do not restrict the HTML? Because basically if each one of our
> > different tools has it's own idea about what tags to allow and each
> > creates a slightly differently structured code, I cannot quite see how
> > we are going to make integrated systems with that where a document can
> > move from say an editing app, to a conversion app, to a printed version
> > and to a commented online version.
>
>
> I still don't think there is a one size fits all solution for editors to
> handle arbitrary HTML. Editors can do whatever they see appropriate.
>

Ok, but then if we want to connect tool A and tool B and we don't want
information to get lost in an HTML cleanup by either of the tools, do not
at least tool A and tool B agree on a subset of HTML/CSS/etc.? Or what's
tye trick there? I can see that adding more information through RDFa, etc.
can help tools to better understand what the meaning of the incoming data
is, but won't there still be the cleanup issue when say tool A uses
canvases extensively, whereas tool B has not even considered that anyone
might be using canvas-elements?



> RDFa is the focus for exchange and that's where we interop. Trying to
> create "Scholarly HTML" that handles a range of scholarly information
> that's strictly around certain HTML elements/attributes (without RDF
> focus) is flawed IMO. I've stressed this several times in this mailing
> list.
>
> Neither is it the case that information is strictly about articles
> ("papers"). So, if this CG is interested in moving the discussion
> towards RDF in HTML patterns for *scholarly information*,
> https://dokie.li/docs already has some patterns documented. See also:
>
> https://lists.w3.org/Archives/Public/public-scholarlyhtml/
> 2017Sep/0046.html
>
> -Sarven
> http://csarven.ca/#i
>
>


-- 
Johannes Wilm
http://www.johanneswilm.org
tel: +1 (520) 399 8880
Received on Thursday, 19 October 2017 15:54:36 UTC