Re: Authoring versus Interchange from Ivan Herman on 2015-12-02 (public-scholarlyhtml@w3.org from December 2015)

From: Ivan Herman <ivan@w3.org>
Date: Wed, 2 Dec 2015 16:34:19 +0100
To: Robin Berjon <robin@berjon.com>
Cc: Johannes Wilm <johanneswilm@vivliostyle.com>, Florian Rivoal <florian@rivoal.net>, W3C Scholarly HTML CG <public-scholarlyhtml@w3.org>
Message-Id: <FCFC259D-CB7D-4E88-AC27-ED39B1EC0BDE@w3.org>

> On 2 Dec 2015, at 16:05, Robin Berjon <robin@berjon.com> wrote:
> 
> On 02/12/2015 03:12 , Johannes Wilm wrote:
>> Could you think of an example of when this would happen? If you write
>> the document by hand, and it needs to be written with the level of
>> specificity as what is later needed for interchange, wouldn't you need
>> to write it as complex as well?
> 
> In practice, no.
> 
> I think that the big difference between an authoring and an interchange
> format is whether a transformation step is required before the content
> can be consumed by a relative generic processor (say, something that
> understands HTML + DPUB ARIA + RDFa + schema.org) so as to obtain the
> same semantics.
> 
> One example from our SH is the markup used for authors:
> http://scholarly.vernacular.io/#authors. The use of schema.org roles,
> the indirection for affiliations, the high markup to text ratio don't
> make this friendly to type by any metric. If I expected this to be
> hand-authored, I would not wish this on anyone.
> 
> The semantics are, however, correct even without knowing that this is
> SH. A general purpose crawler is able to look at that and fully
> understand what you're talking about without ever knowing that this is SH.
> 
> By contrast, were I designing an authoring format for this I would have
> gone with something more like:
> 
> <script type=json/authors>
> [
>  {
>    "given": "Robin",
>    "family": "Berjon",
>    "url": "http://berjon.com/",
>    "org": { "name": "SA", "url": "http://science.ai/" },
>    "corresponding": true
>  },
>  ...
> ]
> </script>
> 
> And indeed: http://www.w3.org/respec/guide.html#editors-authors.
> 
> That's a lot easier to remember, in fact I know I've used the ReSpec
> syntax for that a million times without having to go back to the docs.
> But to general-purpose processors, it is meaningless.
> 
> We get better interoperability by reusing what exists rather than
> reinventing more convenient syntaxes. That's what makes it an
> interchange format more than an authoring format.

There may be a middle way for this specific aspect, though. Using RDFa is based on a specific RDF vocabulary (schema.org + some things). It is perfectly possible to embed RDF into the HTML using JSON-LD and the script tag. Ie, the author can choose to use that instead of RDFa, and the result is identical. Schema.org (afaik) understands both approaches and many RDFa processors (e.g., Python's RDFLib) has parsers both for RDFa and the embedded JSON-LD.

Ivan



> 
>> From the perspective of tools, it does not make a huge difference. The
> HTML+RDFa version is a little bit harder (seriously harder if you want
> to support round-tripping, but that's not required) but overall you can
> have the same form-based UI in which authors can enter a list of people
> defined by straightforward fields.
> 
>> Some of the markup will have to be somewhat complex -- for example
>> citations that have both text before and after them and that need to be
>> able to specify something else than pages as reference. Every few months
>> someone seems to try to invent a new dialect of markdown for academics
>> to get away from the difficulty of writing latex, but once they run into
>> citations they end up either not being able to support most of the
>> required features or defining something that is as complex as latex. So
>> users who choose to write it by hand will have to look the variable
>> names up when using them.
> 
> It is a law of wikitext syntaxes that they will grow in complexity until
> they have the full flexibility of HTML, only much, much uglier. The same
> applies to Markdown.
> 
> --
> • Robin Berjon - http://berjon.com/ - @robinberjon
> • http://science.ai/ — intelligent science publishing
> •
> 


----
Ivan Herman, W3C
Digital Publishing Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
ORCID ID: http://orcid.org/0000-0003-0782-2704

Received on Wednesday, 2 December 2015 15:34:35 UTC