Re: Authoring versus Interchange from Robin Berjon on 2015-12-02 (public-scholarlyhtml@w3.org from December 2015)

From: Robin Berjon <robin@berjon.com>
Date: Wed, 2 Dec 2015 10:05:15 -0500
To: Johannes Wilm <johanneswilm@vivliostyle.com>, Florian Rivoal <florian@rivoal.net>
Cc: public-scholarlyhtml@w3.org
Message-ID: <565F08AB.8080600@berjon.com>

On 02/12/2015 03:12 , Johannes Wilm wrote:
> Could you think of an example of when this would happen? If you write
> the document by hand, and it needs to be written with the level of
> specificity as what is later needed for interchange, wouldn't you need
> to write it as complex as well?

In practice, no.

I think that the big difference between an authoring and an interchange
format is whether a transformation step is required before the content
can be consumed by a relative generic processor (say, something that
understands HTML + DPUB ARIA + RDFa + schema.org) so as to obtain the
same semantics.

One example from our SH is the markup used for authors:
http://scholarly.vernacular.io/#authors. The use of schema.org roles,
the indirection for affiliations, the high markup to text ratio don't
make this friendly to type by any metric. If I expected this to be
hand-authored, I would not wish this on anyone.

The semantics are, however, correct even without knowing that this is
SH. A general purpose crawler is able to look at that and fully
understand what you're talking about without ever knowing that this is SH.

By contrast, were I designing an authoring format for this I would have
gone with something more like:

<script type=json/authors>
[
  {
    "given": "Robin",
    "family": "Berjon",
    "url": "http://berjon.com/",
    "org": { "name": "SA", "url": "http://science.ai/" },
    "corresponding": true
  },
  ...
]
</script>

And indeed: http://www.w3.org/respec/guide.html#editors-authors.

That's a lot easier to remember, in fact I know I've used the ReSpec
syntax for that a million times without having to go back to the docs.
But to general-purpose processors, it is meaningless.

We get better interoperability by reusing what exists rather than
reinventing more convenient syntaxes. That's what makes it an
interchange format more than an authoring format.

>From the perspective of tools, it does not make a huge difference. The
HTML+RDFa version is a little bit harder (seriously harder if you want
to support round-tripping, but that's not required) but overall you can
have the same form-based UI in which authors can enter a list of people
defined by straightforward fields.

> Some of the markup will have to be somewhat complex -- for example
> citations that have both text before and after them and that need to be
> able to specify something else than pages as reference. Every few months
> someone seems to try to invent a new dialect of markdown for academics
> to get away from the difficulty of writing latex, but once they run into
> citations they end up either not being able to support most of the
> required features or defining something that is as complex as latex. So
> users who choose to write it by hand will have to look the variable
> names up when using them. 

It is a law of wikitext syntaxes that they will grow in complexity until
they have the full flexibility of HTML, only much, much uglier. The same
applies to Markdown.

-- 
• Robin Berjon - http://berjon.com/ - @robinberjon
• http://science.ai/ — intelligent science publishing
•

Received on Wednesday, 2 December 2015 15:05:54 UTC