- From: Robin Berjon <robin@berjon.com>
- Date: Mon, 30 Nov 2015 23:13:50 -0500
- To: public-scholarlyhtml@w3.org
With the focus on interchange in mind, some design considerations come to mind. The first is the scope of the data model for SH. I think that it should be the article. Don't get me wrong, I'm just as excited as the next person about liberating knowledge not just from antiquated formats but also from the shackles of the article form. The current insistence on narratives can be toxic, it can be very limiting, it slows down some areas of science, and it stymies reuse. Having said that, the article is not going away any time soon, I would expect ever. Narratives are also very useful, when called for (which is often). And of course, there is a *lot* of existing content in article form. This does not at all mean that projects to apply linked data to research, of which Linked Research is one great example, are wrong. On the contrary, I think SH should be designed in such a way that it can integrate well with them. This enables pipelines such as legacy content -> content mine -> LR. The result is likely not as great as if we got every researcher to use LR for everything from the get go, but that seems unlikely. With this approach, we can upgrade gradually. Having articles as its scope, the choice of HTML as the baseline format should be (hopefully) obvious. HTML can be quite a mess though, so we can't just say "HTML" and expect anything to work. You don't want <applet> and <marquee> of course, but you probably also don't want just a flat list of styled paragraphs (e.g. Word's data model). We need a specifically structured subset of HTML. HTML is also limited in its semantics. It has a few things for scholarly content (such as paragraphs and sections) but that only takes you so far. Thankfully, that's where DPUB-ARIA kicks in. But we likely don't want all that's in DPUB-ARIA either, nor do we want it used arbitrarily. It is designed to also support exotic content, say books, that might be out of scope (or it might not — up for discussion). We probably want to rely on a prescriptive subset of DPUB-ARIA. Then there are parts that DPUB-ARIA doesn't cover because it is generic to publishing and we are specialised to scholarly content (e.g. capturing sources of funding). For those parts we need to avail ourselves of semantic extension mechanisms like Microdata or RDFa (I would say more likely the latter if we prefer to use a format that isn't half-abandoned, though both have issues). This then opens the question of which ontology/-ies to choose. My contention, which I know is not universally shared, is that semantics are only as useful as they are shared. Obviously, this has limits. My 6yo asked me the other day why we bothered having words like "house" when we could just as well get away with building-people-live-in, and we had a fun time regressing that into impossibly long words. If the most broadly understood vocabularies don't have a concept that *roughly* fits, then we can look into less used ones, and then we can invent something. Our SH currently makes use of an ad hoc ontology[0] but we consider that a bug — we plan to replace it entirely. Semantic overlays required by the spec should also be restricted by use cases. Ideally there should be a common interoperable baseline that one can always expect to find, and then people who want to can go crazy on top of that. That enables interoperability and freedom at the same time. So essentially, I propose that SH be entirely comprised of subsets of existing standards, with simple extensibility rules that dictate what can be guaranteed to interoperate, and what can be added safely but might not be universally understood. This is relatively easy to get right. [0] https://github.com/scienceai/scholarly-article/ -- • Robin Berjon - http://berjon.com/ - @robinberjon • http://science.ai/ — intelligent science publishing •
Received on Tuesday, 1 December 2015 04:14:18 UTC