Re: scientific publishing process (was Re: Cost and access)

Sarven, hello.

On 2014 Oct 7, at 13:13, Sarven Capadisli <info@csarven.ca> wrote:

> On 2014-10-07 11:39, Norman Gray wrote:
>> The original spark to the thread was a lament that SW and LD conferences don't mandate something XMLish for submissions because X(HT)ML is clearly better for... well ... dammit, it's Better.
> 
> Straw man argument. Please stop that now!
> 
> I will spell out the main proposal and purpose for you because it sounds like you are completely oblivious to them. Let me know if anything is unclear.

My remark was intended as facetious rather than fractious, but if you feel I misjudged the balance, I apologise.

I want to clarify what I meant, because on reflection it explains (at least to me) why I'm participating in this thread at such length.  My intention was to indicate that I don't feel that HTML is as central as you, amongst others, seem to assert it is.

I characterise the web as:

  1. URIs for addressing things,
  2. HTTP for retrieving things (other protocols exist, but...),
  3. a downloadable format which clients can parse to obtain more URIs, with a 'follow this' semantic.

Now, the obvious candidate for (3) is of course HTML; but on the web, and _especially_ on the Semantic Web, it can be anything: RDF in one or other format, XML+GRDDL, some discipline-specific format with has a link semantic in it, or even a PDF file with a standardised lump of RDF/XMP inside it.  That RDF may be immediately present, or it may require some sort of heuristic or deterministic extraction (as Kingsley has discussed).

All of these are web-native technologies, and I'd go as far as to say that the _least_ interesting thing you can find at the end of a URI is an HTML file.

The big deal, for me, in the idea of the Semantic Web, and the RDF world, is the realisation that the RDF model is sufficiently general that you can turn almost any structured data into RDF, put it into a big bucket, and start inferencing, querying, linking, and so on.  That generation/extraction of RDF is probably easier if the stuff is already pointy-bracketed for you, but that's only a detail.

The interesting thing, for me, is just how the web as a whole can go about collectively managing or facilitating this generation/extraction in a way which balances faithfulness to the original with interoperable meaning (Dublin Core and FOAF are truly wonderful things).  That is why I do feel that -- especially in this SW/LD community --

    HTML is a bit of a sideshow.

HTML is a splendid thing for all the reasons that you know and I know, but if it's seen as central, if all questions turn into "what does that look like in HTML?", if it's so in-our-face that we can't see round it, then we miss the interesting questions.  So it's not that I've a particular downer on HTML, or a particular enthusiasm for PDF, but I think that "what does that look like in PDF?" and "what does that look like in FITS?" (the format of choice in my area) are more interesting.

(or put another way, I don't think that HTML is the SW/LD community's dogfood to eat -- for WHATWG, yes; us no)

The sub-threads here about practicalities are amongst those questions, because they pick up the questions of "how does semantics get attached to documents in practice?", "why would authors bother?", "how does that information get passed around faithfully?"  It would be more interesting and productive if (and I don't mean this completely unseriously) the SW/LD community _forbade_ HTML from its conferences and journals.

So, this is where the opposite end of the spectrum is, from your position.  This may make a little more sense of what I've been saying.

Best wishes,

Norman


-- 
Norman Gray  :  http://nxg.me.uk
SUPA School of Physics and Astronomy, University of Glasgow, UK

Received on Tuesday, 7 October 2014 17:14:23 UTC