Re: Cost and access (Was Re: [ESWC 2015] First Call for Paper)

Hi Phillip, Eric, et. al.
--------------------------------------------
On Fri, 10/3/14, Phillip Lord <phillip.lord@newcastle.ac.uk> wrote:


 
 Eric Prud'hommeaux
 <eric@w3.org>
 writes:
 
 > Let's work through the requirements and a plausible migration plan. We need:
 >
 > 1 persistent storage: it's hard to beat books for a feeling of persistence.
 > Contracts with trusted archival institutions can help but we might also
 > want some assurances that the protocols and formats will persist as well.
 [snip] 
 Protocols and formats, yes, true a problem. I think in an argument between HTML and PDF,
 then it's hard to see one has the advantage over another. My experience is that HTML is easier
 to extract text from, which is always going to be base line.
---------------------------------------
Easier still is (X)HTML or XML written in plain text with Character Entities Hex Escaped.  Clipboards are "owned" by the OS and for ordinary users, syntax errors are fatal; Bread&Butter (full employment) for Help Desks.  Personally, I am un-fond of that ideology.  XSLT 2.0 has a (flawless) translation mechanism which eases user pain.  I've used it several times for StratML projects.  If you want a copy of the transform, contact me off line.
 ---------------------------------------
 For what it is worth, there are achiving solutions, including archive.org and arxiv.org both of which leap to mind.
 ---------------------------------------
The archiving solutions work well for the persistance of protocols and formats.  Persistance of Linked Data depends upon the ability of an archive to reduce <owl:sameAs> and <rdfs:*> to their *export* standards.  Professional credibility in all disciplines relies on how well one hefts the lingo - applies the schema labels to shared concepts. Publishers are very sensitive to this concern and it may be Linked Data with the deaf ear.
----------------------------------------
[snip]
 Okay. I would like to know who made the decision that HTML is not acceptable and why.
----------------------------------------
This is a related issue.  The "decision" to ignore the seperation of concerns issue mentioned above is a user acceptance impediment when "protocols and formats" are the only parameters considered.  In a few decades perhaps we will have real AI, Turing Machines, and academic disciplines will have their own Ontologies which speak to them.  As a container, I think HTML is fine.  I am not comfortable with RDFa "decorations" or /html/head meta data as absentee ownership of documents.

In the meantime, Archives will have to develop methods to recycle and reduce rdfs:Labels, and they will have to be (uncharactaristically) ruthless.  The statistics of RDF rely on a well known "paradox" (http://en.wikipedia.org/wiki/Birthday_problem).  Close matches between name spaces and Ontologies have an extreme bias toward "high probability" identification.  In the end, the probability is just a number, but it intimidates ordinary partial fractions who believe it is the "smartest guy in the room".  That is rather a bad thing.

Cheers,
Gannon 


 
 Phil
 
 

Received on Friday, 3 October 2014 17:16:42 UTC