- From: Phillip Lord <phillip.lord@newcastle.ac.uk>
- Date: Thu, 25 Apr 2013 14:25:29 +0100
- To: Hugh Glaser <hg@ecs.soton.ac.uk>
- Cc: Sarven Capadisli <info@csarven.ca>, "\<public-lod\@w3.org\>" <public-lod@w3.org>
You might be interested in this: http://bio-ontologies.knowledgeblog.org/table-of-contents These are papers from a workshop that I used to organise. The content as you can see is in HTML and has included images and so forth. What is perhaps less obvious is that the source data in most cases is a word doc. All the content including the images was posted by word. We did have to do a little reformatting (the conference template is really the most unhelpful that it could be -- my fault, I wrote it). It takes around 5 - 10 minutes a paper on average (there is quite a wide variance). And more, the content has some semantic markup. The journal, publication date, authors, and title are all clearly described in the HTML; you can retrieve this metadata as RDF also, if you like. This metadata was not added independently; it was present in the underlying Word doc. We added this by simply adding a little markup using shortcodes ([author]Phillip Lord[/author]). Of course this is entirely horrible, to the point that a reviewer of my last grant called it a "drunk under a lamppost idea". But it does work without requiring any modification of word. And it works for wikipedia. Besides, nothing wrong with being drunk under a lamppost occasionally. It's also possible to combine the web and PDF. So, for instance, this link: http://www.russet.org.uk/blog/2366 is my OWLED paper. In this case, the title, author, date come from arXiv, and the abstract is transcluded from there. In short, it's an overlay journal (article). The English summary and reviews are independent, and subsidiary content. In this case, the knowledge comes from arXiv where it has been added independently. I took this route because, sad though it is to say, getting a word doc on the web is much easier than getting a LaTeX document up. We even have this working for CEUR-WS, although in this case, we loose the abstracts; I would describe how we achieved this, but really, you don't want to know. The HTML is messy, of course, and dependent on the underlying tool. The use of short codes is unprincipled and hideous. But it does work. And we can add as much semantics as authors can be bothered with. Given that the latter will be the limiting factor, I don't think it's a bad way forward. Phil Hugh Glaser <hg@ecs.soton.ac.uk> writes: > I hate PDF with a passion, by the way, but in the socio thingy of > being an editor of a proceedings, it can be an enormous pain when > people submit HTML that has local links to images, etc., even from MS > Word documents. > > Cheers > > On 24 Apr 2013, at 18:23, Sarven Capadisli <info@csarven.ca> > wrote: > >> On 04/24/2013 05:37 PM, Andrea Splendiani wrote: >>> There two main issues in moving beyond pdf. >>> >>> One, probably minor, is that there are larger constraints. Some >>> people need their work to be somewhere "understood" by their >>> organization. This is a bit less relevant for conferences than for >>> journals, but still an issue. >>> >>> The other is that some bit of a research paper can lend to >>> formalization. But there is a lot of variability. In some case you >>> are closer to what web languages can represent. E.g.: a finding in >>> RDF, some algorithm shown in JavaScript,... But what is somebody is >>> publishing a description of an information systems ? It may get so >>> far from a standard way to talk about think that you won't gain much >>> with a structured representation. >>> >>> pdf + other technologies, when it applies, could be a good idea, >>> though. >> >> I can't quite make out the core of the issues that you are trying to describe. So, from I understand: >> >> We could maybe at least give this HTML thing a try. And, later worry about semantic alignments? >> >> IMHO, there is no compelling reason to research and try PDF + other >> technologies, when we have HTML+RDF + other technologies already in place >> and staring right at us. >> >> -Sarven >> > > > > -- Phillip Lord, Phone: +44 (0) 191 222 7827 Lecturer in Bioinformatics, Email: phillip.lord@newcastle.ac.uk School of Computing Science, http://homepages.cs.ncl.ac.uk/phillip.lord Room 914 Claremont Tower, skype: russet_apples Newcastle University, twitter: phillord NE1 7RU
Received on Thursday, 25 April 2013 13:25:54 UTC