Re: Formats and icing (Was Re: [ESWC 2015] First Call for Paper) from Colin Maudry on 2014-10-02 (public-lod@w3.org from October 2014)

From: Colin Maudry <colin@maudry.com>
Date: Thu, 02 Oct 2014 16:02:17 +0200
To: John Walker <john.walker@semaku.com>, Norman Gray <norman@astro.gla.ac.uk>, Luca Matteis <lmatteis@gmail.com>
CC: Linked Data community <public-lod@w3.org>
Message-ID: <542D5AE9.9070708@maudry.com>
Hi all,

Thanks John for the references to my project.

It seems that here you need a solution that both pleases those who want
a PDF to comply with existing processes, and those who want a
machine-readable format for better Web-accessibility.

The DITA
<https://www.oasis-open.org/committees/tc_home.php?wg_abbrev=dita>
standard is an OASIS standard, like Open Document. It's an XML framework
dedicated to the creation of documents via the assembling of content
components, the topics. See it as a Docbook evolved. The Wikipedia page
<https://en.wikipedia.org/wiki/Darwin_Information_Typing_Architecture>
is a good introduction.

In the DITA ecosystem, a processing engine has been developed by the
community, the DITA Open Toolkit <http://dita-ot.github.io/>. Through
its plugin system, it enables the publication of DITA content to a
myriad of output formats:

  * PDF
  * Simple HTML
  * HTML WebHelp (fancy example <http://purl.org/dita/ditardf-project>)
  * ePub and Kindle (through the dita4publisher plugin
    <http://dita4publishers.sourceforge.net/>)
  * ...and RDF/XML through the plugin part of the DITA RDF project
    <http://purl.org/dita/ditardf-project>. The plugin extracts the
    metadata of the documentation (author, title, creation date, links,
    variables), not the meaning of the content (output example
    <https://github.com/ColinMaudry/dita-rdf/blob/ditaot-plugin/dita2rdf/demo/out/ditaot-userguide.rdf>).
    It could be extended to extract certain facts from the content.

DITA has a nice feature: its core vocabulary can be extended via
"specialization", so that it can support specific purposes: learning
content, troubleshooting documents, etc.

Those who want a PDF would make a PDF rendition and those who want
machine-readable formats would use a flavour of HTML or give me a hand
with the RDF output.

What do you think?

Colin

On 02/10/2014 11:08, John Walker wrote:
> Hi All,
>  
> I know Latex is the norm in academic circles, but the DITA XML
> standard is widely used in industry and gaining traction in publishing.
>  
> Colin Maudry ( @CMaudry) has a project for extracting RDF metadata
> from DITA content [1].
> Seems to be attracting interest from Marklogic and HarperCollins [2]
> and others [3].
>  
> Cheers,
> John
>  
> [1] http://purl.org/dita/ditardf-project
> [2]  http://files.meetup.com/1645603/meetup-2014-08-12.pptx
> [3] http://de.slideshare.net/TheresaGrotendorst/towards-dynamic-and-smart-content-semantic-technologies-for-adaptive-technical-documentation
>
>
> > On October 2, 2014 at 12:03 AM Norman Gray <norman@astro.gla.ac.uk>
> wrote:
> >
> >
> >
> > Greetings.
> >
> > On 2014 Oct 1, at 22:36, Luca Matteis <lmatteis@gmail.com> wrote:
> >
> > > So forget PDF. Perhaps we can add markup to Latex documents and make
> > > them linked data friendly? That would be cool. A Latex RDF
> > > serialization :)
> >
> > There exists
> <http://www.siegfried-handschuh.net/pub/2007/salt_eswc2007.pdf>:
> >
> > > SALT: Semantically Annotated LATEX Tudor Groza Siegfried Handschuh
> Hak Lae Kim
> > >
> > > Digital Enterprise Research Institute
> > > IDA Business Park, Lower Dangan
> > > Galway, Ireland
> > > {tudor.groza, siegfried.handschuh, haklae.kim}@deri.org
> > >
> > > ABSTRACT
> > >
> > > Machine-understandable data constitutes the basis for the Seman-
> tic Desktop. We provide in this paper means to author and annotate
> Semantic Documents on the Desktop. In our approach, the PDF file
> format is the basis for semantic documents, which store both a
> document and the related metadata in a single file. To achieve this we
> provide a framework, SALT that extends the Latex writ- ing environment
> and supports the creation of metadata for scien- tific publications.
> SALT lets the scientific author create metadata while putting together
> the content of a research paper. We discuss some of the requirements
> one has to meet when developing such an ontology-based writing
> environment and we describe a usage scenario.
> >
> > That describes a very thorough approach to embedding some semantics
> within LaTeX documents.
> >
> > Yes, 'thorough'; very thorough; verging on the intimidating.
> >
> > I dimly recall that there was a rather more lightweight approach
> which was used for proceedings in ISWC or ESWC -- I remember marking
> up a LaTeX document in something less comprehensive than SALT -- but I
> can't remember enough to be able to re-find it.
> >
> > All the best,
> >
> > Norman
> >
> >
> > --
> > Norman Gray : http://nxg.me.uk
> > SUPA School of Physics and Astronomy, University of Glasgow, UK
> >
> >
Received on Friday, 3 October 2014 07:52:56 UTC