Re: Plain textual bodies - summary of arguments and possible solutions from Stian Soiland-Reyes on 2013-02-04 (public-openannotation@w3.org from February 2013)

From: Stian Soiland-Reyes <soiland-reyes@cs.manchester.ac.uk>
Date: Mon, 4 Feb 2013 15:07:47 +0000
To: Bob Morris <morris.bob@gmail.com>
Cc: Antoine Isaac <aisaac@few.vu.nl>, public-openannotation@w3.org
Message-ID: <CAPRnXtnc6b4p3jOT5VJTOW6LswZeX_FFoXCf8Hbef31cX2w5Fg@mail.gmail.com>

On Mon, Feb 4, 2013 at 5:05 AM, Bob Morris <morris.bob@gmail.com> wrote:

> I'm not sure what counts as "released", but the project reported in
> [1] uses cnt  via its sister "Http in RDF"[2].  They appear to use it
> in a fashion rather consistently with a remark in the closing of an
> issue [3] in PROV declining to make it part of PROV itself

Just a quick note, the issue you link to above is not refusing to use
HTTP-in-RDF in PROV itself, but to not use it in PROV-AQ, a note that
describes a simple REST service for finding provenance resources. This
does not have anything with describing provenance of HTTP exchanges,
and so it was found not relevant there.

However the HTTP (and Content)-in-RDF vocabulary could as you said
well be used for describing such provenance.  Description of HTTP
provenance would be a specialized use of the general PROV model, so if
it was to be defined by the WG (we have not had any such requests)
then it would be as a separate vocabulary/extension.

I have personally used Content-in-RDF in our
<http://ns.taverna.org.uk/2012/tavernaprov/> vocabulary to represent
text-content of data values in provenance from a scientific workflow
system, but also allowing the content to be identified by a (relative)
URI/file reference. So here the content is almost like a snapshot of
the file at the time of writing:

:data1 a wfprov:Artifact, prov:Entity  ;
    prov:wasGeneratedAt :workflowStepActivity ;
    tavernaprov:content <out/data1.txt> .

<out/data1.txt> a tavernaprov:Content, cnt:ContentAsText ;
    cnt:chars "The textual content of data1" ;
    cnt:characterEncoding "UTF-8" ;
    tavernaprov:sha1 "6df85f16fbbebfa171a0b223910269817938ce58" ;
    tavernaprov:byteCount 28 .

Here the cnt:characterEncoding is also valuable, because it says which
encoding was used to write cnt:chars to the file (or opposite, for
inputs), which also affects the sha1 checksum.

We added tavernaprov:content indirection from the prov:Entity rather
than putting cnt:chars directly on :data1, because internally in our
workflow system,  data is a set of references, which could be resolved
or transformed to potentially give bytes/strings. Thus we don't
consider activities in our system to be producing strings, they
produce at best references to such strings. I did not find any good
existing term for this linking to a cnt:Content - but perhaps it could
be a subproperty of prov:alternateOf.

Naturally we only want to include cnt:chars when that string is not
massive, but still the tavernaprov:Content instance survives with
checksums and bytecount; indicating that there is cnt:Content, we just
don't know it or represent it in RDF.

-- 
Stian Soiland-Reyes, myGrid team
School of Computer Science
The University of Manchester

Received on Monday, 4 February 2013 15:08:38 UTC