Re: Examples towards Embedded Body discussion

Rob,

See comments below...

On 27 Oct 2014, at 21:11 , Robert Sanderson <azaroth42@gmail.com> wrote:

> 
> Examples towards a discussion on the topic tomorrow morning at TPAC:
> 
> 1.  Just a string literal:
>     {"hasBody": "This is the comment"}
> 
> 2.  String literal and Language
>     {"hasBody": {"@value" : "This is the comment", 
>                          "@language": "en"}}
> 
> 3.  String literal and format as data type
>     {"hasBody": {"@value": "<span>This is the comment</span>", 
>                          "@type": "rdf:HTML"}}
> 
> 4.  Embedded string, as a resource:
>     {"hasBody": {"rdf:value": "This is the comment"}}
> 
> 5.  Embedded String and format as media type
>     {"hasBody": {"rdf:value": "<span>This is the comment</span>",
>                          "dc:format": "text/html"}}
> 
> 6.  Embedded String and language as property rather than tag
>   {"hasBody": {"rdf:value": "This is the comment",
>                        "dc:language": "en"}}
> 
> 7.  Embedded String, format and language
>    {"hasBody": {"rdf:value": "<span>This is the comment</span>",
>                         "dc:format": "text/html",
>                         "dc:language": "en"}}
> 
> 8a.  Simple URI when string literals are not allowed (4-7)
>     {"hasBody" : "http://example.org/index.html"}
> 
> 8b.  Simple URI when string literals are allowed (1-3)
>     {"hasBody" : {"@id": "http://example.org/index.html"}}
> 
> 
> Notes:
> * 7 cannot be done with @value/@type/@language, as RDF does not allow datatype and language tag on the same literal.  Thus 7 is the only possible model for when all three are required at once.
> * 3 requires a URI for the format, whereas 5 and 7 require a media type registration.  Some content may have neither, such as Markdown.
> * For 1,2 and 3 the body is a literal. For 4, 5, 6 and 7 the body is a resource.  Literals cannot have other properties associated with them, such as creator, created date, or other provenance, and thus these must use the resource pattern.  When the literal is used by itself (1-3), *it has no provenance information* beyond that of the graph, which is likely not correct.
> * If the value of the string in 1-3 was a URI, then it is *not* the resource identified by that URI, it is just a string that happens to look like a URI.  For the URI case, it would have to be as per 8b.
> 
> 
> The consideration is, in my opinion:
> 
> Does the simplicity of 1 outweigh the complexity of having to deal with all of the options, and especially requiring the structure always be present when the body is a resource with its own URI? If String literals are not allowed, then the consistent pattern is that of 4 through 8a.  If they are, then the client must deal with all 1-7 plus 8b.

I am arguing for the necessity to allow for pattern #1. Putting my CSVW WG member's hat on:-): I think the issue is that the annotation may be human edited. Let me also clarify/describe the use case for those who are not familiar with the background. Sorry if it is a bit longish.

The CSVW WG defines metadata for CSV files. Ie, alongside the CSV content proper, CSV data publishers would/could produce a separate file that describes the data, providing information like creation dates, structure of the data (column and, possibly, row names, data types for columns or rows or for individual cells, etc.). The metadata is a JSON file. If you are interested, the latest version is at [1]. What is important to note here is that the file is not (necessarily) machine generated, but written by humans, possibly using a simple text editor.

The metadata includes a term "notes". This may include information like, say, the name of a statistical method used to generate a particular row, the name of the satellite that produced the meteorological information in a column, that sort of things. These are clearly annotations, and we would like to make it OA compatible.

There is an RFC for fragment ID-s in CSV files, so anchoring is well defined (although it is not robust, but let us put that aside for now). The current OA would mean something like:

"notes" : [{
		"hasTarget" : "URIforCSV#row=1234"
		"hasBody"   : { "@value" : "My favourite stats method is used for this" }
	},
	...
	}]

While, of course, there may be annotations that require more complex bodies, and then the structure above is o.k., it would be a really hard call to convince people to use the structure above instead of the more obvious:

"notes" : [{
		"hasTarget" : "URIforCSV#row=1234"
		"hasBody"   : "My favourite stats method is used for this"
	},
	...
	}]

I am almost sure that most of the data publishers will get this wrong and will simply do it the simple way which, let us face it, is the 'usual' JSON way (ie, to have either a string or an object in such a situation). I also believe this is not specific to CSV metadata: the same situation will arise in all situations where the annotation/note is produced by a human and not by some sort of an application. Hence the need, in the CSVW WG's view at least, to make that type of structure o.k. for OA as well.

(The term 'hasBody' in this context is not that intuitive either, b.t.w., we may think in using some alias.)

As for "simplicity of 1 outweigh the complexity of having to deal with all of the options": the question is indeed to ask whether the "simplicity of 1 _for users_ outweigh the complexity _for implementers_ having to deal with all the options". Putting it this way the answer seems to be clear to me: we should definitely allow for option 1...

B.t.w., to turn more technical: from an RDF point of view, it means relaxing the requirements, ie, that 'oa:hasBody' should not be defined as an object property (which is an OWL notion anyway, RDF does not have this). Meaning that its value is simply defined as an RDF Resource (a Literal is also an RDF Resource). The only consequence is that the OA data would not be OWL DL compliant (OWL DL requires a strict separation of object and data properties, and even OWL 2 DL's new punning features do not help in that). The question is whether it is a requirement that OA data should be usable for DL reasoners. Personally, I do not think that should be a requirement, and we can also simply make it clear in the documentation that if somebody uses that type of punning (ie, 'hasBody' with literal value) then the data is not DL compatible. But it should still be 'legal' OA data. 

Another possibility (I am making this up while writing this...) is that the current 

	annotation->body->hasBody->@value
	annotation->target->hasTarget->

'triangle' could be relaxed, conceptually, for a simple case where the body is really a simple literal into something like

	annotation->@value
	annotation->target->hasTarget->

ie, that the separate resource for a body may be missing altogether to be replaced by a direct value. In JSON terms, our example would then become something like:

"notes" : [{
		"hasTarget" : "URIforCSV#row=1234"
		"@value"    : "My favourite stats method is used for this"
	},
	...
	}]

I am sorry if this turned out to be a bit long. Unfortunately, I cannot be at the F2F later today...

Thanks!

Ivan

[1] http://w3c.github.io/csvw/metadata/index.html


> 
> 
> Thanks, and see many of you tomorrow :)
> 
> Rob
> 
> -- 
> Rob Sanderson
> Technology Collaboration Facilitator
> Digital Library Systems and Services
> Stanford, CA 94305


----
Ivan Herman, W3C 
Digital Publishing Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
GPG: 0x343F1A3D
WebID: http://www.ivan-herman.net/foaf#me

Received on Tuesday, 28 October 2014 12:33:19 UTC