- From: Stian Soiland-Reyes <soiland-reyes@cs.manchester.ac.uk>
- Date: Tue, 14 Oct 2014 16:38:06 +0100
- To: Robert Sanderson <azaroth42@gmail.com>
- Cc: Annotation WG <public-annotation@w3.org>
At first, I like the simplicity of your approach. Ideally I would had hoped for Content-in-RDF to be adapted by this WG so that we could just put it out (~ as-is) and make it official for anyone else - but I guess that is not allowed by our charter. One feature of the Content-in-RDF model is that it allows both string and binary representations concurrently, indicating character set for interpreting the bytes. This is very useful for embedding content from non-web resources (e.g. files on a USB stick or stdout from a command line tool), as one cannot always be sure about the "stringiness" of the value and in particular of the character set of the bytes: :value1 a cnt:ContentAsText, cnt:ContentAsBase64 ; cnt:bytes "SGVsbG8gd29ybGQ="^^xsd:base64Binary ; cnt:chars "Hello world" ; cnt:characterEncoding "ASCII" . Your approach is using rdf:value, where only one representation is possible. A duality of representations might not be needed as much in annotation systems, and could alternatively be asserted using prov:alternateOf (and prov:wasDerivedFrom) statements to a secondary oa:Content - I must admit this makes it clearer the direction of provenance: :value1 a oa:Content ; rdf:value "Hello world" ; prov:alternateOf :value1Bytes ; prov:wasDerivedFrom :value1Bytes . :value1Bytes a oa:Content ; rdf:value "SGVsbG8gd29ybGQ="^^xsd:base64Binary ; prov:atLocation <file:///tmp/annotation.txt> . (Describing the character set, checksums etc. would require additional vocabularies and perhaps a PROV activity) It was mentioned earlier in another forum the challenge of embedding resources which have alternative representations (e.g. image/svg+xml and image/png). This might be a better way to handle any representation dualities - should oa:Content describe such relations? dc:format is a fairly weak property. It has been commonly used with IANA media types as in your example, but it is very poorly defined. Other valid dc:format strings are "book", "VHS" and "poster". It is also unclear if parameters can be included, e.g. "application/ld+json; profile=http://example.com/p1". If a type is known and identifiable with a URI, but not officially registered with IANA, it might be odd for a third party to mint a x-type. An example of such a type is the system biology model language, which identifier includes version and compliancy level etc, http://identifiers.org/combine.specifications/sbml.level-3.version-1.core.release-1 With dc:format to a literal we have to resort to xsd:anyURI which does not make it Linked Data. (On the other hand dc:format is so poorly defined you could also use it as an object property to a resource!) I have previously used instead dct:format. You loose some niceness as it gets a bit more verbose if you want it to be complete: from https://gist.github.com/stain/4635250 <http://example.com/page.html> dcterms:format <http://purl.org/NET/mediatypes/text/html> . <http://purl.org/NET/mediatypes/text/html> a dcterms:FileFormat ; dcam:memberOf dcterms:IMT ; rdf:value "text/html" ; rdfs:isDefinedBy <http://mediatypes.appspot.com/dump.rdf> ; rdfs:label "HTML document" (note - rdf:value there again) Secondary the URIs for IANA media types are not quite in this century yet - see my suggestion to IANA. http://www.ietf.org/mail-archive/web/media-types/current/msg00617.html But overall I would much prefer the use of dct:format (or a sub-property oa:format which we say has range dcterms:FileFormat) to be able to have a resource that: a) I can extend with additional properties b) Can be a non-IANA type, e.g. http://identifiers.org/combine.specifications/sbml.level-3.version-1.core.release-1 c) Can have a classical IANA media type main/sub value with rdf:value (TODO: parameters allowed?) d) Can have a human-readable rdfs:label - e.g. "Microsoft Word document" e) A common URI pattern for any registered IANA type - e.g. http://www.iana.org/assignments/media-types/application/pdf (If they agree) or http://purl.org/NET/mediatypes/text/plain On the other side I think dct:language (with range dct:LinguisticSystem) would be too verbose, so I would keep dc:language as long as we also recommend RFC 4646 for identifying the language. On 12 Oct 2014 21:43, "Robert Sanderson" <azaroth42@gmail.com> wrote: > > > One of the most significant changes that we need to make is what to do about the use of the seemingly abandoned Content in RDF specification. > > The issue: > * https://github.com/w3c/web-annotation/issues/3 > * http://www.w3.org/annotation/track/issues/1 > > The proposal in the github issue is to create two new classes for embedded plain text and embedded base64 encoded text, corresponding to cnt:ContentAsText and cnt:ContentAsBase64 respectively. > > These classes would use the properties: > * rdf:value -- for recording the content (required) > * dc:format -- for the media type of the content (optional) > * dc:language -- for the language of the content (optional) > > In JSON-LD this might look like: > > { > "@type": "oa:Content", > "value": "I love this book!", > "format": "text/plain", > "language": "en" > } > > Comments? > > Thanks! > > Rob > > > -- > Rob Sanderson > Technology Collaboration Facilitator > Digital Library Systems and Services > Stanford, CA 94305
Received on Tuesday, 14 October 2014 15:38:55 UTC