- From: Robert Sanderson <azaroth42@gmail.com>
- Date: Tue, 14 Oct 2014 09:08:54 -0700
- To: Stian Soiland-Reyes <soiland-reyes@cs.manchester.ac.uk>
- Cc: Annotation WG <public-annotation@w3.org>
- Message-ID: <CABevsUFNkYZ8QEmeaT8NeLBZPL9LmsjfrsVMajA5L5nXyN6gFQ@mail.gmail.com>
Hi Stian, Thanks for the detailed response! On Tue, Oct 14, 2014 at 8:38 AM, Stian Soiland-Reyes < soiland-reyes@cs.manchester.ac.uk> wrote: > At first, I like the simplicity of your approach. > > Ideally I would had hoped for Content-in-RDF to be adapted by this WG > so that we could just put it out (~ as-is) and make it official for > anyone else - but I guess that is not allowed by our charter. > I'm not sure that we'd be allowed to co-opt their namespace regardless of whether they're doing anything with it. One feature of the Content-in-RDF model is that it allows both string > and binary representations concurrently, indicating character set for > interpreting the bytes. My concerns with both representations at once are: 1. What is a system is supposed to do when they're different? 2. When would a system ever use the Base64 version when they have the decoded characters already? If the answer to these is that bytes and chars can be different representations, then I think that's a bug rather than a feature. And my concerns with characterEncoding: 3. The serialization should have a character encoding, not a single literal within the graph. 4. The encoding should be UTF-8 regardless. 5. I'm not sure that there would be many systems that actually did anything with the value, if it was supplied. > This is very useful for embedding content from > non-web resources (e.g. files on a USB stick or stdout from a command > line tool), as one cannot always be sure about the "stringiness" of > the value and in particular of the character set of the bytes: > I have some sympathy, but I find it hard to construct a convincing use case in the context of annotation. Your approach is using rdf:value, where only one representation is > possible. A duality of representations might not be needed as much in > annotation systems, and could alternatively be asserted using > prov:alternateOf (and prov:wasDerivedFrom) statements to a secondary > oa:Content - I must admit this makes it clearer the direction of > provenance: > +1 to splitting into two resources as below. > :value1 a oa:Content ; > rdf:value "Hello world" ; > prov:alternateOf :value1Bytes ; > prov:wasDerivedFrom :value1Bytes . > > :value1Bytes a oa:Content ; > rdf:value "SGVsbG8gd29ybGQ="^^xsd:base64Binary ; > prov:atLocation <file:///tmp/annotation.txt> . > > It was mentioned earlier in another forum the challenge of embedding > resources which have alternative representations (e.g. image/svg+xml > and image/png). This might be a better way to handle any > representation dualities - should oa:Content describe such relations? > We have oa:Choice for handling that. It would be, currently, though see https://github.com/w3c/web-annotation/issues/2: { "@type": "oa:Choice" "default" : { "@type": "oa:Content", "format" : "image/svg", "value": "<svg:svg ...>" } "item" : { "@type": "oa:ContentAsBase64", "format" : "image/png", "value" : "91843709tuhasdfglkjdhfg..." } } > dc:format is a fairly weak property. It has been commonly used with > IANA media types as in your example, but it is very poorly defined. > Other valid dc:format strings are "book", "VHS" and "poster". > Agreed. The question to me is whether it's going to conflict with other systems making "VHS" assertions? Alternatively, is there a better property that already exists, or would we be minting our own? It is also unclear if parameters can be included, e.g. > "application/ld+json; profile=http://example.com/p1". > This is more problematic. I would like this to be possible. Given the looseness of dc:format, I think it's okay? If a type is known and identifiable with a URI, but not officially > registered with IANA, it might be odd for a third party to mint a > x-type. An example of such a type is the system biology model > language, which identifier includes version and compliancy level etc, > > http://identifiers.org/combine.specifications/sbml.level-3.version-1.core.release-1 > What would a client system do with this information? And when would such a thing be embedded in an annotation? > I have previously used instead dct:format. You loose some niceness as > it gets a bit more verbose if you want it to be complete: > > <http://example.com/page.html> dcterms:format > <http://purl.org/NET/mediatypes/text/html> . > > <http://purl.org/NET/mediatypes/text/html> a dcterms:FileFormat ; > dcam:memberOf dcterms:IMT ; > rdf:value "text/html" ; > rdfs:isDefinedBy <http://mediatypes.appspot.com/dump.rdf> ; > rdfs:label "HTML document" > In the minimal case where we want to only record media type... { "dct:format": { "value" : "text/html" } } Yes? And then this pattern allows other systems to include URIs as it's a resource. But overall I would much prefer the use of dct:format (or a > sub-property oa:format which we say has range dcterms:FileFormat) to > be able to have a resource that: > > a) I can extend with additional properties > +0 ... unless there's a use case (other than label)? > b) Can be a non-IANA type, e.g. > > http://identifiers.org/combine.specifications/sbml.level-3.version-1.core.release-1 +0 ... not sure what a client would do with this, but I see the attraction. > c) Can have a classical IANA media type main/sub value with rdf:value > (TODO: parameters allowed?) > +1, and +1 to parameters being allowed ... or in a separate property? d) Can have a human-readable rdfs:label - e.g. "Microsoft Word document" > +1 to this property, this is almost convincing by itself. > e) A common URI pattern for any registered IANA type - e.g. > http://www.iana.org/assignments/media-types/application/pdf (If they > agree) or http://purl.org/NET/mediatypes/text/plain > -1 out of scope for us to fix this if IANA don't care? On the other side I think dct:language (with range > dct:LinguisticSystem) would be too verbose, so I would keep > dc:language as long as we also recommend RFC 4646 for identifying the > language. > Yes, though 5646 [1] obsoletes 4646, so with that slight tweak, I agree. However, to play devil's advocate, there is some use of lexvo for linked data languages, similarly to include labels and so forth. This would not be too ugly given a sensible JSON-LD context that hides the complexity... {"dct:language" : "lang:en"} Thanks Stian! Rob [1] http://tools.ietf.org/html/rfc5646 > On 12 Oct 2014 21:43, "Robert Sanderson" <azaroth42@gmail.com> wrote: > > One of the most significant changes that we need to make is what to do > about the use of the seemingly abandoned Content in RDF specification. > > > > The issue: > > * https://github.com/w3c/web-annotation/issues/3 > > * http://www.w3.org/annotation/track/issues/1 > > > > The proposal in the github issue is to create two new classes for > embedded plain text and embedded base64 encoded text, corresponding to > cnt:ContentAsText and cnt:ContentAsBase64 respectively. > > > > These classes would use the properties: > > * rdf:value -- for recording the content (required) > > * dc:format -- for the media type of the content (optional) > > * dc:language -- for the language of the content (optional) > > > > In JSON-LD this might look like: > > > > { > > "@type": "oa:Content", > > "value": "I love this book!", > > "format": "text/plain", > > "language": "en" > > } > > > > Comments? > > > > Thanks! > > > > Rob > > > > > > -- > > Rob Sanderson > > Technology Collaboration Facilitator > > Digital Library Systems and Services > > Stanford, CA 94305 > -- Rob Sanderson Technology Collaboration Facilitator Digital Library Systems and Services Stanford, CA 94305
Received on Tuesday, 14 October 2014 16:09:23 UTC