- From: Henri Sivonen <hsivonen@iki.fi>
- Date: Sun, 24 Aug 2008 22:17:23 +0300
On Aug 23, 2008, at 18:16, Dan Brickley wrote: > It may not be obvious to those who haven't followed the history, or > who were at school at the time, but many of us did indeed invest a > lot of time and effort using name/value metadata structures in HTML. > For example, the Dublin Core project began with this technology base > beginning back in 1994/5, and the experience of metadata > implementors using it was one of the drivers for the creation of > RDF. At the time there no WHATWG to talk to, but the metadata > community *did* talk to W3C. I don't doubt that there's metadata that doesn't fit into name-value pairs nicely. However, the title of the work, the license for the work as a whole, attribution wish (a natural-language string with potentially multiple names, commas and "and"s) and a single attribution URL all fit into name-value pairs, so for CC licensing, a graph seems like an overkill. Of course, there's the issue of conveying that data for each subwork of a larger work that remixes many works. But can we expect John Q. Public to convey that data so that there's something to be DRY with in a case where the subworks aren't independent files that could carry their own metadata? That is, if the larger work remixes multiple photos in a single Theora video stream or into one large JPEG file, can we really expect tools (or John Q. Public manually) to be able to address into the larger work in such a way that any syntax other than natural language identifies which subwork had which license and attribution requirement? > Does the very loosely defined Dublin Core really qualify as a > "standard" that can be read and processed programmatically? Thanks for the pointers to history. I wasn't aware that the Dublin Core community had itself documented this fundamental problem with Dublin Core so early on. I have ran into this problem myself when in a past project I inherited a metadata spec that my predecessors had modeled after Dublin Core without having experience of developing software. > DC.creator.phone.1 > +44 227 462062 In this particular instance, it seems to me that the main problem isn't that the metadata doesn't fit into key-value pairs but that the metadata that doesn't probably doesn't *really* need to be recorded as metadata. If you are creating a document search engine, does the user ever want to search documents by the authors' phone numbers? If the user searches by other criteria, does the phone number *really* need to be extractable for display in search results? I realize it sounds offensive to suggest that someone doesn't need the metadata they say they need, but when I worked (briefly) on metadata for long-term preservation of digital files in the National Archives of Finland, it became apparent pretty quickly that at least some metadata specs aren't driven by considering what absolutely *must* be there to satisfy realistic use cases but by modeling what *could* be said about the domain and inventing fields for everything *just in case*. > Looking at this example, > > <div id="license" about="#license" typeof="rdf:Property"> > <h4>cc:license</h4> > A <a rel="rdfs:domain" href="#Work">Work</a> <span > property="rdfs:label">has license</span> a <a rel="rdfs:range" > href="#License">License</a>. <br /> > > (a <a rel="rdfs:subPropertyOf" href="http://purl.org/dc/terms/license > ">subproperty of dc:license</a>, <a rel="owl:sameAs" href="http://www.w3.org/1999/xhtml/vocab#license > ">the same as xhtml:license</a>) > </div> > > > Actually we can do a fair bit more than simply have human readable > strings. For example from the CC case, we've got a sub-property > relationship between cc:license and dc:license. [...] > So while it is useful to have human readable strings (including > translations) we also get simple relationships between independently > defined vocabulary terms. And in www-archive: On Aug 23, 2008, at 23:59, Ben Adida wrote: > Henri Sivonen wrote: >> Also, in this case, the prefix cc is actually more persistent than >> the >> URI, since Creative Commons has changed the namespace URI of its RDF >> vocabulary without changing the canonical prefix (from >> http://web.resource.org/cc/ to http://creativecommons.org/ns#). > > Highly misleading statement, since we are also creating equivalences > between the old and new namespace. That's the power of RDF. How common is it that user-facing applications that use RDF metadata dereference namespace URIs, load declarations of equivalence or subclass relationships between properties and successfully map vocabularies created after the creation of the application to the vocabulary understood by the application? Are there known instances of applications that were programmed to process http://web.resource.org/ cc/ metadata in a XML-wise correct way (i.e. not using regular expressions matching on "cc:") and that automatically processed http://creativecommons.org/ns# metadata right by autodiscovering the equivalence? (These are not rhetorical questions. I really don't know and am curious. My intuiting suggests that this wouldn't be a common occurrence.) Where some see "the power of RDF", others see "the RDF tax". There's a tradeoff between making the common case simple and making things powerful for the less common and more complex cases. The simple case is finding out what license a document is under. Compared to looking up a string value by unstructured opaque string key from within the file, it's very different to extract an RDF graph from a file, defererence all namespace URIs using a network connection relying on hosts being reachable, load data describing equivalence and subclass relations--perhaps recursively--and simplify until the application sees a value connected to a property it is programmed to know about. -- Henri Sivonen hsivonen at iki.fi http://hsivonen.iki.fi/
Received on Sunday, 24 August 2008 12:17:23 UTC