Re: [Moderator Action] Case on Europeana data and named graphs

Hi Dave,

The list is very short indeed, I'm not 100% sure. As I wrote below, "endorse" is probably the closest thing, from a business perspective (output required from Europeana).
But from an infrastructure perspective, "Shared Web Crawler" has some common points with what we do: our input is not RDF, but we do harvest metadata files from many providers. And it could be RDF, one day.

Best,

Antoine


> Hi all,
>
> Antoine, would you mind telling us how much of your case description would be covered by our short list of use cases [1]? What are the gaps? Thank you.
>
> Regards,
> Dave
>
> [1] http://www.w3.org/2011/rdf-wg/wiki/Why_Graphs
>
>
> On Apr 10, 2012, at 05:19, Ivan Herman wrote:
>
>> Antoine has no write permission to our working group mailing list, so forwarding this
>>
>> Ivan
>>
>> Begin forwarded message:
>>
>>> From: Antoine Isaac <aisaac@few.vu.nl <mailto:aisaac@few.vu.nl>>
>>> Subject: [Moderator Action] Case on Europeana data and named graphs
>>> Date: April 9, 2012 15:43:25 GMT+02:00
>>> To: <public-rdf-wg@w3.org <mailto:public-rdf-wg@w3.org>>
>>>
>>> Dear all,
>>>
>>> A while ago, Guus asked me to send an case description to the group, illustrating some requirements from the Europeana project, and the way we could tackle them with graphs.
>>> The technical side of our network is very interested in the topic of expressing data containment. Graphs could be a good way to do so, but until the new RDF group started to work on this the situation was unclear.
>>> I hope we can at least contribute a usage example you could refer to. Even though I have some doubts we can exemplify all the subtleties your Group has been discussing lately!
>>>
>>> Best,
>>>
>>> Antoine
>>> ---
>>> Antoine Isaac
>>> Scientific Coordinator, Europeana
>>> http://www.few.vu.nl/~aisaac/
>>>
>>>
>>> ==== Background: requirements, EDM, ORE proxies
>>>
>>> The Europeana Data Model [1] is Europeana's new framework for harvesting and exposing metadata on objects (books, paintings, films, maps, etc) gathered from hundreds of cultural institutions at europeana.eu <http://europeana.eu>.
>>>
>>> An important requirement for EDM is the ability to distinguish between data provided by different actors for one same "real-world" object (a book, a painting). In traditional information science context, one would speak of situations where several "records" [2] are provided for one object.
>>> This happens quite often in the library domain, for books. Here, the two data providers would like that data consumers can see which "authority" created which part of the data.
>>> Another case is when someone creates "semantic enrichment" of an object, by automatic tools: it can be sometimes wrong. Thus many services would like to be able to distinguish the authoritative data from data providers from the enrichments.
>>>
>>> EDM currently envisions to tackle this using "proxies" from OAI-ORE [3]. Upon designing EDM, we were already using OAI-ORE for other purposes. And it has proxies are aimed at for representing a specific "view" over an object, so we went for it.
>>> However, ORE proxies have all kinds of disadvantages, among which making the data more complex and making semantic inference much more difficult to implement. We are thus very interested in RDF graphs.
>>>
>>>
>>> ==== Current EDM approach, using OAI-ORE proxies
>>>
>>>
>>> Here is an EDM example with a subset of the data Europeana has for a map at the British Library (all the data we have for it can be retrieved linked-data style at data.europeana.eu <http://data.europeana.eu>, our linked data prototype).
>>>
>>>
>>> @prefix edm: <http://www.europeana.eu/schemas/edm/> .
>>> @prefix dc: <http://purl.org/dc/elements/1.1/> .
>>> @prefix dcterms: <http://purl.org/dc/terms/> .
>>> @prefix ore: <http://www.openarchives.org/ore/terms/> .
>>>
>>> <http://data.europeana.eu/item/92037/25F9104787668C4B5148BE8E5AB8DBEF5BE5FE03> a edm:ProvidedCHO .
>>> # the object (a map of London)
>>>
>>> <http://data.europeana.eu/proxy/provider/92037/25F9104787668C4B5148BE8E5AB8DBEF5BE5FE03> a ore:Proxy ;
>>> # the British Library proxy for the map, holding data created by the British Library
>>> ore:proxyFor <http://data.europeana.eu/item/92037/25F9104787668C4B5148BE8E5AB8DBEF5BE5FE03> ;
>>> dc:title "the Cittie of London 31" ;
>>> dc:subject "London (England) -– Maps" ;
>>> dcterms:created "1633" ;
>>> dcterms:spatial "London, City of London" .
>>>
>>> <http://data.europeana.eu/proxy/europeana/92037/25F9104787668C4B5148BE8E5AB8DBEF5BE5FE03> a ore:Proxy ;
>>> # the Europeana proxy for the map, holding data created by automatic enrichment by Europeana
>>> ore:proxyFor <http://data.europeana.eu/item/92037/25F9104787668C4B5148BE8E5AB8DBEF5BE5FE03> ;
>>> edm:hasMet <http://www.eionet.europa.eu/gemet/concept/5011> ;
>>> edm:hasMet <http://sws.geonames.org/2643741/> .
>>> # edm:hasMet is a property we use for enrichment, because we lose track of which is the original property (dc:subject, dcterms:spatial) from which we extracted the semantic links :-/
>>>
>>>
>>> ==== Test with Graphs
>>>
>>> I'm using in the following the TriG syntax from the draft dated today at [4].
>>>
>>> @prefix edm: <http://www.europeana.eu/schemas/edm/> .
>>> @prefix dc: <http://purl.org/dc/elements/1.1/> .
>>> @prefix dcterms: <http://purl.org/dc/terms/> .
>>> @prefix ore: <http://www.openarchives.org/ore/terms/> .
>>>
>>> <http://data.europeana.eu/item/92037/25F9104787668C4B5148BE8E5AB8DBEF5BE5FE03> a edm:ProvidedCHO .
>>>
>>> :G1 {
>>> <http://data.europeana.eu/item/92037/25F9104787668C4B5148BE8E5AB8DBEF5BE5FE03> dc:title "the Cittie of London 31" ;
>>> dc:subject "London (England) -– Maps" ;
>>> dcterms:created "1633" ;
>>> dcterms:spatial "London, City of London" . }
>>>
>>> :G2 {
>>> <http://data.europeana.eu/item/92037/25F9104787668C4B5148BE8E5AB8DBEF5BE5FE03> edm:hasMet <http://www.eionet.europa.eu/gemet/concept/5011> ;
>>> edm:hasMet <http://sws.geonames.org/2643741/> . }
>>>
>>> Of course one could start then expressing some meta-level data on who created the graphs, like
>>>
>>> :G1 dc:creator "British Library" .
>>> :G2 dc:creator "Europeana" .
>>>
>>> But I won't go further, I suppose this is not the core interest of this group.
>>> By the way I noted in the draft at [4] the following issue:
>>>> Examples should not refer to TriX vocabularies.
>>>
>>> While I understand the case for properties like assertedBy or qutoedBy in absolute, I can only support the issue: your spec should not go into the recommendation of specific properties for provenance/quotation.
>>>
>>>
>>> ==== Discussion / request for feedback?
>>>
>>>
>>> Named graphs have the positive aspect of allowing RDF data consumers to ignore them, if they wish so. With the second solution, a data consumer can access all the Europeana data for the map, without bothering about integrating the data scattered on different proxies. I trust that a query for the pattern:
>>>
>>> ?X dcterms:created "1633" ; edm:hasMet <http://sws.geonames.org/2643741/> .
>>>
>>> would return me <http://data.europeana.eu/item/92037/25F9104787668C4B5148BE8E5AB8DBEF5BE5FE03>, no?e
>>>
>>> One problem I have however is the complexity of the potential complexity of graphs. When Guus asked me to do this exercise, he recommended me the "solution-design" wiki page [5].
>>>
>>> But I'm not sure I understand all the distinctions there, and what my case would require. There could be some terminological issues--I find the terminology in such an expression like "the state of Web resources depends on time" hard to understand (is it about "asserted statements in data about a web resource that change over time"?). But I guess I have not been following closely enough the discussion on graphs in the past couple of years.
>>>
>>> Anyway, as I get it [5] is partly about the need for having a graph-as-changing-data-source, where each snapshot of such "diachronic graph" could contain different statements over time.
>>> I understand this need, but my case is more about the mere ability to reify a bunch of statements using graphs, in a more "elegant" (and in fact, efficient) way than the current RDF statement reification.
>>>
>>> Without a doubt, my :G1, for example, contains data that is contributed by the British Library at a certain point in time. But as far as I can see our requirements now, I'd be happy not to make the distinction between "snapshots" and "diachronic graphs".
>>>
>>> Our case is perhaps similar to "endorsement" in the mail from Sandro at [6]. But then I don't find it easy to represent it as a "TriG/state" like in [5], with the rdf:StaticGraphContainer type. The "TriG/equals" solution from [6] could be easier, syntax-wise. Or am I wrong? Of course I've not thought the issue thoroughly enough to get all the consequences of the owl:sameAs statement at [6] and other subtleties.
>>>
>>> Or maybe I'm too biased with my own case. I am in fact puzzled that RDF should embed some very fine-grained considerations on what data sources are on the web, which would influence the way to treat the more basic needs. It seems to me that this could be handled at a higher-level than the very RDF syntax--which seems confirmed by all these new classes and properties in the rdf: namespace that I am now discovering in [6].
>>> Maybe the more recent "6.1" proposal [7] would fit well this position. Even though I'm a bit puzzled by having the equality relation resulting from a "rdf:type rdf:Graph" statement (with the open world assumption this type statement could be asserted after the first "<u1> { <a> <b> <c> }" statement has been asserted)
>>>
>>> But anyway, I'm not in the Group, so maybe it's time to shut up ;-)
>>>
>>>
>>> [1] http://pro.europeana.eu/edm-documentation
>>> [2] http://en.wikipedia.org/wiki/Bibliographic_record
>>> [3] http://www.openarchives.org/ore/datamodel
>>> [4] http://dvcs.w3.org/hg/rdf/raw-file/default/trig/index.html
>>> [5] http://www.w3.org/2011/rdf-wg/wiki/TF-Graphs-Designs, current version http://www.w3.org/2011/rdf-wg/wiki/index.php?title=TF-Graphs&oldid=1757 <http://www.w3.org/2011/rdf-wg/wiki/index.php?title=TF-Graphs&oldid=1757>
>>> [6] http://lists.w3.org/Archives/Public/public-rdf-wg/2012Jan/0021.html
>>> [7] http://www.w3.org/2011/rdf-wg/wiki/Graphs_Design_6.1, current version http://www.w3.org/2011/rdf-wg/wiki/index.php?title=Graphs_Design_6.1&oldid=1966 <http://www.w3.org/2011/rdf-wg/wiki/index.php?title=Graphs_Design_6.1&oldid=1966>
>>>
>>
>>
>> ----
>> Ivan Herman, W3C Semantic Web Activity Lead
>> Home: http://www.w3.org/People/Ivan/
>> mobile: +31-641044153
>> FOAF: http://www.ivan-herman.net/foaf.rdf
>>
>>
>>
>>
>>
>

Received on Tuesday, 10 April 2012 20:41:38 UTC