W3C home > Mailing lists > Public > public-rdf-wg@w3.org > February 2012

Re: [Dbpedia-discussion] intended semantics of fourth column values in DBpedia N-quads (ACTION-144)

From: Ted Thibodeau Jr <tthibodeau@openlinksw.com>
Date: Wed, 22 Feb 2012 14:55:18 -0500
Cc: dbpedia-discussion <dbpedia-discussion@lists.sourceforge.net>, RDF WG <public-rdf-wg@w3.org>
Message-Id: <7B0FA4CF-318B-46D8-9B1E-B7B809F1CCE3@openlinksw.com>
To: Jimmy O'Regan <joregan@gmail.com>

On Feb 8, 2012, at 03:28 PM, Jimmy O'Regan wrote:
> On 8 February 2012 19:23, Ted Thibodeau Jr <tthibodeau@openlinksw.com> wrote:
>>   <http://dbpedia.org/resource/Academy_Award_for_Best_Art_Direction>
>>   <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>>   <http://www.w3.org/2002/07/owl#Thing>
>>   <http://en.wikipedia.org/wiki/Academy_Award_for_Best_Art_Direction#absolute-line=1>
>>   .
>> 
>> 
>> This appears to suggest that the { ?s ?p ?o } triplet was extracted
>> from the resource at the URI in the ?c position -- but the fragment
>> identifier breaks that suggestion, as the above triple simply
>> doesn't come from line 1 of either the Wikipedia markup source --
>> 
>>   {{Infobox award
> 
> That's the line it was generated from. The mapping for this template
> (http://mappings.dbpedia.org/index.php/Mapping:Infobox_award) has a
> 'map to class' for
> http://mappings.dbpedia.org/index.php/OntologyClass:Award, which in
> turn has rdfs:subClassOf owl:Thing



Ah!  So "absolute-line=1" is meant to refer to the line *of the 
DBpedia mapping* which caused the triple to be generated?  

It may surprise you to hear that none of us in the RDF-WG were able 
to figure that out from context.  

But... Line 1 of that mapping is just --

   {{TemplateMapping

Line *2* holds --

   | mapToClass = Award

-- so it seems that may need at least a little adjustment.


It further seems to me that there are at least three factors which 
provide  context for any triple produced by the DBpedia extractors, 
all of which should somehow be made available through the fourth 
position of the N-quads dump --

1. URI of source document
2. URI of mapping rules
3. timestamp that mapping rules were applied to the source 
   document, which resulted in generation of the triple (or 
   rather, its enclosing graph)

(The timestamp in #3 might be sufficient to nail down the revisions
of #1 and #2, or it might not... If not, then at least two more 
factors must be made available through the fourth position.)

Melding all these factors into a single string or URI would be ugly 
at best, so perhaps there should be an "extraction ontology" which 
is used to describe the RDF Graphs produced by the extractors. 

I would suggest that the fourth column of the N-quads dumps should 
hold a DBpedia URI, perhaps something like --

   <http://dbpedia.org/graph/Academy_Award_for_Best_Art_Direction/
    20120222Z125218.123456#this>

This URI identifies the RDF Graph (a/k/a "G-snap") produced by the
mapping against the source document, by a combination of the source 
document's wikiword and the timestamp of the graph's production.  The 
RDF Graph can then itself be described with sourceURI, mappingURI, 
timeStamp, etc. -- whatever other metadata may make sense.

Users could then 

- get one or more complete RDF Graphs, as produced on chosen date(s),
  associated with a given wikiword -- whether current or historic;

- compare these RDF Graphs over time

- compare the results of different mappings against the same source
  document (wikiword), as each extraction should produce a differently 
  timestamped RDF Graph


What do you think?

Ted



--
A: Yes.                      http://www.guckes.net/faq/attribution.html
| Q: Are you sure?
| | A: Because it reverses the logical flow of conversation.
| | | Q: Why is top posting frowned upon?

Ted Thibodeau, Jr.           //               voice +1-781-273-0900 x32
Evangelism & Support         //        mailto:tthibodeau@openlinksw.com
                             //              http://twitter.com/TallTed
OpenLink Software, Inc.      //              http://www.openlinksw.com/
         10 Burlington Mall Road, Suite 265, Burlington MA 01803
     Weblog   -- http://www.openlinksw.com/blogs/
     LinkedIn -- http://www.linkedin.com/company/openlink-software/
     Twitter  -- http://twitter.com/OpenLink
     Google+  -- http://plus.google.com/100570109519069333827/
     Facebook -- http://www.facebook.com/OpenLinkSoftware
Universal Data Access, Integration, and Management Technology Providers







Received on Wednesday, 22 February 2012 19:55:44 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 22:02:03 UTC