Re: Newbie LOD Questions :) from Nathan on 2009-10-28 (public-lod@w3.org from October 2009)

From: Nathan <nathan@webr3.org>
Date: Wed, 28 Oct 2009 23:48:13 +0000
To: Kingsley Idehen <kidehen@openlinksw.com>
CC: public-lod@w3.org
Message-ID: <4AE8D83D.9080906@webr3.org>
Kingsley Idehen wrote:
> Nathan wrote:
>> Hi All,
>>
>> Apologies if this is the wrong place to ask questions about linked 
>> data; however not sure where else to turn at the minute! and again as 
>> it's quite a long list.
>>
>>
>> worth noting the following link for most of the following questions:
>> http://sameas.org/text?uri=http%3A%2F%2Fdbpedia.org%2Fresource%2FLondon
>>
>>
>> 1] Let's say I'm writing an article about London, England; which one 
>> of the many URI's do I reference that my data is "about"?
>>
>> 2] Would there be scope for a single globally unique identifier / URI 
>> to represent "London, England"? one which rather than holding 
>> information about London (like http://dbpedia.org/resource/London), 
>> essentially held a set of sameas items which everyone could use when 
>> publishing data "about" "London, England" (like the data at the 
>> sameas.org link above).
>>
>> 3] If sameAs indicates that two URI references contain information 
>> about the same thing; how do we assert that two URI's contain the same 
>> information about the same thing (ie identical data)?
> You don't want to assert that they have the same data. You are asserting 
> co-reference i.e. the URIs are about the same Entity. Thus, you can then 
> perform union style expansion from the co-reference URIs to get a bigger 
> picture of a  given entity e.g., London, from a variety of data sources.
> 
> Examples:
> 
> - About Me (compact) [1]
> - About Me (expanded via explicit co-reference of the kind delivered by 
> owl:sameAs) [2]
> - About Me (expanded via fuzzier co-reference via a rule that asserts, 
> in this context, that foaf:name is an inverse functional property i.e., 
> its values are in-direct identifiers) [3]
> 
>>
>> 3a] as [3], mirrors are common on the net, us1.domain us2.domain etc; 
>> each one containing the same information; as above how would one 
>> indicate that the data is the same? considering that ...
>> - the data is identical, no way to inject in a "sameas" in to the rdf
> RDFizer middleware can use custom (context specific) rules to make their 
> own assertions as part of the RDFization processing pipeline. For 
> instance, SPARQL is an effective rules language (head and body just 
> happen to be on a vertical as opposed to horizontal visual plane), so it 
> is possible for said engines to perform constrained forward-chaining 
> (with the generated triples written to a specific graph that used in 
> specific context).
>> - one 3rd party may reference the uri 
>> http://us1.domain.com/something.rdf whilst another 3rd party 
>> references http://us2.domain.com/something.rdf
>> both are the same data, but no correlation between the two exists 
>> anywhere to say they are the same thing.
> See comment above.
>> - it stands to reason that the ideal is a single endpoint and mirrors 
>> behind the scenes without any http 30* redirects ever being returned 
>> to the client, however this won't always be the case so what syntax 
>> can we use in this scenario?
> Single cannot exist.
> 
> Context is all that exists.
> 
> Within a given context (always inherently subjective) certain assertions 
> can be made about co-reference be it explicit (owl:sameAs) or fuzzy 
> (e.g. IFP based rules).
>>
>> [4] Are there any conventions or guidelines for combining data and 
>> resolving discrepancies? for instance to get all data about london one 
>> would theoretically have to combine all the data from the uri's 
>> referenced at (the sameas.org link aforementioned), but surely if you 
>> combined all data together then you'd get both duplicates and 
>> differences in the data.. which is fact etc.
>>
>> [4a] Likewise with people - I have multiple social profiles all about 
>> "me" but surely in the near future multiple URI's will each represent 
>> #me; I think we can safely say that not all of these will be linked 
>> with sameas, and further still which one should X person use when 
>> referencing information about "me"?
>>
>> [4b] Is there any method to mark which is the preferred source of 
>> information (and verify it)? at the minute it seems like it would be 
>> very simple to publish a vast amount of inaccurate data in triples and 
>> it appears the current mentality would be to take it for granted that 
>> the information IS fact.
>>
>>
>> DC vs ctag and FOAF
>>
>> For RDFa we have ctag and maker; which to me seems very exact:
>> <span rel="ctag:means" resource="http://dbpedia.org/page/Washington"/>
>> <span rel="foaf:maker" resource="http://faviki.com/person/example#me"/>
>>
>> but in dublin core we have the very loose
>> <span property="dc:subject">Washington</span>
>> <span property="dc:creator">Example</span>
>>
>> I'm aware one can couple both dc and ctag/foaf in RDFa; but should we 
>> be replacing dc values wherever possible with the more precise 
>> ctag/foaf? (and indeed in our standard rdf data?)
>>
>> A quick question about the usage of RDFa; previously I had always 
>> envisioned RDFa documents to contain a lot of inline rdf markup; I'm 
>> aware of the problems in picking up a term in the middle of a block of 
>> text and wrapping it in the appropriate notation; however my question 
>> is am I wrong in thinking this is the main use/advantage? in most 
>> cases where I've sen XHTML+RDFa (like uriburner etc) it's been more 
>> case of using RDFa to display human readable RDF; as opposed to human 
>> targeted article with rfda embedded in-place / in-line. Does anybody 
>> have any examples of a full RDFa demo site; not just with the normal 
>> dc/foaf and tags but fully enriched with detected semantic terms 
>> highlighted, linked and wrapped in rdfa, inline..?
>>
>>
>> And finally any info on creating a set/document which comprises of or 
>> includes / references items in other datasets? (I may really show my 
>> newbie-ness here) - what I mean is say I'm making an RDFa page about 
>> London, and in that I mention the population; I don't want to have the 
>> population in document or in the rdf, I do however want to link 
>> through to the triple which holds the population for london in dbpedia 
>> or a geo set and have that in my rdfa. So where I could have:
>> (s–p–o)
>> london-population-7556900
>>
>> I'd rather have:
>> london-population-{some link to dbpedia-owl:populationTotal value in 
>> dbpedia's rdf for london)
>>
>> Thus I'm saying that london's population is {found here} and it'd be 
>> nice if it can also be pulled in and displayed through in an 
>> XHTML+RDFa document by possibly 
>> content="URI#dbpedia-owl:populationTotal" or suchlike.
>> Not sure if I explained that properly, perhaps just simply how do I 
>> reference a single triple rather than a full rdf set; or am I way of 
>> target?
>>
>> Many Thanks in advance for any answers, comments etc & apologies again 
>> if it's the wrong place to ask!
>>
>> Nathan
>>
> I've used My person entity URI instead of "London", for maximum effect 
> i.e., lots of URIs associated with me etc..
> 
> Links:
> 
> 1.   http://tr.im/DoCA -- compact description (&sas=no implies no 
> "owl:sameAs" context rule)
> 2.   http://tr.im/DoD4  - owl:sameAs expansion (&sas=yes implies 
> "owl:sameAs" smushing/meshing/expansion/explosion context on)
> 3.   http://tr.im/DoIi -- show UI that provides holistic view of the 
> data space i.e., you can see via the indirect co-reference the effect of 
> an IFP rule re. foaf:name and foaf:mbox_sha1sum (note: there is a bug I 
> hit while writing this mail and you will most likely hit it if you click 
> on the IFP tab URIs)
> 4.    http://tr.im/DoNv -- above using London from the larger data 
> corpus (8 Billion) at: http://lod.openlinksw.com (just visit the tabs 
> for the different co-reference URIs).
> 
> 

Thanks Kingsley!

something just clicked and half of my questions are now irrelevant; to 
summarise my current understanding..

Let's say I'm doing the simplest report ever, where I want to display 
the sentence "The population of X is Y" in XHTML+RDFa.
  - where X is the current name of "London" and "Y" is the current 
population, even if the name changes in 50 years to "new london" and the 
population drops to 54
  - and I want the data to be always up to date (or as up to date as 
sources allow)
then all I need to do is:

1 - find a resource which holds rdf information about London
2 - SPARQL said resource to pick out only the name and population nodes
3 - (optional) assign a nice endpoint display the results of the query 
as rdf
4 - XSLT transform the results in to an XHTML+RDFa document ["about" URI 
for london], [by FOAF:Person/dc:creator me] where name and population 
are injected in to X & Y respectively.

that makes sense, and I'll assume the following:

to ensure info is always available I'd need to query (not sure what or 
where here) to get all resources which describe city london, then use 
all resources from above query as my source for steps 1-4 above; thus 
should dbpedia die my report will still work!(?)

I guess that also advocates the decentralized nature of LOD and uses of 
sameas, but on the other hand it suggests a need for a single point of 
entry / initial search in to the "cloud"?

and to ensure I've "got it"..

essentially I could write the following sentence in a document 
"'Kingsley Idehen' wrote a post entitled 'name of post'"; where the 
'name of post' is automatically injected at render time directly from 
the rdf title of your post; so if you change the title, my sentence 
remains accurate.

All that's left is:

- my question regarding "combining data and resolving discrepancies" 
(unless I find the answer upon closer analysis of the provided links);

- which are the preferred ontologies to use when trying to be very 
specific about a subject (rather than dc.subject dc.creator etc which 
are essentially free text based not URI identifier based)

Regards & many thanks,

Nathan
Received on Wednesday, 28 October 2009 23:49:17 UTC