W3C home > Mailing lists > Public > semantic-web@w3.org > June 2012

Re: Fwd: Knowledge Graph links to Freebase

From: Kingsley Idehen <kidehen@openlinksw.com>
Date: Tue, 05 Jun 2012 07:03:44 -0400
Message-ID: <4FCDE790.50807@openlinksw.com>
To: lotico-list@googlegroups.com, "public-lod@w3.org" <public-lod@w3.org>, "semantic-web@w3.org" <semantic-web@w3.org>
On 6/4/12 8:12 PM, odoncaoa wrote:
> Semantic Colleagues,
>
> Being intruigued by the weekend 'geek find', I couldn't help but 
> excerise the example. The information provided makes the 'unique 
> identifier' motivation between among Google and Freebase data 
> management systems, pretty obvious.
>
> >  wondering why they add the base64 gzip and redirect overhead though.
>
> Step 1) odoncaoa@waterford$ echo "1936-06-24,Steve Jobs" | gzip -c | 
> base64 -i
>         H4sIAIcxzU8AAzO0NDbTNTDTNTLRCS5JLUtV8MpPKuYCABCyf7YWAAAA
>
> Step 2) someonelse@gatewaylt$ echo 
> "H4sIAIcxzU8AAzO0NDbTNTDTNTLRCS5JLUtV8MpPKuYCABCyf7YWAAAA" | base64 -d 
> | gunzip
>         1936-06-24,Steve Jobs
>
> Unique identifiers are able to be produced with the use of the 
> g[un]zip and base64 commands. With their employment it is possible to 
> translate text string based identifiers, of a particular orthographic 
> digital encoding, into unique representations (step 1). i.e. 
> Application of data compression and encoding methods. The unique 
> identifiers then can then be included in the production of URIs, and 
> employed via sundry autonomouns, data management systems.
>
> The unique identifiers embody a semantic binding between the 
> orthographic encoding, and the unique entity, for which they are 
> representative. Such unique identifiers, can also then be 
> employed/incorporated into disparate data management systems. In this 
> way, it is also possible to change the data management paradigm 
> employed, from one which is relational, to one which is graph based; 
> while employing both, at the same time.
>
> Moreover, with the use of base64, and gunzip, it is possible to 
> excercise an inverse application of data decoding and decompression, 
> upon the identifiers; in order to re-produce the original orthographic 
> digital encoding (text strings) (step 2).  In this manner, the 
> references, and relations can be coordinated, among the various, 
> otherwise unrelated, data management systems.

Yes to all of that. And it can be implemented is a manner that gels 
nicely with existing architecture of the world wide web.

Note, you've just provide an alternative narrative for what Linked Data 
is all about. Basically, you are explaining the virtues of denotation 
(names) and associated descriptor resource (data object) binding packed 
into 'super keys' that function at Web-scale.

Identifiers (e.g. HTTP URIs) endowed with entity denotation and resource 
identification duality are they keys to transcendent open data access 
and integration, across heterogeneous data sources.

It's all coming together :-)

Kingsley
>
>
> Douglas Donahue
> PCL Institute
> odoncaoa@gmail.com <mailto:odoncaoa@gmail.com>
>
>
> On Sun, Jun 3, 2012 at 3:32 PM, Kingsley Idehen 
> <kidehen@openlinksw.com <mailto:kidehen@openlinksw.com>> wrote:
>
>     On 6/3/12 2:20 PM, Marco Neumann wrote:
>
>         Here is an interesting geek find, and further evidence for how
>         Google
>         uses unique identifiers directly from the freebase database.
>
>         I mean we all new that Google bought freebase so it's fair
>         play they
>         use their assets. wondering why they add the base64 gzip and
>         redirect
>         overhead though.
>
>         maybe Google hashes them to index them for another store. Vini is
>         Google working on the Pregel store?
>
>         Marco
>
>
>
>
>         ---------- Forwarded message ----------
>         From: Andreas Thalhammer<andreas.thalhammer@sti2.at
>         <mailto:andreas.thalhammer@sti2.at>>
>         Date: Sun, Jun 3, 2012 at 10:05 AM
>         Subject: Knowledge Graph links to Freebase
>         To: semantic-web@w3.org <mailto:semantic-web@w3.org>
>
>
>         Dear all,
>
>         I want to take the opportunity to present my findings after
>         analysing
>         the patterns of Google's Knowledge Graph.
>
>         Each summary has a unique identifier. This identifier is used when
>         linking to other entities, e.g.
>
>         H4sIAAAAAAAAAONgVuLQz9U3MKs0LgIAXXSnTQwAAAA
>
>         stands for the summary of Steve Jobs.
>
>         The URI to get the summary is
>         https://www.google.com/search?hl=en&sa=X&q=steve+jobs&stick=H4sIAAAAAAAAAONgVuLQz9U3MKs0LgIAXXSnTQwAAAA
>         <https://www.google.com/search?hl=en&sa=X&q=steve+jobs&stick=H4sIAAAAAAAAAONgVuLQz9U3MKs0LgIAXXSnTQwAAAA>
>
>         I found out that this key is created with 2 tools, namely
>         base64 and gzip.
>
>         We can use this key to find out what the original content was
>         (hoping
>         to find a link to Freebase).
>
>         The way to go is the following:
>
>         1. Store the identifier above (H4sl..) in a file, e.g. id.b64
>         2. console:$ base64 -d id.b64>  id.gz
>         3. console:$ gunzip id.gz
>         4. console:$ cat id
>
>         -->  /m/06y3r
>
>         Now, add the freebase namespace to that:
>
>         http://rdf.freebase.com/ns/m/06y3r
>
>         This redirects to:
>
>         http://www.freebase.com/view/en/steve_jobs
>
>         Have a nice Sunday!
>
>         Andreas
>
>         --
>         Andreas Thalhammer
>         PhD Student
>         Semantic Technology Institute
>         University of Innsbruck
>         http://www.sti2.at/
>
>         phone: +43 (0) 512507 6454 <tel:%2B43%20%280%29%20512507%206454>
>         email: andreas.thalhammer@sti2.at
>         <mailto:andreas.thalhammer@sti2.at>
>
>
>
>
>
>     FYI
>
>     It speaks volumes all by itself. Thus, I return to my contemptuous
>     silence re., this matter :-)
>
>     -- 
>
>     Regards,
>
>     Kingsley Idehen
>     Founder&  CEO
>     OpenLink Software
>     Company Web: http://www.openlinksw.com
>     Personal Weblog: http://www.openlinksw.com/blog/~kidehen
>     <http://www.openlinksw.com/blog/%7Ekidehen>
>     Twitter/Identi.ca handle: @kidehen
>     Google+ Profile: https://plus.google.com/112399767740508618350/about
>     LinkedIn Profile: http://www.linkedin.com/in/kidehen
>
>
>
>
>
>
>
> -- 
> You received this message because you are subscribed to the Google 
> Groups "Lotico Semantic Web - Lab" group.
> To post to this group, send email to lotico-list@googlegroups.com.
> To unsubscribe from this group, send email to 
> lotico-list+unsubscribe@googlegroups.com.
> For more options, visit this group at 
> http://groups.google.com/group/lotico-list?hl=en.


-- 

Regards,

Kingsley Idehen	
Founder&  CEO
OpenLink Software
Company Web: http://www.openlinksw.com
Personal Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca handle: @kidehen
Google+ Profile: https://plus.google.com/112399767740508618350/about
LinkedIn Profile: http://www.linkedin.com/in/kidehen








Received on Tuesday, 5 June 2012 11:04:36 GMT

This archive was generated by hypermail 2.3.1 : Tuesday, 26 March 2013 21:45:49 GMT