Re: Vote for my Semantic Web presentation at SXSW from Kingsley Idehen on 2011-08-18 (public-lod@w3.org from August 2011)

From: Kingsley Idehen <kidehen@openlinksw.com>
Date: Thu, 18 Aug 2011 09:01:31 -0400
To: public-lod@w3.org, "semantic-web@w3.org" <semantic-web@w3.org>
Message-ID: <4E4D0D2B.4060003@openlinksw.com>
On 8/17/11 9:30 PM, Sampo Syreeni wrote:
> On 2011-08-17, ProjectParadigm-ICT-Program wrote:
>
>> Google just bought Motorola Mobility and Microsoft is rumored to buy 
>> Nokia. The killer apps for the semantic web will be apps for mobile 
>> devices.
>
> But once again, is that because you cheer for SemWeb, or because you 
> have some specific application in mind which would be better served 
> by, say, RDF, than the existing technology like RDBMS+CSV? If you have 
> the latter in mind, why aren't you rich already?

To whom is that comment aimed at? Irrespective, what does "rich" mean?

>
> Again, I really do like the idea of a Semantic Web (architecture) and 
> Linked Data (data). But even after I mentioned some FOAF derivative 
> being a potential "killer", the only real proposal for an application 
> turned out to be "structured profiles". 

Maybe the meaning of "application" is part of the problem. Ditto the 
preoccupation with "killer" delivered via a prospective moniker.

When folks started using email, I don't think they deemed it the killer 
application of the Internet, prospectively.

When TimBL started blogging (eons before Dave Winer) I don't think he 
thought: this is the killer application of the WWW.

In all cases, the focus has to be demonstrating the utility of a new 
technology by application to known problems. In many cases personal 
problems trigger the development of new technologies so the architects 
test (dogfood) these technologies themselves.

The problem with the Semantic Web Project re. the above is that it got 
caught up in the furor of the WWW bootstrap. By this I mean adding "The" 
to "Semantic Project" was very much like saying: you are all using a WWW 
variant that is about to become obsolete. This happened at a time when 
companies behind a plethora of technologies had placed their bets on the 
WWW. Compounding the problem was the fact that RDF markup also got 
entangled with XML which most simply didn't understand, so nature human 
instincts kicked in re. self preservation etc.. Of course, the 
narratives that followed RDF on the anecdotal fronts (typically devoid 
or working examples addressing real problems) just compounded matters.

> That is, a FOAF derivative. 

FOAF is just a vocabulary. RDF is just markup for expressing semantics 
enabling the construction of vocabularies like FOAF. For the most part 
these are infrastructure oriented parts of the puzzle rather than what 
you would consider "applications".

> As for linked data, it was shown that yes, it is as useful as ever. 
> But I didn't see a *hint* of a real life application where some other, 
> existing technology couldn't fare as much or better than the current 
> W3C sanctioned SemWeb framework. Nothing I would invest in, because it 
> lowers the costs, gets things done, brings happiness to the masses, or 
> even hold any heretofore undiscovered functionality or bling over the 
> competition.

Linked Data addresses many real world problems. The trouble is that 
problems are subjective. If you have experienced a problem it doesn't 
exist. If you don't understand a problem it doesn't exist. If you don't 
know a problem exists then again it doesn't exist in you context.

For the umpteenth time here are three real world problems addressed 
effectively by Linked Data courtesy of AWWW (Architecture of the World 
Wide Web):

1. Verifiable Identifiers -- as delivered via WebID (leveraging Trust 
Logic and FOAF)
2. Access Control Lists -- an application of WebID and Web Access 
Control Ontology
3. Heterogeneous Data Access and Integration -- basically taking use 
beyond the limits of ODBC, JDBC etc..

Let's apply the items above to some contemporary solutions that 
illuminate the costs of not addressing the above:

1. G+ -- the "real name" debacle is WebID 101 re. pseudonyms, synonyms, 
and anonymity
2. Facebook -- all the privacy shortcomings boil down to not 
understanding the power of InterWeb scale verifiable identifiers and 
access control lists
3. Twitter -- inability to turn Tweets into structured annotations that 
are basically nano-memes
4. Email, Comment, Pingback SPAM -- a result of not being able to verify 
identifiers
5. Precision Find -- going beyond the imprecision of Search Engines 
whereby subject attribute and properties are used to contextually 
discover relevant things (explicitly or serendipitously).

The problem isn't really a shortage of solutions, far from it.

>
> This might be a tired topic already, but it's going to stay relevant 
> till we actually have something to show the world; or until the whole 
> idea just dies a slow death. 

But, anyone can pop up and simply repeat everything you've just said, in 
most cases they will, and they will ignore all the examples outlined 
above, for the very reasons I explained earlier re. problem context.

> If I had some real, final answers here, I too would already be rich. 
> But I'm not. Then my ideas too stay rather (wannabe-) academic. Them 
> being:
>
> 1)  URI based naming of shared concepts is the biggest part. A shared,
>     extensible, completely distributed and unambiguous namespace is
>     something new and *highly* variable. This is pretty much the only
>     new part we're delivering, so let's concentrate on that.
>

We are done with that already. We just need better educational material 
that leverages history re. what's actually going on. Positioning the 
concept of URIs devoid of history simply leads to confusion.

Separating the WWW from distributed object computing was a mistake. Use 
"resource" instead of "object" was a mistake. Those marketing comms 
items are critical and costly errors.

> 2)  RDF/XML is just bad. The folks who came up with that should be shot.
>     Repeatedly. NTriples is more like it for an early adopter, if even
>     that.

It is bad, and should never have been the marquee of the meme for so 
long. It is being gradually being downgraded or removed from entry level 
training material. Especially those from the W3C.

>
> 2a) Standards only help if there is just one. All of the slower, messier
>     and "more correct" ones should be dropped wholesale once a simpler
>     one shows signs of catching on.

You should never prescribe standards. That's the problem. The W3C should 
take de facto standards and turn them into real standards via a formal 
standardization process.
>
> 3)  Triples are a neat model for semistructured data. What we actually
>     need though is structured data. There n-ary instead of binary (yes,
>     RDF is basically binary, and not ternary) works much better.

EAV/SPO triples are fine. N-arity is the problem. Remember, the goal is 
to represent data objects in graph form, at global scale.
>
> 3a) This is reflected in the current query language, SPARQL. It's a
>     total mess for any query you'd usually use for Big Data. 

Really? I happily use it to do things against a 29 Billion+ data set 
that I am yet to see in any other DBMS realm (RDBMS or basic NoSQL).

You see, you are expressing a syntactic gripe rather that addressing the 
core problem addressed by SPARQL which boils down to this: a declarative 
query language for graph dbms and lightweight data stores. One that 
shares a degree of syntactic commonality with SQL as a bridge mechanism 
between the RDBMS and Graph DBMS and Store realms.

The reason why OODBMS and ORDBMS engines failed (pre Web era) boiled 
down to the fact that they didn't have a declarative query language like 
SQL. When the ODMG finally created OQL (which actually looks like 
SPARQL) the OODBMS game was over.

SPARQL is an example of learning from the ODMG mistake. A mistake that 
delivered a lost decade to the DBMS realm that lead to the RDBMS staying 
are the fore of data management way beyond its sell by date.

> For the
>     latter you'd *always* use some variant of relational algebra, not
>     the equivalent path query. That's just wrong, since SemWeb + Linked
>     Data was supposed to deal with formally interpretable data overall,
>     and not just the easiest kind of human-produced metadata, like
>     manually input bibliographic references mandated by an academic's
>     superior.
>
> 4)  We're about semantics, so why do we not preferentially target the
>     problem areas where semantics are and have been a problem in the
>     past? One simple problem I've bumped into in my daily database work
>     is that it's amazingly difficult and time-consuming to import and
>     export stuff from/to an RDBM, because even the lowest level type
>     semantics can't be carried by most export formats. Where's the
>     SemWeb solution to that? That's for certain a problem that is being
>     experienced every day by at least tens of thousands of people, it
>     has to do with (granted, low level) semantics, yet there is no
>     commonly accepted solution.

Look, much much simpler than that. Simple example:

In the ODBC realm you have desktop productivity tools (spreadsheets, 
word processors, presentation packages, email etc..). These applications 
can transparently bind to a plethora of RDBMS engines via ODBC drivers. 
The developer of the desktop productivity tool works with the ODBC API. 
The developer of the ODBC driver also works with the ODBC API (interface 
implementation side).  All an application does is bind to a Data Source 
Name.

In Linked Data you have URIs as Data Source Names. Unlike ODBC they are 
protocol agnostic and separate data access from data representation -- 
something you get gratis with HTTP scheme URIs. Thus, you no longer need 
to install an OS specific ODBC driver manager on a machine en route to 
connecting desktop productivity tools to backend data sources. You just 
use a SPARQL protocol URL, and the job is done. You don't have to worry 
about data formats because those are now negotiable etc..

A Linked Data URI delivers the ultimate Data Source Name. The problem is 
that many that push Linked Data and the Semantic Web vision remain 
totally unfamiliar with ODBC and its vast use. At best, there is some 
knowledge of JDBC, but even that doesn't really match ODBC since its is 
programming language constrained i.e., Java specific.

>
>     You'll probably have many other examples like that. Which is good.
>     What is bad is that we don't seem to be targeting/solving them right
>     now. Even now, it seems to be more about the infrastructure than
>     the final application.

Application and Infrastructure depend totally on the eyes of the beholder.

>
> 5)  As another example of how SemWeb could make a difference, it's
>     pretty high on distributed extensibility. Compared to the
>     alternatives like plain XML, and in particular most of the lesser
>     protocols. Can we not find the *concrete* fields where that is at
>     demand? EAV/CR already pretty much addressed that with polymorphic
>     medical records, very much in the vain of heterogeneous
>     triple-relation vein. So why aren't we following and bettering that
>     approach, actively?

Good points. Back to what I said about data access and integration. In 
this case without the constraints of flat or hierarchical data 
structures. We have the power of directed graphs instead. Plus the 
ability to exploit URIs as reference value types. That's basically what 
you get implicitly from a DBMS that groks the power of Linked Data as 
vehicle for OODBMS or ORDBMS 3.0 (since 2.0 got lost in XML).

>
> 6)  If we're doing metadata, why can't we do meta-metadata and beyond
>     more effectively? Why is the reification issue so bogged down? I
>     mean, there's a huge use case for temporal (even bitemporal) data
>     out there, provenance, (cryptographically certified, or
>     PKI/WoT-derived) trust, disjunctive knowledge representation, or
>     whatnot, out there.
>
>     I sort of think, after the quad vs. triple debates, that much of
>     this could be dissolved simply by abandoning the triple model, while
>     staying with a shared, distributed, vocabulary for predicates
>     (triples)/column headers (the n-ary relational model).
>
> And so on. I'm pretty sure that we could do better even at the 
> infrastructure level of SemWeb. It's just that first and foremost we'd 
> need some real applications which are well targeted, and can then 
> drive the rest of the work. Both in money, and in user feedback. Not 
> perhaps "killer apps" per se, but useful apps which uniquely leverage 
> the semantic web and couldn't exist without it.

I think Linked Data is littered with useful solutions. The problem is 
accepting what one sees or looking at things via appropriate "context 
lenses" :-)


-- 

Regards,

Kingsley Idehen	
President&  CEO
OpenLink Software
Web: http://www.openlinksw.com
Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca: kidehen
Attachments

application/pkcs7-signature attachment: S/MIME Cryptographic Signature
Received on Thursday, 18 August 2011 13:02:25 UTC