- From: Kingsley Idehen <kidehen@openlinksw.com>
- Date: Thu, 18 Aug 2011 09:01:31 -0400
- To: public-lod@w3.org, "semantic-web@w3.org" <semantic-web@w3.org>
- Message-ID: <4E4D0D2B.4060003@openlinksw.com>
On 8/17/11 9:30 PM, Sampo Syreeni wrote: > On 2011-08-17, ProjectParadigm-ICT-Program wrote: > >> Google just bought Motorola Mobility and Microsoft is rumored to buy >> Nokia. The killer apps for the semantic web will be apps for mobile >> devices. > > But once again, is that because you cheer for SemWeb, or because you > have some specific application in mind which would be better served > by, say, RDF, than the existing technology like RDBMS+CSV? If you have > the latter in mind, why aren't you rich already? To whom is that comment aimed at? Irrespective, what does "rich" mean? > > Again, I really do like the idea of a Semantic Web (architecture) and > Linked Data (data). But even after I mentioned some FOAF derivative > being a potential "killer", the only real proposal for an application > turned out to be "structured profiles". Maybe the meaning of "application" is part of the problem. Ditto the preoccupation with "killer" delivered via a prospective moniker. When folks started using email, I don't think they deemed it the killer application of the Internet, prospectively. When TimBL started blogging (eons before Dave Winer) I don't think he thought: this is the killer application of the WWW. In all cases, the focus has to be demonstrating the utility of a new technology by application to known problems. In many cases personal problems trigger the development of new technologies so the architects test (dogfood) these technologies themselves. The problem with the Semantic Web Project re. the above is that it got caught up in the furor of the WWW bootstrap. By this I mean adding "The" to "Semantic Project" was very much like saying: you are all using a WWW variant that is about to become obsolete. This happened at a time when companies behind a plethora of technologies had placed their bets on the WWW. Compounding the problem was the fact that RDF markup also got entangled with XML which most simply didn't understand, so nature human instincts kicked in re. self preservation etc.. Of course, the narratives that followed RDF on the anecdotal fronts (typically devoid or working examples addressing real problems) just compounded matters. > That is, a FOAF derivative. FOAF is just a vocabulary. RDF is just markup for expressing semantics enabling the construction of vocabularies like FOAF. For the most part these are infrastructure oriented parts of the puzzle rather than what you would consider "applications". > As for linked data, it was shown that yes, it is as useful as ever. > But I didn't see a *hint* of a real life application where some other, > existing technology couldn't fare as much or better than the current > W3C sanctioned SemWeb framework. Nothing I would invest in, because it > lowers the costs, gets things done, brings happiness to the masses, or > even hold any heretofore undiscovered functionality or bling over the > competition. Linked Data addresses many real world problems. The trouble is that problems are subjective. If you have experienced a problem it doesn't exist. If you don't understand a problem it doesn't exist. If you don't know a problem exists then again it doesn't exist in you context. For the umpteenth time here are three real world problems addressed effectively by Linked Data courtesy of AWWW (Architecture of the World Wide Web): 1. Verifiable Identifiers -- as delivered via WebID (leveraging Trust Logic and FOAF) 2. Access Control Lists -- an application of WebID and Web Access Control Ontology 3. Heterogeneous Data Access and Integration -- basically taking use beyond the limits of ODBC, JDBC etc.. Let's apply the items above to some contemporary solutions that illuminate the costs of not addressing the above: 1. G+ -- the "real name" debacle is WebID 101 re. pseudonyms, synonyms, and anonymity 2. Facebook -- all the privacy shortcomings boil down to not understanding the power of InterWeb scale verifiable identifiers and access control lists 3. Twitter -- inability to turn Tweets into structured annotations that are basically nano-memes 4. Email, Comment, Pingback SPAM -- a result of not being able to verify identifiers 5. Precision Find -- going beyond the imprecision of Search Engines whereby subject attribute and properties are used to contextually discover relevant things (explicitly or serendipitously). The problem isn't really a shortage of solutions, far from it. > > This might be a tired topic already, but it's going to stay relevant > till we actually have something to show the world; or until the whole > idea just dies a slow death. But, anyone can pop up and simply repeat everything you've just said, in most cases they will, and they will ignore all the examples outlined above, for the very reasons I explained earlier re. problem context. > If I had some real, final answers here, I too would already be rich. > But I'm not. Then my ideas too stay rather (wannabe-) academic. Them > being: > > 1) URI based naming of shared concepts is the biggest part. A shared, > extensible, completely distributed and unambiguous namespace is > something new and *highly* variable. This is pretty much the only > new part we're delivering, so let's concentrate on that. > We are done with that already. We just need better educational material that leverages history re. what's actually going on. Positioning the concept of URIs devoid of history simply leads to confusion. Separating the WWW from distributed object computing was a mistake. Use "resource" instead of "object" was a mistake. Those marketing comms items are critical and costly errors. > 2) RDF/XML is just bad. The folks who came up with that should be shot. > Repeatedly. NTriples is more like it for an early adopter, if even > that. It is bad, and should never have been the marquee of the meme for so long. It is being gradually being downgraded or removed from entry level training material. Especially those from the W3C. > > 2a) Standards only help if there is just one. All of the slower, messier > and "more correct" ones should be dropped wholesale once a simpler > one shows signs of catching on. You should never prescribe standards. That's the problem. The W3C should take de facto standards and turn them into real standards via a formal standardization process. > > 3) Triples are a neat model for semistructured data. What we actually > need though is structured data. There n-ary instead of binary (yes, > RDF is basically binary, and not ternary) works much better. EAV/SPO triples are fine. N-arity is the problem. Remember, the goal is to represent data objects in graph form, at global scale. > > 3a) This is reflected in the current query language, SPARQL. It's a > total mess for any query you'd usually use for Big Data. Really? I happily use it to do things against a 29 Billion+ data set that I am yet to see in any other DBMS realm (RDBMS or basic NoSQL). You see, you are expressing a syntactic gripe rather that addressing the core problem addressed by SPARQL which boils down to this: a declarative query language for graph dbms and lightweight data stores. One that shares a degree of syntactic commonality with SQL as a bridge mechanism between the RDBMS and Graph DBMS and Store realms. The reason why OODBMS and ORDBMS engines failed (pre Web era) boiled down to the fact that they didn't have a declarative query language like SQL. When the ODMG finally created OQL (which actually looks like SPARQL) the OODBMS game was over. SPARQL is an example of learning from the ODMG mistake. A mistake that delivered a lost decade to the DBMS realm that lead to the RDBMS staying are the fore of data management way beyond its sell by date. > For the > latter you'd *always* use some variant of relational algebra, not > the equivalent path query. That's just wrong, since SemWeb + Linked > Data was supposed to deal with formally interpretable data overall, > and not just the easiest kind of human-produced metadata, like > manually input bibliographic references mandated by an academic's > superior. > > 4) We're about semantics, so why do we not preferentially target the > problem areas where semantics are and have been a problem in the > past? One simple problem I've bumped into in my daily database work > is that it's amazingly difficult and time-consuming to import and > export stuff from/to an RDBM, because even the lowest level type > semantics can't be carried by most export formats. Where's the > SemWeb solution to that? That's for certain a problem that is being > experienced every day by at least tens of thousands of people, it > has to do with (granted, low level) semantics, yet there is no > commonly accepted solution. Look, much much simpler than that. Simple example: In the ODBC realm you have desktop productivity tools (spreadsheets, word processors, presentation packages, email etc..). These applications can transparently bind to a plethora of RDBMS engines via ODBC drivers. The developer of the desktop productivity tool works with the ODBC API. The developer of the ODBC driver also works with the ODBC API (interface implementation side). All an application does is bind to a Data Source Name. In Linked Data you have URIs as Data Source Names. Unlike ODBC they are protocol agnostic and separate data access from data representation -- something you get gratis with HTTP scheme URIs. Thus, you no longer need to install an OS specific ODBC driver manager on a machine en route to connecting desktop productivity tools to backend data sources. You just use a SPARQL protocol URL, and the job is done. You don't have to worry about data formats because those are now negotiable etc.. A Linked Data URI delivers the ultimate Data Source Name. The problem is that many that push Linked Data and the Semantic Web vision remain totally unfamiliar with ODBC and its vast use. At best, there is some knowledge of JDBC, but even that doesn't really match ODBC since its is programming language constrained i.e., Java specific. > > You'll probably have many other examples like that. Which is good. > What is bad is that we don't seem to be targeting/solving them right > now. Even now, it seems to be more about the infrastructure than > the final application. Application and Infrastructure depend totally on the eyes of the beholder. > > 5) As another example of how SemWeb could make a difference, it's > pretty high on distributed extensibility. Compared to the > alternatives like plain XML, and in particular most of the lesser > protocols. Can we not find the *concrete* fields where that is at > demand? EAV/CR already pretty much addressed that with polymorphic > medical records, very much in the vain of heterogeneous > triple-relation vein. So why aren't we following and bettering that > approach, actively? Good points. Back to what I said about data access and integration. In this case without the constraints of flat or hierarchical data structures. We have the power of directed graphs instead. Plus the ability to exploit URIs as reference value types. That's basically what you get implicitly from a DBMS that groks the power of Linked Data as vehicle for OODBMS or ORDBMS 3.0 (since 2.0 got lost in XML). > > 6) If we're doing metadata, why can't we do meta-metadata and beyond > more effectively? Why is the reification issue so bogged down? I > mean, there's a huge use case for temporal (even bitemporal) data > out there, provenance, (cryptographically certified, or > PKI/WoT-derived) trust, disjunctive knowledge representation, or > whatnot, out there. > > I sort of think, after the quad vs. triple debates, that much of > this could be dissolved simply by abandoning the triple model, while > staying with a shared, distributed, vocabulary for predicates > (triples)/column headers (the n-ary relational model). > > And so on. I'm pretty sure that we could do better even at the > infrastructure level of SemWeb. It's just that first and foremost we'd > need some real applications which are well targeted, and can then > drive the rest of the work. Both in money, and in user feedback. Not > perhaps "killer apps" per se, but useful apps which uniquely leverage > the semantic web and couldn't exist without it. I think Linked Data is littered with useful solutions. The problem is accepting what one sees or looking at things via appropriate "context lenses" :-) -- Regards, Kingsley Idehen President& CEO OpenLink Software Web: http://www.openlinksw.com Weblog: http://www.openlinksw.com/blog/~kidehen Twitter/Identi.ca: kidehen
Attachments
- application/pkcs7-signature attachment: S/MIME Cryptographic Signature
Received on Thursday, 18 August 2011 13:02:25 UTC