RE: Test of Independent Invention: RDF from Hans Teijgeler on 2015-05-12 (semantic-web@w3.org from May 2015)

From: Hans Teijgeler <hans.teijgeler@quicknet.nl>
Date: Tue, 12 May 2015 10:32:59 +0200
To: "'Harry Halpin'" <hhalpin@ibiblio.org>, "'Tim Berners-Lee'" <timbl@w3.org>
Cc: "'Melvin Carvalho'" <melvincarvalho@gmail.com>, "'Bob DuCharme'" <bob@snee.com>, "'SW-forum Web'" <semantic-web@w3.org>
Message-ID: <7A2492382A9641ABAC734B11AE182D7E@HansPC>
Harry,

>From an information integration point of view the problem with application
software is that every system has its own logic, naming conventions, and
shortcuts (lots of implicit information).
RDF and one or more vocabularies can provide a match with one such
application, and where that application is an "island of automation", so is
that RDF implementation.
That is fine if that serves a purpose, but not if your data span the
information generated by a large number of applications, such as is the case
in the process industries (oil, chemical, food, etc).

I am an ISO 15926 evangelist (http://15926.org) and from that point of view
I think that another barrier is the lack of rigourous and generic
data-driven modeling.  
IMHO schema.org simply is insufficient, and as you said: "....this is an
area where we need real work."
We have done such work for the last two decades, starting with a foundation
model as shown at http://www.15926.org/topics/data-model/index.htm

ISO 15926 provides the modeling for a lingua franca in combination with a
Reference Data Library that covers the definitions of the concepts that
apply in a given domain.
It can be used for interoperability purposes and for archiving lifecycle
information of a facility (plant, ship, airplane, etc), a kind of black box.
Contributing applications must have an adapter for mapping produced data
from internal to 15926 format (in Turtle) and mapping SPARQL query results
to internal format.

RDF + SPARQL provide us a handy format to store our data in triples and to
fetch information. 
It's a good match, although we still miss rich possibilities for
inheritance, validation and rules.

Cheers,
Hans

==========================================

-----Original Message-----
From: Harry Halpin [mailto:hhalpin@ibiblio.org] 
Sent: maandag 11 mei 2015 21:48
To: Tim Berners-Lee
Cc: Melvin Carvalho; Bob DuCharme; SW-forum Web
Subject: Re: Test of Independent Invention: RDF

Quick top-posting, I do realize that the last post was overly negative, but
as someone who has also been evangelizing the Semantic Web and feels its
actually rather close to taking off into mainstream development, I wanted to
take the question of whether or not there has been a million dollar start-up
quite seriously. I do think it's possible, so let's turn my criticism into a
positive programme for action. In brief:

1) The largest deployment barrier is probably IMHO the lack of a fast,
scalable, open-source triple-store that can replace Postgre or the noSQL
solutions like MongoDB people know and use. After fixing RDF/XML with turtle
and JSON, that's probably the largest barrier.

2) If we actually are going to see massive use of 'follow your nose', we'll
need real caching infrastructure. Luckily, Google is on top of schema.org,
but this is an area where we need real work.

3) Let's not pretend there's been million dollar Semantic Web start-ups
where there haven't been like in the case of Garlik. That's not necessarily
bad news. There's still room for a million dollar start-up, and health-care
or decentralized RDF is networking are probably good bets IMHO. Also,
infrastructure companies could be useful - if the established players like
Google and Oracle don't move there first.

4) URIs aren't decentralized in terms of ownership - and are rented from
ICANN. So the Semantic Web community needs to start paying attention to the
Internet Governance debates and think of technical ways of decentralizing
URIs and preserving their content (Memento).


On Tue, May 5, 2015 at 5:20 AM, Tim Berners-Lee <timbl@w3.org> wrote:
>
> On 2015-05 -03, at 03:38, Harry Halpin <hhalpin@ibiblio.org> wrote:
>
>> On Wed, Apr 29, 2015 at 3:53 AM, Melvin Carvalho 
>> <melvincarvalho@gmail.com> wrote:
>>>
>>>
>>> On 29 April 2015 at 03:11, Harry Halpin <hhalpin@ibiblio.org> wrote:
>>>>
>>>> Not convinced. From my conversations with engineers there like 
>>>> Mischa Tuffield, I believe the answer is "yes" it could have been 
>>>> done without the Semantic Web and
>
> I'd heard meanwhile that Garlik (pre-Experian) did benefit very much from
very schema-free and being able to throw more data into the triple store at
a moments notice without SQL-like schema design. Maybe we should check with
Mischa.
>
>>>> *the part of the company Experian
>>>> bought*,
>
> Did they not buy the whole company?  Pointer to that fact?

I got this from discussing with Mischa Tuffield ages ago. It would make
sense an identity company buys a honeypot/identity fraud company.
If Experian has moved any of their infrastructure to RDF, that would also be
great but I haven't seen evidence of that.

>
>>>> i.e. the honeypot for identity fraud,  the main part of the 
>>>> business was done out without RDF. Thus, Experian is not 
>>>> maintaining the RDF infrastructure (at least 4store).
>>>>
>>>> So, I still haven't seen RDF used in any start-ups that have 
>>>> succeeded yet. I suspect there is probably some ones that *will* 
>>>> succeed in the healthcare space. However, in general there are 
>>>> major flaws in the entire Semantic Web concept ("follow your nose" 
>>>> URIs lead to accidental denial of service attacks,
>
> You quote below a problem  -- a major bug -- with a Microsoft XML system,
not an RDF system at all.
> Or are you agains using URIs for anything.

I think the Semantic Web community should get involved in fixing the URI
issue by making it more decentralized, which is related to the domain name
system issue. There's no reason we can't move URIs off of such a tight
dependency on DNS, although Namecoin and other alternatives seem to extreme
and not workable due to domain name squatting.

 The Web is the collective memory of humanity, and the fact that our data
can be disappear and is essentially 'rented' from ICANN is probably not an
infrastructure for the future. In fact, the solution to the DNS issue may
not be purely technical, but may require new kinds of URIs to be purchased
from ICANN, such as a .sem that has a better governance structure and
permanence guarantees. I know the TAG was looking into this, but never saw a
solution.

I'll drag up a proposal from Slim Amamou (a Tunisian) in terms of p2p
caching of content and DNS that could possibly be enabled by WebRTC.

>
>
>
>>>> basic CS tells us graphs will
>>>> always be slower than hash tables, etc.)
>
> The first reason why that is nonsense is that graphs are hash table
inside.

Although there is still a cost of traversing the graph. That could probably
be optimized, but I don't see how that's going to be fast as a
single-lookup. That being said, graphs do map many kinds of data better than
associative arrays. Nonetheless, I still think we need an open-source
triplestore that can compete with relationalDBs and JSON-optimized noSQL
stores.

>
>>>> that will likely prevent it
>>>> from ever occupying the place XML or JSON has IMHO. That being 
>>>> said, it will likely to continue to be useful in niche markets 
>>>> involving data merger with dynamic schemas
>>>
>>>
>>> Couldnt every statement you made above about the web of data, be 
>>> applied to the web of documents, and be contrary to experience?
>>
>> Melvin - which is why Google exists.
>
> Google.  Yes, that's the company which brought its search engine into the
modern age using a huge internal Knowledge Graph, which now drives >much or
its operations.  Yes, it does not share it in general -- but then, would
you?

Yes, although that's close to DBPedia. If the Semantic Web is a common good,
maybe governments and others should be paying for a RWW-version of DBPedia
that can handle massive querying and 'follow your nose'
look-ups.

>
> That's the company which by reading semantic web data in microformats and
RDF/a in its crawl has prompted approaching 30% of web pages on the entire
WWW to contain RDF-equivalent data?
>
>> The reason why Semantic Web stuff
>> doesn't scale in most real-world apps would be that you would 
>> basically need a Google-style infrastructure.
>
> You glibly roll that off without any consideration of what sort of an app,
what sort of a problem, and what sort of a scale, none of which are simple
questions with one-line answers.

I was thinking of large-scale apps that a million-dollar start-up would want
to make, which would need to scale to millions of users.
Think Reddit or Twitter but for the Semantic Web.

>
>> Yet search over Linked
>> Data seems to have stopped working (Sindice) and I haven't heard of 
>> real-world caching. But for a non-SemWeb example of "follow-your-nose"
>> failing hard, when W3C made XML processors think the XML DTD had to 
>> be retrieved from w3.org,
>
> - W3C (XML Schema)  did NOT specify for XML processors that  the DTD
should be retrieved.
> - That event  was only one buggy implementation which did.
> - You are talking a non-semantc web implementation anyway.
>
>> the server basically couldn't handle this well-meaning DOS attack :)
>
>
>> However, I do think the DOS attack
>> problem/caching/searching are very solvable.
>>
>
> So you are a fan of follow-your-nose or not?

Yes, assuming we can solve the caching/denial of service issue in a way
ordinary web developers can use. I believe this is solvable.

>
>> Another  reason why Semantic Web stuff doesn't actually scale is 
>> basic computer science and so isn't likely solvable
>
> I'd like to see an actual elaborated argument there rather than political
rhetoric.

My argument is graphs are naturally slower than hash-tables, and that this
is currently the largest roadblock for Semantic Web adoption. I know this as
someone who when deploying Semantic Web infrastructure on open-source
backends basically got a huge performance hit :( Thus, we need to probably
get more database academics involved.

>
>> -  and the reason we are
>> seeing JSON take off (rather than RDF) as the lingua franca of the
>> Web: array-values pairs map well to hash tables and what programming 
>> languages actually do. I would be shocked if graph DBs (see 
>> travelling salesman problem) ever got nearly as fast as hash tables 
>> (O(1) vs NP complete), so thus in general I think as a core 
>> technology the main problem with moving to RDF is a huge performance
loss.
>
> You are confusing two things.  It is true that JSON is appealing because
it is a native data structure to the current functional language of the day.
> > That is natural.  It also is superior to XML for hierarchical data in
that it has numbers as well as strings.  It is inferior to (say) turtle in
that it has the > JS problem of only having one number type, whereas in
turtle 2, 2.0 and 2.0e0 are distinct typed numbers which are generally
expected in the data > > world, python, etc etc.

Agreed Turtle is better in terms of many things. Actually I think had we
pushed Turtle as the default syntax into the original RDF stack pre-JSON, we
would probably have lots more developers and maybe even JSON would have
never happened.

>
> Then you are claiming RDF is inferior because graph problems are in
general harder to solve than tree problems.  This is extremely disingenuous.
>The traveling salesman problem on an arbitrary graph is hard to solve no
matter what data format and model you use.  It is just going to be easier to
>code using a language which handles graphs.   A tree-like query, on the
other hand, will be fast ether you write the tree in JSON or Turtle.
>
> A triple store is just hashes inside.
> Yes, you pay a bit of penalty for having the extra possibilities but only
a bit.
>

Then the question - which I'd love to see some illumination on - is why are
triplestores so slow? How can we make them faster?

>> Also, it would be useful if Semantic Web people really thought 
>> through decentralization. URIs are not decentralized, they are rented 
>> from ICANN, which runs a number of quite centralized name-servers. 
>> Yes, once you buy one you can mint infinite URIs, but that's pretty 
>> far from decentralized - and TimBL has said as much: "We could 
>> decentralize everything but this"
>
> Please don't  quote me out of context.
>
> Note I am really involved in a lot of work on re-decentralizing the web
which does use URIs with domain names in. How so?    Because the value of
>that is massively more than then damage currently inflicted by the DNS.
You mint a new URI with every loop in a program. You only actually need to
>create a new domain name rarely, such as once in a project.  Yes, the
philosophical basis of the naming, or the commercial arrangements may not
>be perfect but that argument is a million miles beneath that of the
benefits of RDF.

I actually think it's philosophically sound use of URIs, and it's one of the
strokes of genius behind RDF. However, in order for RDF to transform into a
long-term infrastructure for the Web, we need to pay serious attention to
DNS and decentralization, and not just assume it will always work. The
fragmentation of the Internet, caused by a lack of trust in ICANN,  for
example could mean that a URI would retrieve different data in say, China
and Germany. That would be a net loss for the Web.

>
> You go away and replace DNS with something sounder politically and
commercially and RDF will use it straight away of course.   But keep your
>campaigns against RDF and DNS separate.

To me, figuring out how to fix DNS and make it more decentralized is one of
the most pressing research issues. I think Vint Cerf is skeptical of any
solutions, but agrees that something should be done.
Not sure about the rest of the internet governance circles, but maybe the
Semantic Web community could press the point?  Would the 'magna carta' help
somehow?

>
>
>> That being said, I agree with Juan - in specialized cases involving 
>> data merger and a natural graph structure, Linked Data makes tons of 
>> sense. I think the domain of health care is likely to work out in 
>> real companies, and likely social network analysis for the 
>> military-industrial-surveillance complex. Can't think of too many 
>> other domains where it makes tons of sense off the top of my head, 
>> but would be happy to hear more and hope to see many SemWeb related 
>> start-ups make the next million bucks.
>>
>
> Well, if they see the encouragement you have given them in this thread,
they will probably roll over and die, but I note your faint praise for them.

I am just pointing out the problem is hard, but again, these problems
*are* solvable. The RDF community successfully standardized Turtle, solving
the great RDF/XML catastrophe. We've done a good job with interop via
JSON-LD.  Yet until we can offer performance similar to non-RDF WebApps and
make sure our URIs keep resolving properly, I think we won't get mainstream
developer acceptance. The reverse is true - we can get mainstream developer
acceptance *if* these problems are solved I think.

However, the long-term propsects for RDF are stronger than ever, and we
should see a million dollar start-up sooner rather than later.

>
>>
>>   cheers,
>>      harry
>>>
>>>>
>>>>
>>>> And as a source of academic papers :)
>>>>
>>>>
>>>> On Tue, Apr 28, 2015 at 8:58 PM, Bob DuCharme <bob@snee.com> wrote:
>>>>> I never said that they were purchased "due to RDF." Sampo asked 
>>>>> about "a company or consortium out there which has made 1-10 
>>>>> million bucks applying technology, which couldn't have been 
>>>>> without the Semantic Web." Garlik applied this technology and made 
>>>>> a million bucks, so they were an obvious answer to Sampo's 
>>>>> question.
>>>>>
>>>>> Could they have done it without RDF technology? See what their CTO 
>>>>> Steve Harris said at
>>>>>
>>>>>
http://stackoverflow.com/questions/9159168/triple-stores-vs-relational-datab
ases.
>>>>>
>>>>> Bob
>>>>>
>>>>>
>>>>>
>>>>> On 4/28/2015 5:51 PM, Harry Halpin wrote:
>>>>>
>>>>> On Apr 28, 2015 9:59 AM, "Bob DuCharme" <bob@snee.com> wrote:
>>>>>>
>>>>>> On 4/27/2015 5:08 PM, Sampo Syreeni wrote:
>>>>>>>
>>>>>>> All of this Semantic Web stuff has existed for a while now. One 
>>>>>>> would expect that there is a company or consortium out there 
>>>>>>> which has made 1-10 million bucks applying technology, which 
>>>>>>> couldn't have been without the Semantic Web.
>>>>>>
>>>>>>
>>>>>> If you're looking for a dramatic success story in which one 
>>>>>> company is 100% about semantic web technology and then makes a 
>>>>>> million dollars, here's
>>>>>> one: http://www.dataversity.net/experian-acquires-garlik-ltd/
>>>>>>
>>>>>
>>>>> Bob, they were not purchased due to RDF. Their triplestore and use 
>>>>> of RDF was at best support for their main project  They were 
>>>>> purchased because they would use honeypots to identify identity 
>>>>> fraud. It's possible they used RDF to help combat identity fraud, 
>>>>> but they were not purchased because of RDF.
>>>>> That's like saying a social networking company was purchased 
>>>>> because they were using this thing called a SQL database :)
>>>>>
>>>>> That being said, there's more investment in RDF than there used to be.
>>>>> Has
>>>>> the technology hit a home-run like XML and taken over the industry?
>>>>>
>>>>> The honest answer is "no, not yet." And XML is rapidly being 
>>>>> eroded by JSON and Javascript. Who knows what will be next?
>>>>>
>>>>>   cheers,
>>>>>         harry
>>>>>
>>>>>
>>>>>
>>>>>> Companies such as TopQuadrant, Franz, and Cambridge Semantics are 
>>>>>> doing just fine, and more importantly, their customers are doing 
>>>>>> quite well using this technology. I think the more interesting 
>>>>>> thing to look at is the number of well-known companies that while 
>>>>>> not devoting themselves 100% to this technology, are still 
>>>>>> getting more and more work done with it:
>>>>>> http://www.snee.com/bobdc.blog/2014/05/experience-in-sparql-a-plu
>>>>>> s.html
>>>>>>
>>>>>> It's been interesting to see different divisions of Bloomberg 
>>>>>> joining these ranks lately.
>>>>>>
>>>>>> Bob DuCharme
>>>>>> @bobdc
>>>>>> snee.com/bobdc.blog
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>>
>



-----
Geen virus gevonden in dit bericht.
Gecontroleerd door AVG - www.avg.com
Versie: 2015.0.5941 / Virusdatabase: 4342/9747 - datum van uitgifte:
05/11/15
Received on Tuesday, 12 May 2015 08:33:30 UTC