RE: Test of Independent Invention: RDF from Mark Wallace on 2015-05-07 (semantic-web@w3.org from May 2015)

From: Mark Wallace <mwallace@modusoperandi.com>
Date: Thu, 7 May 2015 13:22:30 +0000
To: adasal <adam.saltiel@gmail.com>, Florence Amardeilh <florence.amardeilh@mondeca.com>
CC: Tim Berners-Lee <timbl@w3.org>, Harry Halpin <hhalpin@ibiblio.org>, Melvin Carvalho <melvincarvalho@gmail.com>, Bob DuCharme <bob@snee.com>, SW-forum Web <semantic-web@w3.org>
Message-ID: <DM2PR08MB398F1CD61ECE14AB729E9BAC3DF0@DM2PR08MB398.namprd08.prod.outlook.com>
The TOII I’m looking at is here:   <http://www.w3.org/DesignIssues/Evolution.html#TOII> http://www.w3.org/DesignIssues/Evolution.html#TOII

 

I can think of parts of RDF that fare well against this test.  E.g.:

 

1)      RDF is independent of specific serialization, but because it has an XML serialization, and SPARQL results do too, it can easily be transformed into other formats, using things like XSLT.  I have used this in multiple solutions.  [1]  This allows it to be used in systems and in ways not necessarily known at the time RDF was initially invented.

2)      Being a graph model, RDF can be used, or transformed and then used (by e.g. Neo4J) to do graph analytics.  

3)      Given that it uses URIs for identifiers, it can talk about (refer to, "decorate") any web resources that can be described by the URI scheme (the scheme called out in the TOII article linked above as a good example of passing the TOII test).

 

These are just a few quick thoughts.

-Mark

 

[1] http://www.dataversity.net/leveraging-owl-dl-sparql-and-xslt-to-automate-java-agent-configuration/ 

 

--

Mark Wallace

Principal Engineer, Semantic Applications

MODUS OPERANDI, INC.

 

www.modusoperandi.com

 

From: adasal [mailto:adam.saltiel@gmail.com] 
Sent: Thursday, May 07, 2015 8:26 AM
To: Florence Amardeilh
Cc: Tim Berners-Lee; Harry Halpin; Melvin Carvalho; Bob DuCharme; SW-forum Web
Subject: Re: Test of Independent Invention: RDF

 

I've lost track of the original question.

Sarven Capadisli asked:-

Looking at the past, present and future, what is the state of RDF in the Test of Independent Invention?

 

-Sarven

http://csarven.ca/#i

--------------------------

I don't know what TOII actually means, is there a rigorous definition?

However, on this list we are notified of

 

SemStats 2015 Call for Papers

=============================

 

Third International Workshop on Semantic Statistics (SemStats 2015)

....

which Sarven is chairing.

 

Looking through I would have thought the field pointed at in the conference supplies answers to the question asked, if what is really meant is "What does the SemWeb field look like to the list over the next few years?"

That seems to be the question others on the list are responding to, anyway.

The idea of combining SemWeb technologies and principals with the work of statisticians must be meeting a perceived need.

I find myself interested in what those needs might be.

At some point, conceptually, the topic must deal with what I understand to be two fundamentally different ways of forming related concepts, where ontology modelling starts with some categories formed ab initio as axiomatic sui generis whereas statistically formed categories depend on the implied assumption that the background can be computed and in such a way so as to reveal a foreground.

Be that as it may taking the theme of the up coming conference is possibly only one example of how two different professional fields may be combined, but it does seem to be one for which a compelling case can be made.

If we are talking seriously about compelling cases then, of course, nothing in response to the question on this list so far succeeds either way.

A moments reflection on the coming conference suggests that the field we are interested in is both complex and also probably doesn't lend itself to breakout successes.

We enter into the business and economic illusion I think respondents are subject to here.

Most business is not about breakout success, it is therefore questionable what that is a measure of.

As with most businesses, this business (SemWeb in a most wide definition) exists in a complex interdependent environment, each separate part of which needs to have evolved itself in a certain way in order for this business to make sense. We may think of the interdependent environment as, in part, the raw materials of the field, in this case data.

It is common in these most complex arenas for there to be multiple solutions addressing subtly different aspects of the business domain, so I would expect multiple almost parallel solutions of a SemWeb type within the field of health care, as an example.

 

However I do think that respondents here may have other concerns, where the big players are mentioned. As far as I understand it the more money a company has and generates the more they 'own' the internet and that.of course means ownership of the direction of protocols, adoption of such and just basic convention and common practice.

The very idea that the form and function of data, how it may be transmitted, by whom and otherwise circumscribed is being competed for is fascinating.

 

It never seems a good idea to me not to be able to recognise a threat as such when surveying a domain of operation. The answer as to whether google et al are more or less useful will depend on the business proposition. Each will entail its own battles. Whether there is a broader existential threat, to the SemWeb (as I call it) or fundamental principals of the internet, or perhaps to ourselves in that the internet may not embody the best protocols or practices for ourselves is all open to question.

But I don't think that was the question here?

 

Adam

 

On 6 May 2015 at 09:42, Florence Amardeilh <florence.amardeilh@mondeca.com <mailto:florence.amardeilh@mondeca.com> > wrote:

Hi guys,

We won't roll over and die at Mondeca since we are not a start-up anymore,
with our 15 years of experience in SemWeb. Even if we are not earning
millions and billions (but at least 1 to 1.5 million each year, enough to
hire 20 persons that lives upon semweb), we promote the semweb technos and
philosophy in our everyday activities. Each year new major companies believe
in our semweb tools, not only because we have good salesmen but mainly
because of the flexibility, interoperability and new kind of capacity (such
as reasoning) that these tools allow for their applications.

Is it all about healthcare systems ? well, it is an important business
market for us, that's true but not the only one. We also target industry,
media, e-tourism, government, etc.

Could these projects have been done without semweb techno ? probably,
nothing is impossible. But I believe that if our client have chosen us to do
the job it is because we sell semweb methodologies and tools and find
innovative solutions with that.

Is it so easy? No of course not, and we must fight every day to exist but
since 15 years we have witnessed a change in our clients which are more
open-minded and aware of the semweb stack and the benefits they can get from
it. And we hope it will still continue to grow in the forthcoming years.

So I think we can definitely reply "yes" to the primary question. And even
if we known that we will never become as big as Google, there are room for
each of us. We are pleased to see how Google gets interested and involved
through the Knowledge graph and the use of microformats in its indexing
processes. But I don't think we serve the same goal anyway.

Just my 2cents on this discussion.

Florence.


Dr. Florence Amardeilh
Research & Development Director
--------------------------------------------------------
Mondeca                             
35 bd de Strasbourg, 75010 Paris, France
www.mondeca.com <http://www.mondeca.com> 
Follow us on Twitter : @mondecanews
--------------------------------------------------------



-----Message d'origine-----
De : Tim Berners-Lee [mailto:timbl@w3.org <mailto:timbl@w3.org> ]
Envoyé : mardi 5 mai 2015 11:20
À : Harry Halpin
Cc : Melvin Carvalho; Bob DuCharme; SW-forum Web
Objet : Re: Test of Independent Invention: RDF



On 2015-05 -03, at 03:38, Harry Halpin <hhalpin@ibiblio.org <mailto:hhalpin@ibiblio.org> > wrote:

> On Wed, Apr 29, 2015 at 3:53 AM, Melvin Carvalho
> <melvincarvalho@gmail.com <mailto:melvincarvalho@gmail.com> > wrote:
>>
>>
>> On 29 April 2015 at 03:11, Harry Halpin <hhalpin@ibiblio.org <mailto:hhalpin@ibiblio.org> > wrote:
>>>
>>> Not convinced. From my conversations with engineers there like
>>> Mischa Tuffield, I believe the answer is "yes" it could have been
>>> done without the Semantic Web and

I'd heard meanwhile that Garlik (pre-Experian) did benefit very much from
very schema-free and being able to throw more data into the triple store at
a moments notice without SQL-like schema design. Maybe we should check with
Mischa.

>>> *the part of the company Experian
>>> bought*,

Did they not buy the whole company?  Pointer to that fact?

>>> i.e. the honeypot for identity fraud,  the main part of the business
>>> was done out without RDF. Thus, Experian is not maintaining the RDF
>>> infrastructure (at least 4store).
>>>
>>> So, I still haven't seen RDF used in any start-ups that have
>>> succeeded yet. I suspect there is probably some ones that *will*
>>> succeed in the healthcare space. However, in general there are major
>>> flaws in the entire Semantic Web concept ("follow your nose" URIs
>>> lead to accidental denial of service attacks,

You quote below a problem  -- a major bug -- with a Microsoft XML system,
not an RDF system at all.
Or are you agains using URIs for anything.



>>> basic CS tells us graphs will
>>> always be slower than hash tables, etc.)

The first reason why that is nonsense is that graphs are hash table inside.

>>> that will likely prevent it
>>> from ever occupying the place XML or JSON has IMHO. That being said,
>>> it will likely to continue to be useful in niche markets involving
>>> data merger with dynamic schemas
>>
>>
>> Couldnt every statement you made above about the web of data, be
>> applied to the web of documents, and be contrary to experience?
>
> Melvin - which is why Google exists.

Google.  Yes, that's the company which brought its search engine into the
modern age using a huge internal Knowledge Graph, which now drives much or
its operations.  Yes, it does not share it in general -- but then, would
you?

That's the company which by reading semantic web data in microformats and
RDF/a in its crawl has prompted approaching 30% of web pages on the entire
WWW to contain RDF-equivalent data?

> The reason why Semantic Web stuff
> doesn't scale in most real-world apps would be that you would
> basically need a Google-style infrastructure.

You glibly roll that off without any consideration of what sort of an app,
what sort of a problem, and what sort of a scale, none of which are simple
questions with one-line answers.

> Yet search over Linked
> Data seems to have stopped working (Sindice) and I haven't heard of
> real-world caching. But for a non-SemWeb example of "follow-your-nose"
> failing hard, when W3C made XML processors think the XML DTD had to be
> retrieved from w3.org <http://w3.org> ,

- W3C (XML Schema)  did NOT specify for XML processors that  the DTD should
be retrieved.
- That event  was only one buggy implementation which did.
- You are talking a non-semantc web implementation anyway.

> the server basically couldn't handle this well-meaning DOS attack :)


> However, I do think the DOS attack
> problem/caching/searching are very solvable.
>

So you are a fan of follow-your-nose or not?

> Another  reason why Semantic Web stuff doesn't actually scale is basic
> computer science and so isn't likely solvable

I'd like to see an actual elaborated argument there rather than political
rhetoric.

> -  and the reason we are
> seeing JSON take off (rather than RDF) as the lingua franca of the
> Web: array-values pairs map well to hash tables and what programming
> languages actually do. I would be shocked if graph DBs (see travelling
> salesman problem) ever got nearly as fast as hash tables (O(1) vs NP
> complete), so thus in general I think as a core technology the main
> problem with moving to RDF is a huge performance loss.

You are confusing two things.  It is true that JSON is appealing because it
is a native data structure to the current functional language of the day.
That is natural.  It also is superior to XML for hierarchical data in that
it has numbers as well as strings.  It is inferior to (say) turtle in that
it has the JS problem of only having one number type, whereas in turtle 2,
2.0 and 2.0e0 are distinct typed numbers which are generally expected in the
data world, python, etc etc.

Then you are claiming RDF is inferior because graph problems are in general
harder to solve than tree problems.  This is extremely disingenuous. The
traveling salesman problem on an arbitrary graph is hard to solve no matter
what data format and model you use.  It is just going to be easier to code
using a language which handles graphs.   A tree-like query, on the other
hand, will be fast ether you write the tree in JSON or Turtle.

A triple store is just hashes inside.
Yes, you pay a bit of penalty for having the extra possibilities but only a
bit.

> Also, it would be useful if Semantic Web people really thought through
> decentralization. URIs are not decentralized, they are rented from
> ICANN, which runs a number of quite centralized name-servers. Yes,
> once you buy one you can mint infinite URIs, but that's pretty far
> from decentralized - and TimBL has said as much: "We could
> decentralize everything but this"

Please don't  quote me out of context.

Note I am really involved in a lot of work on re-decentralizing the web
which does use URIs with domain names in. How so?    Because the value of
that is massively more than then damage currently inflicted by the DNS.  You
mint a new URI with every loop in a program. You only actually need to
create a new domain name rarely, such as once in a project.  Yes, the
philosophical basis of the naming, or the commercial arrangements may not be
perfect but that argument is a million miles beneath that of the benefits of
RDF.

You go away and replace DNS with something sounder politically and
commercially and RDF will use it straight away of course.   But keep your
campaigns against RDF and DNS separate.


> That being said, I agree with Juan - in specialized cases involving
> data merger and a natural graph structure, Linked Data makes tons of
> sense. I think the domain of health care is likely to work out in real
> companies, and likely social network analysis for the
> military-industrial-surveillance complex. Can't think of too many
> other domains where it makes tons of sense off the top of my head, but
> would be happy to hear more and hope to see many SemWeb related
> start-ups make the next million bucks.
>

Well, if they see the encouragement you have given them in this thread, they
will probably roll over and die, but I note your faint praise for them.

>
>   cheers,
>      harry
>>
>>>
>>>
>>> And as a source of academic papers :)
>>>
>>>
>>> On Tue, Apr 28, 2015 at 8:58 PM, Bob DuCharme <bob@snee.com <mailto:bob@snee.com> > wrote:
>>>> I never said that they were purchased "due to RDF." Sampo asked
>>>> about "a company or consortium out there which has made 1-10
>>>> million bucks applying technology, which couldn't have been without
>>>> the Semantic Web." Garlik applied this technology and made a
>>>> million bucks, so they were an obvious answer to Sampo's question.
>>>>
>>>> Could they have done it without RDF technology? See what their CTO
>>>> Steve Harris said at
>>>>
>>>>
http://stackoverflow.com/questions/9159168/triple-stores-vs-relational-datab
ases.
>>>>
>>>> Bob
>>>>
>>>>
>>>>
>>>> On 4/28/2015 5:51 PM, Harry Halpin wrote:
>>>>
>>>> On Apr 28, 2015 9:59 AM, "Bob DuCharme" <bob@snee.com <mailto:bob@snee.com> > wrote:
>>>>>
>>>>> On 4/27/2015 5:08 PM, Sampo Syreeni wrote:
>>>>>>
>>>>>> All of this Semantic Web stuff has existed for a while now. One
>>>>>> would expect that there is a company or consortium out there
>>>>>> which has made
>>>>>> 1-10
>>>>>> million bucks applying technology, which couldn't have been
>>>>>> without the Semantic Web.
>>>>>
>>>>>
>>>>> If you're looking for a dramatic success story in which one
>>>>> company is 100% about semantic web technology and then makes a
>>>>> million dollars, here's
>>>>> one: http://www.dataversity.net/experian-acquires-garlik-ltd/
>>>>>
>>>>
>>>> Bob, they were not purchased due to RDF. Their triplestore and use
>>>> of RDF was at best support for their main project  They were
>>>> purchased because they would use honeypots to identify identity
>>>> fraud. It's possible they used RDF to help combat identity fraud,
>>>> but they were not purchased because of RDF.
>>>> That's like saying a social networking company was purchased
>>>> because they were using this thing called a SQL database :)
>>>>
>>>> That being said, there's more investment in RDF than there used to be.
>>>> Has
>>>> the technology hit a home-run like XML and taken over the industry?
>>>>
>>>> The honest answer is "no, not yet." And XML is rapidly being eroded
>>>> by JSON and Javascript. Who knows what will be next?
>>>>
>>>>   cheers,
>>>>         harry
>>>>
>>>>
>>>>
>>>>> Companies such as TopQuadrant, Franz, and Cambridge Semantics are
>>>>> doing just fine, and more importantly, their customers are doing
>>>>> quite well using this technology. I think the more interesting
>>>>> thing to look at is the number of well-known companies that while
>>>>> not devoting themselves 100% to this technology, are still getting
>>>>> more and more work done with it:
>>>>> http://www.snee.com/bobdc.blog/2014/05/experience-in-sparql-a-plus
>>>>> .html
>>>>>
>>>>> It's been interesting to see different divisions of Bloomberg
>>>>> joining these ranks lately.
>>>>>
>>>>> Bob DuCharme
>>>>> @bobdc
>>>>> snee.com/bobdc.blog <http://snee.com/bobdc.blog> 
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>
>
Attachments

application/pkcs7-signature attachment: smime.p7s
Received on Thursday, 7 May 2015 13:23:04 UTC