Re: Converting RDF to JSON-LD : shared lists between graphs from David Booth on 2014-07-25 (public-rdf-comments@w3.org from July 2014)

From: David Booth <david@dbooth.org>
Date: Fri, 25 Jul 2014 01:57:54 -0400
To: Pat Hayes <phayes@ihmc.us>
CC: Kingsley Idehen <kidehen@openlinksw.com>, public-rdf-comments@w3.org
Message-ID: <53D1F1E2.9000903@dbooth.org>
Hi Pat,

On 07/24/2014 11:48 PM, Pat Hayes wrote:
>
> On Jul 23, 2014, at 2:21 PM, David Booth <david@dbooth.org> wrote:
>
>> Hi Kingsley,
>>
>> On 07/23/2014 10:13 AM, Kingsley Idehen wrote:
>>> On 7/23/14 6:46 AM, Dan Brickley wrote:
>>>>> How so?  It seems to me that there is an inherent tension
>>>>> between being nice
>>>>>> to RDF consumers (by using URIs for things that other might
>>>>>> want to
>>>>> refer
>>>>>> to, as AWWW recommends) and author convenience, which leads
>>>>>> to bnode
>>>>> use.
>>>> Yes, that's a real tension, although bnodes are just one
>>>> aspect. My point was to question the "clearly" in  "the use of
>>>> blank nodes clearly violates the web architectural good
>>>> practice that anything of importance should be given a URI".
>>>> Using bnodes is consistent with the things the bnodes represent
>>>> having URIs, so nothing is violated. The reason btw we renamed
>>>> them "bnodes" instead of the earlier (1997-2000
>>>> e.g.http://www.w3.org/2000/03/rdf-tracking/#rdfms-identity-anon-resources)
>>>>
>>>>
>>>>
phrase "anonymous nodes" was this point: the things are not anonymous
>>>> / nameless. Only particular descriptions of them.
>>>>
>>>
>>> +1
>>>
>>> Using pronouns (from natural language) to explain the nature of
>>> blank nodes helps a lot.
>>
>> Maybe, but pronouns are used *very* differently than blank nodes,
>> so it really isn't an accurate comparison.  Normally when a pronoun
>> is used, the corresponding noun is *also* used, so the reader can
>> easily determine the intended noun.  ("When *Jack* got to the bank,
>> *he* stopped.")  But that is not usually the case with blank nodes.
>> Usually if a blank node is used in an RDF document, no equivalent
>> URI is given for that node.  But still, I can see how the analogy
>> could help sometimes.
>
> The correct analogy is with indefinite pronouns like "someone" or
> "something". But many uses of blank nodes are in fact more like
> indefinite noun phrases, eg a bnode with an rdf:type link to
> http://dbpedia.org/ontology/tree is almost an exact rendering of the
> English phrase "a tree".
>
> BTW, recent work crawling the actual existing semantic web shows that
> about 40% of deployed RDF uses blank nodes. So apparently this
> "confusing" aspect of RDF is not confusing to a fair number of users.
> IMO all this noise about 'good' vs. 'bad' RDF is just that, noise.
> Actual users of RDF, as opposed to writers of technical blogs, seem
> to be able to handle RDF quite well.

I don't think that's a very helpful claim.  First, the fact that RDF 
users are handling blank nodes does *not* mean that blank nodes do not 
add complexity to their work.  For example, I'm an actual user of RDF, 
and I have many times been frustrated at the extra effort that was 
required in dealing with RDF because of blank nodes.  Of course, blank 
nodes can also make the job *easier* in some cases -- they have pros and 
cons -- hence the desire to have the best of both worlds, which was the 
motivation behind the idea of Well Behaved RDF.  Second, that claim 
suffers from self-selection bias: obviously those who "couldn't handle 
RDF quite well" abandoned it in favor of JSON or something else they 
found easier, so they are no longer "actual users of RDF".

I would like to increase the pool of RDF users -- not limit it to some 
elite who are "able to handle" it.  I've been coming to the conclusion 
that RDF is harder than it needs to be to support the Semantic Web.  But 
there's a big vested interest among those in the existing RDF community 
-- particularly among RDF tool developers -- that makes it hard to look 
past the current specs and semantics, to what we *could* have.  Like the 
Innovator's Dilemma,
http://en.wikipedia.org/wiki/The_Innovator%27s_Dilemma
I'm worried that RDF may be eclipsed by something else that is simpler, 
but still does the job well enough to support the Semantic Web. 
(Something JSON-based? or JSON-LD?)

>
>>>
>>> Over the years there's been a tendency to tag vital aspects of
>>> RDF as bad, for a variety of reasons that always boil down to
>>> assuming that users (end-users and developers) can't figure this
>>> stuff out.
>>
>> I think we have quite a lot of experience indicating that that is
>> the case.  Even though the basic idea of triples representing
>> simple assertions is easy -- and David Wood has a really nice Dr
>> Seuss-inspired introduction
>> http://www.slideshare.net/3roundstones/rdf-explained-by-suess-and-me
>>
>>
-- RDF also has subtleties that cause complexity and grief in practice 
and IMO inhibit adoption.  Blank nodes are prime culprits here.
>
> Do you have any actual evidence for this claim, David? I have never
> seen any.

Uh, yes.  Try explaining to a newbie why cmp cannot be used to check 
whether two graphs are the "same", even when those graphs are serialized 
as ntriples and sorted.  Try explaining to a newbie why a blank node 
returned from a SPARQL query cannot be referenced in a subsequent query, 
because a blank node label is a name that isn't really a name, it's just 
a temporary label for a "node", which isn't really the same thing as a 
node that has a URI, because a node that has a URI maps to a 
*particular* thing (in every interpretation), whereas a blank node 
merely maps to the *existence* of a thing.  See also the discussion of 
poll results in "Everything You Always Wanted to Know About Blank Nodes":
http://www.websemanticsjournal.org/index.php/ps/article/download/365/387

> Most of the noise about blank nodes seems to have been
> generated by people who dislike them and write a lot.

Well of course.  Who else would be motivated to complain about them? 
And there's a reason why they dislike them, and it has nothing to do 
with their color or font: blank nodes have significant downsides.

> People who
> understand them and use them easily seem to just get on with the job
> and don't write spleenish complaints about them.

Of course.  Again, that's self-selection bias.

>
>> And my main argument in this paper
>> http://dbooth.org/2013/well-behaved-rdf/Booth-well-behaved-rdf.pdf
>> is that if we constrain how blank nodes are used, by eliminating
>> explicit blank nodes while retaining implicit blank nodes, we can
>> simplify RDF usage while retaining the main benefits of blank nodes
>> -- getting the best of both.
>>
>>> In my experience, a little flexibility on the narrative and
>>> anecdotes front can leads to clarity, appreciation, and
>>> adoption.
>>
>> Yes, that definitely helps too.  And I appreciate all the work you
>> have done over the years to ease that path.  But I still think RDF
>> is harder than it should be, because of these complexities, and we
>> would gain more adoption if we made it simpler.
>
> "We" are not going to change it at all in the forseeable future,
> after getting RDF 1.1 done. Writing papers recommending that people
> stop using it properly according to its specs, the way that it is
> already being used, do not achieve anything very positive, IMO.

I beg your pardon.  That is a *gross* mischaracterization.  That paper 
on Well Behaved RDF does not in any way advocate that RDF be used 
improperly or in violation of the RDF specs, nor does it advocate that 
RDF be used differently than it is already being used **in the vast 
majority of cases**.  According to "Everything you always wanted to know 
about blank nodes" by Aidan Hogan, Marcelo Arenas, Alejandro Mallea, and 
Axel Polleres,
http://www.websemanticsjournal.org/index.php/ps/article/download/365/387
[[
We conclude that the majority of documents sur-
veyed contain acyclical blank node structures. Fur-
thermore, with a low average in-degree of 1.07, we
conclude that blank nodes mostly tend to form di-
rected trees from subject to object. However, unlike
observations for previous datasets [47], we see a sig-
nificant number of blank-node components (37.7%)
containing cycles. Of the 1,258,774 with a treewidth
of 2, we found that 1,257,229 of these (99.9%) origi-
nated from a single domain, data.gov.uk, which is
in fact the largest producer of blank nodes in our
data (cf. Table 2). Aside from this one domain, the
vast majority of blank nodes form acylical graph
structures.
]]

However, the paper on Well Behaved RDF *does* advocate that 
*problematic* uses of blank nodes should be avoided, and it proposes a 
very simple and practical rule for ensuring that they are.  You may not 
see that as leading to anything positive, but I certainly do.

David

>
> Pat
>
>
>>
>> David
>>
>>>
>>> Here are two important aspects of RDF that typically end up
>>> being labeled as problematic:
>>>
>>> [1] blank nodes -- pronouns [2] statement reification --
>>> statements are things too!
>>>
>>> The items above enable important bridging across:
>>>
>>> 1. Open Data -- basic structured data 2. Linked Open Data --
>>> structured data constructed using Linked (Open) Data principles
>>> 3. Semantically enhanced Linked Open Data -- structured data
>>> constructed using RDFS, OWL, etc., in conjunction with Linked
>>> Open Data principles.
>>>
>>>
>>> In most cases, rather than reification, RDF user agents will
>>> leverage blank nodes as objects of relations that associate
>>> embedded structured data with their container documents.
>>>
>>> Example:
>>>
>>> [1]
>>> http://linkeddata.uriburner.com/about/id/entity/http/thenextweb.com/apple/2014/07/23/apple-release-information-ios-response-claims-backdoor-data-collection/
>>>
>>>
-- Statements are reified en route to creating an RDF based Linked Data
>>> document that's close to self-explanatory re. follow-your-nose
>>> pattern
>>>
>>> [2] http://bit.ly/rdf-statement-reficiation-fyn  -- alternative
>>> view for deeper follow-your-nose exploration
>>>
>>> [3] https://twitter.com/thalhamm/status/471994633573380096 --
>>> tweet about blank node use that's also a segue to a usecase
>>> demo.
>
> ------------------------------------------------------------ IHMC
> (850)434 8903 home 40 South Alcaniz St.            (850)202 4416
> office Pensacola                            (850)202 4440   fax FL
> 32502                              (850)291 0667   mobile
> (preferred) phayes@ihmc.us       http://www.ihmc.us/users/phayes
>
>
>
>
>
>
>
>
>
>
Received on Friday, 25 July 2014 05:58:22 UTC