Re: blank nodes …. from David Booth on 2013-04-03 (public-semweb-lifesci@w3.org from April 2013)

From: David Booth <david@dbooth.org>
Date: Wed, 03 Apr 2013 16:43:06 -0400
To: Jeremy J Carroll <jjc@syapse.com>
CC: public-semweb-lifesci@w3.org
Message-ID: <515C945A.3060107@dbooth.org>
Hi Jeremy,

I worked with genomic data very similar to yours, and initially used 
blank nodes for exactly the purpose that you describe below.  However, I 
regretted doing so when I realized that every time I loaded some of the 
same source data, duplicate triples were added, because the system did 
not recognize them as representing the same n-ary tuple.  (This problem 
could have been avoided but regenerating everything from scratch every 
time I wanted to reload some data, but there were reasons for not doing 
that.)  I also discovered that the presence of the blank nodes made it 
harder to manipulate the data using SPARQL query/update, because blank 
nodes are not stable across queries.  When producing n-ary tuples, I now 
favor generating URIs based on the constituent values in the tuple. 
Usually one or more of the values form a usable key (or composite key). 
  To avoid superficially different URIs when alleles are swapped, you 
can sort the alleles.

But really the problem here is the lack of native RDF support for 
n-tuples.  We're just using bnodes and fabricated URIs to work around 
that gap.  And that forces every one of us to reinvent our own 
idiosyncratic ways to do it, and it prevents tools from recognizing 
n-tuples for what they really are.

David


On 04/03/2013 03:12 PM, Jeremy J Carroll wrote:
>
>
> One question that I didn't really answer today was about my choice to
> use blank nodes extensively following
>
> http://www.w3.org/TR/swbp-n-aryRelations/
>
> [[ we did not give meaningful names to instances of properties or to
> the classes used to represent instances of n-ary relations, but
> merely label them_:Temperature_Observation_1, Purchase_1, etc. In
> most cases, these individuals do not stand on their own but merely
> function as auxiliaries to group together other objects. Hence a
> distinguishing name serves no purpose. ]]
>
> This is an old chestnut, with there being two sides with some people
> (e.g. some notable names in the linked data community) having a
> strong dislike for blank nodes, and others (including myself) seeing
> them as one of those necessary things, which you could abolish only
> to have to reinvent.
>
> I found this old message on the web fairly easy:
>
> http://lists.w3.org/Archives/Public/www-rdf-interest/2003Jul/0166.html
>
>  [[
>
>>> Q: Why are blank nodes necessary?
>
>> There is, I think, a sustainable argument that blank nodes are not
>> *necessary*.
>
> I beg to differ. Blank nodes are absolutely necessary.
>
> Consider translating an XML file into RDF, where typically none of
> the incoming resource nodes have URI's. You have two choices, you can
> use blank nodes to represent them, or you can use (globally unique)
> URI's. If you use URI's, then you need a scheme for generating them
> so that (1) you don't clash with other uniquely generated nodes, (2)
> you need to figure out how to label the nodes each of the subsequent
> times that you load the same graph, (3) you still need a scheme to
> know that these nodes are semantically "blank", so that your
> application can avoid generating "pointers" to them. Its not safe to
> reference the URI's of blank nodes, since typically they won't recur
> the next time you load, or if they do recur, there is no way to
> guarantee that they denote the same node they did the first time.
>
> So, you can have blank nodes, or you can have a maintenance
> nightmare. The choice is yours.
>
> Cheers, Bob
>
> ]]
>
> Hmmm, nearly 10 years old.
>
>
> In particular any process we have for creating URIs for the blank
> nodes on my slide 16 "INFO about Alleles"
> http://lists.w3.org/Archives/Public/www-archive/2013Apr/att-0002/W3C-JJC-LifeSci.pdf
>
>  would need to be wise to the fact that whether the VCF line
> (extract) reads
> http://lists.w3.org/Archives/Public/www-archive/2013Apr/att-0002/W3C-JJC-LifeSci.pdf
>
>
#CHROM	POS	 	REF	ALT		INFO
> 20			1110696	A		G,T		AF=0.333,0.667;AA=T;DB
>
> or swapping the order of the alleles
>
> #CHROM	POS	 	REF	ALT		INFO 20			1110696	A		T,G
> AF=0.667,0.333;AA=T;DB
>
> is inconsequential and should not result in any semantic difference,
> and in addition that we cannot use the base sequence in the
> generation of a URI for an allele since not all alleles have a base
> sequence, e.g. a deletion of only some approximate length
>
>
>
>
>
>
> Jeremy J Carroll Principal Architect Syapse, Inc.
>
>
>
>
>
>
>
Received on Wednesday, 3 April 2013 20:43:35 UTC