- From: David Booth <david@dbooth.org>
- Date: Wed, 03 Apr 2013 16:43:06 -0400
- To: Jeremy J Carroll <jjc@syapse.com>
- CC: public-semweb-lifesci@w3.org
Hi Jeremy, I worked with genomic data very similar to yours, and initially used blank nodes for exactly the purpose that you describe below. However, I regretted doing so when I realized that every time I loaded some of the same source data, duplicate triples were added, because the system did not recognize them as representing the same n-ary tuple. (This problem could have been avoided but regenerating everything from scratch every time I wanted to reload some data, but there were reasons for not doing that.) I also discovered that the presence of the blank nodes made it harder to manipulate the data using SPARQL query/update, because blank nodes are not stable across queries. When producing n-ary tuples, I now favor generating URIs based on the constituent values in the tuple. Usually one or more of the values form a usable key (or composite key). To avoid superficially different URIs when alleles are swapped, you can sort the alleles. But really the problem here is the lack of native RDF support for n-tuples. We're just using bnodes and fabricated URIs to work around that gap. And that forces every one of us to reinvent our own idiosyncratic ways to do it, and it prevents tools from recognizing n-tuples for what they really are. David On 04/03/2013 03:12 PM, Jeremy J Carroll wrote: > > > One question that I didn't really answer today was about my choice to > use blank nodes extensively following > > http://www.w3.org/TR/swbp-n-aryRelations/ > > [[ we did not give meaningful names to instances of properties or to > the classes used to represent instances of n-ary relations, but > merely label them_:Temperature_Observation_1, Purchase_1, etc. In > most cases, these individuals do not stand on their own but merely > function as auxiliaries to group together other objects. Hence a > distinguishing name serves no purpose. ]] > > This is an old chestnut, with there being two sides with some people > (e.g. some notable names in the linked data community) having a > strong dislike for blank nodes, and others (including myself) seeing > them as one of those necessary things, which you could abolish only > to have to reinvent. > > I found this old message on the web fairly easy: > > http://lists.w3.org/Archives/Public/www-rdf-interest/2003Jul/0166.html > > [[ > >>> Q: Why are blank nodes necessary? > >> There is, I think, a sustainable argument that blank nodes are not >> *necessary*. > > I beg to differ. Blank nodes are absolutely necessary. > > Consider translating an XML file into RDF, where typically none of > the incoming resource nodes have URI's. You have two choices, you can > use blank nodes to represent them, or you can use (globally unique) > URI's. If you use URI's, then you need a scheme for generating them > so that (1) you don't clash with other uniquely generated nodes, (2) > you need to figure out how to label the nodes each of the subsequent > times that you load the same graph, (3) you still need a scheme to > know that these nodes are semantically "blank", so that your > application can avoid generating "pointers" to them. Its not safe to > reference the URI's of blank nodes, since typically they won't recur > the next time you load, or if they do recur, there is no way to > guarantee that they denote the same node they did the first time. > > So, you can have blank nodes, or you can have a maintenance > nightmare. The choice is yours. > > Cheers, Bob > > ]] > > Hmmm, nearly 10 years old. > > > In particular any process we have for creating URIs for the blank > nodes on my slide 16 "INFO about Alleles" > http://lists.w3.org/Archives/Public/www-archive/2013Apr/att-0002/W3C-JJC-LifeSci.pdf > > would need to be wise to the fact that whether the VCF line > (extract) reads > http://lists.w3.org/Archives/Public/www-archive/2013Apr/att-0002/W3C-JJC-LifeSci.pdf > > #CHROM POS REF ALT INFO > 20 1110696 A G,T AF=0.333,0.667;AA=T;DB > > or swapping the order of the alleles > > #CHROM POS REF ALT INFO 20 1110696 A T,G > AF=0.667,0.333;AA=T;DB > > is inconsequential and should not result in any semantic difference, > and in addition that we cannot use the base sequence in the > generation of a URI for an allele since not all alleles have a base > sequence, e.g. a deletion of only some approximate length > > > > > > > Jeremy J Carroll Principal Architect Syapse, Inc. > > > > > > >
Received on Wednesday, 3 April 2013 20:43:35 UTC