- From: Juan Sequeda <juanfederico@gmail.com>
- Date: Mon, 31 Jan 2011 23:10:32 -0600
- To: "Eric Prud'hommeaux" <eric@w3.org>
- Cc: RDB2RDF Working Group WG <public-rdb2rdf-wg@w3.org>
- Message-ID: <AANLkTimoOjuGwGSA71maepRr-bqey27zX=81nHmZfF9=@mail.gmail.com>
late night, so I won't respond to all... On Mon, Jan 31, 2011 at 1:34 PM, Eric Prud'hommeaux <eric@w3.org> wrote: > * Juan Sequeda <juanfederico@gmail.com> [2011-01-31 10:11-0600] > > This may be something we have talked about, so sorry if I'm asking about > > something that already has an answer. > > > > We assume that a table that does not have a primary key will have a blank > > node as the Row identifier for each tuple. > > > > But what happens if the table does not have a primary key but does have a > > candidate key(s). Are we still generating a blank node as the Row > identifier > > for each tuple? Or could we consider building an IRI with the candidate > > keys? > > That would make the rule a bit more complicated to explain to users > and would lead to some design questions: Which candidate key would > dominate when there were several to choose from (e.g. the Projects > table)? How would the dominant key's value be available when > generating reference triples which link to a non-dominant keys? > Excellent point! This would just open a can of worms. So I guess that we can just keep it simple and state that if a table does not have a pk and even though it has 1 or more candidate keys, we are sticking with generating Blank Nodes. > One use case I want to be sure to address is that of a typical > warehouse merging data from multiple sources, re-populated at a > regular interval (say 3am daily). Sometimes they don't have a primary > key (the candidate keys serve for linking purposes) because those keys > would change every day. Sometimes they do have a primary key but its > volatility dictates that the key is a secret used only by the import > scripts. > > > > Consider the following example > > > > Schema > > Projects(lead, name, deptName, deptCity) where UNIQUE(name, deptName, > > deptCity) > > I read this as a superkey encompassing the two candidate keys > described in > <http://www.w3.org/2001/sw/rdb2rdf/directMapping/#ref-no-pk>. > > > Instances > > Projects(8, pencil survey, accounting, cambridge) > > Projects(8, eraser survey, accounting, cambridge) > > > > For each tuple we could create a fresh blank node, or we could create a > Row > > IRI for each tuple using the candidate key : > > > > <Projects/name=pencil survey,deptName=accounting,deptCity=cambridge> > > <Projects/name=eraser survey,deptName=accounting,deptCity=cambridge> > > > > These IRIs are unique because they come from unique keys. > > > > What is the consensus here. I do not think this case is covered in the > > current direct mapping doc (right Eric?) > > The modeling you're exploring isn't used in the direct mapping doc, > but the use case is addressed. "Referencing tables with empty primary > keys" includes the table with two unique keys and no primary keys that > you describe above. The generated graph maintains referential > integrity by labeling the triples from one row of the Projects table > as _:c and using that as the object of all arcs which reference that > row. > > I think the simple consistency of the current rule will appeal more to > users and implementers. We now have two cases: > > table has a primary key → row node is a function of that primary > key value. > > table has no primary key → row node is a new blank node. > > We will otherwise have three cases: > > table has a primary key (and any number of candidate keys) → row > node is a function of that primary key value. > > table has no primary key and no canidate key → row node is a new > blank node. > > table has no primary key and some canidate keys → row node is a > function of those candidate key values. > > > > Cheers > > > > Juan Sequeda > > +1-575-SEQ-UEDA > > www.juansequeda.com > > > > > > On Fri, Jan 21, 2011 at 2:41 PM, RDB2RDF Working Group Issue Tracker < > > sysbot+tracker@w3.org <sysbot%2Btracker@w3.org> <sysbot%2Btracker@w3.org<sysbot%252Btracker@w3.org>>> > wrote: > > > > > > > > ISSUE-9 (bn_directmapping): Generate Blank Nodes for duplicate tuples > > > [Direct Mapping] > > > > > > http://www.w3.org/2001/sw/rdb2rdf/track/issues/9 > > > > > > Raised by: Juan Sequeda > > > On product: Direct Mapping > > > > > > Given a table that does not have a primary key, which has duplicate > tuples, > > > a different blank node must be created for each tuple. > > > > > > In the Direct Mapping as rules section of the Direct Mapping document, > we > > > described this scenario by using all the values of the tuple to create > the > > > blank node [1] [2]. However, there is a bug, raised by Alexandre [3]. > The > > > issue is that datalog cannot deal with duplicate. Consequently, Marcelo > > > raised the point that we can use simple versions of datalog that can > deal > > > with duplicate solutions. > > > > > > Possible solutions: > > > > > > 1) assume that each table implicitly has a row id which is part of its > set > > > of attributes. The row id is unique. > > > 2) associates to each tuple an annotation that corresponds to the > > > multiplicity of the tuple in the database. This annotation function > > > corresponds to the function card in the definition of the semantics of > > > SPARQL > > > > > > > > > [1] > > > > http://www.w3.org/TR/2010/WD-rdb-direct-mapping-20101118/#rules_table_triples_no_pk > > > [2] > > > > http://www.w3.org/TR/2010/WD-rdb-direct-mapping-20101118/#rules_literal_triples_no_pk > > > [3] > > > > http://lists.w3.org/Archives/Public/public-rdb2rdf-wg/2011Jan/0044.html > > > > > > > > > > > > > > -- > -ericP >
Received on Tuesday, 1 February 2011 06:16:25 UTC