Re: Plain SQL representation(s) of SKOS

On Fri, Jun 26, 2009 at 9:56 AM, Dan Brickley <danbri@danbri.org> wrote:

>
> Hi folks
>
> While SKOS is an application of RDF, and makes use of various RDF features,
> I'd like to ask if any of your have built (or know about) plain SQL
> implementations of SKOS. This is to help advise on some work that is based
> around elaboration of a simpler SQL taxonomy structure. I would like some
> examples to show that are SKOS-friendly, SKOS-centric or SKOS-based ... but
> which doesn't make the complete jump to full unconstrained tables of RDF
> triples. Anything around?


Dan -

SQL isn't really ideal, but this approach kinda sorta works: it is hard to
keep things   clustered, and there can be a lot of database overhead for
short rows.


---

Four tables - Concept, Label, Relationship, and Note

*Concept*
For plain SKOS, Concept just carries a unique concept ID; approaches that
require subclasses of Concept can extend this table.

CREATE TABLE Concept (
       id int PRIMARY KEY
);




*Label*

Label is used to store the labels associated with a concept. Every label is
marked with a flag to indicate the language it is encoded in, and the kind
of label it is - for example, preferred, alternate, and historical). There
are two uniqueness constraints that we need to enforce.

One is the restriction that a concept can only have one preferred label per
concept per language; the other must ensure that a given string is not used
as label for more than one concept, *unless* the label is no longer current.
In postgresql, these constraints can be implemented using partial indexes.

Because of the uniqueness contraints, Label should not be used for
classification codes.

Keeping all the labels in a single table makes it easier to search for a
label when the type is not known.  The best order to arrange this table
depends on how the database is likely to be used;  if a typical use involves
displaying an entry (thus retrieving all the labels for a concept, it's
better to order the table by concept; if a more typical use is displaying a
tree of preferred labels, then it is better to order the table by kind.   In
both situations, labels should ideally be output in as close to the order in
which they will be accessed as possible (breadth-first is probably better
than depth first).

CREATE TABLE Label (
       concept_id int  REFERENCES Concept(id)  ON DELETE CASCADE NOT NULL,
       kind char(1) NOT NULL, -- Preferred, Alternate, Historical
       lang text  NOT NULL,
       label text NOT NULL
);
CREATE INDEX Label_label_idx ON Label (label);
CREATE INDEX Label_concept_id_label_type_idx ON Label (concept_id,kind);
CREATE UNIQUE INDEX Label_preferred_label_idx ON Label (concept_id,lang)
       WHERE kind = 'P';
CREATE UNIQUE INDEX Label_not_historic_idx ON Label (label,lang)
       WHERE kind <> 'H';

*Relationship*

Relationship contains three elements; a source, a destination, and a flag
indicating the kind of relationship being marked  This reminds me of
something - can't thing what.   An entry should be created in each
direction, with the table clustered by source concept id;  this groups
together all the relationships for a given concept.

CREATE TABLE SemanticRelationship (
       src int REFERENCES Concept(id)  ON DELETE CASCADE NOT NULL,
       dst int REFERENCES Concept(id)  ON DELETE CASCADE NOT NULL,
       kind char(1) NOT NULL, -- Broader,Narrower,Related,
       UNIQUE (kind,src,dst)
);

CREATE INDEX SemanticRelationship_src_idx ON SemanticRelationship (src);

*Note*

Note is used to carry all of the plain text notes associated with a
concept.  All kinds of notes are stored in the same table, allowing them to
be searched using a single index.

CREATE TABLE Note (
       id int PRIMARY KEY,
       concept_id int REFERENCES Concept(id)  ON DELETE CASCADE NOT NULL,
       kind char(1) NOT NULL, -- definition, scope note, etc.
       lang TEXT NOT NULL,
       note TEXT NOT NULL
);

CREATE INDEX Note_concept_id ON Note (concept_id);
CREATE INDEX Note_note_idx ON Note (note);

Received on Friday, 26 June 2009 17:43:07 UTC