Re: How to link to Companies House (UK) data?

On 16 January 2017 at 14:17, Matt Wallis <matt@solidarityeconomics.org>
wrote:

> Hi Phil,
> Thank you very much for your reply.
> Your suggestion is very interesting, and it points to a common issue I
> have in data modelling in the LOD world: I come from a programming
> background, and it is very easy for me unhelpfully to fall back on my
> experience of strictly typed languages (like C++), in which classes are
> user-defined data types, and in which each object is 'typed' by being an
> instance of exactly one Class. But I'm now finding it more helpful to see
> classes in RDF as mathematical sets - an object can belong to more than one
> set.
>
> My programming background also leads me to ponder the efficiency of
> queries made on my dataset. For example, with your suggestion, by declaring
>
> <http://business.data.gov.uk/id/company/08209948>
> <http://business.data.gov.uk/id/company/08209948> a
> org:FormalOrganization .
>
> it means that one needs to look inside <http://business.data.gov.uk/
> id/company/08209948> <http://business.data.gov.uk/id/company/08209948> to
> find out if it has the properties of a rov:RegisteredOrganization. But then
> I guess that is always the case - even if I had declared
>
> <http://business.data.gov.uk/id/company/08209948>
> <http://business.data.gov.uk/id/company/08209948> a
> rov:RegisteredOrganization
>
> there is still no guarantee that the properties that I'm interested in
> will have been defined for that particular rov:RegisteredOrganization
> instance. Again, I am getting my head around the difference between LOD
> data models and user-defined types in programming languages.
>
> I may well ask another question on this list about how best to organize a
> triple store to provide efficient queries: in particular, how much data
> from other datasets would one replicate in one's own triple store, and are
> there standard best practices for cacheing that avoid explicit replication.
>

Things in RDF can have a number of types.  Some are declared explicitly,
some are inferred, e.g. if I say <#alice> foaf : knows <#bob> from the
vocabulary you can infer that alice and bob are a foaf : Person , foaf :
Agent.

There's also a special distinction in RDF between documents and things,
normally delineated by the # character.  The basic reason for this is that
there's lots of HTTP meta data associated with a document and you would
ideally want to cleanly separate that data from the content type data, or
else you run the risk of collisions, which can come along in ways
unexpected, and become difficult to change later on.  Mainly because URLs
once published on the web are hard to change.

The pattern I tend to use personally (which I dont think companies house is
using) is as follows.

Have a document (non hash) URL as a 'container' for some data.  Think of a
URL as a name, its easier to see perhaps in files, we know that Person.cpp
is a filename and not a Class.  Same is true on the web.

So let's say you have

Container:  https://example.com/company1

I'd then use the primary topic pattern to say what is the main class inside
that URI. Often the keyword <#this> is used similar to programming
languages.  Id suggest each http document having one main class as a clean
way to model.

So you'll have

<> foaf : primaryTopic <#this> .

<#this> a Class1, Class2 ;
  key1 ; property1 ;
  key2 ; property2 ;
  key3 ; property3 .

Perhaps this is helpful to someone familiar with C++?



>
> Best regards
> Matt
>
>
> On 13/01/2017 10:48, Phil Archer wrote:
>
> Hi Matt,
>
> Welcome along. I think in the case you give, the predicate you're looking
> for is 'a', i.e. your object is both a my:Thing and a
> rov:RegisteredOrganization (or org:FormalOrganization as you wish).
>
> The latter is a reference to the ORG Ontology,
> https://www.w3.org/TR/vocab-org/ which is what you probably want. So I'd
> do this:
>
> <http://business.data.gov.uk/id/company/08209948>
> <http://business.data.gov.uk/id/company/08209948> a
> org:FormalOrganization, my:Thing .
>
> Make sense?
>
> Phil
>
>
> On 12/01/2017 13:28, Matt Wallis wrote:
>
> As a relative newcomer to LOD (first post here), I have a very basic
> question: How to link from an object specified in RDF to a Companies
> House URI for data about a particular registered company?
>
> Suppose, for example that I have a class, my:Thing, and that some of
> these Things are also registered companies. I want to provide a link
> from an instance of my:Thing to the data held by Companies House.
> Let's suppose that the Companies House URI is
> http://business.data.gov.uk/id/company/08209948.
>
> Is there an existing predicate that I can simply add to my resource
> description? Like this:
>
>    my:object a my:Thing .
>    my:object predicate <http://business.data.gov.uk/id/company/08209948>
> <http://business.data.gov.uk/id/company/08209948> .
>
> Or do I need to modify the definition of the my:Thing class in order to
> provide this link? If so how?
>
> An extra requirement is that I don't want the mechanism to be
> UK-specific. I see from the Companies House data model
> (http://business.data.gov.uk/companies/docs/data-model-reference.html)
> that it uses the Registered Organization Vocabulary
> (https://www.w3.org/TR/vocab-regorg/) which is not UK-specific. In
> particular:
>
>    http://business.data.gov.uk/companies/def/terms/RegisteredCompany
>    rdfs:subClassOf http://www.w3.org/ns/regorg#RegisteredOrganization
>
> So I'm hoping that there's a straightforward way for the linkage
> mechanism to work without it being UK-specific.
>
> In case it is relevant, my:Thing is actually
> http://purl.org/solidarityeconomics/experimental/essglobal/vocab/
> SSEInitiative
> ..
>
>
>
>
> --
> Matt Wallis
> Institute for Solidarity Economicshttp://www.solidarityeconomics.org
>
>

Received on Monday, 16 January 2017 16:56:07 UTC