Comments on use case "representing entities and facts extracted from text"

Hi Davide ... more good work on the representing entities and facts extracted from text use case<http://w3c.github.io/csvw/use-cases-and-requirements/#UC-RepresentingEntitiesAndFactsExtractedFromText>.

Editorial comments:

Paragraph 3; I think you refer to "rows" with different numbers of fields - by their nature, triples don't vary in length :-)

There are a number of other interesting things to pick out here ...

1)      Each row describes a specific entity - in the example, this entity is identified as ":e4". ":e4" appears to be a primary key - but how do we know this?

2)      ":e4" is an ambiguous local identifier (as are ":e7" and ":e9") - how can one make the identifier an unambiguous URI?

3)      Entity ":e4" references entities ":e7" and ":e9" - these appear to be (foreign key) references to other entities described in this table? (or externally?)

4)      Entity ":e4" is the subject of _many_ rows - meaning that many rows can be combined to make a composite set of statements about this entity (I don't think any of our other examples do this yet)

5)      The identifiers used for the predicates (e.g. type, mention, per:siblings etc.) are ambiguous local identifiers - how can one make the identifier an unambiguous URI?

6)      "PER" appears to be a term from a controlled vocabulary. How do we know which controlled vocabulary it is a member of and what its authoritative definition is?

7)      The identifiers "D00124" and "D00101" are ambiguous local identifiers ... (OK, you get the idea!)

8)      Page number ranges (e.g. "180-181") are clearly valid only in the context of the preceding document identifier. The interesting assertion about provenance is the reference (document plus page range). Thus we might want to give the _reference_ a unique identifier comprising from document ID and page range (e.g. D00124#180-181)

9)      A single row in the table comprises a triple (subject-predicate-object), one or more provenance references and an optional certainty measure. The provenance references have been normalised for compactness (e.g. so they fit on a single row). However, each provenance statement has the same target triple so one could unbundle the composite row into multiple simple statements that have a regular number of columns (see below)

{snip}
:e4 per:age      "10"    D00124 180-181 173-179 182-191 0.9
:e4 per:parent   :e9     D00124 180-181 381-380 399-406 D00101 220-225 230-233 201-210
{snip}

{snip}
:e4 per:age      "10"    D00124 180-181 0.9
:e4 per:age      "10"    D00124 173-179 0.9
:e4 per:age      "10"    D00124 182-191 0.9
:e4 per:parent   :e9     D00124 180-181
:e4 per:parent   :e9     D00124 381-380
:e4 per:parent   :e9     D00124 399-406
:e4 per:parent   :e9     D00101 220-225
:e4 per:parent   :e9     D00101 230-233
:e4 per:parent   :e9     D00101 201-210
{snip}

I think that there a number of requirements present in this example that are yet to be mentioned.

Finally, I don't see how requirement MissingValueDefinition<http://w3c.github.io/csvw/use-cases-and-requirements/#R-MissingValueDefinition> is relevant here.

Regards, Jeremy

Received on Monday, 10 March 2014 10:59:15 UTC