- From: Stasinos Konstantopoulos <konstant@iit.demokritos.gr>
- Date: Mon, 30 Jan 2012 14:30:10 +0200
- To: Government Linked Data Working Group WG <public-gld-wg@w3.org>
All, hi. just some random ranting on the possiblity of more flexible underspecified dates than what has been proposed so far. I am towards recommending Approach 3 myself, although Approach 1 has the merit of perfectly fitting the current practice of using the first day of the month to mean any time during the month. Best, Stasinos Let us say that we want to use the property gld:birthDate to assert that http://konstant.gr/#stasinos was born between Aug 15th and Sep 15th, 1973, but we are unable to provide a specific date. Aproach 1 We define one specificity property for each property that ranges over dates that can potentially be underspefied. This property ranges over xsd:duration and means that the value of the original property should be understood as an unknown xsd:date instance that lies within the interval starting on the date shown by the original property and lasting for the duration shown by the specificity property. In our example we define gld:birthDateSpecificity rdfs:range xsd:duration and specify that, if present, the value of gld:birthDate should be understood as an unknown xsd:date instance that lies within the interval starting on the date shown by gld:birthDate and lasting for the duration shown by gld:birthDateSpecificity: <http://konstant.gr/#stasinos> gld:birthDate "1973-08-15"^^xsd:date ; gld:birthDateSpecificity "P1M"^^xsd:duration . Pros: - It is simple to explain and populate. - It is consistent with current practice of using midnight of the first day of the month (year) to mean an unknown date during that month (year). We can make explicit that a given "1973-01-01" value is actual meant to mean "sometime during 1973" WITHOUT retracting any statements, but by adding a specificity statement of "P1Y"^^xsd:duration. Cons: - It requires a new property for each property that we want to treat. - It distributes a meaning over two properties that are not nested within the same pattern, but are at the same level as other, related properties of the same resource. Aproach 2 Both cons above can be treated by introducing blank nodes (shudders) or genids or whatever name is more palatable as values of gld:birthDate. Such nodes would have properties of their own, restricting the range of possible concrete values they can assume: <http://konstant.gr/#stasinos> gld:birthDate [ rdf:type gld:underSpecifiedDate ; gld:startDate "1973-08-15"^^xsd:date ; gld:specificity "P1M"^^xsd:duration ] . We can, if so inclined, give more formal rigour by making explicit that gld:underSpecifiedDate is an instance inside an inteval and not the whole interval: <http://konstant.gr/#stasinos> gld:birthDate [ rdf:type gld:date ; gld:within [ gld:dateInterval gld:startDate <http://dates.org/1973/08/15> ; gld:specificity "P1M"^^xsd:duration ] ] . Note that the range of gld:birthDate is now a resource (since it has properties of its own) so this breaks compatibility with using xsd:date values when the date is known exactly. Exact dates would have to either be date URIs or be blank nodes with a data property ranging over xsd:date: <http://konstant.gr/#stasinos> gld:birthDate <http://dates.org/1973/09/02> . or <http://konstant.gr/#stasinos> gld:birthDate [ rdf:type gld:date ; gld:hasValue "1973-09-02"^^xsd:date ] . Pros: - Relatively simple to explain - It defines a handful of types and properties that can be used for any property that ranges over dates. gld:date does not need to be a new type, but can be the type of any existing date URI schema. - It collects all the triples about the underspecified date under single reosurce Cons: - Harder to populate than Approach 1 - It breaks compatibility with current practice, even for fully known dates. Approach 3 We define gld:birthDate as a datatype property that ranges over the union of xsd:date and gld:underspecifiedDate. gld:underspecifiedDate is a simple datatype, derived by restricting xsd:string to: DD(SS)? where DD is the lexical space of xsd:date and SS is the lexical space of xsd:duration. Semantics is start date and specificity as above. SS is optional and, if missing, defaults to "P1D" (one day). Examples: <http://konstant.gr/#stasinos> gld:birthDate "1973-08-15P1M"^^gld:underspecifiedDate . The following values are equal (although not identical, so functional properties can have only one): "1973-09-02P1D"^^gld:underspecifiedDate . "1973-09-02"^^gld:underspecifiedDate . "1973-09-02"^^xsd:date . The following values are not equal, as per the definition of xsd:duration that states that no relationship exists between months and days: gld:birthDate "1973-08-15P1M"^^gld:underspecifiedDate . gld:birthDate "1973-08-15P31D"^^gld:underspecifiedDate . Pros: - Relatively simple to explain and populate - It maintains compatibility with xsd:date, although inconsistent with the practice of using midnight of the first day of the month (year) to mean an unknown date during that month (year), as all xsd:date values are interpreted as exact dates. Cons: - Harder to index, as "1973-08-15P1M", "1973-08-15P15D", and "1973-08-15" are all different values. Searching for all documents related to "1973-08-15" requires full-text search with globs; not a hard requirement (e.g., Solr does prefix* globs), but less efficient than searching for exact values. Approach 4 One, rather cumbersome, solution using existing OWL 2 constructs is to not make a direct gld:birthDate assertion, but instead restrict the possible values of this property for this resource, if ever discovered: ClassAssertion( DataAllValuesFrom( gld:birthDate DatatypeRestriction( xs:dateTime xsd:minInclusive "1973-08-15T00:00:00Z"^^xsd:dateTime xsd:maxExclusive "1973-09-16T00:00:00Z"^^xsd:dateTime )) <http://konstant.gr/#stasinos> ) and as RDF triples: <http://konstant.gr/#stasinos> rdf:type [ rdf:type owl:Restriction ; owl:onProperty gld:birthDate ; owl:allValuesFrom [ rdf:type rdfs:Datatype ; owl:onDatatype xsd:dateTime ; owl:withRestrictions ( [ xsd:minInclusive "1973-08-15T00:00:00Z"^^xsd:dateTime ] [ xsd:maxExclusive "1973-09-16T00:00:00Z"^^xsd:dateTime ] ) ] ] . The use of midnight values of xsd:dateTime instead of xsd:date is mandated by the fact that xsd:date does not permit the xsd:minInclusive/xsd:maxExclusive restriction facets. This is, obviously, not something any sane person would suggest that GLD recommends, but goes to show that it is very well possible to formalize a human-readable underspecified date format by transforming to equivalent OWL 2 data.
Received on Monday, 30 January 2012 12:30:46 UTC