Re: Blank nodes must DIE! [ was Re: Blank nodes semantics - existential variables?] from Hugh Glaser on 2020-07-23 (semantic-web@w3.org from July 2020)

From: Hugh Glaser <hugh@glasers.org>
Date: Thu, 23 Jul 2020 23:58:07 +0100
To: Eric Prud'hommeaux <eric@w3.org>
Cc: Antoine Zimmermann <antoine.zimmermann@emse.fr>, Semantic Web <semantic-web@w3.org>, Maxime Lefrançois <maxime.lefrancois@emse.fr>
Message-Id: <2B0A412D-F27C-46AB-9A77-882E3EA3150C@glasers.org>
If I understand correctly.
I will need to add a UCUM parser to my system to be able to process these datatypes, if people send them to me in their RDF?
In fact, I will need a UCUM to RDF converter to be able to "understand" properly what they "mean"?
Does such an animal exist?

It looks to me that UCUM is quite a large vocabulary of units, for a start - what would the URI for the "liter" unit of measurement be, for example?

I'm very happy to have widely adopted standards like this - I just want to keep my Semantic Web processing in the Semantic Web (RDF), and as simple as possible.
Or at least be helped to do that.

Cheers 

> On 23 Jul 2020, at 23:06, Eric Prud'hommeaux <eric@w3.org> wrote:
> 
> On Tue, Jul 21, 2020 at 02:35:02PM +0200, Antoine Zimmermann wrote:
>> Regarding physical quantities, such as "5 inches", etc., my colleague Maxime
>> Lefrançois and myself coauthored a specification for a datatype for physical
>> quantities [1]. It is quite simple: we reuse the Unified Code for Units of
>> Measurement (UCUM), a standard that is used in many scientific applications,
>> and combine it with a number:
>> 
>> <QUANTITY> ::= <NUMBER> <SPACES> <UCUMCODE>
>> <NUMBER> ::= xsd:decimal(('e'|'E')xsd:integer)?
>> 
>> Since UCUM has a well defined semantics, so does our datatype. Better, since
>> UCUM is implemented in many programming languages, my colleague Maxime could
>> easily integrate it into Jena and its SPARQL engine [2].
>> 
>> So, with our Jena fork, one can write:
>> 
>> SELECT ?planet WHERE {
>>  ?planet a ex:Planet;
>>    ex:diameter ?s .
>>  FILTER(?s > "2e11 mm"^^cdt:ucum)
>> }
> 
> I applaud the work to extend XSD's numeric types so that RDF can have standard  measurement types. But why not leverage your work by adding SPARQL support for UCUM types? e.g.
> 
> SELECT ?planet WHERE {
>  ?planet a ex:Planet;
>    ex:diameter ?s .
>  FILTER(?s > "2e11"^^ucum:mm)
> }
> 
> It feels cleaner to me to embed the entire type of the data in the literal's datatype rather than spreading it across an aggregator type (cdt:ucum) and the lexical value (" mm").
> 
> In either case we probably have a union type in the lexical value so we'd have to micro-parse doubles, decimals and integers, but the parsing is easier if the measurement unit is broken out into the end of the datatype URL.
> 
> There are a few UCUM units that aren't viable localnames (e.g. "m/s.s"), but I think we can encode around that (e.g. "m_s.s") in a way that still makes ucum: a practical namespace for datatypes.
> 
> 
>> This works if the size of the planet is encoded as a cdt:ucum, no matter
>> what unit one is using. One can even use "link for Gunter's chain" (unit
>> "[lk_us]"), or "cubic meters per acre" (unit "m3/[acr_us]") [3], which are
>> both units of length.
>> 
>> With some of our industrial partners, we are using this for energy data, and
>> they seem to be very pleased with this approach, compared to an
>> ontology-based approach.
>> 
>> 
>> [1] https://w3id.org/lindt/custom_datatypes#ucum
>> [2] You can try it at https://ci.mines-stetienne.fr/lindt/playground.html
>> [3] Try this query in the playground:
>> 
>> """
>> PREFIX iter: <http://w3id.org/sparql-generate/iter/>
>> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
>> PREFIX cdt: <http://w3id.org/lindt/custom_datatypes#>
>> PREFIX ex: <http://example.org/>
>> 
>> SELECT ?length ?normalized
>> 
>> WHERE{
>> 
>>  VALUES ?position { "2.7e3 m3/[acr_us]"^^cdt:ucum }
>>  # convert to meters
>>  BIND("0 m"^^cdt:ucum + ?position AS ?normalized )
>> 
>> }
>> """
>> 
>> --AZ
>> 
>> Le 17/07/2020 à 01:57, Cox, Simon (L&W, Clayton) a écrit :
>>> Yeah, the atomicity of the chunk is the point. This even applies to
>>> quantities. 25.4mm is *identical* to 1” – they are the same thing. Any
>>> engine that operates with quantities needs to understand that. ’25.4’
>>> and ‘mm’ cannot be separated. Coordinates are slightly more complex but
>>> it comes down to the same thing. A single element within a set of
>>> coordinates that describes a position in space is not independent of the
>>> other numbers in the tuple, or of the coordinate reference system within
>>> which they are expressed. One value should *never* be used independent
>>> of the others. Exactly the same position on the earth will be denoted by
>>> three different numbers if embedded in a different coordinate reference
>>> system. You can only ‘reason’ over them as a group, not individually.
>>> 
>>> *From:*Dan Brickley <danbri@danbri.org>
>>> *Sent:* Thursday, 16 July, 2020 23:58
>>> *To:* Jeen Broekstra <jeen@fastmail.com>
>>> *Cc:* Semantic Web <semantic-web@w3.org>
>>> *Subject:* Re: Blank nodes must DIE! [ was Re: Blank nodes semantics -
>>> existential variables?]
>>> 
>>> …
>>> 
>>> I believe the big appeal of putting it all into the zone we call
>>> "literals" is that you get a kind of atomicity; that chunk of data is
>>> either there, or not there; it is asserted, or not asserted. With a
>>> triples-based (description of a ) data structure you have to be
>>> constantly on your guard that every subset of the full graph pattern is
>>> at least sensible and harmless, even when subsetting these chunks is
>>> often confusing or misleading for data consumers. I can't help wondering
>>> whether notions of graph shapes from shacl, shex (and sparql) could be
>>> exploited to create an RDF-based data format which had atomicity at the
>>> level of entire shapes.
>>> 
>>> Dan
>>> 
>>>    Jeen
>>> 
>> 
>> -- 
>> Antoine Zimmermann
>> Institut Henri Fayol
>> École des Mines de Saint-Étienne
>> 158 cours Fauriel
>> CS 62362
>> 42023 Saint-Étienne Cedex 2
>> France
>> Tél:+33(0)4 77 42 66 03
>> Fax:+33(0)4 77 42 66 66
>> http://www.emse.fr/~zimmermann/
>> Member of team Connected Intelligence, Laboratoire Hubert Curien

-- 
Hugh
023 8061 5652
Received on Thursday, 23 July 2020 22:58:29 UTC