RE: Blank nodes must DIE! [ was Re: Blank nodes semantics - existential variables?] from Cox, Simon (L&W, Clayton) on 2020-07-26 (semantic-web@w3.org from July 2020)

From: Cox, Simon (L&W, Clayton) <Simon.Cox@csiro.au>
Date: Sun, 26 Jul 2020 10:52:14 +0000
To: Eric Prud'hommeaux <eric@w3.org>
CC: Hugh Glaser <hugh@glasers.org>, Antoine Zimmermann <antoine.zimmermann@emse.fr>, Semantic Web <semantic-web@w3.org>, Maxime Lefrançois <maxime.lefrancois@emse.fr>
Message-ID: <ME2PR01MB2882D0335B0CFB26F38314B288750@ME2PR01MB2882.ausprd01.prod.outlook.com>
> wipe:mercury ex:diameter "4879.4"^^ucum:km , "3031.9"^^ucum:mi .
> wipe:saturn  ex:diameter "1.1e5"^^ucum:km , "72367"^^ucum:mi .
> wipe:jupiter ex:diameter "139822"^^ucum:km , "86881"^^ucum:mi .

This style is a bit like language tags on string-literals - but here it is "unit tags" on number-literals. 
It looks like a nice parallel: "air"@en denotes a very different thing to "air"@ms  , 
and "2.5^^ucum:mi is very different to "2.5"@ucum:km . 

But do we need to remember and avoid the processing issues that came from language tags? 

Simon 

> -----Original Message-----
> From: Eric Prud'hommeaux <eric@w3.org>
> Sent: Saturday, 25 July, 2020 01:38
> To: Cox, Simon (L&W, Clayton) <Simon.Cox@csiro.au>
> Cc: Hugh Glaser <hugh@glasers.org>; Antoine Zimmermann
> <antoine.zimmermann@emse.fr>; Semantic Web <semantic-web@w3.org>;
> Maxime Lefrançois <maxime.lefrancois@emse.fr>
> Subject: Re: Blank nodes must DIE! [ was Re: Blank nodes semantics -
> existential variables?]
> 
> You'll need a microparsing regardless, and, as Simon points out, it's not that
> onerous. My point was just that having to microparse union types out of the
> same literal as a UCUM type is more complicated than parsing the UCUM
> type out of the literal's datatype. Having numeric types separated from their
> units would allow SPARQL 1.1 queries to avoid cracking the literal form, e.g.
> 
> Data:
> [[
> PREFIX ex: <http://a.example/astro#>
> PREFIX ucum: <http://ucum.nlm.nih.gov/#> PREFIX wipe:
> <https://en.wikipedia.org/wiki/>
> 
> wipe:mercury ex:diameter "4879.4"^^ucum:km , "3031.9"^^ucum:mi .
> wipe:saturn  ex:diameter "1.1e5"^^ucum:km , "72367"^^ucum:mi .
> wipe:jupiter ex:diameter "139822"^^ucum:km , "86881"^^ucum:mi .
> ]]
> 
> Query:
> [[
> PREFIX ex: <http://a.example/astro#>
> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
> PREFIX ucum: <http://ucum.nlm.nih.gov/#>
> 
> SELECT ?planet WHERE {
>  ?planet ex:diameter ?d .
>  FILTER(datatype(?d) = ucum:km
>      && xsd:float(str(?d)) > 1E5)
> }
> ]]
> 
> Results:
> ┌─────────────────────────────────────────┐
> │ ?planet                                 │
> │  <https://en.wikipedia.org/wiki/saturn> │ │
> <https://en.wikipedia.org/wiki/jupiter> │
> └─────────────────────────────────────────┘
> 
> Here, SPARQL takes care of parsing value types for us ("1.1e5" or "139822")
> so the query is safe (datatype(?d) = ucum:km) and pretty easy to compose.
> The same query is substantially more tedious with data like:
>   wipe:saturn  ex:diameter "1.1e5 km"^^cdt:ucum , "72367 mi"^^cdt:ucum .
> which is likely to lead to unsafe shortcuts.
> 
> So Maxime and Antionne, how hard would it be to transplant your semantics
> to apply directly to ucum (presuming their cooperation).
> 
> 
> On Fri, Jul 24, 2020 at 12:12:34AM +0000, Cox, Simon (L&W, Clayton) wrote:
> > Yes you would need a UCUM parser.
> >
> > Note however that UCUM is not a "large vocabulary".
> > There is a relatively small set of terminals here
> http://unitsofmeasure.org/ucum-essence.xml , and a rule to combine these
> into a countably infinite set.
> > The rule is described here:
> > http://unitsofmeasure.org/ucum.html#section-Syntax-Rules

> >
> > There are a number of implementations listed here
> https://unitsofmeasure.org/trac at 'Implementation Support'.
> > This documentation has not been updated for about 3 years, so some of
> the links might be stale, and there may be others.
> >
> > A units-of-measure library, with UCUM support, that was available to be
> integrated into RDF applications would be a significant contribution to the
> community.
> >
> > Simon
> >
> > > -----Original Message-----
> > > From: Hugh Glaser <hugh@glasers.org>
> > > Sent: Friday, 24 July, 2020 08:58
> > > To: Eric Prud'hommeaux <eric@w3.org>
> > > Cc: Antoine Zimmermann <antoine.zimmermann@emse.fr>; Semantic
> Web
> > > <semantic-web@w3.org>; Maxime Lefrançois
> <maxime.lefrancois@emse.fr>
> > > Subject: Re: Blank nodes must DIE! [ was Re: Blank nodes semantics -
> > > existential variables?]
> > >
> > > If I understand correctly.
> > > I will need to add a UCUM parser to my system to be able to process
> > > these datatypes, if people send them to me in their RDF?
> > > In fact, I will need a UCUM to RDF converter to be able to "understand"
> > > properly what they "mean"?
> > > Does such an animal exist?
> > >
> > > It looks to me that UCUM is quite a large vocabulary of units, for a
> > > start - what would the URI for the "liter" unit of measurement be, for
> example?
> > >
> > > I'm very happy to have widely adopted standards like this - I just
> > > want to keep my Semantic Web processing in the Semantic Web (RDF),
> > > and as simple as possible.
> > > Or at least be helped to do that.
> > >
> > > Cheers
> > >
> > > > On 23 Jul 2020, at 23:06, Eric Prud'hommeaux <eric@w3.org> wrote:
> > > >
> > > > On Tue, Jul 21, 2020 at 02:35:02PM +0200, Antoine Zimmermann wrote:
> > > >> Regarding physical quantities, such as "5 inches", etc., my
> > > >> colleague Maxime Lefrançois and myself coauthored a specification
> > > >> for a datatype for physical quantities [1]. It is quite simple:
> > > >> we reuse the Unified Code for Units of Measurement (UCUM), a
> > > >> standard that is used in many scientific applications, and combine it
> with a number:
> > > >>
> > > >> <QUANTITY> ::= <NUMBER> <SPACES> <UCUMCODE> <NUMBER> ::=
> > > >> xsd:decimal(('e'|'E')xsd:integer)?
> > > >>
> > > >> Since UCUM has a well defined semantics, so does our datatype.
> > > >> Better, since UCUM is implemented in many programming languages,
> > > >> my colleague Maxime could easily integrate it into Jena and its
> > > >> SPARQL engine
> > > [2].
> > > >>
> > > >> So, with our Jena fork, one can write:
> > > >>
> > > >> SELECT ?planet WHERE {
> > > >>  ?planet a ex:Planet;
> > > >>    ex:diameter ?s .
> > > >>  FILTER(?s > "2e11 mm"^^cdt:ucum) }
> > > >
> > > > I applaud the work to extend XSD's numeric types so that RDF can
> > > > have
> > > standard  measurement types. But why not leverage your work by
> > > adding SPARQL support for UCUM types? e.g.
> > > >
> > > > SELECT ?planet WHERE {
> > > >  ?planet a ex:Planet;
> > > >    ex:diameter ?s .
> > > >  FILTER(?s > "2e11"^^ucum:mm)
> > > > }
> > > >
> > > > It feels cleaner to me to embed the entire type of the data in the
> > > > literal's
> > > datatype rather than spreading it across an aggregator type
> > > (cdt:ucum) and the lexical value (" mm").
> > > >
> > > > In either case we probably have a union type in the lexical value
> > > > so we'd
> > > have to micro-parse doubles, decimals and integers, but the parsing
> > > is easier if the measurement unit is broken out into the end of the
> datatype URL.
> > > >
> > > > There are a few UCUM units that aren't viable localnames (e.g.
> > > > "m/s.s"),
> > > but I think we can encode around that (e.g. "m_s.s") in a way that
> > > still makes
> > > ucum: a practical namespace for datatypes.
> > > >
> > > >
> > > >> This works if the size of the planet is encoded as a cdt:ucum, no
> > > >> matter what unit one is using. One can even use "link for
> > > >> Gunter's chain" (unit "[lk_us]"), or "cubic meters per acre"
> > > >> (unit
> > > >> "m3/[acr_us]") [3], which are both units of length.
> > > >>
> > > >> With some of our industrial partners, we are using this for
> > > >> energy data, and they seem to be very pleased with this approach,
> > > >> compared to an ontology-based approach.
> > > >>
> > > >>
> > > >> [1] https://w3id.org/lindt/custom_datatypes#ucum

> > > >> [2] You can try it at
> > > >> https://ci.mines-stetienne.fr/lindt/playground.html

> > > >> [3] Try this query in the playground:
> > > >>
> > > >> """
> > > >> PREFIX iter: <http://w3id.org/sparql-generate/iter/>
> > > >> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
> > > >> PREFIX cdt: <http://w3id.org/lindt/custom_datatypes#>
> > > >> PREFIX ex: <http://example.org/>
> > > >>
> > > >> SELECT ?length ?normalized
> > > >>
> > > >> WHERE{
> > > >>
> > > >>  VALUES ?position { "2.7e3 m3/[acr_us]"^^cdt:ucum }  # convert to
> > > >> meters
> > > >>  BIND("0 m"^^cdt:ucum + ?position AS ?normalized )
> > > >>
> > > >> }
> > > >> """
> > > >>
> > > >> --AZ
> > > >>
> > > >> Le 17/07/2020 à 01:57, Cox, Simon (L&W, Clayton) a écrit :
> > > >>> Yeah, the atomicity of the chunk is the point. This even applies
> > > >>> to quantities. 25.4mm is *identical* to 1” – they are the same thing.
> > > >>> Any engine that operates with quantities needs to understand that.
> > > ’25.4’
> > > >>> and ‘mm’ cannot be separated. Coordinates are slightly more
> > > >>> complex but it comes down to the same thing. A single element
> > > >>> within a set of coordinates that describes a position in space
> > > >>> is not independent of the other numbers in the tuple, or of the
> > > >>> coordinate reference system within which they are expressed. One
> > > >>> value should *never* be used independent of the others. Exactly
> > > >>> the same position on the earth will be denoted by three
> > > >>> different numbers if embedded in a different coordinate
> > > >>> reference system. You can only ‘reason’ over them
> > > as a group, not individually.
> > > >>>
> > > >>> *From:*Dan Brickley <danbri@danbri.org>
> > > >>> *Sent:* Thursday, 16 July, 2020 23:58
> > > >>> *To:* Jeen Broekstra <jeen@fastmail.com>
> > > >>> *Cc:* Semantic Web <semantic-web@w3.org>
> > > >>> *Subject:* Re: Blank nodes must DIE! [ was Re: Blank nodes
> > > >>> semantics
> > > >>> - existential variables?]
> > > >>>
> > > >>> …
> > > >>>
> > > >>> I believe the big appeal of putting it all into the zone we call
> > > >>> "literals" is that you get a kind of atomicity; that chunk of
> > > >>> data is either there, or not there; it is asserted, or not
> > > >>> asserted. With a triples-based (description of a ) data
> > > >>> structure you have to be constantly on your guard that every
> > > >>> subset of the full graph pattern is at least sensible and
> > > >>> harmless, even when subsetting these chunks is often confusing
> > > >>> or misleading for data consumers. I can't help wondering whether
> > > >>> notions of graph shapes from shacl, shex (and
> > > >>> sparql) could be exploited to create an RDF-based data format
> > > >>> which had atomicity at the level of entire shapes.
> > > >>>
> > > >>> Dan
> > > >>>
> > > >>>    Jeen
> > > >>>
> > > >>
> > > >> --
> > > >> Antoine Zimmermann
> > > >> Institut Henri Fayol
> > > >> École des Mines de Saint-Étienne
> > > >> 158 cours Fauriel
> > > >> CS 62362
> > > >> 42023 Saint-Étienne Cedex 2
> > > >> France
> > > >> Tél:+33(0)4 77 42 66 03
> > > >> Fax:+33(0)4 77 42 66 66
> > > >> http://www.emse.fr/~zimmermann/

> > > >> Member of team Connected Intelligence, Laboratoire Hubert Curien
> > >
> > > --
> > > Hugh
> > > 023 8061 5652
> > >
> >
Received on Sunday, 26 July 2020 10:52:52 UTC