Re: Blank nodes must DIE! [ was Re: Blank nodes semantics - existential variables?] from Hugh Glaser on 2020-07-24 (semantic-web@w3.org from July 2020)

From: Hugh Glaser <hugh@glasers.org>
Date: Fri, 24 Jul 2020 19:43:31 +0100
To: "Cox, Simon (L&W, Clayton)" <Simon.Cox@csiro.au>
Cc: Eric Prud'hommeaux <eric@w3.org>, Antoine Zimmermann <antoine.zimmermann@emse.fr>, Semantic Web <semantic-web@w3.org>, Maxime Lefrançois <maxime.lefrancois@emse.fr>
Message-Id: <6815E4DF-F8D3-4B75-B355-F8072E19F817@glasers.org>
Thanks.

So now I am trying to get my head around how this will fit into my Linked Data processing.
And also, to try to understand just what the more general thing that is being proposed here is.
In particular, where does it stop?
We can keep encoding knowledge into the literals like this, with the same arguments about atomicity etc., until there is little RDF left?

I will need a micro parser, so that when I am processing RDF, the code can either interpret these literals directly, or convert them and assert them into RDF (possibly in a canonical unit for?), so they can just be processed in the same way as all the other knowledge I am processing.
This seems quite a big change to the system to me, but maybe the value is worth it.

Also, what about other literals?
Presumably, by implying the same approach and implementation structure, we can have:
Literals for telephone numbers (ITU compliant),
ISO 19160 or whatever it is for addresses.
Librarians would presumably like a structured literal for peoples names.
Can I have a similar process for URIs as literals, perhaps.

In addition to a micro-parser from the UCUM literal to RDF, presumably the inverse would be really useful:- given a well-formed (in UCUM ontology terms) RDF structure, for example as the result of resolving a Linked Data URI, I would need to create the UCUM literal that it corresponds to.

Am I on the right lines?

> On 24 Jul 2020, at 01:12, Cox, Simon (L&W, Clayton) <Simon.Cox@csiro.au> wrote:
> 
> Yes you would need a UCUM parser. 
> 
> Note however that UCUM is not a "large vocabulary". 
> There is a relatively small set of terminals here http://unitsofmeasure.org/ucum-essence.xml , and a rule to combine these into a countably infinite set. 
> The rule is described here: http://unitsofmeasure.org/ucum.html#section-Syntax-Rules 
> 
> There are a number of implementations listed here https://unitsofmeasure.org/trac at 'Implementation Support'. 
> This documentation has not been updated for about 3 years, so some of the links might be stale, and there may be others.   
> 
> A units-of-measure library, with UCUM support, that was available to be integrated into RDF applications would be a significant contribution to the community. 
> 
> Simon 
> 
>> -----Original Message-----
>> From: Hugh Glaser <hugh@glasers.org>
>> Sent: Friday, 24 July, 2020 08:58
>> To: Eric Prud'hommeaux <eric@w3.org>
>> Cc: Antoine Zimmermann <antoine.zimmermann@emse.fr>; Semantic Web
>> <semantic-web@w3.org>; Maxime Lefrançois
>> <maxime.lefrancois@emse.fr>
>> Subject: Re: Blank nodes must DIE! [ was Re: Blank nodes semantics -
>> existential variables?]
>> 
>> If I understand correctly.
>> I will need to add a UCUM parser to my system to be able to process these
>> datatypes, if people send them to me in their RDF?
>> In fact, I will need a UCUM to RDF converter to be able to "understand"
>> properly what they "mean"?
>> Does such an animal exist?
>> 
>> It looks to me that UCUM is quite a large vocabulary of units, for a start -
>> what would the URI for the "liter" unit of measurement be, for example?
>> 
>> I'm very happy to have widely adopted standards like this - I just want to
>> keep my Semantic Web processing in the Semantic Web (RDF), and as simple
>> as possible.
>> Or at least be helped to do that.
>> 
>> Cheers
>> 
>>> On 23 Jul 2020, at 23:06, Eric Prud'hommeaux <eric@w3.org> wrote:
>>> 
>>> On Tue, Jul 21, 2020 at 02:35:02PM +0200, Antoine Zimmermann wrote:
>>>> Regarding physical quantities, such as "5 inches", etc., my colleague
>>>> Maxime Lefrançois and myself coauthored a specification for a
>>>> datatype for physical quantities [1]. It is quite simple: we reuse
>>>> the Unified Code for Units of Measurement (UCUM), a standard that is
>>>> used in many scientific applications, and combine it with a number:
>>>> 
>>>> <QUANTITY> ::= <NUMBER> <SPACES> <UCUMCODE> <NUMBER> ::=
>>>> xsd:decimal(('e'|'E')xsd:integer)?
>>>> 
>>>> Since UCUM has a well defined semantics, so does our datatype.
>>>> Better, since UCUM is implemented in many programming languages, my
>>>> colleague Maxime could easily integrate it into Jena and its SPARQL engine
>> [2].
>>>> 
>>>> So, with our Jena fork, one can write:
>>>> 
>>>> SELECT ?planet WHERE {
>>>> ?planet a ex:Planet;
>>>>   ex:diameter ?s .
>>>> FILTER(?s > "2e11 mm"^^cdt:ucum)
>>>> }
>>> 
>>> I applaud the work to extend XSD's numeric types so that RDF can have
>> standard  measurement types. But why not leverage your work by adding
>> SPARQL support for UCUM types? e.g.
>>> 
>>> SELECT ?planet WHERE {
>>> ?planet a ex:Planet;
>>>   ex:diameter ?s .
>>> FILTER(?s > "2e11"^^ucum:mm)
>>> }
>>> 
>>> It feels cleaner to me to embed the entire type of the data in the literal's
>> datatype rather than spreading it across an aggregator type (cdt:ucum) and
>> the lexical value (" mm").
>>> 
>>> In either case we probably have a union type in the lexical value so we'd
>> have to micro-parse doubles, decimals and integers, but the parsing is easier
>> if the measurement unit is broken out into the end of the datatype URL.
>>> 
>>> There are a few UCUM units that aren't viable localnames (e.g. "m/s.s"),
>> but I think we can encode around that (e.g. "m_s.s") in a way that still makes
>> ucum: a practical namespace for datatypes.
>>> 
>>> 
>>>> This works if the size of the planet is encoded as a cdt:ucum, no
>>>> matter what unit one is using. One can even use "link for Gunter's
>>>> chain" (unit "[lk_us]"), or "cubic meters per acre" (unit
>>>> "m3/[acr_us]") [3], which are both units of length.
>>>> 
>>>> With some of our industrial partners, we are using this for energy
>>>> data, and they seem to be very pleased with this approach, compared
>>>> to an ontology-based approach.
>>>> 
>>>> 
>>>> [1] https://w3id.org/lindt/custom_datatypes#ucum
>>>> [2] You can try it at
>>>> https://ci.mines-stetienne.fr/lindt/playground.html
>>>> [3] Try this query in the playground:
>>>> 
>>>> """
>>>> PREFIX iter: <http://w3id.org/sparql-generate/iter/>
>>>> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
>>>> PREFIX cdt: <http://w3id.org/lindt/custom_datatypes#>
>>>> PREFIX ex: <http://example.org/>
>>>> 
>>>> SELECT ?length ?normalized
>>>> 
>>>> WHERE{
>>>> 
>>>> VALUES ?position { "2.7e3 m3/[acr_us]"^^cdt:ucum }  # convert to
>>>> meters
>>>> BIND("0 m"^^cdt:ucum + ?position AS ?normalized )
>>>> 
>>>> }
>>>> """
>>>> 
>>>> --AZ
>>>> 
>>>> Le 17/07/2020 à 01:57, Cox, Simon (L&W, Clayton) a écrit :
>>>>> Yeah, the atomicity of the chunk is the point. This even applies to
>>>>> quantities. 25.4mm is *identical* to 1” – they are the same thing.
>>>>> Any engine that operates with quantities needs to understand that.
>> ’25.4’
>>>>> and ‘mm’ cannot be separated. Coordinates are slightly more complex
>>>>> but it comes down to the same thing. A single element within a set
>>>>> of coordinates that describes a position in space is not independent
>>>>> of the other numbers in the tuple, or of the coordinate reference
>>>>> system within which they are expressed. One value should *never* be
>>>>> used independent of the others. Exactly the same position on the
>>>>> earth will be denoted by three different numbers if embedded in a
>>>>> different coordinate reference system. You can only ‘reason’ over them
>> as a group, not individually.
>>>>> 
>>>>> *From:*Dan Brickley <danbri@danbri.org>
>>>>> *Sent:* Thursday, 16 July, 2020 23:58
>>>>> *To:* Jeen Broekstra <jeen@fastmail.com>
>>>>> *Cc:* Semantic Web <semantic-web@w3.org>
>>>>> *Subject:* Re: Blank nodes must DIE! [ was Re: Blank nodes semantics
>>>>> - existential variables?]
>>>>> 
>>>>> …
>>>>> 
>>>>> I believe the big appeal of putting it all into the zone we call
>>>>> "literals" is that you get a kind of atomicity; that chunk of data
>>>>> is either there, or not there; it is asserted, or not asserted. With
>>>>> a triples-based (description of a ) data structure you have to be
>>>>> constantly on your guard that every subset of the full graph pattern
>>>>> is at least sensible and harmless, even when subsetting these chunks
>>>>> is often confusing or misleading for data consumers. I can't help
>>>>> wondering whether notions of graph shapes from shacl, shex (and
>>>>> sparql) could be exploited to create an RDF-based data format which
>>>>> had atomicity at the level of entire shapes.
>>>>> 
>>>>> Dan
>>>>> 
>>>>>   Jeen
>>>>> 
>>>> 
>>>> --
>>>> Antoine Zimmermann
>>>> Institut Henri Fayol
>>>> École des Mines de Saint-Étienne
>>>> 158 cours Fauriel
>>>> CS 62362
>>>> 42023 Saint-Étienne Cedex 2
>>>> France
>>>> Tél:+33(0)4 77 42 66 03
>>>> Fax:+33(0)4 77 42 66 66
>>>> http://www.emse.fr/~zimmermann/
>>>> Member of team Connected Intelligence, Laboratoire Hubert Curien
>> 
>> --
>> Hugh
>> 023 8061 5652
>> 
> 

-- 
Hugh
023 8061 5652
Received on Friday, 24 July 2020 18:43:56 UTC