Re: Blank nodes must DIE! [ was Re: Blank nodes semantics - existential variables?]

Le 27/07/2020 à 22:02, Martynas Jusevičius a écrit :
> Hi Antoine,
> 
> you have a vocabulary that you'd like users to adopt. But not only
> that, they would also have to adopt a software library specific to
> that vocabulary (the Jena fork), because your syntax does not use
> standard SPARQL datatype URIs. Am I getting it right?

Half right. You do not depend on the Jena fork at all. We have a 
specification of a datatype that anyone can implement in their own way, 
and we have a proof-of-concept implementation that happens to be a Jena 
fork. If you usually do semantic web stuff with Jena, integrating our 
fork is really straightforward, but it comes at your own risks. We are 
not selling a product with guarantees, nor even with strong quality 
control. It just works to the level needed to support our claims about 
the usefulness of the datatype. We wish the datatype, and not its 
implementation, get enough attention to be implemented by true 
professional developers that are making money out of their coding skills.

> I'm all for new vocabularies, but having to depend on your Jena fork
> in our projects would be a no-go for me. If you want broad adoption,
> standard SPARQL is the only way IMO.

GeoSPARQL has broad adoption, I believe. It's integrating additional 
datatypes and special functions. You need a special implementation to 
work with it. However, it is an OGC standard, so it's backed up by 
organisations and companies. The implementations have not been put 
together in just a few days as proofs-of-concepts. You can trust them 
more than our own cdt:ucum implementation.

But as far as I'm concerned, what I am defending here is the datatype 
cdt:ucum, not our implementation. But a specification is not 
demonstrative, so we needed the implementation to show something in 
action. You're free to distrust our Jena fork, even abhor it, but you 
may still consider integrating cdt:ucum into your environment somehow 
(though I think Maxime did a great job that is extremely effective, even 
if very quickly put together).

> 
> A side question (haven't followed the thread very much): how does UCUM
> compare to QUDT? https://qudt.org/

UCUM is the Unified Code of Unit of Measurements. It is a wait to write 
units (all units) in a way that can be transmitted electronically (such 
as, in an email, or in a CSV file, or JSON, or in Excel, or in an IP 
packet sent by a sensor). QUDT is a Web ontology for the domain of 
measurements, units of measures, quantities, and related stuff. It does 
not define any code for units and could be used together with UCUM, by 
the way. But it can also express the precision of a measurement, the 
tools used to measure a quantity, and much more.

cdt:ucum is only a way of expressing physical quantities and nothing 
else, in a literal.

--AZ


> 
> Martynas
> 
> On Fri, Jul 24, 2020 at 9:49 PM Antoine Zimmermann
> <antoine.zimmermann@emse.fr> wrote:
>>
>> Hugh,
>>
>> I will answer as if you are asking about our own proposal (cdt:ucum),
>> and not Eric's proposal (ucum:m, ucum:W, ucum:s, etc).
>>
>> Le 24/07/2020 à 00:58, Hugh Glaser a écrit :
>>> If I understand correctly.
>>> I will need to add a UCUM parser to my system to be able to process these datatypes, if people send them to me in their RDF?
>>
>> Yes, exactly.
>>
>>> In fact, I will need a UCUM to RDF converter to be able to "understand" properly what they "mean"?
>>
>> No, you need RDF with support for cdt:cum. Same thing as when you want
>> to use wkt:Literal, from the GeoSPARQL standard. Other RDF parsers will
>> still work but they won't be able to understand what the value is.
>>
>>> Does such an animal exist?
>>
>> Yes. My colleague Maxime Lefrançois implemented it as a fork of Apache
>> Jena. You can try it at
>> https://ci.mines-stetienne.fr/lindt/playground.html. There are examples
>> queries to play with.
>>
>>> It looks to me that UCUM is quite a large vocabulary of units, for a start - what would the URI for the "liter" unit of measurement be, for example?
>>
>> There are infinitely many units. E.g., meters (m), square meters (m2),
>> cubic meters (m3), meters to the fourth (m4), etc. There is an infinite
>> algebra of units of measures. Nothing to be afraid of: there are
>> infinitely many integers, and yet xsd:integer is well supported.
>>
>>> I'm very happy to have widely adopted standards like this - I just want to keep my Semantic Web processing in the Semantic Web (RDF), and as simple as possible.
>>> Or at least be helped to do that.
>>
>> The alternative approaches always require specific code. If you exclude
>> new datatypes (stick to XSDs and RDF datatypes), you need code that
>> interpret some vocabulary, e.g.:
>>
>> <quantity> :numericValue 10;
>>     :unit <inches> .
>>
>> You have to retrieve the relevant triples to reconstruct the quantity.
>> You have to deal with the cases where there are missing triples:
>>
>> <quantity> :numericValue 10 .
>>
>> or multiple values:
>>
>> <quantity> :numericValue 10;
>>     :unit <inches> .
>> <quantity> :numericValue 25.4;
>>     :unit <meters> .
>>
>> It's quite complicated to use, implement, and cumbersome to query.
>>
>>
>> --AZ
>>
>>>
>>> Cheers
>>>
>>>> On 23 Jul 2020, at 23:06, Eric Prud'hommeaux <eric@w3.org> wrote:
>>>>
>>>> On Tue, Jul 21, 2020 at 02:35:02PM +0200, Antoine Zimmermann wrote:
>>>>> Regarding physical quantities, such as "5 inches", etc., my colleague Maxime
>>>>> Lefrançois and myself coauthored a specification for a datatype for physical
>>>>> quantities [1]. It is quite simple: we reuse the Unified Code for Units of
>>>>> Measurement (UCUM), a standard that is used in many scientific applications,
>>>>> and combine it with a number:
>>>>>
>>>>> <QUANTITY> ::= <NUMBER> <SPACES> <UCUMCODE>
>>>>> <NUMBER> ::= xsd:decimal(('e'|'E')xsd:integer)?
>>>>>
>>>>> Since UCUM has a well defined semantics, so does our datatype. Better, since
>>>>> UCUM is implemented in many programming languages, my colleague Maxime could
>>>>> easily integrate it into Jena and its SPARQL engine [2].
>>>>>
>>>>> So, with our Jena fork, one can write:
>>>>>
>>>>> SELECT ?planet WHERE {
>>>>>    ?planet a ex:Planet;
>>>>>      ex:diameter ?s .
>>>>>    FILTER(?s > "2e11 mm"^^cdt:ucum)
>>>>> }
>>>>
>>>> I applaud the work to extend XSD's numeric types so that RDF can have standard  measurement types. But why not leverage your work by adding SPARQL support for UCUM types? e.g.
>>>>
>>>> SELECT ?planet WHERE {
>>>>    ?planet a ex:Planet;
>>>>      ex:diameter ?s .
>>>>    FILTER(?s > "2e11"^^ucum:mm)
>>>> }
>>>>
>>>> It feels cleaner to me to embed the entire type of the data in the literal's datatype rather than spreading it across an aggregator type (cdt:ucum) and the lexical value (" mm").
>>>>
>>>> In either case we probably have a union type in the lexical value so we'd have to micro-parse doubles, decimals and integers, but the parsing is easier if the measurement unit is broken out into the end of the datatype URL.
>>>>
>>>> There are a few UCUM units that aren't viable localnames (e.g. "m/s.s"), but I think we can encode around that (e.g. "m_s.s") in a way that still makes ucum: a practical namespace for datatypes.
>>>>
>>>>
>>>>> This works if the size of the planet is encoded as a cdt:ucum, no matter
>>>>> what unit one is using. One can even use "link for Gunter's chain" (unit
>>>>> "[lk_us]"), or "cubic meters per acre" (unit "m3/[acr_us]") [3], which are
>>>>> both units of length.
>>>>>
>>>>> With some of our industrial partners, we are using this for energy data, and
>>>>> they seem to be very pleased with this approach, compared to an
>>>>> ontology-based approach.
>>>>>
>>>>>
>>>>> [1] https://w3id.org/lindt/custom_datatypes#ucum
>>>>> [2] You can try it at https://ci.mines-stetienne.fr/lindt/playground.html
>>>>> [3] Try this query in the playground:
>>>>>
>>>>> """
>>>>> PREFIX iter: <http://w3id.org/sparql-generate/iter/>
>>>>> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
>>>>> PREFIX cdt: <http://w3id.org/lindt/custom_datatypes#>
>>>>> PREFIX ex: <http://example.org/>
>>>>>
>>>>> SELECT ?length ?normalized
>>>>>
>>>>> WHERE{
>>>>>
>>>>>    VALUES ?position { "2.7e3 m3/[acr_us]"^^cdt:ucum }
>>>>>    # convert to meters
>>>>>    BIND("0 m"^^cdt:ucum + ?position AS ?normalized )
>>>>>
>>>>> }
>>>>> """
>>>>>
>>>>> --AZ
>>>>>
>>>>> Le 17/07/2020 à 01:57, Cox, Simon (L&W, Clayton) a écrit :
>>>>>> Yeah, the atomicity of the chunk is the point. This even applies to
>>>>>> quantities. 25.4mm is *identical* to 1” – they are the same thing. Any
>>>>>> engine that operates with quantities needs to understand that. ’25.4’
>>>>>> and ‘mm’ cannot be separated. Coordinates are slightly more complex but
>>>>>> it comes down to the same thing. A single element within a set of
>>>>>> coordinates that describes a position in space is not independent of the
>>>>>> other numbers in the tuple, or of the coordinate reference system within
>>>>>> which they are expressed. One value should *never* be used independent
>>>>>> of the others. Exactly the same position on the earth will be denoted by
>>>>>> three different numbers if embedded in a different coordinate reference
>>>>>> system. You can only ‘reason’ over them as a group, not individually.
>>>>>>
>>>>>> *From:*Dan Brickley <danbri@danbri.org>
>>>>>> *Sent:* Thursday, 16 July, 2020 23:58
>>>>>> *To:* Jeen Broekstra <jeen@fastmail.com>
>>>>>> *Cc:* Semantic Web <semantic-web@w3.org>
>>>>>> *Subject:* Re: Blank nodes must DIE! [ was Re: Blank nodes semantics -
>>>>>> existential variables?]
>>>>>>
>>>>>> …
>>>>>>
>>>>>> I believe the big appeal of putting it all into the zone we call
>>>>>> "literals" is that you get a kind of atomicity; that chunk of data is
>>>>>> either there, or not there; it is asserted, or not asserted. With a
>>>>>> triples-based (description of a ) data structure you have to be
>>>>>> constantly on your guard that every subset of the full graph pattern is
>>>>>> at least sensible and harmless, even when subsetting these chunks is
>>>>>> often confusing or misleading for data consumers. I can't help wondering
>>>>>> whether notions of graph shapes from shacl, shex (and sparql) could be
>>>>>> exploited to create an RDF-based data format which had atomicity at the
>>>>>> level of entire shapes.
>>>>>>
>>>>>> Dan
>>>>>>
>>>>>>      Jeen
>>>>>>
>>>>>
>>>>> --
>>>>> Antoine Zimmermann
>>>>> Institut Henri Fayol
>>>>> École des Mines de Saint-Étienne
>>>>> 158 cours Fauriel
>>>>> CS 62362
>>>>> 42023 Saint-Étienne Cedex 2
>>>>> France
>>>>> Tél:+33(0)4 77 42 66 03
>>>>> Fax:+33(0)4 77 42 66 66
>>>>> http://www.emse.fr/~zimmermann/
>>>>> Member of team Connected Intelligence, Laboratoire Hubert Curien
>>>
>>
>>

Received on Monday, 27 July 2020 20:47:19 UTC