Re: Prefixes in S-expressions

This is addressed in the GitHub develop branch of SXP, SPARQL and related gems, as described in https://github.com/dryruby/sxp.rb/issues/20#issuecomment-996277338 <https://github.com/dryruby/sxp.rb/issues/20#issuecomment-996277338>.

There will be a 1.2 release of SXP and 3.2 release of SPARQL coming once other outstanding issues are addressed, likely by the end of the year.

Gregg Kellogg
gregg@greggkellogg.net

> On Dec 6, 2021, at 11:11 AM, Gregg Kellogg <gregg@greggkellogg.net> wrote:
> 
>> On Dec 6, 2021, at 6:10 AM, Daniel Hernandez <daniel@degu.cl <mailto:daniel@degu.cl>> wrote:
>> 
>> 
>> Hi,
>> 
>> I am surprised because URIs are translated to S-expressions in different
>> ways.  That means that I can get:
>> 
>> uri1 == uri2  #=> true
>> uri1.to_sxp   #=> wdt:P31
>> uri2.to_sxp   #=> <http://www.wikidata.org/prop/direct/P31 <http://www.wikidata.org/prop/direct/P31>>
>> 
>> In this case uri1 is translated using a prefix.  To create a URI that is
>> translated using a prefix, this URI have to be inside a query using that
>> prefix.  For instance, the following query:
> 
> When parsing SPARQL, the parser records any original PName/CURIE in the URI instance using RDF::URI#lexical defined in the SXP gem. This is to ease the process of re-serializing the parse URI to SXP using the original parsed PNAME_LN production. If you serialize just the URI outside of the query context, you’ll lose the prefix definition.
> 
>> query = <<~QUERY
>>  PREFIX wdt: <http://www.wikidata.org/prop/direct/ <http://www.wikidata.org/prop/direct/>>
>>  PREFIX wd: <http://www.wikidata.org/entity/ <http://www.wikidata.org/entity/>>
>>  SELECT *
>>  WHERE { ?person wdt:P31 wd:Q5 }
>> QUERY
>> exp = SPARQL.parse(query).to_sxp_bin
>> 
>> => [:prefix,
>> [[:"wdt:", #<RDF::URI:0xfe4 URI:http://www.wikidata.org/prop/direct/ <http://www.wikidata.org/prop/direct/>>],
>>  [:"wd:", #<RDF::URI:0xff0 URI:http://www.wikidata.org/entity/ <http://www.wikidata.org/entity/>>]],
>> [:bgp,
>>  [:triple,
>>   #<RDF::Query::Variable:0xfb8(?person)>,
>>   #<RDF::URI:0x1004 URI:http://www.wikidata.org/prop/direct/P31 <http://www.wikidata.org/prop/direct/P31>>,
>>   #<RDF::URI:0xfbc URI:http://www.wikidata.org/entity/Q5 <http://www.wikidata.org/entity/Q5>>]]]
>> 
>> Then I can print the query as a S-expression:
>> 
>> puts exp.to_sxp
>> 
>> => "(prefix ((wdt: <http://www.wikidata.org/prop/direct/ <http://www.wikidata.org/prop/direct/>>)
>>             (wd: <http://www.wikidata.org/entity/ <http://www.wikidata.org/entity/>>))
>>            (bgp (triple ?person wdt:P31 wd:Q5)))"
>> 
>> An I can print a part of the query:
>> 
>> puts exp[2].to_sxp
>> 
>> => "(bgp (triple ?person wdt:P31 wd:Q5))"
>> 
>> However, the second sxp is wrong, because does not define the prefixes.
>> I also noticed that to equivalent URIs can be printed differently:
> 
> Yes, as noted above. Perhaps a more sophisticated SXP serializer would see if the expression includes the prefix definition before using the stored lexical value, or the serializer would include something in the recursive context so that when serializing the URI, it would know if the saved lexical representation could be re-used. This is all done in SXP::Writer and/or SXP::Generator (see https://github.com/dryruby/sxp.rb/blob/develop/lib/sxp/writer.rb#L161 <https://github.com/dryruby/sxp.rb/blob/develop/lib/sxp/writer.rb#L161>).
> 
> Alternatively, the SPARQL query optimizer explicitly clears these lexical representations recursively. See https://github.com/ruby-rdf/sparql/blob/b100da5f2c43cf8ada56cd114fef6ee32fa0696e/lib/sparql/algebra/extensions.rb#L457-L474 <https://github.com/ruby-rdf/sparql/blob/b100da5f2c43cf8ada56cd114fef6ee32fa0696e/lib/sparql/algebra/extensions.rb#L457-L474>.
> 
>> uri1 = exp[2][1][2]
>> uri2 = RDF::URI.new uri1.to_s
>> 
>> uri1 == uri2  #=> true
>> uri1.to_sxp   #=> wdt:P31
>> uri2.to_sxp   #=> <http://www.wikidata.org/prop/direct/P31 <http://www.wikidata.org/prop/direct/P31>>
>> 
>> I see two ways to fix the second pattern:
>> 
>> 1. Generate the sxp without prefixes.
>>   (i.e., print <http://www.wikidata.org/prop/direct/P31 <http://www.wikidata.org/prop/direct/P31>>)
>> 
>> 2. Add the prefixes to the second expression
>>   (i.e., add the prefix wd: <http://www.wikidata.org/prop/direct/ <http://www.wikidata.org/prop/direct/>>
>>   to the expression.
> 
> What we don’t have is a comprehensive way to add or remove a prefix definition to an S-Exp and recursively re-write the embedded URIs to take this into consideration, which could be useful.
> 
>> Hence, I have two questions:
>> 
>> 1) Can I get the respresentation without prefixes of an expression.
>>   For instance, with a parameter like this:
>> 
>>   exp[2].to_sxp(with_prefixes: false)
>> 
>>   => (bgp (triple ?person
>>                   <http://www.wikidata.org/prop/direct/P31 <http://www.wikidata.org/prop/direct/P31>>
>>                   <http://www.wikidata.org/entity/Q5 <http://www.wikidata.org/entity/Q5>>))
> 
> As mentioned above, probably the right way to handle this would be to have methods for specifically managing prefix and base definitions within an S-Exp and leave it to that logic to do the right thing. Something which explicitly managed clearing all prefix and base definitions in an S-Exp to return the full expanded form would be generally interesting. That would go somewhere in the SXP gem.
> 
>> 2) Can I get the prefixed that each URI as follows:
>> 
>>   uri1.get_prefix  #=> [:"wdt:", #<RDF::URI:0xfe4 URI:http://www.wikidata.org/prop/direct/ <http://www.wikidata.org/prop/direct/>>]
> 
> I think you found one way to do this in a follow-on by using RDF::URI#qname, although this is somewhat archaic (though not deprecated), and RDF::Vocabulary.find_term might be more efficient.
> 
> I created an issue for this: https://github.com/dryruby/sxp.rb/issues/20 <https://github.com/dryruby/sxp.rb/issues/20>. It would be great if you were up to submitting a PR for this.
> 
> Gregg
> 
>> Thanks,
>> Daniel

Received on Thursday, 16 December 2021 23:45:15 UTC