Re: Diff'ing RDF files

Hi,

Curious this is coming up just as an effort to get consistent formatting 
for RDF (TTL for now) out the door on behalf of QUDT.

Looked into canonicalization but the downside you mention is a 
non-starter if you want to track changes with a version control system,
so we're just reproducing the input order of blank nodes by hacking into 
the jena TTL parser.

code: https://github.com/atextor/turtle-formatter

which is being plugged into

https://github.com/diffplug/spotless/ (maven plugin for now)

Bottom line: you'll be able to format TTL consistently with the spotless 
maven plugin soonish. Maybe one day, you won't even lose your comments.

Reach out if you want to help making it work for other formats or if you 
want a gradle/sbt plugin

Best regards,
Florian

Am 2024-09-13 16:18, schrieb Pierre-Antoine Champin:
> Dear all,
> 
> yesterday during the RDF-star working group call, I mentioned that RDF 
> canonicalization [1] can be used to build a crude RDF "diff" tool, and 
> that I was using a small script that I wrote for that. Other 
> participants expressed interest for this script, so I cleaned it up a 
> bit and published it here:
> 
> https://gist.github.com/pchampin/7017fa5ff607e5bedf65e2f9954cfd46
> 
> As indicated at the top, it relies on my Sophia library [2] for parsing 
> and canonicalizing, but it can be easily adapted to use other 
> command-line tools (for a while, I was using Gregg Kellogg's Ruby 
> implementation [3]).
> 
> Note that I describe it as a *crude* tool because
> 
> - if the two graphs/dataset are isomorphic (i.e. identical modulo blank 
> node labels), it will show no difference,
> - BUT if there is only the slightest difference, the tool may report a 
> lot of changes, not all of them relevant.
> 
> This is due to the fact that even a small difference can cause the 
> canonicalization to relabel blank node in a completely different way. 
> So even blank nodes that were not impacted by the change may end up 
> with different names, and so the text diff applied to the canonical 
> form will report those as changes.
> 
> But despite these "false positives", I find it quite useful, and you 
> might too. In particular, if the changes only impact triples/quads on 
> IRIs and literals, the diff will be "exact".
> 
>   best
> 
> [1] https://github.com/w3c/rdf-canon
> [2] https://github.com/pchampin/sophia_rs
> [3] https://ruby-rdf.github.io/

Received on Friday, 13 September 2024 18:10:40 UTC