- From: Dan Brickley <danbri@danbri.org>
- Date: Fri, 11 Jun 2021 10:16:32 +0100
- To: David Booth <david@dbooth.org>
- Cc: semantic-web@w3.org
- Message-ID: <CAFfrAFq+87oGULWSo-c5jzk=vr88EkETuxRsPHZd2T8EAxjQMQ@mail.gmail.com>
It looks like some formatting of this email may not be showing up properly- but seems readable at https://lists.w3.org/Archives/Public/semantic-web/2021Jun/0102.html Apologies for the noise! San On Fri, 11 Jun 2021 at 10:08, Dan Brickley <danbri@danbri.org> wrote: > (Sorry, this is long.) > > On Fri, 11 Jun 2021 at 00:19, David Booth <david@dbooth.org> wrote: > >> On 6/10/21 11:08 AM, Ivan Herman wrote: >> >> On 10 Jun 2021, at 16:13, David Booth <david@dbooth.org >> >> I still feel like I am somehow missing a fundamental assumption that >> >> others are making and I have not yet been able to identify. >> > > Know the feeling. > > […] > > The other thing that I still fundamentally do not yet grasp about the >> proposed charter is this: Why is it restricted to RDF source documents? >> Clearly the canonicalization algorithm is about RDF, so that much I >> understand. But for the digital signature vocabulary, why wouldn't it >> also be useful to be able to sign, say, a PDF document? Why should the >> RDF signing vocabulary be limited to talking about RDF documents? Or am >> I misunderstanding the intent here? Perhaps if there were a simple, >> complete example, it would help. Again, I feel like I am missing some >> of the assumed context. >> > > > This is a very reasonable and pertinent question. > > The vast vast majority of content on the web is not best considered as > being purely RDF, even if you can often project out an RDF view of it. And > the bits that are RDF-based rarely end up in anything most folks would > consider an “RDF store”. But the distinction is a strange one, as explored > below. > > There is a *lot* of non-RDF data out there. I can hardly believe this > needs emphasis in 2021 but here we are. > > This content (or rather, the billions of web users touched by it) deserves > ways of being assured by modern W3C-standard Signature. > > We should not gloss over the scale of this. We are talking about petabytes > of data here. Literally the entire contents of the web, for starters. > > There are lots of other data-related formats out there in the web - CSS, > CSV, SQL dumps, the email mbox format, iCalendar, vCARD, non-RDF property > graphs, MARC, YAML, XML itself of course, Protobufs, Apache Arrow, > microformats, HTML5, HTML-anything, SGML, Midi files, MP3, WAV, > Prolog/Datalog, OWL, N3, RIF, … package formats like ZIP, JAR, … image > formats like PNG, JPG, GIF, … the SVG case is interesting (“this is > definitely our logo”) but the others have embedded metadata too, EXIF not > being RDF whilst XMP being RDF. Maybe OWL/RIF/N3 could use their “compiled > down to triples” view, but should we extend that argument to everyone else? > Rdf-star? What about SPARQL queries? Windows .ini files, …? PDFs, Flash, > video file formats? In an age of misinformation facilitated by the web, it > is not obvious that W3C’s next Data Signature WG should cover only “Linked” > RDF data, and ignore media formats. What about robots.txt files? CBOR? The > .sna format for snapshots of ZX Spectrum games? VMWare images? .iso disk > images? > > I could go on but > https://en.m.wikipedia.org/wiki/List_of_file_formats exists. We didn’t > even touch coding languages (JS, Java, JVM, WASM, GLSL, COBOL), or > notebook formats. Or .rtf files. > > Even if you are solely concerned with some more restricted notion of > “data”, still there is a lot out there, eg take a look around using > https://datasetsearch.research.google.com/ to see what is showing up from > research, science, govt etc. > > Should protein databank files be RDFized before they fall in scope of this > new WGs mission? > https://en.wikipedia.org/wiki/Protein_Data_Bank_(file_format) - and if > so, why? > > > > > > Potential huge scope established, what does W3C do in this area? > > Remember that RDF was introduced as a metadata system - its founding > purpose was exactly to describe the sprawling chaos of the above file > format diversity, and not to replace it all with triples. > > I understand that XML Signature can/could detached sign any format, but > that it may also be showing its age as a standard, having been created 21+ > years ago. Maybe it is time for it to be superseded by something more > modern from W3C, with the option to drop-in format-specific > canonicalization steps, such as bnode labelling etc for RDF? Why not create > *that* WG rather than this one, given W3C’s limited resources? > > > > https://www.w3.org/TR/xmldsig-core2/#sec-Introduction tells us, > “ This document specifies XML syntax and processing rules for creating > and representing digital signatures. XML Signatures can be applied to any digital > content (data object) > <https://www.w3.org/TR/xmldsig-core2/#def-DataObject>,” > > The last actual W3C REC says the same, > https://www.w3.org/TR/xmldsig-core1/#sec-Introduction > > So that’s a W3C recommended technology that W3C currently says is up to > the job. It is old, its flaws are well known, it isn’t clear if it has been > abandoned or just in maintainance mode, but it remains a recommended > standard for now. > > Please indulge a thought experiment. > > For all the non-RDF formats I touched on above, should they (a) use XML > detached signature (b) go through the lengthy and painful process of trying > to create a rich RDFS/OWL-facilitated model of their content as an RDF > graph, so they can sign using Linked Data Signature, or my suggestion > below, (c) - stick it all in one triple in an easily round-trippable way. > Let’s explore (c). > > Apache HTTPd server logs could trivially be mapped into RDF’s data model > too, as could anything. Would this be in scope of the new WG per its > charter? (I am ignoring input docs for now, as Ivan has advised) > > Let’s define such a mapping from bytes to a 1 triple graph, call it the > “Retro Graph Mapping” (RGM). It retrospectively maps any byte sequence into > an RDF graph. It is similar in spirit to the idea of RDF graph literals, > perhaps. > > For any sequence of bytes ‘bs’, create a corresponding space-separated > hex-encoded sequence of lowercase pairs of unicode characters. RGM graphs > have one triple which varies only in its literal value content and > datatype. For example: > > <file:/dev/🦖/RGMv1> rdf:value “hex sequence here” ^wikidata:Q5153426 . > > Formal spec to follow but basic idea is a triple-ization in which subject > and object URIs are fixed, no language tag, all everything is in a single > value, rdf: and wikidata: prefixes and 🦖 are used here for simple example; > datatype is an optional format identifier that could be discarded. A > similar approach could be used to generate multi-graph datasets. A > discardable filename preference could be packed in to the file: URI too, if > desired. For text-oriented content it might be worth considering a more > readable representation than hex codes, but RGM is not designed for humans > to read. RGM is not especially useful for data access, SPARQL etc., but it > will make *any* kind of data signable by the work about to be chartered by > W3C. > > RGM can reflect any data format into a single-triple, very > efficiently-sorted, bnode-free rdf graph. This is both stupid and > powerful. > > Is the coordinated, lead-the-web-to-its-full-potential W3C view here that > simultaneously the following are both true?: > > 1.) XML Signature is good enough for all of the world’s files and data > except for the RDF-graph cases. > 2.) XML Signature is so inappropriate and/our outdated that it is barely > mentioned for the case of signing RDF in the proposed charter and explainer. > > If the truth is that XML Signature is a pain point for W3C in 2021 then > the fact that it is about to spin up a WG that can do some of the same > things deserves more attention than the zero mentions granted to the topic > by the draft charter. > > The draft Signed Linked Data WG explainer says *“roughly, the same > approach as for XML [xmldsig-core1 > <https://w3c.github.io/lds-wg-charter/explainer.html#bib-xmldsig-core1>].” > and yet the actual suggested charter does not mention XML let alone W3C’s > huge piece of work in this area, XML Signature. Despite the fact that for > any piece of data in the web, W3C offers both XML Signature and also (via > RGM’s retro-graph mapping into a triple), Linked Data Signature as > potentially relevant technologies.* > > *We know that all web content that can be turned into a normalized triple > via RGM as **I sketch above. Or it could be signed with XML Signature. * > > *For cases like CSV, YAML, SVG, is there *anything* to be gained in the > Signature world from doing a more careful and fine-grained mapping into > RDF, beyond just avoiding having to use 20-year old XML-flavoured signature > technology? Is RGM too stupid to use, leaving those formats behind?* > > *Why should RDF content get modernized web-standard signature tech first? > Why not make something modern for the content of the entire world-wide web > and then plug in the bnode-labelling preprocessor for the RDF special case?* > > *The fact that W3C proposes to make new REC-track work on RDF Signature, > while simultaneously leaving its ancient XML Signature Recommendation > roaming the earth like an undead dinosaur ought to ring alarm bells here. > What are the prospects of this new RDF work being carefully maintained by > W3C in 20 years? It feels like this essentially general purpose piece of > new work is being put through as a Linked Data thing because when evaluated > by the wider set of stakeholders it will attract more skepticism than > enthusiasm.* > > *It is always easier to create new things than to curate old messes, and > it is always easier to scope things tightly than to risk a design by > committee that nearly-kinda meets everyone’s goals. The idea of entangling > this new set of work items with XML Signature ought to be slightly > terrifying, but cross-donain standards coordination is W3C’s core duty and > strength.* > > *Any yet, any/all web content can be trivially brought into scope of the > new WG via RGM. Which puts us substantially in the same territory as that > currently occupied by the existing XML Sig W3C REC.* > > *I know it is annoying to introduce new terminology but I do so here in > pursuit of consensus. Since any data can (via RGM or hard work > ontologizing) be “linked data” sufficiently to be signable via the proposed > new standards, we can ask ourselves whether custodians of data in non-RDF > formats would gain anything by doing so. If they would, the WG scope should > be admitted to be data-signing, not linked-data-signing. If not, I’d like > to understand why. Of course I understand the general benefits of using RDF > more wholeheartedly, but for the case of signing specifically the picture > does not yet feel clear.* > > *Dan* > > > > >> Thanks, >> David Booth >> >>
Received on Friday, 11 June 2021 09:17:24 UTC