Re: GRDDL for BigData or CSVW for Avro?

> On 3. Jun 2022, at 17:51, Eric Prud'hommeaux <eric@w3.org> wrote:
> 
> 
> As a starting proposition, you'd probably want apply some sort of striping assumption and map avro:type to rdf:type. (Our goal is not to have an RDF representation of the schema, but a defined RDF graph for any instance data conforming to the schema).
> 
> [[
> { <-- S is a fresh BNode (or steal @id conventions from JSON-LD)
> "type": "record", <-- emit { S rdf:type <record> .} (outer-most frame only)
> "namespace": "example.avro", <-- no effect; used for avro resultion

It took me a bit of time to understand what you were writing there as
my Apple mail client messed up all the white spaced there and so it 
ended up looking like the picture I put on Twitter here
https://twitter.com/bblfish/status/1533823391148425216

I tried reformatting it in my IDE, but as you are using comments the
above is invalid JSON. So I rewrote it in YAML, which has syntax for
comments. I put the minimal yaml here:
https://gist.github.com/bblfish/7d7bc62c0c5612649b8bc2135633226e

That too won’t appear well to others using Apple Mail because it
is using spacing to delimit blocks.

But after asking on the yaml gitter channel I found a tool that could
transform my clean-to-look-at yaml into an e-mail ready yaml
using https://github.com/pantoniou/libfyaml
and running 

$ fy-dump --comment -mflow ericP.avro2bin.yaml > ericP.avro2bin.mailready.yaml

[[
{
  type: record, # <-- emit { S rdf:type <record> .} (outer-most frame only)
  namespace: example.avro, # <- no effect; used for avro resultion
  name: array_union, # <- no effect (i don't even know what this is for)
  fields: # <-- implies a list of nested statements with subject S
  [
    {
      name: study i, # <-- V for a complex type is a fresh BNode
      "@id": http:...name, # <-- emit { S <http:...study> V . }
      type: # <-- S := V
      {
        name: study, # <- no effect
        type: record, # <-- no effect
        fields: # <-- implies a list of nested statements with subject V
        [
          {
            name: name, # <-- validates a value V
            type: string, # <-- with DT := xsd:string
            "@id": http:...name # <-- emit { S <http:...name> V^^DT . }
          },
          {
            name: corpus, # <-- V is a fresh BNode
            "@id": http:...corpus, # <-- P :=
            type: [
              null, # <-- No triple emitted if null (at least, that was the DirectMapping choice)
              {
                type: array, # <-- in a type 'array', so keep track of tail of list: TAIL := {
                             #                  S <http:...corpus>  @TBD . }'
                name: corpus_name_0,
                items: # <-- for each item
                {
                  name: _name_0, # <-- no effect
                  type: record, # <-- LI := fresh BNode;
                                # emit TAIL with LI substituted in for @TBD
                                # emit { LI rdf:first foaf:name S };
                                # TAIL := { LI rdf:rest @TBD . }
                  fields: # <-- S is a fresh BNode
                  [
                    {
                      name: name, # <-- validates a value V
                      type: string, # <-- with DT := xsd:string
                      "@id": foaf:name # <-- emit { S foaf:name V^^DT };
                    },
                    {
                      name: status, # <-- validates a value V
                      "@id": http:...status,
                      type: {
                        name: StatusType,
                        type: enum, # <-- with termType := iri
                        symbols: [
                          enrolled, # <-- if matched, emit { S <http:...status> <enrolled> . }
                          initiated, # <-- if matched, emit { S <http:...status> <initiated> . }
                          completed # <-- if matched, emit { S <http:...status> <completed> . }
                                    # at end of items, emit TAIL with rdf:nill substituted in for @TBD.'
                        ]
                      }
                    }
                  ]
                }
              }
            ]
          }
        ]
      }
    }
  ]
}
]]

yes, so we are thinking of Avro-dl here.
But also perhaps what we want is to just keep the annotations 
(ie. rdf URIs for classes,  and relations) around as much 
as possible so that we can build tools 
that add those annotations to java, scala or other code, which can
manipulate the data efficiently and only at the last moment if needed
transform to rdf triples. 

An advantage of RDF here is simply to have hyperlinkable URLs that
link to definitions of the concepts using the linked data principles,
so that people can understand what the data was meant to mean.

Henry Story

PS. Does anyone have a trick to get Apple to display text messages 
more cleanly?

https://co-operating.systems
WhatsApp, Signal, Tel: +33 6 38 32 69 84‬ 
Twitter: @bblfish

Received on Monday, 6 June 2022 16:36:48 UTC