- From: Henry Story <henry.story@bblfish.net>
- Date: Sun, 5 Jun 2022 19:18:16 +0200
- To: Eric Prud'hommeaux <eric@w3.org>
- Cc: Joshua Shinavier <josh@fortytwo.net>, semantic-web@w3.org
> On 3. Jun 2022, at 17:51, Eric Prud'hommeaux <eric@w3.org> wrote: > > On Fri, Jun 03, 2022 at 04:19:49PM +0200, Henry Story wrote: >> >> >>> On 3. Jun 2022, at 12:58, Eric Prud'hommeaux <eric@w3.org> wrote: >>> >>>> >>>> I’ll look into Schema Salad >>>> https://www.commonwl.org/v1.0/SchemaSalad.html >>> >>> In principle, an accompanying JSON-LD @context does this for you, e.g. >>> AVRO schema: >> >> Thanks Eric for those very helpful examples. (I think the data you >> gave for the second example does not quite fit the schema, but I >> get the point). > > Yeah, I had .name as a sibling of .study . Before looking at your ideas on avro-dl I wanted to look at Salad, as it had Avro in the title "Semantic Annotations for Linked Avro Data”. The problem it is trying to solve is the number of different files doing nearly the same thing, which is something you pointed out earlier in this thread too I think. To understand Salad I worked on transforming your first example to Salad yaml format. I was more interested in getting it to work than to be faithful to your structure. So for example I renamed the ”name” fields to ”dname” and ”fname” because of name clashed. There is likely a way to solve that, but it would be something to do next. The data files became the following ## Trial.data.yaml This is just the json data you gave me earlier , but now in yaml format with name disambiguated (to start with) and a base added (may not be needed) [[ $base: "https://mrna.com/" study: dname: PARAMEDIC2 corpus: - fname: Kathleen Cleaver status: initiated - fname: Fredricka Newton status: enrolled ]] ## Trial_schema.yaml The Schema YAML file brings together both the Avro schema, and the JSON-LD markup allowing one also to add comments. (Note: I started off with the complex nested structure you had but I could not get the jsonldPredicate to work that way so I decomposed it in a flatter hierarchy that also makes it easier to read) [[ $base: "https://salad.egg/" $namespaces: ex: "http://example.org/ns/rct#" foaf: "http://xmlns.com/foaf/0.1/" doap: "http://usefulinc.com/ns/doap#" $graph: - name: Trial type: record documentRoot: true # namespace: example.avro <- not needed fields: - name: study jsonldPredicate: "ex:study" type: Study - name: ParaMedic type: record fields: - name: fname #was "name", changed to avoid name clash jsonldPredicate: "foaf:name" type: string - name: status jsonldPredicate: "ex:status" type: StatusType - name: StatusType type: enum symbols: - "enrolled" - "initiated" - "completed" - name: Study # change from 'study' to avoid nameclash type: record fields: - name: dname # was "name", changed to avoid name-clash type: "string" doc: "name of study" jsonldPredicate: "doap:name" - name: corpus doc: "the body of the study (made of people)" jsonldPredicate: "_id": "ex:corpus" "_container": "@list" type: type: array items: ParaMedic ]] After installing schema-salad-tool I can use those python tools to do the following ## Extract the RDFS from the Salad Schema [[ $ schema-salad-tool --print-rdfs Trial_schema.yaml /Users/hjs/Library/Python/3.8/bin/schema-salad-tool Current version: 8.3.20220525163636 @prefix doap: <http://usefulinc.com/ns/doap#> . @prefix ex: <http://example.org/ns/rct#> . @prefix foaf: <http://xmlns.com/foaf/0.1/> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . <https://salad.egg/#ParaMedic> a rdfs:Class . <https://salad.egg/#StatusType> a rdfs:Class . <https://salad.egg/#Study> a rdfs:Class . <https://salad.egg/#Trial> a rdfs:Class . ex:status a rdf:Property ; rdfs:domain <https://salad.egg/#ParaMedic> . ex:study a rdf:Property ; rdfs:domain <https://salad.egg/#Trial> . doap:name a rdf:Property ; rdfs:domain <https://salad.egg/#Study> . foaf:name a rdf:Property ; rdfs:domain <https://salad.egg/#ParaMedic> . ]] ## Extract the Avro JSON schema from the Salad Schema [[ $ schema-salad-tool --print-avro Trial_schema.yaml /Users/hjs/Library/Python/3.8/bin/schema-salad-tool Current version: 8.3.20220525163636 [ { "name": "egg.salad.Trial", "type": "record", "documentRoot": true, "fields": [ { "name": "study", "jsonldPredicate": "ex:study", "type": { "name": "egg.salad.Study", "type": "record", "fields": [ { "name": "dname", "type": "string", "doc": "name of study", "jsonldPredicate": "doap:name" }, { "name": "corpus", "doc": "the body of the study (made of people)", "jsonldPredicate": { "_id": "http://example.org/ns/rct#corpus", "_container": "@list" }, "type": { "type": "array", "items": { "name": "egg.salad.ParaMedic", "type": "record", "fields": [ { "name": "fname", "jsonldPredicate": "foaf:name", "type": "string" }, { "name": "status", "jsonldPredicate": "ex:status", "type": { "name": "egg.salad.StatusType", "type": "enum", "symbols": [ "enrolled", "initiated", "completed" ] } } ] }, "name": "" } } ] } } ] } ]] ## Extract the json-ld context This gives us the JSON-LD context that one can use with the YAML data Trial.data.yaml to produce RDF. [[ $ schema-salad-tool --print-jsonld-context Trial_schema.yaml /Users/hjs/Library/Python/3.8/bin/schema-salad-tool Current version: 8.3.20220525163636 { "@context": { "ParaMedic": "https://salad.egg/#ParaMedic", "StatusType": "https://salad.egg/#StatusType", "Study": "https://salad.egg/#Study", "Trial": "https://salad.egg/#Trial", "completed": "https://salad.egg/#StatusType/completed", "corpus": { "@container": "@list", "@id": "http://example.org/ns/rct#corpus" }, "dname": "doap:name", "doap": "http://usefulinc.com/ns/doap#", "enrolled": "https://salad.egg/#StatusType/enrolled", "ex": "http://example.org/ns/rct#", "fname": "foaf:name", "foaf": "http://xmlns.com/foaf/0.1/", "initiated": "https://salad.egg/#StatusType/initiated", "status": "ex:status", "study": "ex:study" } } ]] ## Transform the Data using the schema to RDF One can do the transformation to rdf directly with the yaml data [[ schema-salad-tool --print-rdf Trial_schema.yaml Trial.data.yaml /Users/hjs/Library/Python/3.8/bin/schema-salad-tool Current version: 8.3.20220525163636 @prefix doap: <http://usefulinc.com/ns/doap#> . @prefix ex: <http://example.org/ns/rct#> . @prefix foaf: <http://xmlns.com/foaf/0.1/> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . [] ex:study [ ex:corpus ( [ ex:status "initiated" ; foaf:name "Kathleen Cleaver" ] [ ex:status "enrolled" ; foaf:name "Fredricka Newton" ] ) ; doap:name "PARAMEDIC2" ] . ]] ## other options I have not yet found if one could use this now directly to do something with Avro binary data. schema-salad-tool -h usage: schema-salad-tool [-h] [--rdf-serializer RDF_SERIALIZER] [--skip-schemas] [--strict-foreign-properties] [--print-jsonld-context] [--print-rdfs] [--print-avro] [--print-rdf] [--print-pre] [--print-index] [--print-metadata] [--print-inheritance-dot] [--print-fieldrefs-dot] [--codegen language] [--codegen-target CODEGEN_TARGET] [--codegen-examples directory] [--codegen-package dotted.package] [--codegen-copyright copyright_string] [--codegen-parser-info parser_info] [--print-oneline] [--print-doc] [--strict | --non-strict] [--verbose | --quiet | --debug] [--only ONLY] [--redirect REDIRECT] [--brand BRAND] [--brandlink BRANDLINK] [--brandstyle BRANDSTYLE] [--brandinverse] [--primtype PRIMTYPE] [--version] [schema] [document] positional arguments: schema document optional arguments: -h, --help show this help message and exit --rdf-serializer RDF_SERIALIZER Output RDF serialization format used by --print-rdf(one of turtle (default), n3, nt, xml) --skip-schemas If specified, ignore $schemas sections. --strict-foreign-properties Strict checking of foreign properties --print-jsonld-context Print JSON-LD context for schema --print-rdfs Print RDF schema --print-avro Print Avro schema --print-rdf Print corresponding RDF graph for document --print-pre Print document after preprocessing --print-index Print node index --print-metadata Print document metadata --print-inheritance-dot Print graphviz file of inheritance --print-fieldrefs-dot Print graphviz file of field refs --codegen language Generate classes in target language, currently supported: python, java, typescript --codegen-target CODEGEN_TARGET Defaults to sys.stdout for python and ./ for Java --codegen-examples directory Directory of example documents for test case generation (Java only). --codegen-package dotted.package Optional override of the package name which is other derived from the base URL (Java only). --codegen-copyright copyright_string Optional copyright of the input schema. --codegen-parser-info parser_info Optional parser name which is accessible via resulted parser API (Python only) --print-oneline Print each error message in oneline --print-doc Print HTML schema documentation page --strict Strict validation (unrecognized or out of place fields are error) --non-strict Lenient validation (ignore unrecognized fields) --verbose Default logging --quiet Only print warnings and errors. --debug Print even more logging --only ONLY Use with --print-doc, document only listed types --redirect REDIRECT Use with --print-doc, override default link for type --brand BRAND Use with --print-doc, set the 'brand' text in nav bar --brandlink BRANDLINK Use with --print-doc, set the link for 'brand' in nav bar --brandstyle BRANDSTYLE Use with --print-doc, HTML code to link to an external style sheet --brandinverse Use with --print-doc --primtype PRIMTYPE Use with --print-doc, link to use for primitive types (string, int etc) --version, -v Print version Hope some of you find this helpful. Henry Story https://co-operating.systems WhatsApp, Signal, Tel: +33 6 38 32 69 84 Twitter: @bblfish
Received on Sunday, 5 June 2022 17:18:32 UTC