- From: Marta Villegas <marta.villegas@gmail.com>
- Date: Thu, 5 Feb 2015 12:15:10 +0100
- To: Penny Labropoulou <penny@ilsp.gr>
- Cc: "John P. McCrae" <jmccrae@cit-ec.uni-bielefeld.de>, public-ld4lt@w3.org
- Message-ID: <CAPq_VFnBfUFao0OLC1YwEyB0336QNtuYHbmEb0BzgaKZz__G0w@mail.gmail.com>
Hi again, I forgot the attached files (sorry) I hope this helps! 2015-02-05 12:12 GMT+01:00 Marta Villegas <marta.villegas@gmail.com>: > Hi John, Penny and all > > I'm sending you the xsl file we use to analise MS schema and the output we > get. > > The xsl script generates all possible 'xpaths' from the root element > (resourceInfo) to all terminal nodes. > Each node corresponds to an xml element. The output looks like: > > */resourceInfo[n](1)/identificationInfo[](11)/resourceName[n](6)@* > > where [] collects element's cardinality & () = count(element + siblings) > In this example, *resourceName[n](6) *means resource Name is unbounded > and has 5 siblings > > Terminal nodes (ending with @) are simple typed elements that become data > type properties. > > Non terminal nodes correspond to 'embedded' XML elements. They are all > complex elements and, generally, they have @type or @ref attributes in the > schema. Few are locally described (complex elements with no @type nor @ref > but locally described). > > In principle, complex elements (all non terminal ones) should generate a > Class + an Object Property. This however, 'over generates' the resulting > graph. Those with [] suggests better not to follow the rule. > > - *Nodes with [](1) and [](*)* can be removed (these are nodes with > cardinality 1 and no siblings) > > for example: > > > *.../creationInfo[](17)/creationTool[unbounded](4)/targetResourceNameURI[](1)@* > > where: X creationTool url. > > is better than: X creationTool [y targetResourceNameUri uri ] . > > similarly in: > > > *.../evaluationReport[n](9)/documentInfo[](*)/title[unbounded](20)@* > > X evaluationReport [ Y a documentType ; title 'some title' ] > > is better than > > X evaluationReport [y a documentationInfoType ; documentInfo [ z a > documentInfoType ; title 'some title']]. > > Compare the following lines: > > > *.../licenceInfo[n](5)/licensor[n](11)/personInfo[](*)/surname[n](6)@* > > * .../metadataInfo[](11)/metadataCreator[n](9)/surname[n](6)@* > > > In the first case the personInfo node is superfluous (here *licensor *is > typed as Actor which in turn is defined as a choice between Person and > Organisation. XML does not help!!) > > In the second case, *metadataCreator *is typed as Person. The > corresponding MXL instances show this 'problem': > > <*metadataInfo*> > <metadataCreationDate>2006-05-04</metadataCreationDate> > *<metadataCreator>* > <surname lang="en-US">surname0</surname> > <givenName lang="en-US">givenName0</givenName> > <sex>male</sex> > <*licenceInfo*> > <licence>CC-BY</licence>... > *<licensor>* > * <personInfo>* > <surname lang="en-US">surname0</surname> > <givenName lang="en-US">givenName0</givenName> > <sex>male</sex> > These are XML problems which can be easily addressed in owl. Having > something like: an Actor super class with subclasses for Person & > Organisation; one metadataCreator property with range Person and one > licesor property with range Actor > > X metadataCreator [ y a Person ; surname ?surname'] . > X licensor [ y a Person ; surname ?surname'] . > > - *other [] nodes *need careful revision. > > for example: > > /resourceInfo[n](1)/*identificationInfo[]*(11)/resourceName[n](6)@ > /resourceInfo[n](1)/*identificationInfo[]*(11)/description[n](6)@ > /resourceInfo[n](1)/*identificationInfo[]*(11)/resourceShortName[n](6)@ > /resourceInfo[n](1)*/identificationInfo[]*(11)/url[n](6)@ > /resourceInfo[n](1)*/identificationInfo[](*11)/metaShareId[](6)@ > /resourceInfo[n](1)*/identificationInfo[]*(11)/identifier[n](6)@ > > also look at: > > > /resourceInfo[n](1)/resourceComponentType[](11)/toolServiceInfo[](*)/toolServiceEvaluationInfo[](9)/evaluationReport[n](9)/documentInfo[](*)/documentType[](20)@ > > > > > 2015-01-28 13:40 GMT+01:00 Penny Labropoulou <penny@ilsp.gr>: > >> >> >> >> >> On Tue, Jan 27, 2015 at 10:03 PM, Penny Labropoulou <penny@ilsp.gr> >> wrote: >> >> Hi John and all. >> >> Thanx for the quick work! >> >> Below are a few comments/replies in between the lines. >> >> >> >> 1) Some names have been shortened, e.g., >> 'ConformanceToBestStandardsAndPractices' -> >> 'StandardsBestPractices', should we accept such names or stay true to >> MetaShare? >> I think we should decide this on a case-by-case basis; although some >> names are long, they are self-explanatory. In general, at ld4lt we have >> changed some names (e.g. resource to language resource) when it was agreed >> that the new label is better. >> >> Hmm... change for the sake of change is difficult, particularly when it >> is only a small part of the vocabulary, that creates gotchas. >> >> There are some typos that we have spotted and also comments raised by >> various. >> >> >> 2) A lot of MetaShare names have (unnecessarily) the words 'Info', 'Type' >> or 'InfoType', we could eliminate these. >> All “info” elements are in fact component names: in accordance to the >> CMDI principles, elements (and other components) are grouped into >> semantically coherent components. For instance, the identificationInfo >> groups together elements that are used for the identification of a >> resource, such as the resourceId, a url used as landing page, the >> resourceName and shortName, the description etc. If I have understood well, >> this structure is not needed/not a good practice for RDF and this is why >> they have been eliminated already at the IULA/UPF mapping. >> >> “type” elements are used in MetaShare for components that can be re-used: >> e.g. persons can be licensors, contact points, resource creators etc., but >> in all cases they are encoded using the personInfoType, which groups >> together given name, surname, communication information etc. Again, I think >> this is not mapped in RDF as such, if I understand well. >> >> Yeah that is my feeling too, I would like to shorten the names, however >> it seems hard to do this consistently as it would create clashes, e.g., >> ActualUse/ActualUseInfo, DocumentType/DocumentInfo >> >> >> 3) IULA have split the AnnotationType class into 5 subclasses >> (DiscourseAnnotation, etc.) >> >> That’s an improvement from the original model and I suggest we stick to >> it. >> >> 4) There are many properties suggested by IULA or in the 'DISTRIBUTION' >> model that have no correspondence in the MetaShare data... we should >> discuss these on a case-by-case basis, right? >> >> We have already discussed with Victor the distribution and licensing >> module and have come up with a proposal re-introducing some of the original >> MetaShare elements that were not mapped in the IULA/UPF version and using >> the odrl (mainly) and cc vocabularies ; the general ideas are to be found >> at >> https://www.w3.org/community/ld4lt/wiki/Metashare_vocabulary_for_licenses >> and https://www.w3.org/community/ld4lt/wiki/Examples and the mappings >> were documented in the previous googlesheet. I will add these to the new >> googlesheet by next week. >> >> I incorporated all the functional (non-documentary) information from the >> distribution model already... or at least I tried, let me know if I missed >> anything. >> >> Ok; to be checked >> >> 5) The Prev. Google Doc proposed mapping to both SWRC and BIBO, do we >> need to do BIBO as well (SWRC seems sufficient)? >> >> 6) I added the license modelling that LingHub does in ODRL, could one of >> our ODRL experts look at it and fix the last one? >> >> Please, see also the two wikis on licensing, especially the examples. And >> as discussed, together with Victor we will provide a file with the RDF >> representations in odrl of the licenses used in MetaShare (of course, only >> of those that have not already been RDFized). >> >> This refers to "R4 To neatly represent conditions of use"... but I >> couldn't find the structured definitions of conditions of use so I wrote my >> own in the sheet titled "License Modelling" >> >> To be checked and finalized by Monday. >> >> 7) Some property values, especially *resource types*, such as *ontology* >> or *corpus* were created as classes in the Google Doc, shall we confirm >> this usage pattern? >> >> This needs some more thinking, checking the various cases. Is there a >> list of these? >> >> This seems to be individuals of the classes 'ResourceType' and >> 'LexicalConceptualResourceType', approximately, here are the lists for >> reference: >> >> In Prev. Google Doc: BabelNet*, ComputationalLexicon, Corpus, >> CorpusAudio*, CorpusCollection*, CorpusImage*, CorpusText*, >> CorpusTextNgram*, CorpusTextNumerical*, CorpusVideo*, Framenet, >> LexicalConceptualResource, Lexicon, MachineReadableDictionary, Ontology, >> TerminologicalResource, Thesaurus, ToolService*, WordList, WordNet >> >> From Metashare: computationalLexicon, framenet, lexicon, >> machineReadableDictionary, ontology, other*, terminologicalResource, >> thesaurus, wordList, wordnet, corpus, languageDescription*, >> lexicalConcepturalResource >> >> *Unique elements >> >> We might need a telco discussion for this, but first let me check the >> current mappings. >> >> >> >> 8) *See attached diagram.* There is a big difference in granularity >> between the XSD and IULA-UPF's ontology. For example, there are 4 tags >> between the resource and its actual usage in the XML, e.g., >> >> <resourceInfo> ... >> >> <usageInfo> ... >> >> <actualUsageInfo> .... >> >> <useNLPspecific>parsing</useNLPspecific> .... >> >> Where is in the IULA model this is considerably simplified to >> >> :resource a ms:Resource ; >> >> ms:actualUse ms:parsing >> >> >> >> This would be great, but it also loses information, for example, the IULA >> schema associates the *availability* with the *Resource*. However, the >> XSD schema associates an *availability* with each *Distribution* >> (download file). In fact, there are resources that have different >> availability for different downloads (e.g., BabelNet), so there is >> information loss here. Thus, LingHub is very conservative and sticks to the >> XSD, e.g., >> >> :resource a ms:ResourceInfo ; >> >> ms:usageInfo [ >> >> ms:actualUsageInfo [ >> >> ms:useNLPspecific ms:parsing ] ] >> >> What shall we recommend here? >> >> >> >> Again, discuss on a case-by-case basis. For instance, for availability, >> we have re-introduced the distribution element, as otherwise we lose in >> semantics. For other cases, I think we should see them more closely. The >> grouping into components made sense in XSD because it brought together >> elements. I will have to look at them more closely and explain for each >> case why this grouping was meant, so that we can decide if this should also >> remain in the RDF mapping. Is there an easy way of spotting these cases? >> >> OK, we should discuss this in a telco. >> >> >> >> A final question: how will we add the comments/decisions from the >> previous googlesheet to the current one? As said, I can do this for the >> distribution/licensing module elements but for the rest? >> >> Add any comments you want (possibly copied from previous doc). Apart from >> that I would like to keep the sheet itself clean until the next ldl4lt >> telco at least >> >> Regards, >> John >> >> >> >> Best, >> >> Penny >> >> >> >> >> > > > > -- > Marta Villegas > marta.villegas@gmail.com > -- Marta Villegas marta.villegas@gmail.com
Attachments
- text/plain attachment: MSxpath.txt
- text/xml attachment: MSxsdxpath.xsl
Received on Thursday, 5 February 2015 11:15:54 UTC