Re: Semantic File Inspector

 Hello!

I was not aware of Strigi; it looks interesting, but it seems Recoll has
replaced it. I might add support for this and similar systems like SPARQL
Anything or Extractor in the future as extensions. The format support is
definitely going to become more modular.
As for NEPOMUK, I have used their vocabularies extensively; there is
nothing better to describe file system hierarchies, as far as I can tell.
:-)

Unrelated: I have also received feedback through the form regarding a few
things, which might be good to address publicly here:
- Input file name is not included in the output ‒ this is true only if the
option -d/--data-only is set, and only for the top-level input file.
Removing this option from the default arguments line adds the file node
into the output, at the cost of not having a stable URI to identify it.
- Relative URIs are used ‒ this happens in the default Turtle formatter to
make the output more readable, but it might not be correctly parsed in
other software (I had issues with Virtuoso for example). You can add
-u/--ugly or -b/--buffered to disable it, or use NTriples as the output
format if there are other formatting issues.
- Duplicate triples are produced ‒ this is not a bug, but a result of
multiple analyzers describing the same input (images may be processed by up
to 3 analyzers!). By default, triples are serialized directly to the
output, so it does not remember whether a triple has been already produced.
However, this can be fixed using -b/--buffered, if undesirable.

IS4

po 10. 4. 2023 v 7:54 odesílatel Egon Willighagen <
egon.willighagen@gmail.com> napsal:

>
> Hi IS4,
>
> do you know Strigi/Nepomuk? See e.g.:
>
> -
> https://dot.kde.org/2007/04/11/road-kde-4-strigi-and-file-information-extraction
> -
> https://dot.kde.org/2007/05/08/interview-flavio-castelli-strigi-developer
> -
> https://neksa.blogspot.com/2007/08/strigi-plugins-tokenizers-and-ontology.html
>
> Looking forward to seeing your code online.
>
> With kind regards,
>
> Egon
>
>
> On Thu, 30 Mar 2023 at 15:58, IS4 <illidans4@gmail.com> wrote:
>
>> Hello!
>>
>> I have made a tool that can describe any file or piece of data, including
>> its formats and contents, in RDF: the Semantic File Inspector, available at
>> https://sfi.is4.site/ (requires WebAssembly)! It currently supports over
>> 30 different formats including common media formats, archives, executables
>> and documents, and is able to collect rich metadata including common file
>> properties and format-specific properties such as image dimensions and
>> others, it can also compute hashes using various algorithms to describe and
>> identify the data, and it encodes all of this in RDF using common
>> vocabularies found on the semantic web, with the possibility to save the
>> result in one of the many RDF serialization formats, or use SPARQL to
>> extract information or data.
>>
>> Additional cool abilities:
>> - It can represent and describe files at different levels of abstraction,
>> for example: the file node itself, its binary/text content, the XML
>> document it encodes, and the object it represents.
>> - It can derive multiple formats from a single file, for example both ISO
>> and UDF from images.
>> - It looks recursively into archives or other resources storing resources.
>> - It can emulate MS-DOS executables and store their output as
>> dcterms:description.
>> - All components can be configured or disabled, if needed. Plugins may be
>> developed for additional functionality.
>> - No data is sent anywhere; all runs in the browser.
>>
>> I would be happy for any feedback. Code will be available soon.
>>
>> Enjoy!
>>
>
>
> --
> Predicting binding affinities can be predicted for each protein variant
> with a new QSAR model that takes into account the amino acid change:
> https://jcheminf.biomedcentral.com/articles/10.1186/s13321-023-00701-3
>
> --
> E.L. Willighagen
> Department of Bioinformatics - BiGCaT
> Maastricht University (http://www.bigcat.unimaas.nl/)
> Homepage: https://egonw.github.io/
> Blog: https://chem-bla-ics.blogspot.com/
> Mastodon: https://scholar.social/@egonw
> PubList: https://orcid.org/0000-0001-7542-0286
>

Received on Wednesday, 12 April 2023 14:55:19 UTC