Re: How can TDMRep be used in PDF?

Laurent – your mention of ChatGPT is interesting, because it’s actually not clear legally if “use of data for the purposes of training an AI/ML model” is the same thing as “data mining”.   Currently, the industry position is that they are very different – and that mining restrictions do not prevent training (and vice versa).

Leonard

From: Laurent Le Meur <laurent@edrlab.org>
Date: Friday, March 10, 2023 at 12:22 PM
To: public-tdmrep@w3.org <public-tdmrep@w3.org>
Subject: Re: How can TDMRep be used in PDF?

EXTERNAL: Use caution when clicking on links or opening attachments.


Hi Vincent,

The community group will stay alive, and we certainly should setup a call to check renewed interest for this project, now that chatGPT has shown what AI agent can get from freely available content on the Web.

The work done by the group was initially scoped by the desire to keep things simple: we used robots.txt as an example of something sufficiently simple to be universally accepted. This led to the 3 proposed techniques (use http headers, use a central file, or use html meta tags). A content provider has a choice between these different techniques, but a TDM Actor (an AI Agent) MUST support all three techniques.

I could argue that using http header or a central file was sufficient: this means acting at the level of the transfer protocol, before the AI Agent gets the content. Adding the possibility to embed information in an HTML resource was done essentially because this was a know practice for web indexing (cf robots.txt). But now that we have this metadata embedding capability, I understand that publishers would like to use it on non-HTML resources, namely PDF and EPUB.

So, ok we can study this evolution. We should have good reasons to do so, as if we go there, AI Agents will have to (= MUST) support these additional techniques: this may become an issue.

Note: If this group decides to use XMP for embedding metadata in PDF (and other formats supporting XMP), I think that these new properties should be equivalent to the properties currently embedded in HTML pages and elsewhere, i.e. "tdm-reservation" (0 or 1) and "tdm-policy" (a URL).

Best regards
Laurent
(currently co-chair of the TDMRep CG)



Le 9 mars 2023 à 15:56, Lizzi, Vincent <Vincent.Lizzi@taylorandfrancis.com<mailto:Vincent.Lizzi@taylorandfrancis.com>> a écrit :

Dear TDMRep community,

There was a recent notice from W3C that this community group might automatically close soon, so this seems like the time to get a question in for this group. I work for a publisher that distributes content in multiple formats including web pages, EPub, and PDF. The PDF format is still very popular among our readers. The PDF files that we publish contain embedded XMP metadata that provides the title, authors, digital object identifier (DOI), version (using NISO JAV terminology), copyright, and license information. In considering a possibility of someday having to implement TDMRep for our content, the TDMRep Final Community Report shows how TDMRep metadata can be encoded in HTML meta tags which could be used in web pages and EPub files. It is unclear as to whether, or how, TDMRep metadata can be encoded in RDF XML to be included in the XMP metadata of a PDF file. Is this use case far outside of the intended scope of TDMRep? If this is a reasonable use of TDMRep, can you provide any guidance on how to encode TDMRep metadata in the metadata of a PDF file?

Thank you,
Vincent

______________________________________________
Vincent M. Lizzi
Head of Information Standards | Taylor & Francis Group
530 Walnut St., Suite 850, Philadelphia, PA 19106
E-Mail: vincent.lizzi@taylorandfrancis.com<mailto:vincent.lizzi@taylorandfrancis.com>
Web: www.tandfonline.com<https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.tandfonline.com%2F&data=05%7C01%7Clrosenth%40adobe.com%7Cdb561cc2dc714c37d89e08db218bf972%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C638140657311307117%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=NizU8BZIsePBkfqY%2B%2BXLED%2BROfjN5uh2SDhBJYDlzT8%3D&reserved=0>

Taylor & Francis is a trading name of Informa UK Limited,
registered in England under no. 1072954

"Everything should be made as simple as possible, but not simpler."



Information Classification: General

Received on Friday, 10 March 2023 17:25:23 UTC