Re: page about robots.txt from Almalibri on 2021-02-25 (public-tdmrep@w3.org from February 2021)

From: Almalibri <claudio.tubertini@almalibri.it>
Date: Thu, 25 Feb 2021 16:05:43 +0000
To: Leonard Rosenthol <lrosenth@adobe.com>
Cc: Brendan Quinn <brendan@cluefulmedia.com>, "laurent.lemeur@edrlab.org" <laurent.lemeur@edrlab.org>, "public-tdmrep@w3.org" <public-tdmrep@w3.org>
Message-ID: <2KCfMyL3dP1dkQO0_j9dHFZf249EOv1AxZU0zkwbsOXVCZR61oOHn85GWCBghj7GnQbA0ce9qxx2ILe>
Hi everybody. Sorry I didn't get to yesterday meeting but I would like to add something about robots.txt anyway. Leonard is right about the lack of reference to the assets in web pages, as they are described in robots.txt. We should be able to refer to single copyrighted items in a much more detailed way. The only thing I could think of is using json-ld together with someting like shema.org. This json file could be found along with robots.txt in the same directory and could be offering all the necessary details about every single assets in a page. What do you think?

=======================
Claudio Tubertini
Almalibri
mob +39 327 1503898

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Thursday, February 25, 2021 3:58 PM, Leonard Rosenthol <lrosenth@adobe.com> wrote:

> As I mentioned on the call, the biggest problem with robots.txt (and the others that Brendan mentions) is that they are completely detached from the assets that they refer to. This means that a user can simply move an asset from one server to another, and all TDM information/rights will no longer apply. This, IMO, makes it a non-starter as an option.
>
> Leonard
>
> From: Brendan Quinn <brendan@cluefulmedia.com>
> Date: Thursday, February 25, 2021 at 9:02 AM
> To: "laurent.lemeur@edrlab.org" <laurent.lemeur@edrlab.org>
> Cc: "public-tdmrep@w3.org" <public-tdmrep@w3.org>
> Subject: Re: page about robots.txt
> Resent-From: <public-tdmrep@w3.org>
> Resent-Date: Thursday, February 25, 2021 at 9:02 AM
>
> Thanks Laurent, that looks good.
>
> It's probably worth mentioning that there are some provider-specific extensions to robots.txt used in the wild, eg sitemap: used by "Google, Bing,and other major search engines".
>
> [https://developers.google.com/search/reference/robots_txt#google-supported-non-group-member-lines](https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdevelopers.google.com%2Fsearch%2Freference%2Frobots_txt%23google-supported-non-group-member-lines&data=04%7C01%7Clrosenth%40adobe.com%7C2b7b105680864ad5690c08d8d99603d9%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C1%7C637498585628386484%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=%2FQcl2BqfQPOnjV%2Bg1J7F%2Bd9T0LEQsIvkCjpsXPkR8Jw%3D&reserved=0)
>
> I guess we should also document the .well-known folder, with spec here: [https://tools.ietf.org/html/rfc8615](https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftools.ietf.org%2Fhtml%2Frfc8615&data=04%7C01%7Clrosenth%40adobe.com%7C2b7b105680864ad5690c08d8d99603d9%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C1%7C637498585628396441%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=bSNGMhpPpwQ6MtYQsFQ9HSSMZFkpT%2BYyK2vtOCBY5HM%3D&reserved=0) and the quite extensive "well-known URI repository" at [https://www.iana.org/assignments/well-known-uris/well-known-uris.xhtml](https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.iana.org%2Fassignments%2Fwell-known-uris%2Fwell-known-uris.xhtml&data=04%7C01%7Clrosenth%40adobe.com%7C2b7b105680864ad5690c08d8d99603d9%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C1%7C637498585628396441%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=EcCnu7bPXtdIhe97lNAsXPI7c7%2FNzGeH5NkKTv1kiWk%3D&reserved=0)
>
> Also see IAB's ads.txt initiative: [https://iabtechlab.com/ads-txt/](https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fiabtechlab.com%2Fads-txt%2F&data=04%7C01%7Clrosenth%40adobe.com%7C2b7b105680864ad5690c08d8d99603d9%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C1%7C637498585628406402%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=PqBbBZ0pRnjcTplqC1EeefkSLuEiuKpWlR4ZGfa2RaY%3D&reserved=0)
>
> Sorry I missed the call on Tuesday. I hope it was fruitful.
>
> Best regards,
>
> Brendan.
>
> On Thu, 25 Feb 2021 at 15:29, Laurent Le Meur <laurent.lemeur@edrlab.org> wrote:
>
>> Dear participants,
>>
>> I have added a page to the Github repo, which tries to summarize what is robots.txt and how it is used. Robots.txt has been described by Ivan Herman as a possible source of inspiration during our last call.
>>
>> [https://github.com/w3c/tdm-reservation-protocol/blob/main/docs/robots.md](https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fw3c%2Ftdm-reservation-protocol%2Fblob%2Fmain%2Fdocs%2Frobots.md&data=04%7C01%7Clrosenth%40adobe.com%7C2b7b105680864ad5690c08d8d99603d9%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C1%7C637498585628406402%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=kZWJwgmJ3PjWOOutxW1xjB4qMi0T2BUdFAWMgNtaIDo%3D&reserved=0)
>>
>> Best regards
>>
>> Laurent Le Meur
Received on Thursday, 25 February 2021 16:09:35 UTC