Websites that don't want to be indexed by Google and other search engines have long been able to use robots.txt files that tell robots that they are spammy. There's no law requiring this, but Google, Yahoo, Bing, and others have always followed this recommendation.
Since Open AI Chat released GPT and started the AI gold rush, robots.txt has also begun to be used to require AI companies not to collect all the content on websites to train their large language models. But AI companies don't have the same moral compass as search engine developers. Reuters Reports indicate that many companies simply chose to ignore the files and wishes of website owners.
This revelation comes after a letter from Tollbit, a company that mediates between website publishers and artificial intelligence developers in order to obtain content licensing agreements. Wired previously accused Perplexity of ignoring robots.txt files on its site and other Condé Nast sites. according to Interested in trade He also ignores Anthropic and Open AI files, despite previously saying to respect them.
says Aravind Srinivas, CEO, Perplexity Fast company The company's robots do not ignore robots.txt files, but purchase materials from other companies that have done so. When the reporter asked him whether the company would now ask the partner to start respecting the files, he replied, “It is complicated.”
More Stories
EA President Talks New Dragon Age: 'A Return to What Made Bioware Great'
She thought she had bought a phone – she was shocked by its contents
Rumor: Lots of AI in Google's Pixel 10 and 11 cameras