Perplexity is allegedly scraping websites it’s not supposed to, again

0 2 minutes read

It is claimed that the web crawls are published by confusion on web sites are to avoid restrictions, according to a new report issued by Cloudflare. Specifically, the report claims that the company’s robots seem to be “ghost crawl” by hiding its identity to circumvent Robots.txt files and protection walls.

Robots.txt is a simple file site that allows the web crawling to know if they can scrape the content of the web sites or not. Web robots crawling on the Internet in confusion are “confusing” and “puzzling user”. In Cloudflare tests, the confusion was still able to display the content of a new web site, even when those specified robots were banned by Robots.txt. The behavior has extended to web sites that contain specific web protection wall rules (WAF) that also restrict the web crawling.

A streamlined plan is tried by Cloudflare to clarify the different ways of Perplexity web crawls to access the content of the web site.

(Cloudflare)

Cloudflare believes that confusion revolves around these obstacles by using a “general browser aimed at impersonating Google Chrome on MacOS” when Robots.txt is prohibited by his regular robots. In CloudLfare tests, the company’s undeclared m celebration can also rotate through the unlikely IP IP addresses in Perplexity to obtain protection walls. Cloudflare says that confusion looks like it is doing the same with the independent system numbers (ASNS) – ID for IP addresses that are managed by the same work – he writes that she has monitored an ASNS changing “across tens of thousands of areas and millions of requests per day.”

Engadget has communicated to the confusion to comment on the Cloudflare report. We will update this article if we hear.

Modern information from web sites is vital to training companies on artificial intelligence models, especially with the use of confusion in service as alternatives to search engines. Also, confusion has been arrested in the past, defrauding the rules to stay aware. Multiple web sites reported in 2024 that confusion was still reaching its content despite its ban in Robots.txt- something that the company blamed the web crawls from the third party that it was using at that time. Later, Perplexity has made a partnership with many publishers to exchange revenue gained from the ads offered alongside its content, apparently as a base for her previous behavior.

It is possible that preventing companies from bulldozing the web content will remain a game of Whack-A-Mole. Meanwhile, the Cloudflare removed Perplexity robots from the list of verified robots and implemented a way to determine and prevent the ghost creeping in Perplexity from accessing the content of its customers.

Don’t miss more hot News like this! Click here to discover the latest in Technology news!

2025-08-04 21:11:00

0 2 minutes read