Technology

Wikipedia Is Making a Dataset for Training AI Because It’s Overwhelmed by Bots

The developers of artificial intelligence appear mainly and Wikipedia to provide its training data. On Wednesday, the Wikimedia Foundation announced that it is cooperating with the Kaggle owned by Google-is a famous community science platform-to issue a copy of improved Wikipedia to train artificial intelligence models. Starting with the English and French, the Foundation will provide abstract publications from the text of WAW Wikipedia, with the exception of any references or code of discount.

Being a non -profit platform, which is led by volunteers, which indicates Wikipedia through donations and does not have the content it hosts, allowing anyone to use the content and re -coordinate it from the statute. It is good with other organizations that use the extensive knowledge collection of all types of cases – Kiwix, for example, is a non -internet version of Wikipedia used to smuggle information to North Korea.

But a flood of robots wandering on its website to meet the needs of artificial intelligence training has led to an increase in non -human traffic to Wikipedia, which was interested in processing it with high costs. Earlier this month, the Foundation said that the consumption of the frequency range has increased by 50 % since January 2024. This is not great for the company that does not control its website on the Internet and instead it depends on regular donation engines. The presentation of a standard version of the JSON of Wikipedia artificial intelligence developers should be submitted by the bombing of its location on the web.

“With the place where the machine learning community comes to tools and tests, Kaggle is very excited to be the host of the Wikimedia Foundation data.” “Kaggle is excited to play a role in keeping this data available, available and useful.”

It is not a secret that technology companies do not mainly respect content creators and put a little value on the creative work of any individual. There is an increasingly intellectual school in the artificial intelligence industry. All content should be free and take it from anywhere on the web to train the artificial intelligence model.

But someone must create content in the first place, which is not cheap, and the startups of artificial intelligence have been very ready to ignore pre -accepted standards about respecting the site’s desires not to crawl. Language models that produce human similar outputs should be trained in huge amounts of materials, and training data has become something closer to oil in the mutation of artificial intelligence. It is well known that the leading models are trained using copyright -protected business, and many artificial intelligence companies are still litigating on the issue.

Some shareholders in Wikipedia may hate their content to train artificial intelligence. All writing on the site is licensed under Creative rumors license, support license, which allows anyone to do so Participate freely, adapting, and building work, even commercially, as long as it is caused by the original creator and licensing their derivative work under the same conditions. It is not clear how Wikimedia will ensure that artificial intelligence companies respect these requirements, but Gizmodo has continued to comment.

Don’t miss more hot News like this! Click here to discover the latest in Technology news!

2025-04-17 16:15:00

Related Articles

Back to top button