GA4 432855558 307042592
Bluesky dataset for AI training removed from Hugging Face
0 Comments

Mission LiFE

[ad_1]

Social media

The creator of the dataset issued an apology to concerned users in a post on Bluesky.

A dataset of 1m Bluesky posts uploaded to machine learning platform Hugging Face earlier this week was removed.

On November 26, Daniel van Strien, machine learning librarian at Hugging Face, uploaded a dataset of 1m public posts and their accompanying metadata from Bluesky’s Firehose API. The dataset card explained that it was “intended for machine learning research and experimentation with social media data”.

However, after facing a backlash, van Strien removed the Bluesky data and apologized yesterday (November 27).

“I removed the Bluesky data from the repo,” van Strien posted on Bluesky.

“Although I wanted to support the development of tools for the platform, I recognize that this approach violated the principles of transparency and consent in data collection. I apologize for this mistake.”

He said he left the public repository (to which the dataset was sent) online so users can continue to provide feedback.

As 404 Media noted, the data was anonymized, and each post was listed along with the user’s decentralized identifier.

While many commenters said there should be an opt-out option in data collection, others argued that Bluesky’s data is publicly available anyway and therefore fair use of the data set.

Discourse on data

There is an increasing amount of discourse about the use of people’s data for artificial intelligence (AI) training without the users’ consent.

X, the social media site that Bluesky competes with, found itself in hot water earlier this year when a security expert said Elon Musk’s X is “exceeding the boundaries of digital ownership” by defaulting on users share their posts, interactions and even conversations. with its Grok AI chatbot for the purpose of AI development.

A month later, the Irish Data Protection Commission (DPC) said X decided to suspend the processing of EU users’ personal data to train Grok after the Commission took legal action against him.

Meta, the parent company of WhatsApp, Facebook and Instagram, also faced complaints about its plans to use personal data for AI earlier in the year.

Instead of asking users for their consent, Meta argued that it had a “legitimate interest” in collecting and processing this data. The company used this same legal basis for its personalized advertising policies, but the European Court of Justice rejected this basis last year.

Earlier this month, Bluesky saw a surge in new users, which followed a mass exodus of users from X, and even caused a brief hiatus for the site.

Open source champion Kelsey Hightower, known for his work with Kubernetes and Google, spoke to SiliconRepublic.com about Bluesky’s promise as a decentralized platform.

He said we have been given a new opportunity to get social media right but he said we all have a responsibility to make sure this happens.

Don’t miss out on the information you need to succeed. Register for the Daily SummarySilicon Republic’s must-have sci-tech news summary.

[ad_2]

Source link


Discover more from Mission LiFE

Subscribe to get the latest posts sent to your email.


Leave a Reply

Categories

Bharat Amrutkal Trusr@NGO India.

All rights reserved.

Design by Mission LiFE

Index

Discover more from Mission LiFE

Subscribe now to keep reading and get access to the full archive.

Continue reading