Popular LAION-5B artificial intelligence training dataset contains images of child sexual abuse - study

By: Bohdan Kaminskyi | 21.12.2023, 14:17

LAION

The LAION-5B artificial intelligence training dataset contains at least 1679 references of child sexual abuse images (CSAM).

Here's What We Know

Researchers at the Stanford Internet Observatory began analysing the LAION datasets in September 2023 - they checked image hashes using specialised platforms to detect CSAM. There was also a check by the Canadian Centre for Child Protection.

According to the website description, LAION is an index of images from the internet, not a repository. Nevertheless, experts note the potential danger of having CSAM material in it - AI models trained on such data can learn to create malicious content.

The researchers recommended discontinuing the use of AI models that were trained on LAION-5B. In particular, Stability AI's Stable Diffusion model was partially trained on this data.

Google also used an earlier version of LAION for Imagen, but later abandoned that data.

Source: The Verge