Generative AI training data sets are now trackable – and often legally complicated

A new online tool allows users to identify, track and learn about the legal status of training data sets for generative AI, and a quick glance shows that many may have licensing issues.

The tool, dubbed the Data Provenance Explorer, is the result of a joint effort between machine learning and legal experts from MIT, generative AI API provider Cohere, and 11 other organizations — Harvard Law School, Carnegie Mellon University and Apple are all among the contributors. The Data Provenance Explorer lets researchers, journalists and anyone else search through thousands of AI training databases and trace the “lineage” of widely used data sets.

To read this article in full, please click here

Source:: Computerworld