I was just listening to Kenny Mac's song The rise of Ai (https://songcrafters.org/forum/index.php?topic=35766.0), and I wondered if Songcrafters had any kind of policy or prohibition about Large Language Models (LLMs) from scraping the content from this site.
Have we considered whether we will allow LLMs to use Songcrafters pages – posts, lyrics, music – as training data?
I'm in favor of it – blocking LLMs that is. Although, the horse may be out of the barn. I also know that the methods I can detect (i.e. by looking at robots.txt for directives like these (https://robotstxt.com/ai) or the HTML for robots directives in tags) are the exact kind of directives that AI scrapers are known to ignore completely (articles 1 (https://www.theregister.com/2024/07/30/taming_ai_content_crawlers/), 2 (https://www.theverge.com/2024/7/25/24205943/anthropic-ai-web-crawler-claudebot-ifixit-scraping-training-data), 3 (https://www.theregister.com/2024/07/03/cloudflare_ai_blocks/), 4 (https://www.engadget.com/ai-companies-are-reportedly-still-scraping-websites-despite-protocols-meant-to-block-them-132308524.html)). So perhaps there are already measures in place that I can't detect – such as server configurations or firewall rules.
We put a lot of effort into our songs and lyrics, and while some of us freely share our work, others are more protective of our IP. If AI models train on our content without consent, they could reproduce or repurpose our creativity without credit or compensation. If we want to try to keep LLMs off this site, we should consider leaning towards the more restrictive end of the spectrum.
I'm curious to hear what the administrators and members think about this. Have any steps already been taken? If not, is this something we might want to discuss?
I suspect that this is something that I should be very concerned about, but I have very little to add that might prevent it. If it were quick and easy then I guess I would be in favour of stopping malicious individuals from doing dodgy things with my music. However, if this hampered other Songcrafters members from listening to my stuff then probably not. I often post my stuff on other sites too, so even if it is possible to make this site totally secure, then they (whoever "they" are) would probably find a way to get access elsewhere.
I try very hard not to worry about things that I cannot affect, and since I have not made loads of money from my music I haven't lost anything if someone else does.
I hope that doesn't sound too apathetic. :)
Quote from: chapperz66 on February 19, 2025, 08:28:00 AMIf it were quick and easy then I guess I would be in favour of stopping malicious individuals from doing dodgy things with my music. However, if this hampered other Songcrafters members from listening to my stuff then probably not.
The measures against LLMs scraping info wouldn't affect any humans from accessing the site. Aside from the polite measures that ask the bots to go away, the more aggressive measures would detect bots and block them. And it may be the case that these measures are already in place on Songcrafters, but I can't detect any. (Not that I'm a specialist in this stuff.)
Today I learned about poisoning AI.
Quote from: euronews.comVisual artists are now exploring how to "poison" models (https://www.euronews.com/culture/2023/03/27/from-lawsuits-to-tech-hacks-heres-how-artists-are-fighting-back-against-ai-image-generatio) by adding a layer of data acting as a decoy for AI and therefore, preserving their artistic style by making it harder to mimic by genAI.
https://www.euronews.com/next/2025/03/26/trapped-in-an-ai-labyrinth-one-companys-plan-to-stop-to-bots-scraping-content-for-ai-train
Quote from: https://newatlas.comResearchers at the University of Tennessee, Knoxville and Lehigh University have now developed a new tool that could help musicians protect their work from being fed into the machine. It's called HarmonyCloak, and it works by effectively embedding a new layer of noise into music that human ears can't detect but AI 'ears' can't tune out.
This extra noise is dynamically created to blend into the specific characteristics of any given piece of music, remaining below the human hearing threshold. But any errant AI models that scrape the music can't figure out which bits to ignore, so it kind of poisons the well and ruins their attempts at recreation.
https://newatlas.com/ai-humanoids/harmonycloak-music-protection-ai/
To implement something like this on Songcrafters is probably not feasible. It would require that every MP3 be altered. But I love that these efforts are being taken.