Proposed Bill Is Another Turning Point in the Tussle Between Generative AI and Human Creators
After the release of Dall-E 2 and ChatGPT in 2022, the popularity of generative AI has exploded. Turning a single sentence or prompt into endless text, a picture, a song, or more is a novelty unlike anything we’ve ever seen. Gen AI tools have been adopted by hundreds of millions of people for everything from silly memes, to experimental art projects, to shortcuts at work, to (often ill-advised) replacement of whole positions.
Generative AI is controversial, though. Many content creators and marketers have rejected the use of AI creations for a variety of reasons: legal uncertainty, inauthenticity of content, hypocrisy with stated values, desire to prioritize funding human creators, and more. Dove, for example, pledged to never use AI to portray people in ads, as the usage of AI could contradict the message of their long-running “Real Beauty” campaign. On the other hand, Under Armour received intense criticism after running an ‘AI-powered commercial’. Many contracts with ad agencies now contain AI restrictions to prevent controversies and preserve authentic, resonant content.
And now, generative AI might face its biggest challenge yet – from the legal sphere. A bill introduced in the US congress by California congressman Adam Schiff would force companies using generative AI to disclose any copyrighted material used in training their AI models subsequently used for commercial use. This bill follows several lawsuits made by copyright holders, claiming their copyright is violated by training an AI on their works, or using the model created from that training. These lawsuits are often dismissed, based partially on the lack of legal precedent on whether or not this really is copyright violation. While corporations such as YouTube claim that training constitutes a “clear violation”, OpenAI won’t confirm what their video-creating Sora was trained on. This is the other factor: it’s notoriously hard to prove definitively whether any given model is trained on any given work. This bill could change all of that.
Whether this bill passes or not, a day of reckoning could come soon for gen AI’s purpose-built, commercial offerings. On the state level, Tennessee enacted legislation that enshrines an artist’s voice as their intellectual property, preventing AI-driven vocal knockoffs from being commercially viable. Any legal precedent for training being copyright violation could start knocking over dozens of lawsuit dominoes, especially if they’re forced to disclose copyrighted material in training data. At the same time, nearly all publicly traded technology companies, such as Google, Microsoft, and Meta have invested untold billions into AI – they won’t back down without a fight.
Let’s look at the possible scenarios and what they might mean for your own content creation and marketing campaign usage.
Scenario One: The Bill Doesn’t Pass
One scenario is that the bill fails to generate the support it needs and the day of reckoning for gen AI is delayed. OpenAI and other tech giants certainly have the lobbyists necessary to sway Congress, so the possibility exists. It depends on whether their lobbyists can match the might of the lobbyists of the RIAA, the SAG-AFTRA, and other powerful entertainment industry organizations that support the bill.
Although the failure of this bill wouldn’t prevent a similar bill from appearing in the future, it could easily embolden these companies to train even more on copyright data. Currently, most models are trained on gigantic corpuses of examples that may or may not include copyrighted data – the collections are often simply too large to determine exactly what they contain, creating a defense against the accusation that copyrighted data is being deliberately used. If there’s no chance of that usage coming to light, AI companies may switch to openly training from datasets that definitely do contain copyrighted material, such as YouTube videos, Instagram posts, or commercial novels. As the pool of “safe” datasets – ones that can claim, however dubiously, no copyright usage – runs dry, AI companies will be eager to expand for future training.
If you’re a content creator concerned with having your material used in training or models being able to emulate your style, bills like this failing should put you on alert. Regardless of where your content is being displayed, it may end up part of a training set – without AI companies even needing to disclose it. Countermeasures like poisoning your art against training can give you some assurance, but supporting these legal efforts to restrict training is the most impactful path.
If you use gen AI content as part of your marketing campaigns, you should be wary. Although bills like this failing will safeguard your ability to do so, the PR backlash you’ll face may also intensify. Explicitly and consistently hiring artists in parallel with using AI in specific cases may help you enjoy the efficiency of gen AI without losing support.
Scenario Two: Gen AI Becomes Weaker
If the bill passes and AI companies have to disclose all copyrighted works used in their training data, they’ll be left between a rock and a hard place. If they disclose the copyrighted material, they may open themselves up for lawsuits from the copyright holders. Instead, they may choose to retrain their models on exclusively fair use material. This copyright-free content is often “stock content” – by its very nature designed for use across many types of campaigns, and doesn’t produce anything dynamic, unique or surprising. Models trained with only stock visuals are limiting and too staged without real human intervention and storytelling genius.
As mentioned before, the huge training corpuses AI companies use are often far too large to sift through and remove copyrighted works. Instead, they’d have to manually build “whitelisted” copyright-free datasets, which would necessarily be much much smaller. Smaller datasets means weaker models – capable of a narrower range of tasks, less variance possible from each prompt, and less detail and accuracy. In fact, OpenAI has claimed in a statement to UK courts that tools like ChatGPT are impossible without copyrighted training material.
For content creators wanting to not be trained on or emulated, this scenario should spur you to be more vigilant when it comes to protecting your content from unpermitted use. AI companies will push the limits on what will still technically fall outside of copyright, so make sure to control things like thumbnails or reposts of your work. If you’re financially motivated – as most of us are – you should also see this as a big business opportunity. A flood of people previously happy with AI output will likely pivot back to human-made content, and may be eager to reach out.
For marketers that were taking advantage of gen AI, this scenario should reinforce an existing best practice: don’t get complacent. Even just with the natural flux of AI models, your outputs can get significantly better or worse on a day-to-day basis. Always be vigilant with careful review and editing before you publish anything made by an AI, especially when there’s news of retraining. Always review material generated by AI tools: you’ll need to fact-check against hallucinations in copy, and your content often screams for personality and an injection of human-made thought and opinion.
Scenario Three: AI Companies Start Paying Copyright Holders
AI companies certainly don’t want to make their models useless by proactively removing all copyrighted works, nor do they want to get sued and forced to remove the works. An option they’re likely to consider is buying the rights to train from the copyright holders. They’ve actually already started – reddit is selling its comment sections as training data for $60 million a year, and Apple has been exploring deals to use news publishers as training data.
Although this wouldn’t result in as much training data as the current “anything goes” scenario, it doesn’t necessarily mean weaker models. Companies able to purchase the highest quality and most relevant data for their specific uses may be able to create even more powerful models than today. It will, however, cost them a lot more. Rather than the current arms race between AI companies to make the most powerful models, we’ll likely see more strategy and discretion about what capabilities are most important to users.
For content creators, don’t hope for this to just be an instant payday. Unfortunately, you may have to become an expert in the terms of service on every website you post on. As training is without real legal precedence, current terms won’t make explicit reference to it, and it may be ambiguous how training fits into other use cases outlined. These terms may be rapidly updated to put the authority, and payout, of that usage in the hands of the site owners, and not the creators. As always, it’s a time of vigilance, and of supporting efforts to get rights securely in the hands of creators.
For AI model-using marketers, this scenario presents a mixed bag. This “copyright marketplace” era may lead to exciting new models, but the costs to AI companies will certainly be passed on to users. Make sure you’re prepared for AI budgets to increase, and be ready and willing to pivot to human solutions if the cost becomes too high.
We hope this exploration of the future of generative AI has been helpful, whether you’re an AI advocate, skeptic, or watching on the sidelines to see how it plays out. At Catch+Release, we advocate strongly for the training data of AI models to be public and accredited. Moreover, we advocate for all content in all contexts to be attributable and licensable so that creators can retain control. If your content is being used in any capacity – to train a model, or as marketing material directly – you should need to consent and be compensated.
No matter what happens with gen AI, traditional, human-created content isn’t going away. It remains the best way to give your campaigns uniqueness, resonance, and authenticity. It’s also the best, and pretty much only, way to train any AI model worth using. Check out our Creator Community to see the richness of amazing content being generated by real humans available for license and use in campaigns now.
P.S. Like the blog’s thumbnail? Shoutout to @bryanbanducci from our Creator Community!