We must protect creatives from AI freeloading

1 Jan

This is the problem with building your business model on suspected theft.

I have spent a long time now quietly watching the development of large language models (LLMs) and the growing furor around accusations that owners used copyrighted data on an industrial scale to train them.

And I’ve been very dubious.

It all seemed wrong.

I’d spent years sinking tens of thousands of pounds into courses, books, and subscriptions to content providers to inform my writing.

That these private corporations could potentially just hoover it all up without so much as a please or a thank you (let alone some actual compensation) seemed immoral at best and illegal at worst.

I’m not the only person to find this troubling.

The Financial Times, DMG Media and PRS for Music have all reported thattheir content has been used to train large language models without their permission, with PRS for Music describing the current situation as:

“…industrial scale infringement”

A member of The House Of Lords similarly described it as:

“…state sanctioned theft”

And everyone from musicians to copyright experts, award-winning journalists and even an OpenAI whistleblower (now tragically dead supposedly by suicide although serious questions remain) have come out to say this is not okay.

Meanwhile, an investigation by The Atlantic alleges Meta may have accessed a “shadow library” containing millions of books and academic papers taken without permission to train its large language model, Llama.

Anyone going to put a stop to this?

A legal grey area

The issue of whether large language model owners broke the law when they took copyrighted data is contested.

People who defend the practice claim that taking copyright-protected works is justified under a principle called “fair use” which permits copyright works to be used for educational purposes in certain circumstances.

However, critics argue this is nonsense and that “fair use” laws were never designed for companies to scrape huge quantities of data to input into profit-making LLMs.

I agree.

Fair use was designed for teachers and journalists to reference copyrighted works in ways that contribute to human learning and development, not to allow profit-making corporations with the creative abilities of a lump of rock to rip off design visionaries.

Thankfully, there’s evidence people in positions of power are thinking similarly.

In a US court case from earlier this year, the court ruled that AI firm Ross Intelligence was in breach of copyright law when they used content from Thomas Reuters to train an LLM that undercut Thomas Reuters’ market.

For all the lofty claims that AI will democratise creativity, underneath it all lies plenty of good old fashioned corporate theft.

Meanwhile in the UK, The House of Lords has repeatedly blocked the government’s attempts to whitelist AI companies using copyrighted data unless copyright holders object.

Concerns that the law will come down on the side of copyright owners are so great, Republicans in the US have proposed a ban on lawsuits being brought against AI companies for 10 years.

The fact that a budding autocracy is furiously trying to prevent an entire industry from being held to account suggests something is very off.

Welcome to dystopia

We do not have any rules governing the development of AI in the UK.

This is a problem.

Because AI won’t just be used in ways that benefit us, it will be used to harm us.

Nearly 30 decades since the start of the internet era, we are finally getting a sense of how dramatically technology is changing all our lives, for good and bad.

Yes, social media has connected us, educated us and empowered us. It has also made many of us isolated, misinformed, distracted and vulnerable to dangerous and predatory influences.

And some of the impacts of those harms are truly grave. Democracy is under serious threat. Vaccinations of children against preventable diseases are falling. Young men are being radicalised against women.

Already we’re seeing AI being used in similarly dangerous ways.

For example, judges have begun citing fake AI-generated cases during legal proceedings in a sign of how AI is set to supercharge misinformation.

Predictive policing is here and, despite years of warnings, it’s already targeting racial minorities thanks to being powered by racist stereotypes.

Companies are building starter kits for violent men to commit rape and incest on AI avatars, with evidence of this spilling out into real world violence.

And who’s going to be footing the bill to deal with the fallout from all those problems? Taxpayers, obviously.

Oh and let’s not forget the whole extinction threat.

The idea that, left to its own devices, AI is going to deliver some kind of utopia is ridiculous.

This is why we urgently need regulation. We need innovation that is constrained by ethics. Allowing companies to do whatever they want is not progress, it is dangerous.

Could enforcing copyright law be one part of the solution?

Copyright owners as guardians

Recently, the UK government has been consulting on how to approach copyright law and AI.

Their preferred solution — to allow AI developers to access data unless copyright holders object — has been roundly criticised by many in the creative industry.

But there’s something else worth nothing.

That in this proposal there is a tacit assumption that it’s a good thing AI developers can access large quantities of training data, with the only contentious issue being how you balance that with the rights of owners.

But let’s take a step back for a minute.

Is it really a good thing AI developers can get all the data they want unless someone objects?

What about the harms AI could generate from precisely that ease of data access?

Perhaps we should be thinking about data access, not just as an exercise in balancing the rights of creatives with the ambition of AI companies, but as a tool with the potential to act as a check on harmful or unprofitable businesses that shouldn’t be in operation in the first place.

I am happy to donate my data to models that will power medical innovations and free apps that teach children to read and write. I am not happy for it to be used in models that disseminate propaganda.

Seen in this way, empowering data owners to make decisions about who gets their data becomes akin to regulation which is a form of quality control.

And if all these pesky restrictions kill the industry as OpenAI CEO Sam Altman claims, then we can set up a publicly owned entity to build models on behalf of the British people, in our interests.

But private markets will be constrained, as they should be.

Don’t we want our markets to consist of high quality products rather than being flooded with cheap, bad and potentially dangerous products because the entry price for data is zero?

I do.

By empowering data owners to decide for themselves who gets their data, we may create a market in which only companies that truly add value to our lives are able to succeed.

Technology used in ways that truly benefit us.

Doesn’t that sound nice?

This isn’t progress

In some ways, my concerns about the copyright debate are more existential than legal. From our earliest years, we are taught that stealing is wrong and yet that is arguably what is being facilitated and defended on a mass scale today.

Call it innovation.

Call it progress.

Call it democratising art.

You’ve taken something that wasn’t yours to take.

I find it chilling how many intelligent people I’ve met who seem willing to defend the indefensible. I find it even more chilling that so many so-called “innovators” seem to have lost the connection to the part of themselves that holds more value than any other: their integrity.

People talk about how artificial intelligence is progress. But progress towards where, exactly?

Towards a world in which our values — the glue that holds societies together — are being consistently undermined by people who would willingly sacrifice anything at the altar of innovation and greed?

For the first time in my lifetime, I am no longer confident that the world the next generation will inherit is better than the one gifted to me and it’s not because of climate change or wars.

It’s because so many people with wealth and power no longer seem to know right from wrong and yet instead of reprimanding them and demanding they do better, we revere them.

Our values are our most precious inheritance and AI corporations, in their flagrant transgression of decades-long norms, are placing that legacy in doubt.

Rein them in.

Nadia Huq

We must protect creatives from AI freeloading

A legal grey area

Welcome to dystopia

Copyright owners as guardians

This isn’t progress

It’s not just AI that needs clear ‘prompts’ — humans do too