15 June 2023

Create a public alternative to the tech giants

Feature article

Language models can lead to greater inequality and less freedom. Conversely, they can increase our productivity, promote the green transition and alleviate labour shortages. What can we do to reap the benefits without falling on our arse?

Feature article in Politiken on 15 June 2023 by Professor Anders Søgaard (UCPH), Professor Sune Lehmann (DTU and UCPH), Professor Rebecca Adler-Nissen (UCPH), Professor Ole Winther (DTU and UCPH) and Professor Michael Bang Petersen (AU).

HISTORY CONTAINS many crossroads. Some crossroads are insignificant to the course of history. Others are crucial. Which path you choose is crucial to how history, and thus the lives of all of us, will be shaped. The last time we were at such a crossroads with digital technology was with social media.

Back then, we went down a path that allowed a small number of companies to take control of the online networks that connect us. And ever since, we've suffered the consequences: hateful discussions, rapid sharing of misinformation and a constant drain on our attention.

Today, developments in artificial intelligence mean we're at a new and even more crucial crossroads. What do we choose this time? Which way do we go?

OpenAI struck at the right time, invested a lot of time and money, backed by Microsoft and others, and trained an excellent language model, GPT-4. A language model that was significantly better than previous iterations.

A general language model that, like a Swiss army knife, can seemingly solve almost all of our problems at once. It can write your divorce contract, pass a legal exam, fabricate convincing misinformation, perform an effective hacker attack and invent a new logo for your company. And a whole lot more. And now Google has done the trick with PaLM-2.

There are language models out there that are better than GPT-4 and PaLM-2 at some things, but there are probably no language models that are as good at so many things. That may be coming soon. In a couple of weeks. Maybe a few months.

While training a GPT-4 or a PaLM-2 is expensive, the payoff is huge. And dozens of American and Chinese companies have already signed up. The question, of course, is whether they can catch up with those who jumped the gun. GPT-4 in particular has the advantage of being first among the biggest. More on that in a moment.

THE FIRST artificial intelligence researchers also dreamed of technology that, like a Swiss army knife, could do everything at once. In the 50s, researchers worked on a type of artificial intelligence called the General Problem Solver. The project failed and the researchers went their separate ways: some worked on machine translation; others on chess computers.

Now, 70 years later, they - or their scientific descendants - are reunited. Artificial intelligence has once again become a unified field of research. And we now have a tool that, like Google's search engine, has become the first thing we pull out when we have doubts. A tool that, like Google's search engine, we can quickly become dependent on. A tool that can become part of our critical infrastructure. But also: A tool that is unsafe and possibly illegal for many to use.

A clerk in the municipality who wants help from artificial intelligence and language models is currently caught between two stools.

Like many civil servants, they are probably under some time pressure and perhaps even encouraged to use artificial intelligence and general language models to facilitate their workflows. But which technology, which language model, should she choose? It seems like there's a choice between two options: using GPT-4 (or PaLM-2), even though it's probably on the edge of the law; or other, far inferior technology.

There are hundreds of other language models, including for Danish. Some are better than GPT-4 and PaLM-2 for specific tasks, but GPT-4 does almost everything pretty well - a powerful technology with a very natural user interface.

Instead of a toolbox of language models for different purposes, GPT-4 and PaLM-2 are Swiss Army knives. One tool for all your tasks.

But our clerks at the municipality can't use GPT-4 or PaLM-2 for much without running afoul of the law. As a data controller, you must ensure an adequate level of security for the processing of personal data. And OpenAI and Google do not yet guarantee this, even though you can now opt-in to have your data deleted after 30 days.

We also know that language models work better for some people than others. Politicians are rattling their sabres and considering that companies should at least declare if their chatbots, speech recognition software or facial recognisers achieve more useful results for some population groups than others.

This may also prevent our clerk from using GPT-4 or PaLM-2, as she is obviously not allowed to discriminate against citizens. Finally, there are unresolved issues around transparency and accountability.

Our proxy is just one example. Thousands will be tempted by this technology: home carers, school teachers, lawyers, doctors. And thousands will be prevented from using it. And run afoul of the law if they do.

LANGUAGE MODELS put us all between two stools: As described in our first three Chronicles, language models can become a powerful tool in the hands of authoritarian regimes and criminal organisations and, in the hands of tech giants, lead to more intense consumption of empty entertainment and social media.

Language models can also shift global and local power balances, contributing to greater inequality and less freedom for individuals. Conversely, they can increase our productivity, promote the green transition, alleviate labour shortages and compensate for disabilities in certain population groups. And a whole lot more. What can we do to reap the benefits without falling on our arse?

More and more people believe that we need to regulate tech giants' artificial intelligence backwards out the door. Insist that their products must fulfil a wide range of requirements that we decide together - citizens in Denmark, the Nordic region, Europe or the entire population of the world. China is the first country to introduce special legislation in this area, but both Brussels and Washington are working on new legislation. There is momentum for regulation, an open window, and regulation is necessary. But it's not enough. We should also develop a public, safer alternative.

Is it possible to create a competitive public alternative to the commercial chatbots, GPT-4 and PaLM-2, many might ask. A public alternative to OpenAI and Google? Firstly, let's take a look at how the process of producing competitive alternatives to GPT-4 and PaLM-2 is progressing.

Right now, many are debating whether OpenAI and Google will stay one step ahead or whether open source alternatives will conquer the market. Open source gives everyone access to the computer code behind it, the ability to create a new version and the ability to pass the programme on.

Over the past few months, such alternatives to GPT-4 have flourished. Many of them are created on small budgets.

The models are extensions of existing open source language models, such as Meta's Llama or Pythia. Using new techniques for adapting language models and online collections of good dialogues with ChatGPT and GPT-4, you can quickly create your own chatbot. For just a few thousand dollars.

Open source solutions open a lot of doors - also for the development of more secure public alternatives to GPT-4, but it's still a long way from Alpaca and Vicuna to GPT-4 and PaLM-2. And there are at least three reasons why open source solutions are unlikely to catch up with GPT-4 on their own: Firstly, it will be difficult for open source solutions to reach users. OpenAI/Microsoft and Google have huge marketing budgets and privileged access to users through Microsoft's software and Google's search engine.

Secondly, OpenAI has so many users already that they can improve their chatbots much faster than their competitors. Here they have a huge advantage, also compared to Google. Thirdly, OpenAI - through Microsoft - has access to an almost infinite number of computers. The same goes for Google.

WHAT WOULD IT TAKE to produce a public, competitive alternative? A Cern of language modelling, if you will.

The public sector is good at one thing. The public sector has privileged access to citizens - and to infrastructures that make it easier to fulfil requirements such as age verification, data security and fairness. A pan-European alternative could quickly reach a user base similar to OpenAI's.

Two challenges remain: people and computers. OpenAI and Google have many talented researchers and developers. How can the public sector attract equally skilled employees?

The short answer is: because OpenAI and Google employees want to work for the public sector (if the conditions are favourable).

At universities, we are getting more and more applicants from there. Many of them have developed a bad taste in their mouths in recent months and are leaving companies in droves. In addition, it's getting easier to train great language models every day.

Denmark has one of the world's best research and teaching environments for this kind of artificial intelligence. And what about computers?

Whether a public - Danish or pan-European - language model needs to be trained from scratch or not, the training costs are the least of the challenges. It's maintenance and daily operations that are the most expensive.

Running this type of technology is expensive. Just like running the power grid, roads, public transport, hospitals, etc.

The question is, how important do we think it is? How important we think it is to have access to this kind of technology. And how important we think it is to prevent our access to this crucial technology from being in the hands of others. And dependent on their every whim. And we are, we might add, not the only ones who have had this idea: On openpetition.eu, a petition to establish a Cern for the development of open source language models is underway. So far, 3,500 have signed the petition.

WHEN REGULATING the use of commercial language models and investing in public alternatives, it is crucial - as outlined in the previous three Chronicles - that a) language models do not increase inequality and undermine human rights, that b) language models do not contribute to children and young people's - indeed, all of us - attention drain, that c) language models do not make us more vulnerable to criminal and geopolitical threats.

There are many ways to meet these challenges. Several of the legislative packages being discussed in the US, EU and China have proposed declaration requirements that make it clear to users if, for example, the technology works better for some population groups than others.

In the American debate, these are called 'nutrition labels'. Right now, more and more people are suggesting limiting screen time, but you could also consider banning engagement optimisation, infinite scroll, streaks and other technologies designed to hold attention.

As OpenAI has pointed out, it's not a social media and doesn't optimise for engagement, but their language models do in the hands of others, such as Snapchat.

If we take on commercial language models, we also need to demand stability of delivery and access so we can understand and control what's going on in the models that will impact our lives, public discourse and our national security. We need to make sure that criminal activity can be tracked without leaking sensitive personal information. And we need to fight market monopolies.

Regulation of language models will of course apply to both commercial and public tenders. It can also be used as a kind of requirement specification for the development of a public alternative.

The EU is on the way with regulation, but the regulation risks being too late and inadequate. The AI Regulation will be adopted over the next year, but allows for a 24-month preparation period from its entry into force to allow businesses and everyone else to adapt.

THIS MEANS that we won't have legislation in this area until late 2025 or early 2026 at the earliest. Until then, the technology is regulated solely by existing legislation and critical consumers. That's why education about these technologies is crucial.

The scientists who are currently sounding the alarm are not only worried about what will happen in two or three years, but also about what will happen in the next few months.

We propose that a committee be set up with representatives from all those who benefit most from technology and who are most challenged by it: dyslexics, engineers, the elderly, health care assistants, primary school teachers, etc. And that the committee looks at whether the EU AI Regulation and existing legislation in Denmark is sufficient to ensure that no one is left behind.

We also suggest that everyone who can contribute to a shared critical awareness of the challenges and opportunities of AI.

And that we pass the hat around - among the country's major foundations, private donors and politicians.

Perhaps even outside the country's borders. And find out whether we can scrape together the money to operate a competitive alternative to a technology that has currently placed our fate in the hands of a company in San Francisco or Seattle.

And no, of course, Denmark shouldn't go it alone. In 1958, a public meeting on nuclear disarmament was held in Westminster, leading to demonstrations in Downing Street. The organisers also initiated the first Aldermaston March, a four-day march.

Thousands took part. The marches continued for almost ten years. Politicians soon began to get involved. In 1961, US President John F. Kennedy gave a speech in front of the UN General Assembly, announcing the United States' intention to challenge the Soviet Union, not to an arms race, but to a peace race.

Denmark should not go it alone. But even if Denmark should not go it alone, we can, like Great Britain in 1958, take the lead and initiate a race to develop safe, public artificial intelligence.


In keeping with the topic, this article has been translated from Danish by a neural machine translation service.

Topics