Roundup: Will generative AI have many long-term benefits?

A lamb in the foreground with 2 + 2 = 5 in the background — A cute lamb that can’t do math. Photo: Unsplash/Elimende Inagella

Remember the exciting days of early 2023 when Sam Altman and his fellow AI boosters were promising chatbots would change the world for the better? They’d be our doctors and teachers, they said, and become assistants for all manners of tasks. Well, the evidence is slowing coming in and it’s showing those claims were a load of bullshit, as many predicted.

Let’s start with education. Some researchers at the University of Pennsylvania decided to see whether ChatGPT really improved students’ math skills. In a study of Turkish high school students, they found students used the chatbot as a crutch, and rather than improving outcomes, they concluded that ChatGPT could actually harm students’ learning. The researchers gave some students access to ChatGPT and some prepared without it. On practice problems, the students using ChatGPT answered 48% more correctly, but when it was time to take the test without the tool they did 17% worse than the students who never used a chatbot. They gave a third group a version of ChatGPT that only provided hints rather than answers. While that group answered 127% more practice problems correctly, they did no better on the actual test than students who didn’t use a chatbot at all.

How about medicine? One of the most common examples we hear for AI implementation (and not just the generative variety) in healthcare is for cancer and x-ray screenings, but a recent study of automation bias calls that into question. Two radiologists typically assess a mammogram to determine whether a patient has breast cancer, but in the study the second set of eyes was replaced by an AI assistant — and researchers ensured the AI would provide an incorrect answer some of the times. They found that radiologists ended up putting a lot of faith in the system’s answers, such that their accuracy got much worse as a result. The accuracy of inexperienced and moderately experienced radiologists dropped from around 80% to about 22% when the AI provided an incorrect result, and even very experienced radiologists found their accuracy drop from 80% to 45%. Not a great outcome!

What about their capability as assistants? Maybe they’re good at some niche tasks, but certainly not at doing everything. A new analysis conducted by Amazon for the Securities and Investments Commission, the corporate regulator in Australia, found that generative AI models were worse than humans at summarizing documents in every conceivable way. In a test to summarize submissions to a parliamentary inquiry into audit and consultancy firms, the test found the generative AI tools may even create more work for humans. It’s probably no surprise the Federal Reserve Bank of New York recently released the results of a survey finding not many firms that have adopted AI have actually cut a lot of jobs as a result.

As concern grows about the state of the AI bubble, studies like these further call into question what, if any, benefits will actually come of the hype of the past couple years. Sure, some investors and executives will make off with boatloads of money, but what will have been the impact on the rest of us?

Earlier this week, I wrote about the arrest of Pavel Durov and Brazil’s decision to suspend Twitter/X because the company refuses to appoint a legal representative. It signals an important shift in internet politics and further shows the need for us to reassess the framings we’ve used to understand what happens online. The internet is not exceptional, most of us do not hold libertarian politics, and tech platforms should not be treated by a different standard than everything else.

Pavel Durov and Elon Musk are not free speech champions

In the roundup this week, I decided to place the focus on a piece about the Durov arrest that goes much deeper on why he was arrested. I’ve noticed some people online saying we don’t know enough about why Durov was arrested, but if you actually read into it, it becomes pretty clear authorities had evidence of child predators using the platform to go after children and exchange illegal explicit images of minors — and the platform was not aiding in addressing it. (In fact, up until a few days ago, it would brag about not responding to requests.)

Plus, the roundup has the usual labor updates and other tech news you might have missed from this past week!

Over on Tech Won’t Save Us, I spoke to Taylor Welling and Kathryn Friesen, video game workers at Bethesda and Blizzard whose teams recently formed wall-to-wall unions. We talked about what it takes to unionizing in the video game industry and their thoughts on the state of the industry after all the recent layoffs.

Have a great week!

— Paris