LLMs’ AI-generated code remains wildly insecure

(Aerps / Unsplash)

By Robert Lemos • Aug 04, 2025

The code generated by large language models (LLMs) has improved some over time — with more modern LLMs producing code that has a greater chance of compiling — but at the same time, it’s stagnating in other ways: Security in particular continues fall short, especially for AI-generated Java code. Aside from introducing vulnerabilities, LLMs remain prone to errors, such as hallucinating software libraries that don’t exist, and are susceptible to problems like the malicious poisoning of their datasets.

In a study of more than 100 LLMs published this week, application security firm Veracode tested whether AI chatbots could produce code using the correct syntax of four languages, and then scanned the code for vulnerabilities to see if the produced code was secure. While the AI-generated code improved greatly in syntax — with more than 90% of the code created by LLMs released in the last year compiling without error, compared with less than 20% prior to June 2023 — only 55% of the code passed subsequent security scans, a number that has stubbornly refused to change over time.

“LLMs are not giving anybody a free pass out of doing the security work that they need to do,” says Jens Wessling, chief technology officer at Veracode, adding: “The code that LLMs are learning from is syntactically correct, but most developers — particularly for non-enterprise or non-open source projects — don’t really understand the security ramifications of the decisions they make, so those linger out there. LLMs’ output is modeled on actual code, and the actual code has security vulnerabilities in it.”