When the Vibes Are Off: The Security Risks of AI-Generated Code

Vibe coding produces software riddled with insecurities. Will risk management and regulatory compliance, too, fall victim to the vibes?

In 2023, Andrej Karpathy, co-founder of OpenAI, prophesized that “[t]he hottest new programming language is English,” envisioning a future in which artificial intelligence (AI) would generate code from natural language instructions. Now, he is heralding the era of “vibe coding”—where software is developed entirely by prompting AI agents. So now, you can “fully give in to the vibes (…) and forget that the code even exists,” as Karpathy put it.

Vibe coding is a novel approach to developing software: Based on prompts, large language models (LLMs) take over the entire software development process, rather than simply supporting it. This innovation is possible because of agentic AI: systems of multiple AI agents that collaborate to solve complex, multi-step tasks. “Coding agents”—such as CodeGPT, Cursor, or Claude Opus 4—assist software developers in churning out ready-to-run code. They design the architecture of a program, generate code, find appropriate open-source components, test software, and fix errors. They can also provide explanations about the code to help their users understand the logic of the software or the purpose of different functions and variables. Since developers can rely on coding agents, with no need for programming knowledge or experience, this technology makes programming accessible to everyone. Vibe coding is the ideal solution if you have “a vision that you can’t execute but AI can,” as one MIT Media Lab researcher put it.

At the same time, the ease of vibe coding introduces severe risks: This approach implies accepting AI-generated code without thorough code review or manual testing. All quality assurance occurs through additional prompts rather than traditional verification methods, such as (static) code analysis or dynamic testing. As a result, vibe coders—according to the very definition of vibe coding—do “not need to understand how or why the code works, and often will have to accept that a certain number of bugs and glitches will be present.”

Vibe coding breaks with established software development practices in a radical way. Although AI assistants have become standard tools for software engineers, they traditionally complement rather than replace human developers. GitHub Copilot exemplifies this approach: It works like an “autocomplete” system, suggesting lines of code (although it now also offers an “agent mode” that can generate, test, and run code). Nonetheless, regardless of AI involvement, all code should be (manually) reviewed and tested prior to deployment (a rule that also applies to code copied and pasted from other sources).

The idea of vibe coding is diametrically opposed to that: Vibe coders do not directly interact with code and are consequently unable to assess—much less guarantee—code quality. This lax attitude toward software quality assurance is fraught with substantial risks.

Sacrificing Security for the Vibes

Vibe coding and security do not go well together, or, as Greg Kedzierski put it: “The S in ‘vibe coding’ stands for security.”

Generative AI has a tendency to “make up” facts due to its autocomplete-like functioning. For example, when asked legal questions, generative AI often comes up with court cases that appear plausible but don’t actually exist. These untrue results are often referred to as “hallucinations.” When AI generates code, it might be riddled with references to plausible software packages and libraries (that is, other software resources). In lucky cases, the referenced software does not exist, which causes the code to malfunction. However, these “hallucinations” can become truly dangerous: Vibe coding services often output the same “hallucinated” names of software packages over and over again. What ensues is a modern form of typosquatting (which preys on common typos, for example, by registering domain names such as gooogle.com rather than google.com). Most recently dubbed “slopsquatting,” nonexistent software resources might similarly be “taken over” by malicious actors to hide malware. Subsequently, AI coding services might unknowingly incorporate such malicious software packages into code, which would compromise the vibe-coded application and its users. To avert the threat of automatically integrating malware camouflaged as (useful) software packages, one would have to verify each and every included software package (until “hallucinations” are curbed). But that would be antithetical to vibe coding—since its purpose is to operate without the assistance of humans.

Additional security issues exist. Since AI-coding assistants are trained with data from open-source repositories such as GitHub, their models might embed widespread but insecure coding practices and “deprecated”—that is, outdated —libraries. Coding agents often generate new instead of reusing or refactoring existing code, leading to “code bloat” and technical debt. Many software developers report issues with a lack of contextual understanding, especially with growing complexity. This may also create vulnerabilities.

Moreover, the coding agents themselves present another attack vector. AI systems are vulnerable to various attacks: For example, with poisoning attacks adversaries may manipulate training data and thereby influence the model and its outputs. Since AI coding models are mostly trained with data that is scraped from publicly available code repositories, their models embed common mistakes and insecure coding practices. Even poisoning rates of 0.001 percent may suffice to manipulate model behavior. If an AI model is open source, malicious actors could either modify the source code or provide their own models—free of charge and with backdoors. Agentic AI systems face additional risks: Attackers can compromise a single agent to spread malicious outputs throughout the entire multi-agent system or manipulate coordination between agents.

Another potential attack surface arises with Anthropic’s new Model Context Protocol (MCP). MCP offers a new standard for connecting AI systems with data troves across different repositories, tools, and development environments. Some describe it as the “USB-C port for AI applications.” This relatively novel technology prioritizes usability over security and, as one AI researcher put it, “lacks authentication standard, context encryption and ways to verify tool integrity.”

It is just a matter of time until vibe code seeps into widely distributed applications: Some companies plan to replace software engineers with AI coding systems, and software developers might increasingly turn to vibe coding for quick results. Vibe coding has already led to some damage. For example, one AI coding assistant wiped out the database of a company. More such incidents will happen in the future.

Will Law Keep Vibe Code in Check?

The inherent security risks and lack of software quality assurance of vibe coding may lead to legal risks. In the EU, the Cyber Resilience Act requires manufacturers of software-based products to implement comprehensive cybersecurity requirements such as developing products according to secure-by-design principles, conducting mandatory risk assessments, and providing ongoing security updates to fix vulnerabilities for at least five years. In the U.S., no such general cybersecurity-related product regulation exists. There are only sector-specific regulatory regimes or other regulatory tools like liability regimes. But security is still an important component of software development contracts in the private sector—even though software companies often try to escape liability for insecure or flawed software.

Regardless of the specific legal regime—be it regulatory, tort-based, or contractual—the basic structure of cybersecurity requirements is similar:

Software should implement methods and tools to protect the confidentiality, integrity, and availability of information, systems, and services.

The necessary level of protection should be evaluated with a risk assessment.

Since software is complex and its environment is ever-changing, a process for detecting and handling vulnerabilities is needed.

These three components can be part of regulatory regimes (they are requirements according to the EU Cyber Resilience Act) or contracts—and, if violated, they may trigger liability.

The Cybersecurity Baseline: The CIA Triad and Risk Management

Security and vibe coding are at odds. Amid risks such as packet hallucinations and code bloat, asking coding agents such as Cursor, Bolt, or Claude to write “secure” software isn’t enough. These vulnerabilities fly under the radar and are difficult to identify. From an external perspective (that of customers or regulatory agencies), these vulnerabilities are even harder to detect—in particular if the software is “closed source,” so researchers cannot test the software directly but have to reverse engineer its source code first.

A good starting point to assess software for risks is technical documentation. It typically contains a general description of the software, its purpose, design, and development, but may also include the software bill of materials (a list of all third-party components), an assessment of cybersecurity risks, and reports of tests (see, for example, Annex VII of the Cyber Resilience Act).

However, this documentation could also be generated by coding agents, rendering it into a simulacrum: It would contain the typical content of documentation to meet the expectations. That means it would look plausible but would not reflect reality. Security is not about having the right paperwork; it’s about actually implemented measures and processes.

One remedy to improve security could be a software bill of materi als, that is, an inventory that contains all components of the software in question (such as code blocks or software libraries). This list can be aggregated and maintained automaticall y. It could help with identifying malicious software resources—for example, software packages that were integrated into the software by a “hallucinating” AI coding service. However, this list still requires manual verification by software engineers because malicious software is not so easy to spot. Again, automation will facilitate software engineering but cannot replace human review.

AI-Generated Risk Assessment?

Risk assessments present the same issues that arise with technical documentation. Software companies should (or often have to) conduct risk assessments to identify relevant threats and to evaluate the probability and likely impact of potential harmful events. The next step is to implement “appropriate” measures to prevent or mitigate damages. The appropriateness relates to both the identified risks and the costs of reducing them, leading to a cost-benefit analysis. For example, a company may spend little effort and money on protecting its website from cyberattacks that would only cause the website to be unavailable. The cybersecurity of a medical device, however, would require more substantial effort to avoid physical harm or danger to life. Proponents of vibe coding might wonder: Is it possible to prompt an AI agent to generate a risk assessment, including recommended actions to manage the risks posed by the product?

Meta seems to think so. According to internal documents obtained by NPR, the company reportedly plans to automate more than 90 percent of its risk assessments with AI. Anthropic’s Model Content Protocol could facilitate this approach by enabling Meta’s local data to be integrated into code-generating LLMs. Using local data thus provides the LLM with more context for its risk assessment, the result of which is therefore more customized to the company.

The complex process of risk assessment entails more than checking boxes; it involves understanding potential threats and dangers, estimating resulting damages, comparing them to preventive measures, and choosing a sufficient level of protection. Risk assessments are a distinctly human endeavor, requiring an understanding of context and human values. Furthermore, they constitute the basis for responsibility and accountability (as well as potentially liability). Outsourcing this process will render it hollow and miss the point.

So far, regulatory risk assessment requirements neither explicitly preclude automated risk assessments nor demand manual intellectual involvement. Rather, the relevant criterion concerns the outcome of the assessment: compliance or noncompliance. Do damages occur or not? An AI-generated risk assessment does not accomplish risk management—rather, it is “vibe compliance.” It simulates the implementation of a process. Nevertheless, it may support and improve risk management: Software developers can review AI-generated risk assessments, scrutinize them, and implement the measures recommended. In fact, these recommendations may be helpful especially to small or medium-sized companies without much cybersecurity expertise. All the same human action is still required.

Automated Vulnerability Handling

“Security is a process” that never ends: Even after software is released, software developers should regularly test for and track vulnerabilities in databases such as CVE or EUVD. Any exploitable vulnerability must be fixed—in some instances, addressing a vulnerability requires a software update.

A vibe coder may contemplate using AI to automatically run tests, detect vulnerabilities, document them, and then generate a remediating security update. Yet again, vibe-coding manufacturers cannot guarantee a sufficient level of security. Automated software testing is already a common practice and can detect certain vulnerabilities by simulating attacks or by analyzing the source code to identify insecure coding practices like leaked credentials. But it can test only according to predefined standards, tends to produce lots of false-positive results, and may miss complex or novel vulnerabilities.

Turning to AI for fixing vulnerabilities (“self-debugging”) is likewise not yielding the desired results. Vibe coders often complain that AI coding services struggle to maintain context across many files and, if prompted to restructure and improve the code, produce errors or include new unrequested features.

The promise of AI security tools automating security testing is nothing new: The market is riddled with security tools that overpromise all-encompassing security but underdeliver or fall under the “snake oil” category of deceptive marketing. Moreover, many automated testing tools produce reports to prove the efficacy of their security processes, contributing to the “security theater” with visible, often performative security checks—irrespective of their accuracy or effectiveness. Automation cannot replace human oversight and control: The highly limited testing capabilities of AI complement, but do not supersede, expert review. Unfortunately, vibe code tends to be much longer than code developed by humans, rendering the debugging tedious or even futile—or as one programmer put it: “Create 20,000 lines in 20 minutes, spend 2 years debugging.”

Simulating Compliance Based on Vibes

Vibe coding comes with significant compliance risks: Vibe coders risk consequences such as liability for damages. They may simulate compliance by having their coding agent generate technical and risk assessment documentation—but that has no real impact on actual risk exposure.

The only path to secure software runs through manual review. Current AI technology is too riddled with shortcomings, especially when it comes to factoring in context. Finding bugs and vulnerabilities in code cannot be achieved through prompting. The inherent opacity of AI makes it hard to anticipate and detect errors.

In short, the security of vibe code cannot be guaranteed, but compliance can be simulated. Just as vibe coders outsource coding to AI, they can outsource generating compliance paperwork.

Will the Software Industry “Self-Heal”?

Unless legislators and government agencies take action against vibe coding in software products, two mechanisms could disincentivize vibe coding.

The first is deterrence through potential liability for damages. This threat may discourage vibe coders from placing products on the market without rigorous quality assurance. However, liability for software defects has so far been not as effective as some may have hoped when it comes to enforcement. Vulnerabilities in software products are still very much commonplace, although somewhat in decline according to the CVE database. Moreover, many companies have found ways to evade liability, for example, by limiting warranty.

The second deterrence is the free market and the law of demand. Customers of software products (especially in business-to-business transactions) demand transparency and reliability. They want to be able to understand and review the code underlying their commissioned development projects, or at least rely on its security. Bloated vibe code, which AI has “self-debugged,” comes with dubious security standards and will not pass this benchmark. Prospectively, customers will shun vibe-coded software and prefer developers with more thorough standards. Admittedly, the demand for secure software has had a scant effect on the actual security level of products. A huge issue is information asymmetries—customers cannot fully assess the cybersecurity of a product, and consumers often do not prioritize security over other features, leading to a “market of lemons.”

The market’s “self-healing” powers depend on whether customers recognize the inherent risk of vibe code and shift toward software created based on the principles of transparency and accountability.

***

Agentic AI allows its users to abstract away complexities while generating more complex software. It is no surprise that the software industry is slowly shifting to VibeOps, with some vibe coders trying to run the entire software business operations on AI.

At the same time, vibe coding comes with unpredictable risks. Rigorous code review and testing of any code remains necessary to guarantee security—and LLMs cannot deliver that. Yet. Consequently, cheap, fast vibe code can become unusable and, in the event of damages, very expensive.

Will vibe coding replace software developers? Hopefully not, and if it will, it will be to the detriment of security. Nevertheless, AI is a powerful tool that offers quick solutions. The most successful teams will be those who can move fast and break fewer things—combining generative AI with the expertise of human software developers. Over the next months and years, there will likely be convergence of AI-assisted coding and vibe coding: AI will take over more and more code generation, whereas software engineers’ tasks will focus on code review and testing. Thus, the future isn’t about AI replacing developers; it’s about developers and AI becoming the ultimate coding duo.

– Carolin Kemper is a lawyer and researcher at the National Institute for Public Administration Germany. She investigates the intersection of law, technology, and security. Published courtesy of Lawfare.

When the Vibes Are Off: The Security Risks of AI-Generated Code

When the Vibes Are Off: The Security Risks of AI-Generated Code

Leave a Reply

The Case for AI Doom Rests on Three Unsettled Questions

When Do Cyber Campaigns Cross a Line?

Security Flaws in Portable Genetic Sequencers Risk Leaking Private DNA Data

How Can Computer Science Educators Teach Students to Calibrate Their Trust in GenAI Programming Tools?

Offensive Cyber Operations and Combat Effectiveness After Ukraine

Why Liability and Insurance Won’t Save AI: Lessons From Cyber Insurance

Offensive Cyber Operations as Relief for Citizens Under Internet Blackout

Leave a Reply