How Automated AI Code Analysis Can Scale Application Security

Published in

better appsec

13 min readOct 19, 2023

A look at how Generative Artificial Intelligence (Gen AI) tools can scale Application Security’s (AppSec) code analysis and threat modeling workflows.

By: James Chiappetta with contributions from Robert Prast

Important Disclaimers:

The opinions stated here are the authors’ own, not necessarily those of past, current, or future employers.
The mentioning of books, tools, websites, and/or products/services are not formal endorsements and the authors have not taken any incentives to mention them.
Recommendations are based on past experiences and not a reflection of future ones.
Artificial Intelligence (AI) is a rapidly developing space and therefore any recommendations, opinions, or examples are based on present knowledge and subject to change.
When using public AI tools, we don’t recommend putting sensitive information into them. You should consult your organization’s policy on their use and always validate the output for accuracy.

Background

Application Security (AppSec) Engineers walk a tightrope. They often need to balance being a jack-of-all-trades for an entire application landscape while still being subject matter experts on a few flagship applications.

This means they need to intricately understand individual codebases for specific business applications -and- how those components interact with the general environment such as central Authentication and Authorization services.

This leads to a large sprawl of code and various interfaces for which an AppSec Engineer must know the behavior in order to begin shaping a security policy or threat models around. Coupling this with changes to code happening asynchronously, it becomes easy for a basic dependency to change, or a large part of the project to change, or both at the same time. This can inevitably introduce undetected and critical vulnerabilities.

Inherently, a consistent pattern emerges where AppSec needs to meet with teams to understand the changes, read code changes directly, or read documentation that summarizes the code changes. This is doable in smaller organizations, but they are unlikely to even have an AppSec team. And it is definitely not scalable for large organizations where features are continually added.

This is where we believe Generative artificial intelligence (Gen AI) tools can help scale AppSec Engineers and make them more effective. This post will dive into the details and illustrate how the likes of ChatGPT can help create AppSec efficiency that scales beyond themselves.

First Things First

We have written several posts before about the highest value workflows that an AppSec team performs, with 2 most impactful being threat modeling and secure code review. These have always been two manual processes that AppSec relies on developers’ engagement for and take a decent amount of knowledge/time/motivation to complete. This, to us, feels like a real opportunity for Gen AI.

So, can Gen AI help AppSec engineers perform these work streams more efficiently? We first explored this thought here and we think yes. However, we will need to dig a bit deeper and see how we get there practically.

In this post, we will take a closer look at few crucial areas:

Using AI for continuous application code change analysis
Using AI for interactive threat modeling and solutioning
Additional areas we think AI tools can and will have an impact on in the short term

One last piece of housekeeping: We will be using ChatGPT (3.5) to help us illustrate the workflows. It is worth mentioning, at the time of writing this, there are differences between v3.5 and v4. There are also alternatives out there to consider. Also, please refer to our recommendations on how your organization can guardrail and control risk associated with using AI products.

OK, now, Let’s get to it!

Continuous Application or Feature Changes Analysis

Walking the AppSec tightrope

Often as AppSec Engineers, our knowledge of core applications in an organization will lag behind by a few releases and even worse, for new applications, knowledge will be practically zero. The main problem is not the fact the changes occurred but knowing when to re-up knowledge and then needing to dedicate considerable time to deciphering changes.

Enter repo and code aware AI solutions. These are products that offer an AI pair programmer autocomplete style recommendations while you code. These are usually baked in an IDE like VS Code so you can engage it either by starting to write code or by writing a natural language comment describing what you want the code to do. They can analyze the context in the file you are editing, as well as related files, and offer suggestions in the IDE.

With repo aware AI solutions we have tested, such as Sourcegraph Cody, GitHub Copilot, AWS Code Whisper, you can begin quickly summarizing entire files within seconds. And, in our last post we used ChatGPT to do quite a bit of this with CFSSL. Cool beans, but let’s put together a real world scenario and see how ChatGPT can help.

Example workflow

Let’s hypothetically say one of your development teams is coming up with a new feature for an e-commerce application. They plan to integrate a new third-party payment gateway service called ‘SuperCoolPay’. As an AppSec Engineer, you are unfamiliar with SuperCoolPay’s API and the potential security challenges it might introduce. You’re tasked with comprehending and securing the application’s interactions with this new service. This is where generative AI’s summarization abilities come into play.

For illustrative purposes, let’s use ChatGPT as our tool of choice. At the heart of its functionalities lies an expansive dataset encompassing a variety of topics, including coding patterns and common vulnerabilities. You can engage in a conversational manner with ChatGPT to query the service’s core functionalities or to ask about any potential vulnerabilities associated with similar services. It can guide you through key concepts and pitfalls, bolstering your understanding and facilitating informed decision-making.

Example AppSec AI workflow for automated threat modeling through code analysis

AI/LLM Limitations

It may not always generate the most accurate or detailed information specific to the new service. This is because it works based on patterns and analogies.
The developer would need to copy and paste code to give specific context for either your application or SuperCoolPay’s interfaces. The prompt size limitation and lack of concrete embeddings for your code further hampers the ability to get a big picture on how specific code is used in the entire application. For true introspection of a repo you would need to look for other tools.
External dependencies on other APIs/services which could be called out as part of the threat model generated by a human would very likely be missed as the repo itself is solely used for context derivation.

Closing the Gaps

AI coding assistants like Copilot or CodeWhisper can provide a complementary approach. These tools can summarize SuperCoolPay’s API interfaces directly from the codebase, breaking down complex structures into digestible snippets. They can crawl through the new service’s API documentation, providing you with a comprehensive, contextual summary of each endpoint, complete with method descriptions, input/output expectations, and any related authorization flow. Note: Summarization depends on the quality of documentation the developer publishes for new code.

Using these summaries, you would understand the API interactions better, allowing you to secure your application more efficiently. For example, if SuperCoolPay’s API has an endpoint for transaction data retrieval that lacks appropriate rate limiting, your AI-assisted understanding could lead to implementing protective measures on your end, such as encapsulating this API call in a custom service that enforces rate limiting.

Example prompts to use:

Summarize the repository into use cases
Summarize the technologies used in the repo
What external and/or internal functions are imported and used
Trace the function call that <insert an action>. Add 10 line code snippets before and after the function call
Are there any user controlled inputs and if yes, then are they properly validated or sanitized
Are there any functions that call potentially insecure libraries or other functions
Which methods or functions are authenticated and which are not
If there are any JWTs used, then are they validated? How long do the JWT sessions last?
What HTTP Methods are used in this repo oland are there any missing security settings on cookies?
What API calls are made and group them by internal and external based on <insert internal DNS context>

These are just scratching the surface and could easily be cooked into a prompt template and applied to any repository. Furthermore, major organizations such as DARPA are moving the community forward with using LLMs to find and fix security vulnerabilities in open source.

Putting a fine point on this concept

AI coding assistants have their strengths, including the ability to learn from millions of lines of code, thus offering best practices and warnings against common pitfalls. However, all these LLM based solutions also have limitations. They don’t inherently understand context or the purpose of the code, leading to potential misinterpretations. This is primarily because the models are trained on large amounts of text and gpt3.5 is not fine-tuned for this task in its original state. Additionally, AI cannot replace human ingenuity, especially when it comes to discerning unique threat vectors. Finding logic bugs is a common area that LLM will struggle to identify.

Using a combination of conversational AI like ChatGPT and AI coding assistants like Copilot or CodeWhisper can significantly streamline the process of familiarizing and securing new services in your application’s environment. As these tools continue to evolve, they will undoubtedly become invaluable allies for AppSec Engineers and Developers alike, assisting in navigating the ever-changing landscape of application security.

Pro Tips:

As an awesome workflow, take in a swagger.yml from commit a and swagger.yml from commit b and ask ChatGPT to summarize the changes. This has led to a massive productivity increase for knowing how backend APIs have evolved since last looked at. With openapi json files, comprehension tends to be really strong.
Check out GenAI Security Analyst assisted user story analysis as a means to identify security issues while product owners build out product features.

Automated Code Generation for Threat Model Mitigations using Context

Context matters

From our experience, threat models are not intuitive to understand for developers and they often find it difficult to translate a proposed mitigation into code. This process often raises more questions than it does provide guidance. It’s great that a security team can leverage a threat model to point to locations where user input should be sanitized before being consumed by a downstream flow; however those same models do little to say how sanitization is implemented.

As AppSec Engineers who produce the threat model, we don’t have the time to delve into every function and give prescriptive solutions as a line by line recommendation. This is actually counter to the goals of the threat model in the first place. In order to mitigate a threat, a developer is going to need to add specific, security centered code that meets the needs of the product and follows existing patterns.

Analogizing here, AppSec in most cases is giving Developers a flashlight to uncover where all the scary code monsters are hiding, but then we leave and let them deal with removing the monsters themselves in the dark. This isn’t the fault of the AppSec team or Development team, but instead is a manifestation of AppSec playing the advisor role to code Developers, rather than the implementers of code.

This system’s inherent flaw is the AppSec Engineer’s lack of context awareness to know exactly how the threats they identified are realized in code -and- the developer’s lack of context of security principles in practice. This is what makes translating mitigation into code difficult.

Adding the context so Gen AI can produce a higher quality response

AI & LLMs can help for both facets, but by only providing the context. Let’s take a look at an example of how this can come together. Imagine a world where an AppSec Engineer provides a prompt template whose rough pseudo-code (or pseudo-prompt) looks like:

----------START CHATGPT prompt

---------------------------
SECURITY Context:
---------------------------

DESCRIPTION: An SQL Injection is a code injection technique that attackers use to insert malicious SQL code into input from client to application. This vulnerability exists when the application includes unfiltered user input in an SQL query.

THREAT: This can lead to data breaches, unauthorized access, or potential damage to your database.

MITIGATION: Use parameterized queries or prepared statements to mitigate this threat. Always sanitize and validate user inputs before using them in your SQL statements.

---------------------------
DEVELOPER Context
---------------------------

1. API REQUEST HANDLER:

// **Provide your code for handling the request parameters or POST body here**
    [YOUR REQUEST HANDLING CODE HERE]
    
    // Example:
    /*
    const express = require('express');
    const app = express();
    app.use(express.json());

app.post('/user', (req, res) => {
        const { username, password } = req.body;
        // More code here...
    });
    */
    
2. BUSINESS LOGIC FUNCTION:

// **Provide your code for how these variables are passed to a function to perform business logic here**
    [YOUR BUSINESS LOGIC FUNCTION CODE HERE]
    
    // Example:
    /*
    function createUser(username, password) {
        // More code here...
    }
    */

3. SQL STATEMENT CREATION:

// **Provide your code for creating the SQL statement here**
    [YOUR SQL STATEMENT CREATION CODE HERE]

// Example:
    /*
    let sql = `INSERT INTO users (username, password) VALUES ('${username}', '${password}')`;
    */

-----------------------
Remediation
-----------------------

Please now combine both the security and developer context to recommend changes to the developers code to best mitigate the security threat.

----------END CHATGPT prompt

Example response from ChatGPT showing hybrid approach giving in context solutions to our threat model finding:

Once the Developer has filled out the prompts, you would combine this information with the AppSec Engineer’s remediation guidance and use it as input for ChatGPT. The result would be a detailed and now context-aware recommendation for mitigating the SQL Injection threat in the specific codebase.

An AppSec Engineer could also fill in the template on behalf of the Developer. They could then clone the repo down, create a new branch, implement those changes and open an MR. By doing this the AppSec Engineer and Developer can review both the threat model and a proposed code solution. This shifts the conversation from a report meeting to a solutions meeting, which should earn the AppSec team brownie points as development enablers rather than blockers.

Wrapping this up

This approach promises improved context-awareness for both the developer and the AppSec Engineer. However, it shouldn’t be relied on as the sole security measure. AI tools, while improving, may still overlook intricate nuances of a specific codebase or may not be fully up-to-date with emerging security threats. While tempting, never blindly accepting AI output, always validate the response before use.

We have highlighted this already in this post and in our last AI focused blog post, and others (1)(2) out there are with us, but saying it again: For the time being, GenAI tools are merely an aid in helping humans get their hard work done more efficiently. We are not recommending that you and your colleagues go for a hike while AI does your job. Right now we see AI helping security engineers go from a hand driven screw driver to a power drill, but, we are not yet to a fully robotic arm driven power drill on an assembly line.

Pro Tips:

Always use additional security practices like regular code reviews, automated vulnerability scans, and maintaining a security-minded development culture.
Many people find prompting a little weird/scary since it’s so open-ended and get discouraged if a given prompt doesn’t work immediately. https://learnprompting.org/ shows a lot of techniques for pretty strongly changing the behavior of the model for a given use case. Some people have even automated the prompt generation step to compare the quality of different prompts to see varying performance.

Going Beyond Just Code

While the bulk of this post has been about Threat Modeling and code generation to mitigate identified flaws, LLM tools can also be able to short circuit the knowledge gathering in the design phase to flag important things or give faster feedback advice.

AppSec often had to parse through tickets or read through design docs for responding to issues or combing through backlogs for things that might be high criticality/importance to perform a review.

This is a place GenAI could also help in two key areas by parsing through this content and giving:

smarter, prompter feedback/advice to developers seeking it,
proactively comment on designs that might be introducing flaws or might need some additional security design review, or
pointing developers to best practices based on anti-patterns or deviations from paved roads are found.

In addition, security engineers also spend quite a bit of time on administrative things like updating tickets and status, and eventually AI tools could help with updating tickets and status docs to remove some of that manual effort. The same goes for vulnerability triage from security tools, but more on all of this in the future.

Takeaways

The combination of conversational AI like ChatGPT and AI coding assistants like Copilot or CodeWhisper can significantly streamline the process of familiarizing and securing new services in your application’s environment.
Artificial intelligence (AI) tools can help scale AppSec Engineers and make them more effective BUT for now should be treated as tools and not the end all be all for teams. Always use additional security practices like regular secure code reviews, automated vulnerability scans, and maintaining a security-minded development culture.
Using purpose-built prompts for Developers and AppSec Engineers to fill in can result in detailed and context-aware recommendation threat models and remedies. The challenge is having the know how and context to build them.
AI tools will continue to evolve and they will undoubtedly become invaluable allies for AppSec Engineers and Developers alike, assisting in navigating the ever-changing landscape of AppSec. LLMfuzz is a great example of this. We will continue to research and provide insights as we learn how to best use AI technology in this very specific space.
There are many other areas we have yet to explore where AI tools can help security, product, and developers alike. Knowledge parsing, summarizing, and communication in early stages of the SDLC is one area we need to further explore, as well as vulnerability triage in the later stages.

Words of Wisdom

Gen AI will ultimately enable humans to get hard work done faster, but it will take a lot of hard work to do that with absolute precision. This is something we need to embrace as a new reality and simply come together to achieve.

“Our destiny is not written for us, it’s written by us” — Barack Obama

Contributions and Thanks

A special thanks to those who helped peer review and make this post as useful as it is: Shane Caldwell, Aditya Sharma, Jeremy Shulman, Abhishek Patole, Luke Matarazzo, Brandon Wu, Benjamin Heiskell, and Vishal Jindal.

A special thanks to you, the reader. We hope you benefited from reading this in some way and we want everyone to be successful at this. While these posts aren’t a silver bullet, we hope they get you started.

Please do follow our page if you enjoy this and our other posts. More to come!