How to Build a Cloud Native Data Security Program

James Chiappetta

Published in

better appsec

18 min readMar 21, 2024

A guide on how to achieve cloud & application security data protection and maturity in a cloud native way.

By: James Chiappetta

Important Disclaimers:

The opinions stated here are the authors’ own, not necessarily those of past, current, or future employers.
The mentioning of books, tools, websites, and/or products/services are not formal endorsements and the authors have not taken any incentives to mention them.
Recommendations are based on past experiences and not a reflection of future ones.
Artificial Intelligence (AI) is a rapidly developing space and therefore any recommendations, opinions, or examples are based on present knowledge and subject to change.

Background

The cloud computing market is seeing a projected compound annual growth rate (CAGR) of 18.3%. The value of the cloud systems we create is tied directly to the data within them. Therefore, it shouldn’t be surprising that within cloud service providers (AWS, GCP, Azure, etc.) and third party software as a service (SaaS) offerings, the most highly subscribed services are Data Warehouses and Relational Databases.

Protecting these vital data stores is mission critical and increasing in difficulty. The rapid proliferation of large language models (LLMs) and artificial intelligence (AI) will further change the threat landscape. Getting ahead starts with the security fundamentals, and that is exactly what we will cover here.

First Things First

Without data, many companies would be irrelevant. It’s their ticket to success and protecting it should be a first order problem. There is no best way to keep the data safe, but establishing a basic barometer of what to do or consider is a vital first step.

So, here is what we will be covering today:

Mapping out the cloud environment with a Data Classification, Cloud Architecture, Cloud Threat Model, and Data Store Inventory.
Performing a data storage configuration and data access pattern analysis.
Understanding what data access patterns means in practice. Hint: principle of least privilege.
Establishing an anomaly detection, alerting, and response pipeline and mechanism.
Exploring how AI and Automation leads to continuous improvement.
Navigating data risks with the growing use of Generative AI (GenAI) in SaaS/Third Party products.
What metrics and measures of success to consider for your cloud data protection program.

Let’s get to it!

Data Inventory & Data Classification

If you live in a well run and organized town or city, then it is likely that there is a map of the city. That map is probably broken down into plots of land and classified by some sort of zoning standard. Based on that zoning standard, there will be dwellings or buildings constructed. Those buildings will have blueprints and floor maps.

Why?

Many reasons of course. One of the primary reasons is for critical infrastructure planning. Without this level of organization, it would be difficult for any locality to provide basic needs such as education, emergency services, waste management, etc. to its inhabitants.

Much like a city, your organization’s cloud environment needs a map, a zoning framework, and inventory of plots of land with blueprints. Without these, how would you know where the most sensitive structures are to protect and the controls you need to protect them?

Depending on where your organization is with its cloud adoption, your approach to building a central inventory will vary. But, for the sake of simplicity, let’s assume for the rest of this post that you have existing cloud infrastructure built and you need to build from within the confines of that.

Establish a Data Classification

Your legal and compliance team will very likely have a way to categorize data and assets for regulatory and compliance reasons. If not, then this is an opportunity to partner and build one.

Build a Cloud Architecture Diagram

This is where you build out a high level map of your organization’s cloud infrastructure. You will also see a blend of networking, availability zoning, central/shared applications or services (Authentication, CI/CD, Logging, etc.), and access patterns referenced. The last and most important component is outlining where the data stores live. You can use a free tool like draw.io to build this.

Pro Tip: Creating data flow diagrams for sensitive or critical data will also give you a more service/application view of the world, in addition to the infrastructure view.

Create a Threat Model

With the Cloud Architecture Diagram built, now it’s time to do some what ifs. This means you and your teams are asking questions like: how can I gain unauthorized access to x, y, or z? Then ask: how can I prevent that bad action? These are your threats and security controls that will be needed to prevent data breaches. See our example on How to kick start your AppSec program with ease and the diagram below:

Establish a Data Store Inventory

One of the best things about using a Cloud Service Provider is that they are all built with automation in mind. This means they have APIs you can use to create, read, update, and delete resources. Therefore, you can query and get an inventory of data stores. In AWS, some common ones would be: S3, RDS, EBS, EFS, ASM, Parameter Store, Redshift, DynamoDB, or Aurora. Check out this comparison guide for Azure and GCP offerings.

There are many ways to build an inventory, but a simple Python script using boto3 to query could even do as a start, but open source projects like CloudFox , ScoutSuite or Cartography have existed for years and work well too. Services such as AWS Config and AWS Macie can also help, but come at a cost, as are modern CSPM solutions that are now offering DSPM. This may uncover things like AWS access from third parties. While this might be just fine for things like infrastructure or billing tools, it is a big risk if done for unsanctioned services that you are not aware of.

You may be asking yourself: What about applications and APIs? Those are also important to inventory as well, but may be more difficult to get the full denominator. You can scrape DNS (e.g. Route53 or Active Directory), but that won’t give you the full picture of what is calling what. Unfortunately, this leaves things like web server logs, VPC Flow logs, or network firewall logs, which can be a bit dubious to scrape. You may want to consider making a super basic DNS validator function that sees if internal DNS records return valid HTTP responses. Then there are ALBs that may front things like Lambda functions. You are going to need to do similar things here. This is hard, and we understand that so our recommendation is to start simple and track things manually at first, then grow from there.

Pro Tip: The 4 things listed above are free. If you have a budget then there are services you can pay for such as AWS Cloud Map, which can help more quickly.

Configuration & Access Pattern Analysis

Once you have done the hard work of building out your inventory and all the supplements like a cloud architecture diagram, then it’s time to analyze the state of things. The AWS guide on data protection is a good place to start for understanding how you stack up.

Configuration Analysis

There are many types of data storage and each comes with their own set of gotchas. The most classic cloud example is a public S3 bucket. But are S3 bucket encryption and versioning enabled? Depending on the disaster recovery requirements of your organization, you may need to backup not just S3 buckets but also file stores (EFS) and databases (RDS or DynamoDB). You may also need to take in account the network and IAM policies.

The obvious solution for this is to use a framework like the CIS Benchmark. We would recommend using that. However, for the sake of conversation, here are some basic checks to consider for AWS specifically.

Encryption at Rest & In Transit: Server-side encryption (SSE) using AWS Key Management Service (KMS) is now enabled by default but you may want to take additional measures by using a Customer Managed Key (CMK). You will need to determine what makes the most sense from a cost benefit perspective for KMS. Additionally, you should check and make sure all connections to any data store are over a secured (SSL/TLS) connection.

Access Controls: Check to see if AWS Identity and Access Management (IAM) policies to control access to the storage service resources aren’t egregiously set (e.g. wildcard). The next section will cover least privilege in more detail but that is crucial to data security.

Network Security: Check Virtual Private Cloud (VPC), security groups (SGs), and network ACLs. Using VPC endpoints or VPC peering for private connectivity is usually preferred. If you discover any public RDS, S3, or any data storage open to the internet then you should quickly escalate to understand why and remove the access if it’s unjustified. You may also want to understand if the data is accessed.

Logging and Monitoring: Check to see if Amazon CloudWatch, AWS CloudTrail, or enhanced monitoring features are enabled. This is a prerequisite for the next section.

Patching and Updates: Check your RDS databases to make sure software is up to date with the latest patches and security updates. If not, you are likely going to need to patch and come up with a regular patch cadence.

Backups: Check to see if automated backups and snapshots are enabled. This will help facilitate quick recovery in case of security or operational incidents. For S3, check to see if versioning on S3 buckets is on. This will allow you to retain multiple versions of objects and protect against accidental deletion or modification of data. You may also want a separate and dedicated AWS account for data backups.

Again, CIS provides a more comprehensive guide for what to check for but these are some of the very basic configuration settings to consider.

Access Pattern Analysis

The next most important step is understanding access patterns. There may be paved roads in your city but that doesn't mean people are driving down them. There may also be some distinct hot spots where there is a lot of traffic. You should analyze traffic patterns to reduce congestion and keep traffic flowing well.

The same holds true for access analysis. This exercise is rather important because it will help you find access to remove and where you need to put your time and effort towards protecting. Understanding normal activity is also key to detecting abnormal activity.

How could you do this? Let’s examine two examples for AWS environments: S3 and RDS. This is where your prior configuration analysis will serve as beneficial since you will need S3 bucket access logging enabled and CloudTrail enabled (similarly for RDS). This will allow you to look back at access patterns. If you have multiple AWS accounts and aren’t a glutton for punishment, then you are going to want to centralize this data. Sticking with AWS again, Athena can help you query in a SQL like format. AWS IAM Access Analyzer is another way, but there are quotas to be cognizant of.

Note: Performing these steps is rather important for the subsequent sections.

Pro Tips:

For GCP there is Cloud Audit Logs, which provides visibility into actions taken on resources within Google Cloud, including API calls, administrative actions, and data access. You then could use BigQuery to SQL queries against it to perform similar analysis steps.
For Azure there is Azure Monitor, which provides logging and monitoring capabilities for Azure resources, capturing activity logs, diagnostic logs, and security-related events across the Azure environment. The Azure Data Lake Analytics service allows you to run SQL queries and analytics.

Principle of Least Privilege

It comes down to some key questions (non-exhaustive list):

What data stores and access patterns are necessary to the core operations of a business system?
Do the identities and data resources have the least amount of privilege needed to perform those operations?
Do humans need direct access to production data stores? If they do then can they use a “break glass” mechanism?
Should individual data stores be encrypted with their own key and if yes then where does that make sense to do? This is a cost versus security problem.
How would or could you identify sensitive data types and stores accurately?
Do all non human identities have the least amount of privileges and how can you remove unused permissions?

Unfortunately, the “how” behind answering these questions can be quite complicated. We recommend doing sone basic things to account for this:

Perform secure design reviews and threat models to critical business systems. This will help iron out threats and security requirements. More on this with our post: How to Kickstart your AppSec Program.
Perform regular identity and resource access recertifications. A cost effective way to do this is generating reports from the data collected from your configuration and access analysis. AWS IAM, AWS Config, IAM Access Analyzer, and AWS CloudTrail will give information about IAM users, groups, roles, and their associated permissions and access activities. Compiling this and sending it to owners monthly/quarterly in an excel file is basic, but a start.
Build a Remediation Plan & Backlog: There is a high likelihood that you will need to action misconfigurations and excessive privileges. This is your “tech debt” and backlog of work. You will also need to look for means to prevent these issues centrally. See our post on the fundamentals of vulnerability elimination, as this will help you here. Taking things off the table like wildcards or unneeded internet/external access policies is always a good place to start.
Build strong threat detections, alerts, and response playbooks. More on this next.
Implement proactive & preventive measures: Build automation into the CI/CD pipeline to check for excessive permissions before deployment.

Pro Tip: Sensitive Data such as application secrets or PII can often end up in places that may have wide access, such as source code, container images, and log aggregation platforms. Don’t forget to take a wider look at the environment and factor this into your program.

Anomaly Detection, Alerting, and Response

Detection

Based on your access analysis, you may want to set up alerts when things deviate from what you expect. Before you create an alert, what are some things you might want to monitor and alert for?

Unauthorized Access Attempts:

Use of invalid credentials or unauthorized IAM roles. This would materialize in failed attempts to access S3 buckets or RDS instances.
Access attempts from suspicious IP addresses or geographical locations that are atypical.
Access from new or rarely seen user agents, applications, IPs/networks, or devices that may indicate unauthorized or unexpected access.

Data Exfiltration Attempts:

Large amounts of data being transferred out of S3 buckets or RDS databases, especially if it happens outside of normal business hours or from unusual sources (see previous section).
Unusual access patterns from human identities. Look for data access or downloads from S3 buckets or RDS instances that have sensitive data (e.g. employee compensation or non-public data) based on your data classification. Also, check if the users or roles are consistent with their usual behavior or job responsibilities. If not, this should be alerted, especially (again) if this is happening outside business hours or from untrusted locations.

Data Tampering Attempts:

Suspicious changes to S3 object permissions, metadata, or content that indicate unauthorized modifications or tampering with data stored in S3 buckets.
Unauthorized or odd modifications to database schemas, tables, or records stored in RDS instances that could compromise data integrity.
Uncommon or infrequently used API calls and/or commands being executed against S3 buckets or RDS instances that are out of the ordinary, given what the typical workflow looks like.

Privilege Escalations Attempts:

Changes to IAM policies or roles that grant any elevated permissions or privileges beyond what is necessary for normal operations.
Changes to security group rules or network configurations (e.g. VPC NACLs) that could potentially expose S3 buckets or RDS instances to unauthorized access.

Malware Activity or Exploitation Attempts:

Detection of malware signatures or exploit attempts targeting S3 buckets or RDS instances, such as malicious file uploads or SQL injection attacks. AWS Guard Duty can do this relatively well.

Unauthorized Changes (Drift):

Changes to S3 bucket policies, access control lists (ACLs) or object permissions that are inconsistent with established security policies, or access controls that are set as source code (e.g. Terraform or CloudFormation).

Alerting

Great, now onto figuring out what the alerting mechanism will be. Here are a few options in AWS:

Amazon CloudWatch Alarms: Create CloudWatch alarms based on CloudTrail log metrics to trigger notifications when specific conditions are met, such as the number of failed API calls exceeding a threshold.
Amazon SNS (Simple Notification Service): Configure CloudWatch alarms to send notifications to Amazon SNS topics, which can then deliver alerts via email, SMS, HTTP endpoint, or other supported protocols.
AWS Lambda: Use AWS Lambda functions to process CloudTrail log events in real-time and trigger custom actions or notifications based on defined criteria.

Response

Regardless of how you set up the alerts, you are going to want to do two additional things:

Make sure your monitoring and alerting works on a continuous basis. Think of this similarly to test driven development (TDD).
Make sure that there are humans on the other end of the alerts that know how to respond accordingly. This is where you are going to want to partner with your SOC or Insider Risk team.

AI & Automation: Continuous Improvement

Every time we write one of these posts, we know there is a wide spectrum of needs. Some people/teams/organizations are small and can easily act quickly on the guidance we provide. Those in larger teams and organizations may have massive challenges with the guidance due to scale. Then there are those in the middle that can cherry pick the low hanging fruit or validate approaches and projects they already have in flight.

This is where we strongly believe AI will help in all scenarios and not just address scale for a single company but scale across the industry. While there are open source LLMs, this is the one area that comes at a higher cost than others. We would argue a cost that is easily offset in value. So, for this topic in particular, let’s look at three distinct areas of impact.

Data classification of structured and unstructured data

Data classification is not something that can easily be done by humans at scale. This however is exactly where AI/ML can help and has been for quite some time. ML is used today for commonplace things like: spam email identification and email filtering, document scanning, sensitive data redaction in web traffic, fraud detections, social post moderation, data leakage prevention tools, and network filtering.

Triaging of alerts and detections

With major advancements including adoption of LLMs, we can use prompt engineering to power the triage of alerts from whatever solutions are in place. ML has been helping in this space but LLMs can help engineers in ways ML never could. For global teams, an LLM can help provide alert context and potential resolution steps in their native language. It also breaks down knowledge barriers. Have you ever used ChatGPT to explain something as if you were: a 5th grader, a CISO, a developer, etc.? It works pretty darn well with some thought (1)(2)(3).

LLMs can analyze data with greater levels of comprehension and accuracy than traditional rule-based or keyword-based approaches. These accuracy and efficiency gains reduce the effort for manual review by engineers.

SaaS & Third Party Data Risk Management

Businesses big and small typically have vendors that have access to important data, such as source code, internal tickets or docs, or customer data. In an age of rapid change and product development, it’s become commonplace to see these SaaS vendors change their data handling and privacy policies. Unless your organization has negotiated specific contract terms, your relationship with that vendor might be exposed to a shifting legal framework.

Exacerbating the problem, it’s been quoted that 77% of vendors implemented generative AI features in 2023. Normally, we are accustomed to and comfortable with vendor contract language associated with the use of user data by the vendor to improve the platform. However, with LLM-related data leakage concerns as cited by many (e.g. MITRE), it is clear that elevated scrutiny must now be put on vendor practices regarding AI model training.

New systems (e.g. AWS Q or Microsoft Copilot) might affect your legal, privacy, cyber risk, and compliance posture. To manage this risk, you need to inventory it across your vendors.

Start by classifying vendors into the high risk category. Do this by tagging vendors that have access to your most sensitive data. While enterprise data protection is the focus of this article, it is crucial to call out the importance of other AI-specific data concerns. Therefore you will need clarity on each vendor’s approach to responsibly governing and deploying AI. Partner with your third party risk team and ask vendors questions like these:

Do you offer features that leverage large language models?
Is the user notified that they are interacting with an AI feature?
Are there any controls over the use of the feature?
Which model and version is used?
What data is it trained on?
What guardrails have you put in place to protect from LLM training bias?
If it’s trained on customer data, how does a customer opt out?
How is consent obtained and managed?
Are any of these disclosures represented in legally binding documents like the Terms of Service or Privacy Policy?

These questions will help you triage high risk relationships for further due diligence and remediation. Conducting this exercise across a large SaaS vendor footprint can be hard, but it’s well worth the effort to mitigate rapidly evolving security and compliance concerns with third party AI systems.

Pro Tip: If you are looking for a primer on AI for cloud data security then we recommend checking out Marco Lancini’s CloudSecDocs.

Measures of Success

What’s the metric we should be targeting when it comes to measuring effective Cloud Security? Should we be indexing toward a percentage of total cloud costs? If yes then how do third parties factor in? How would you even begin to know what’s the right level of initial investment and how do you insulate yourself from unscalable controls and costs as your organization grows? There are so many questions and very little out there on what’s (most) right. We believe it can be simplified to: how much can you do with minimal cost to protect organizational data?

The fundamentals of security can provide a lot for a little but here are a few things to consider:

Total number of data incidents by threat actor (e.g. malicious insider) and cost per incident
Total number of data assets by service/storage type
Total number of compliance violations (e.g. missing encryption, logging, or backup) opened/closed. Even better: prevented by a code pipeline policy
Percentage of CIS Benchmark adherence or total number of violations
Number of data storage locations accurately identified by data type (e.g. S3 buckets or RDS instances with PII)
Percentage of cloud resources (e.g. S3 buckets or RDS instances) with properly configured controls and/or permissions
Number of third-party vendors or service providers with data connectivity assessed for data protection and security controls

Pro Tip: We unfortunately didn’t cover auto remediation in this article and will visit the topic in a separate one. It is something to consider when solving remediation, data ownership, auto removal of permissions, data deletion, etc, at scale. This is a crucial metric area because it demonstrates active attack surface management in a cost effective way.

Takeaways

There are incredibly cost effective ways to get started and build out a Cloud Security Data Program. Getting started is as easy as being able to draw a picture, because even with a rudimentary Cloud Architecture on paper you can begin building a threat model, a data inventory, and data classification.
In order to truly know what is going on in your environment, you need to look around. This means understanding how your object stores, databases, file shares, etc. are configured and accessed. This analysis is key for both understanding your priority work items that are obvious risks.
Point in time assessments have a short half life in the value department. Effective Data Governance means you are finding ways to ensure the principle of least privilege is being upheld continuously for humans, systems, and third parties. Factoring architecture changes that remove wide swaths of access and risk are also highly impactful but need organizational buy in.
Knowing what is typical helps you detect the bad. When you detect something bad, then having a strong security response is vital. There are many prerequisites for this but a Security Operations team can help pave the road for successful detections, alerts, and response.
In today’s world, you need AI and automation to scale just about anything. Data protection is no different but you have to keep a keen eye on your important SaaS vendors and what their GenAI services do with your data.
Protecting data today in a cloud native environment seems like it is more or less the same challenge as protecting it elsewhere, but only after you overcome the knowledge barrier of operating with your Cloud Service Providers offerings. That said, You can start with very little cost and bring assurance to your organization that it is safe. Criticality, measuring and tracking performance will help you get what you need to advance your mission.

Words of Wisdom

You got to the end of this post, congrats! We are just at the beginning of this journey to share our knowledge and experience. Securing anything, let alone data, is a significant challenge for any organization and we encourage you to help us by not letting what you read here sit on the shelf.

“If everyone is thinking alike, then no one is thinking.” — Benjamin Franklin

Contributions and Thanks

A special thanks to those who helped peer review and make this post as useful as it is: Steve Francolla, Henry Stanley, Brandon Wu, Rahul Sharma, Kyle Suero, Tim Lam, Keshav Malik, Sebastian Rojas, Luke Matarazzo, and Eric Ormes.

A special thanks to you, the reader. We hope you benefited from reading this in some way and we want everyone to be successful at this. While these posts aren’t a silver bullet, we hope they get you started.

Please do follow our page if you enjoy this and our other posts. More to come!