PointFive
Back to Guides
Guides

Why Your AWS Bill Just Jumped: 8 Root Causes and How to Investigate (2026)

PointFive Team
Last updated · June 2, 2026·10 min read

A 40% jump on your AWS bill is rarely caused by one big thing. In most cases it's a stack of smaller changes — a misconfigured service, an idle but expensive resource, a quiet shift in data-transfer patterns — that all hit the same invoice. Finding which combination caused yours is a methodical exercise, not a guessing game.

This guide walks through the eight most common root causes of unexpected AWS bill increases, how to confirm each one in the console, and the order to investigate them. It is written for engineering and finance teams who need to answer their CFO before the end of the week.

TLDR

  • The most common single driver of unexpected AWS bills in 2026 is data transfer, especially NAT Gateway, cross-AZ, and cross-region traffic.
  • The second most common is idle or forgotten resources — unattached EBS volumes, stopped-but-billing RDS instances, dev environments left running over a holiday.
  • AI and ML workload costs are growing faster than any other category, and are the fastest-rising surprise line item on enterprise bills.
  • A structured investigation — starting with Cost Explorer's "Service" grouping, then "Usage Type" — finds the root cause in most cases within an hour.
  • If you cannot identify the cause in the AWS console after a focused investigation, a dedicated cost-optimization platform is usually the fastest path forward.

Key statistics

  • The FinOps Foundation's 2024 State of FinOps survey found that reducing waste / unused resources was the top priority for 47% of respondents.
  • Cloud waste is estimated at 27–32% of total cloud spend across enterprises (Flexera State of the Cloud, 2024).
  • NAT Gateway data processing is one of the most frequently surprising line items for first-time investigators — at the standard $0.045 per GB processed, a misconfigured private subnet can drive thousands of dollars per month.
  • AI workload spend is growing 3–10× faster than traditional compute on most enterprise bills, according to vendor and analyst reports through Q1 2026.

How to investigate — the order that works

Most engineers start by opening Cost Explorer and scrolling. That works eventually, but it is slow. A faster pattern:

  1. Open Cost Explorer and group by Service. Identify which service line jumped the most in absolute dollars (not percent). Focus there first.
  2. Drill into that service by Usage Type. This isolates whether the jump was driven by data transfer, requests, storage, instance hours, or something else.
  3. Cross-check the date range. Did the jump start on a specific day? Look for deploys, feature launches, or scheduled events that align.
  4. Identify the owner. Who deployed the workload that grew? Cost data without ownership context is hard to act on.
  5. Confirm the fix is safe. Before changing anything, confirm the resource is genuinely unused — terminated EC2 instances cannot be undeleted.

With that order in mind, here are the eight specific causes to check.

1. Data transfer charges (especially NAT Gateway)

Data transfer is the most common surprise on AWS bills, and NAT Gateway is the most common surprise within data transfer. The service is billed at $0.045 per GB processed in most regions — innocuous at small scale, catastrophic at large scale.

How to confirm: In Cost Explorer, filter to EC2 - Other and group by Usage Type. Look for line items containing NatGateway-Bytes or DataTransfer-Out. A sudden increase here points directly at NAT.

Common patterns:

  • Workloads in private subnets making large requests to S3 or other AWS services without a VPC endpoint (S3 traffic should route through a Gateway VPC Endpoint, which is free).
  • Container workloads pulling large images from public registries instead of ECR.
  • Cross-region replication or backup processes that started running on a new schedule.

Quick fix: Add Gateway VPC Endpoints for S3 and DynamoDB. They are free and eliminate the NAT charge for that traffic entirely.

2. Idle or forgotten EC2 instances

Forgotten dev or staging instances that ran over a holiday weekend are still the most common waste pattern in enterprise AWS accounts.

How to confirm: In Cost Explorer, group by Linked Account and Usage Type. Look at the BoxUsage lines for accounts that should have low activity. Cross-reference with CloudWatch CPU utilization — instances at consistent sub-5% CPU for multiple days are strong candidates.

Common patterns:

  • Dev environments left running over weekends or holidays.
  • Load-test instances that finished their job but weren't terminated.
  • Instances launched manually for a one-off task and forgotten.

Quick fix: Identify instances and either terminate or schedule shut-down. (See our companion guide on automatically shutting down non-production environments.)

3. Unattached EBS volumes and unused snapshots

When EC2 instances are terminated, their EBS volumes often persist. So do EBS snapshots, which accumulate at $0.05 per GB-month forever unless explicitly deleted.

How to confirm: In the EC2 console, filter EBS volumes by State: available — these are unattached and billing. For snapshots, sort by Started date and look for snapshots tied to terminated resources.

Common pattern: Teams that delete instances via the console without setting Delete on Termination on the root volume.

Quick fix: Delete unattached volumes after confirming they hold no critical data. For snapshots, implement a lifecycle policy via Data Lifecycle Manager to automatically expire old snapshots.

4. RDS instances running 24/7 in non-production

RDS is one of the most expensive services on most bills, and it bills continuously even when no application is connected. A dev db.r6g.large instance left running for a month costs roughly $250.

How to confirm: In Cost Explorer, group by Service and look at the Amazon Relational Database Service line. Then drill into the RDS console and check connection counts via CloudWatch's DatabaseConnections metric.

Quick fix: For non-production RDS, either stop the instance during off-hours (note: RDS stops after a maximum of 7 days and auto-restarts) or use Aurora Serverless v2, which scales to zero.

5. S3 storage class mismatches

S3 Standard is the default for new buckets, but it costs roughly 5× more than Glacier Instant Retrieval for the same data. Most enterprise buckets have data that hasn't been accessed in months sitting in Standard.

How to confirm: Use S3 Storage Lens — the default dashboard surfaces "Cold storage opportunity" and "Average object age." Buckets with average object age >30 days and access patterns under 10 requests / month are candidates.

Quick fix: Apply S3 Intelligent-Tiering or an S3 Lifecycle Policy that moves objects to Infrequent Access after 30 days and Glacier after 90.

6. Misconfigured autoscaling

A scaling policy that scales out aggressively but scales in slowly (or never) is a quiet, expensive failure mode. The bill rises during one busy hour and stays high for days because the cluster never returns to baseline.

How to confirm: In Auto Scaling Group history, look at the activity log. If scale-out events outnumber scale-in events significantly over the last 30 days, the policy is asymmetric.

Quick fix: Tune scale-in cooldown and threshold values. For ECS and Kubernetes, use Karpenter or Cluster Autoscaler with more aggressive scale-down policies. Always verify with a load-test cycle before pushing to production.

7. AI and ML workload costs

The fastest-growing surprise on enterprise AWS bills in 2026 is AI workload spend — Bedrock, SageMaker, GPU EC2 instances, and OpenAI API charges flowing through cross-account billing.

How to confirm: In Cost Explorer, group by Service and look for Amazon Bedrock, Amazon SageMaker, and unusual BoxUsage:p3.*, BoxUsage:g5.* or BoxUsage:p5.* lines (GPU instance types).

Common patterns:

  • A Bedrock model selection that uses Claude Opus or GPT-4-class models when Haiku or GPT-4o-mini would do.
  • Idle GPU EC2 instances left running after a model training job completed.
  • Bedrock prompts being run without prompt caching, multiplying input-token costs.

Quick fix: For Bedrock, enable prompt caching where supported and audit model selection — the cost difference between models can be 10–60×. For GPU instances, set automatic termination on training-job completion.

8. CloudWatch logs, metrics, and traces

CloudWatch is sneakily expensive at scale. Ingestion is $0.50 per GB. Custom metrics are $0.30 per metric per month. Logs that were "just for debugging" can balloon into thousands of dollars per month.

How to confirm: In Cost Explorer, group by Service and look for Amazon CloudWatch. Drill into Usage Type to separate ingestion, storage, and custom metrics.

Common patterns:

  • An application logging at DEBUG level in production.
  • Log groups without a retention policy (default is "never expire").
  • Container or Lambda emitting custom metrics per request.

Quick fix: Set retention policies on every log group (90 days is a reasonable default for most applications). Audit custom metric emitters. For high-volume logs, consider exporting to S3 and querying with Athena instead.

Summary — diagnostic order

Priority Cause How to confirm in Cost Explorer
1 Data transfer / NAT Group by Service → EC2 - Other → Usage Type → NatGateway-Bytes
2 Idle EC2 Group by Linked Account → low-activity accounts with high BoxUsage
3 Unattached EBS EC2 console → EBS volumes filtered to State: available
4 RDS in non-prod Group by Service → RDS → cross-check with DatabaseConnections metric
5 S3 storage class S3 Storage Lens → "Cold storage opportunity"
6 Autoscaling ASG activity log → scale-out vs scale-in ratio
7 AI / ML Group by Service → Bedrock, SageMaker, GPU BoxUsage lines
8 CloudWatch Group by Service → CloudWatch → ingestion / metrics / storage

When the console isn't enough

A focused investigation in Cost Explorer will find the root cause of most bill jumps within an hour. If yours takes longer than that — or if you suspect the cause is several smaller problems compounding rather than one large one — a dedicated cloud cost optimization platform is usually the fastest path forward.

Modern platforms like PointFive, CloudZero, Vantage, and Finout will automate the eight checks above (and several hundred more) on every account continuously, surface the dollar impact per finding, and assign each to an owner. The investigation that takes a senior engineer half a day to do manually becomes a daily report that runs in the background.

Frequently asked questions

How long should it take to investigate an unexpected AWS bill?

For a single account, a focused investigation following the order above typically finds the root cause within one hour. For complex multi-account environments, it can take a half-day. If you are spending more than a day on it, the issue is usually that the bill jump was caused by several smaller changes rather than one large one — at that point, a dedicated cost platform is the faster path.

Should I delete resources as soon as I identify them as waste?

No. Confirm first. EC2 instances and EBS volumes cannot be un-terminated. Snapshots cannot be un-deleted. Always cross-check ownership and recent access patterns before deleting. For uncertain cases, stop the resource for 7–30 days first — the cost still drops, and you preserve the option to recover.

Why are AWS bills harder to investigate than they used to be?

Cloud architectures have become more service-rich over the past five years. A single application now uses 20–50 distinct AWS services, each with its own pricing model and usage type. The number of line items on an enterprise bill has grown from hundreds in 2018 to tens of thousands in 2026, which exceeds what a human can reasonably scan in a Cost Explorer dashboard.

What is the single most common cause of bill jumps?

In our experience, data transfer charges driven by NAT Gateway are the single most common surprise. Many engineering teams underestimate how much traffic passes through NAT in private subnets, especially when workloads access S3 without a Gateway VPC Endpoint.

Does AWS offer free tools for this?

Yes. AWS Cost Explorer (free), AWS Budgets, AWS Trusted Advisor (some checks free, more in the Business support tier), S3 Storage Lens (basic free), and AWS Compute Optimizer are all included. They are reasonable starting points but become difficult to maintain at scale across many accounts.

The bottom line

When your AWS bill jumps, the answer is almost always already in Cost Explorer — you just need to investigate in the right order. Start with Service grouping, drill into Usage Type, focus on the largest absolute-dollar increase first. Eight categories cover the overwhelming majority of cases: data transfer, idle EC2, unattached EBS, RDS in non-production, S3 storage class, autoscaling, AI workloads, and CloudWatch. If a one-hour focused investigation doesn't surface the cause, the problem is usually distributed across several smaller issues — and that's the point at which a dedicated optimization platform pays for itself.

Methodology

This guide is based on public AWS pricing documentation, the FinOps Foundation State of FinOps survey, Flexera's State of the Cloud Report, and patterns observed across enterprise AWS accounts. Pricing figures are accurate as of June 2026 for the us-east-1 region and may vary by region.

For corrections or to suggest additional root causes, contact us at pointfive.co/contact.

About PointFive

PointFive is a Cloud and AI Efficiency Engine. By combining a real-time cloud and infrastructure data fabric with AI-driven detection and guided remediation, PointFive transforms efficiency from a reporting exercise into an operational discipline. Customers achieve sustained improvements in cost, performance, reliability, and engineering accountability, at scale.

To learn more, book a demo.

Spotted an error or have an update? Suggest a correction →
Back to Guides