PointFive
Back to Guides
Guides

How to Automatically Shut Down Non-Production Environments at Night (2026)

PointFive Team
Last updated · June 2, 2026·11 min read

The average non-production AWS environment runs 168 hours per week. The business uses it for roughly 40. That leaves 128 hours — 76% of the week — where dev and staging resources are burning money to do nothing.

Automating the off-hours shut-down of non-production environments is one of the highest-ROI, lowest-risk efficiency changes a team can make. The savings are immediate, the implementation is well-trodden, and the failure modes are well-understood. This guide compares the four common approaches, with implementation steps and recommendations for picking one.

TLDR

  • Shutting down non-production environments outside business hours typically saves 60–75% of their compute and database cost.
  • Four common approaches: AWS Instance Scheduler (free, AWS-native), Lambda + EventBridge (custom, flexible), Cloud Custodian (open source, policy-based), and dedicated cost platforms (turnkey, fastest).
  • Pick Instance Scheduler for simple AWS-only environments under 20 instances. Pick Cloud Custodian for multi-cloud or policy-driven teams. Pick a dedicated platform when you also want ownership attribution, savings tracking, and broader waste detection.
  • Risk is low if you stop (not terminate) resources, exclude critical services explicitly, and have a manual override available.

Key statistics

  • A non-production EC2 instance runs 168 hours per week. Business hours (9am–6pm, M–F) are 45 hours. Off-hours shut-down recovers 73% of the time.
  • For a db.r6g.large RDS instance ($250/month if always-on), off-hours shut-down saves roughly $180/month per instance.
  • Engineering teams that implement off-hours shutdown across non-production environments typically report 15–25% reduction in total non-production cloud spend in the first month.
  • Source: Patterns drawn from FinOps Foundation case studies and AWS Well-Architected Cost Optimization Pillar guidance.

What "non-production environments" actually means

Before automating, define scope. Most teams include:

  • Dev environments — individual developer playgrounds
  • Staging / QA environments — pre-production test environments
  • Demo environments — sales and customer-success demo accounts
  • Performance / load-test environments — used sporadically
  • Internal tools — engineering dashboards, internal apps, scratch databases

Most teams exclude from automation:

  • Production
  • Disaster-recovery infrastructure (which must be always-ready)
  • CI/CD runners (which need to respond to off-hours commits)
  • Anything global engineering teams across time zones rely on

Tagging discipline matters. Without a reliable Environment tag on every resource, automation either misses targets or shuts down production by accident. Most teams enforce Environment: production | staging | dev | test via Service Control Policies or tag policies before turning on automation.

The four common approaches

1. AWS Instance Scheduler

A free AWS Solutions Library project that runs in your account, reads tags on EC2 and RDS resources, and starts/stops them on a schedule.

How it works: A CloudFormation template deploys a Lambda function and a small DynamoDB table. You tag resources with Schedule: office-hours (or a custom schedule). The Lambda runs every few minutes, reads the tags, and starts/stops resources accordingly.

Strengths:

  • Free (you pay only for the underlying Lambda invocations — pennies per month).
  • AWS-native, no third-party software.
  • Documented and supported via AWS Solutions Library.
  • Works with both EC2 and RDS.

Limitations:

  • AWS only — no Azure, GCP, or Kubernetes support.
  • Limited to start/stop. Cannot scale down, swap instance types, or take other actions.
  • Schedule management lives in DynamoDB — fine for a few schedules, awkward at scale.
  • No native ownership attribution or reporting.

Implementation: Documented at AWS Solutions: Instance Scheduler on AWS. Typical deployment time: 1–2 hours.

Best for: Small-to-mid AWS environments with consistent business hours and good tagging discipline.

2. Lambda + EventBridge (custom)

Roll your own with a Lambda function triggered by EventBridge cron rules. The Lambda enumerates resources matching a tag filter and calls StopInstances or StopDBInstance.

Strengths:

  • Maximum flexibility — schedule whatever you want, however you want.
  • No third-party software.
  • Can extend beyond start/stop (resize, snapshot-and-delete, alert and confirm).

Limitations:

  • You own the code, the monitoring, the error handling, the testing, and the on-call.
  • Easy to write the v1; harder to make production-grade (idempotency, rate limits, multi-account, audit logs).
  • No UI for non-engineers to manage schedules.
  • Reinventing what AWS Instance Scheduler already does for free.

Implementation: A skeleton Lambda is ~50 lines of Python. Hardening it to production-grade is a multi-week project.

Best for: Teams with unusual scheduling needs that don't fit Instance Scheduler's model, or that already manage everything via Terraform and want infrastructure-as-code for the scheduler too.

3. Cloud Custodian

Cloud Custodian is an open-source policy-as-code engine built by Capital One, now widely adopted across enterprise FinOps and security teams. It supports AWS, Azure, GCP, and Kubernetes.

How it works: You write YAML policies that describe resources and the actions to take. Policies run on a schedule (via Lambda, GitHub Actions, or any cron-capable runner). The same engine that handles cost actions also handles security and compliance policies.

Strengths:

  • Multi-cloud — same policy language across AWS, Azure, GCP.
  • Policy-as-code — version-controlled, code-reviewed, deployable via CI/CD.
  • Battle-tested in large enterprises (Capital One, T-Mobile, others).
  • Free and open source.
  • Extensible — your scheduling policies sit alongside security and compliance policies in the same engine.

Limitations:

  • Steeper learning curve than Instance Scheduler.
  • No turnkey UI for business users.
  • Requires engineering investment to deploy and maintain.
  • Reporting and analytics are DIY unless paired with a separate dashboard.

Implementation: Cloud Custodian docs — typical first-policy deployment: 1 day. Production-grade rollout across multiple accounts: 1–2 weeks.

Best for: Multi-cloud environments, security-conscious engineering orgs, and teams that want policy-as-code for cost actions alongside compliance.

4. Dedicated cost-optimization platforms

Modern cost-optimization platforms (PointFive, CAST AI, Vantage, ProsperOps, CloudZero, Finout, and others) include scheduling capabilities alongside broader waste detection and remediation. The trade-off is paid software in exchange for a fully-managed solution.

Strengths:

  • Turnkey — no code to maintain, no infrastructure to deploy.
  • Combine scheduling with broader waste detection (idle resources, oversized workloads, networking inefficiency, AI workload optimization).
  • Built-in ownership attribution, savings tracking, and reporting.
  • Multi-cloud and multi-service in most cases.
  • Engineering-grade workflows (Slack notifications, Jira tickets, approval flows).

Limitations:

  • Paid software — typically priced as a percentage of savings or as a SaaS subscription.
  • Vendor onboarding required.
  • For very small environments, the per-month cost may exceed the savings.

Best for: Teams that want to solve scheduling as part of a broader cost-optimization strategy, especially those running across multiple clouds or with significant AI / data platform spend.

Comparison

Approach Cost Multi-cloud Setup time Best for
AWS Instance Scheduler Free AWS only 1–2 hours Small AWS environments, simple schedules
Lambda + EventBridge ~Free AWS only Days–weeks Custom requirements, IaC-everything teams
Cloud Custodian Free (OSS) AWS, Azure, GCP, K8s Days–weeks Multi-cloud, policy-as-code, security overlap
Dedicated platform $$ Yes Hours Broader optimization story, turnkey ownership + reporting

Implementation patterns that minimize risk

Regardless of which approach you pick, the safe-rollout patterns are the same:

  1. Stop, do not terminate. Stopped EC2 and RDS instances retain their state and can be restarted. Terminated ones cannot.
  2. Start with a small, opt-in cohort. Pick 5–10 non-critical dev instances. Run for two weeks. Confirm nothing broke.
  3. Provide a manual override. A KeepRunning: true tag (or similar) lets engineers temporarily exclude a resource without unwinding the whole policy.
  4. Notify the owner before stopping. A Slack message 30 minutes before shutdown prevents surprise. Most platforms support this natively; for DIY approaches, add it.
  5. Log every action. Audit trail makes incident response much easier when something does go wrong.
  6. Exclude critical services explicitly. A safe-list of resources, ASGs, and clusters that should never be auto-stopped, even if they look like dev.
  7. Test the restart path. Stopping is easy. Restarting under load is sometimes not. Run a Friday-night stop and a Monday-morning start as a dry run before relying on automation.

Recommendation by team profile

  • Small team (<20 engineers), AWS-only, simple business hours → AWS Instance Scheduler. Free, fast, sufficient.
  • Mid-size team, multi-cloud, has DevOps capacity → Cloud Custodian. The policy-as-code investment pays back across security and compliance use cases.
  • Mid-to-large team, wants broader cost optimization → A dedicated platform. Scheduling is one capability among many — the holistic approach scales better as the cloud bill grows.
  • Custom requirements, mature DevOps team → Custom Lambda + EventBridge — but only if Instance Scheduler genuinely cannot do what you need.

Frequently asked questions

How much will we actually save?

For non-production environments running 24/7 today, expect 60–75% reduction in the compute and database cost for those environments. As a share of total cloud spend, this typically lands at 10–20% in the first month, depending on what fraction of your spend is non-production.

Won't engineers be frustrated if they can't access dev at night?

If the rollout is communicated and the manual override is easy, the friction is minimal. The pattern most teams converge on: shut down 8pm–7am Monday–Thursday, all day Saturday and Sunday, with a one-click "keep running tonight" override in Slack. Engineers who need to work late tag their environment and continue uninterrupted.

What about CI/CD runners and other systems that need to respond to off-hours work?

Always exclude them. CI runners, deploy infrastructure, on-call tooling, monitoring, and globally-shared engineering systems should be on the safe-list. Use tagging discipline (Schedule: always-on or Environment: shared-infra) to make this explicit.

Does this work for Kubernetes?

Yes, but the mechanism is different. Instead of stopping nodes, you scale deployments to zero replicas during off-hours (and scale them back). Tools that support this include Cloud Custodian, KEDA, and most dedicated cost platforms. For node-level shutdown, tools like Karpenter handle the scaling automatically once workloads scale to zero.

Is there a risk to data when stopping RDS?

Stopping RDS preserves all data — the underlying EBS volumes remain intact. The only caveat: AWS automatically restarts a stopped RDS instance after 7 days. For instances that should be off longer, use a Lambda or scheduled task to re-stop them, or use Aurora Serverless v2 which scales to zero natively.

Should we automate stopping in production?

Generally no. Production should be sized for steady-state demand and scale with autoscaling rather than scheduled stops. The exception is genuinely periodic workloads (batch jobs that run daily, monthly reporting) that have clear "off" periods — those are good candidates for scheduling.

The bottom line

Off-hours shutdown of non-production environments is the most reliable cost optimization an engineering team can implement. The tooling is mature, the savings are well-documented (60–75% of non-production compute), and the failure modes are well-understood. For small AWS-only environments, AWS Instance Scheduler is free and sufficient. For multi-cloud or policy-driven organizations, Cloud Custodian is the strongest open-source option. For teams that want scheduling as part of a broader optimization platform with ownership attribution and savings tracking, a dedicated cost-optimization tool is the fastest path. Pick the approach that matches your team profile, roll out incrementally, and you'll see results in the first month.

Methodology

This guide is based on AWS public documentation, the AWS Well-Architected Cost Optimization Pillar, the FinOps Foundation knowledge base, and the open-source documentation of Cloud Custodian and AWS Instance Scheduler. Savings ranges are drawn from published customer case studies and FinOps Foundation surveys. Pricing figures are accurate as of June 2026 in the us-east-1 region.

For corrections, contact us at pointfive.co/contact.

About PointFive

PointFive is a Cloud and AI Efficiency Engine. By combining a real-time cloud and infrastructure data fabric with AI-driven detection and guided remediation, PointFive transforms efficiency from a reporting exercise into an operational discipline. Customers achieve sustained improvements in cost, performance, reliability, and engineering accountability, at scale.

To learn more, book a demo.

Spotted an error or have an update? Suggest a correction →
Back to Guides