FinOps Advisory - Azure Service Event
Share

October 29, 2025 Global Incident

Report Overview

On October 29, 2025, Microsoft Azure experienced a service disruption due to an Azure Front Door configuration change. Access to the Azure portal degraded, and users reported widespread outages to several Azure services. 

 Beyond downtime, this incident may have financial implications for Azure customers worldwide.

This advisory will help you:

  • Understand the cost implications of this event
  • Identify any anomalies in your usage data
  • Understand Azure SLAs and navigate the service credits process

What Happened

On October 29, 2025, beginning at approximately 12:00 PM ET (16:00 UTC), Microsoft Azure experienced global service degradation following an Azure Front Door (AFD) configuration change.

Microsoft provided regular updates through its Azure Status Page and official communication channels.

Affected Azure services include, but are not limited to: App Service, Azure Databricks, Azure SQL Database, Container Registry, Microsoft Defender External Attack Surface Management, Microsoft Entra ID, Microsoft Purview, Microsoft Sentinel and more. This resulted in downstream effects to Microsoft 365, Virtual Desktop, Xbox Live, Minecraft, and numerous third-party platforms.

Organizations around the world reported temporary disruptions. Microsoft initiated mitigation by deploying a "last known good" configuration and began rerouting traffic through healthy infrastructure. The incident was fully mitigated by approximately 00:05 UTC on October 30, 2025 (approximately 7:40 PM ET).

This means: If you run infrastructure on Azure, some workloads may have experienced reduced performance or complete unavailability during the event window, potentially resulting in temporary cost anomalies. These may appear as billing errors; they are actually a normal side effect of large-scale service interruptions.

How This Affects Your Cloud Costs

Review your cloud analytics for the following cost patterns, which may have resulted from the event.

  1. Charges for Limited-Use Services

Resources that kept running but couldn't serve traffic:

  • Virtual Machines with no connectivity
  • Idle databases and storage accounts
  • Unused Application Gateways and Front Door instances
  • App Services unable to handle requests

You might be billed for services that delivered minimal business value during the outage window.

  1. Retry Storm Cost Spikes
  • Applications automatically retrying failed requests created artificial cost explosions:
  • Azure Functions timing out and re-executing
  • API Management request multiplications
  • Data transfer overages due to retry logic
  • Application Insights error logging surges
  • Event Hub and Service Bus message retries

These patterns represent system-resilience mechanisms, not intentional use, and can be analyzed to isolate nonproductive costs.

  1. Failover & Recovery Costs

If you triggered disaster recovery to other regions or on-premises infrastructure, you may see unexpected charges for:

  • Cross-region data transfer and replication
  • Temporary resource scaling in secondary regions
  • Backup restoration and disaster recovery services
  • Traffic Manager and Front Door routing changes

What to Look For

Map your usage anomalies to distinguish outage-driven costs from legitimate business activity.

Look for the following:

Outage-driven spikes: Functions invocations/timeouts, APIM 4xx/5xx errors, storage transaction surges, and data transfer consistent with retry loops/failures

Idle resource charges: Normal metering on resources that were effectively unreachable (compute, databases, gateways, Front Door) during the window.

Real usage: Legitimate business activity that occurred outside the outage window or in unaffected services

Look for short-term deviations (customer environments often report 5- 20x baseline levels during an outage period ( 12:00 PM - 7:40 PM ET )). These patterns can provide valuable context when discussing potential SLA adjustments with Microsoft.

Next Steps

1. Document Your Impact (This Week)

  • Export Service Health events per subscription/region from the Azure Portal (primary tenant-specific evidence)
  • Export hourly costs for October 29, 2025 (12:00 PM - 7:40 PM ET / 16:00-23:40 UTC)
  • Capture metrics showing service unavailability (Azure Monitor alerts, Application Insights errors, timeouts)
  • Calculate costs during the outage vs. normal operations
  • Document business impact (lost transactions, customer complaints, revenue impact)

2. File Your SLA Credit Claim

Microsoft defines SLA thresholds in its Service Level Agreements for individual Azure services.

CRITICAL DEADLINE: You must submit your claim within two months from the end of the billing month in which the incident occurred. For an October incident, the deadline is approximately December 31, 2025 (two months after October 31).

How to File:

  1. Open a support request in the Azure Portal
  2. Navigate to: Help + SupportCreate a support request
  3. Issue type: Billing
  4. Subject: "Azure SLA Credit Request – [Service Name] Outage October 29, 2025"
  5. Include:
    • Dates and times of impact
    • Affected resources (Resource IDs, subscription IDs)
    • Outage Incident ID from Service Health Dashboard
    • Azure Monitor logs, Application Insights data showing unavailability
    • Cost comparison data
    • Business impact documentation

Expected Credits:

Azure SLAs vary by service but typically range from 95% to 99.99% uptime guarantees:

  • 99.9% SLA: Allows ~43 minutes downtime/month
  • 99.95% SLA: Allows ~22 minutes downtime/month
  • 99.99% SLA: Allows ~4 minutes downtime/month

Based on the reported 7.5-hour outage window, services with 99.9% or higher SLAs likely experienced breaches. Typical service credit tiers:

Monthly Uptime % Service Credit
< 99.9% 10%
< 99% 25%
< 95% 100%

Credits appear as deductions on your next billing cycle and apply only to the affected services.

3. Recover Beyond-SLA Costs

Retry storms and failover costs aren't automatically credited. The following require separate Azure Support tickets:

  • Azure Functions retry overages
  • Data transfer spikes between regions
  • API Management cost explosions
  • Traffic Manager and Front Door failover expenses
  • Application Gateway and Load Balancer retry costs
  • Storage transaction surges from retry logic

Reference the October 29 incident and present cost comparisons showing these charges directly resulted from Azure infrastructure failure.

4. Verify Credits Applied

Monitor your next 1-2 billing cycles to confirm credits are reflected. Microsoft typically processes claims within 45 days of submission. If credits don't appear, re-open your support case for review.

If you use PointFive: Open Anomaly Detection for Oct 29, 12:00–16:00 ET across all Azure regions, validate anomalies using Data Explorer, then export the evidence for Microsoft.

Summary & Key Takeaways

File SLA claims by December 29, 2025 – this deadline is non-negotiable

  • Standard SLAs provide 10-100% credit on affected services, depending on uptime percentage
  • Retry storms and failover costs require separate claims through Azure Support escalation
  • Document everything ASAP – logs and cost data become harder to retrieve over time
  • Multi-region architectures with Availability Zones qualify for better SLA commitments (99.99% vs 99.9%)
  • Each Azure service has its own SLA – review individual service agreements for your workloads
  • Microsoft does NOT automatically issue SLA credits – you must submit a claim with proper documentation

Microsoft has a strong track record of transparency and responsiveness in these processes. Be sure to engage your Microsoft account team or Cloud Solution Architect early for direct support.

Resources & Contact

Microsoft Official Resources:

PointFive Contact:

📧 Email: hello@pointfive.co
🌐 Website: pointfive.co
💼 Schedule Consultation: pointfive.co/contact

About PointFive

PointFive pioneered Cloud Efficiency & Performance Management (CEPM) to help organizations proactively optimize cloud resources and identify hidden cost inefficiencies, including outage-related anomalies that traditional tools miss. Our proprietary DeepWaste™ engine uniquely identifies hidden inefficiencies, uncovering significant cost-saving opportunities typically missed by traditional tools. PointFive integrates seamlessly into your engineering workflows with minimal setup, ensuring immediate, measurable impact.

This advisory is provided for informational purposes. While we strive for accuracy, customers should verify all information with Microsoft documentation and their Microsoft account team. SLA terms and credit eligibility are determined solely by Microsoft.

Published: October 29, 2025 | Incident Date: October 29, 2025 | Affected Region: Global

Share
Stay connected
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Find out more
How AI is Reshaping the Demands on Infrastructure
Read more
FinOps Advisory - AWS Service Event
Read more
Smarter, Faster, Actionable: Introducing PointFive's New Cost Anomaly Detection Module
Read more
STARTING POINT

Discover deeper cloud efficiency with PointFive.

Book a Demo