FinOps Advisory - Azure Service Event

Yael Cinamon

October 30, 2025

October 29, 2025 Global Incident‍

Report Overview

On October 29, 2025, Microsoft Azure experienced a service disruption due to an Azure Front Door configuration change. Access to the Azure portal degraded, and users reported widespread outages to several Azure services.

Beyond downtime, this incident may have financial implications for Azure customers worldwide.

This advisory will help you:

Understand the cost implications of this event
Identify any anomalies in your usage data
Understand Azure SLAs and navigate the service credits process

‍What Happened

On October 29, 2025, beginning at approximately 12:00 PM ET (16:00 UTC), Microsoft Azure experienced global service degradation following an Azure Front Door (AFD) configuration change.

Microsoft provided regular updates through its Azure Status Page and official communication channels.

Affected Azure services include, but are not limited to: App Service, Azure Databricks, Azure SQL Database, Container Registry, Microsoft Defender External Attack Surface Management, Microsoft Entra ID, Microsoft Purview, Microsoft Sentinel and more. This resulted in downstream effects to Microsoft 365, Virtual Desktop, Xbox Live, Minecraft, and numerous third-party platforms.

Organizations around the world reported temporary disruptions. Microsoft initiated mitigation by deploying a "last known good" configuration and began rerouting traffic through healthy infrastructure. The incident was fully mitigated by approximately 00:05 UTC on October 30, 2025 (approximately 7:40 PM ET).

This means: If you run infrastructure on Azure, some workloads may have experienced reduced performance or complete unavailability during the event window, potentially resulting in temporary cost anomalies. These may appear as billing errors; they are actually a normal side effect of large-scale service interruptions.

‍How This Affects Your Cloud Costs

Review your cloud analytics for the following cost patterns, which may have resulted from the event.

Charges for Limited-Use Services

Resources that kept running but couldn't serve traffic:

Virtual Machines with no connectivity
Idle databases and storage accounts
Unused Application Gateways and Front Door instances
App Services unable to handle requests

You might be billed for services that delivered minimal business value during the outage window.

Retry Storm Cost Spikes

Applications automatically retrying failed requests created artificial cost explosions:
Azure Functions timing out and re-executing
API Management request multiplications
Data transfer overages due to retry logic
Application Insights error logging surges
Event Hub and Service Bus message retries

These patterns represent system-resilience mechanisms, not intentional use, and can be analyzed to isolate nonproductive costs.

Failover & Recovery Costs

If you triggered disaster recovery to other regions or on-premises infrastructure, you may see unexpected charges for:

Cross-region data transfer and replication
Temporary resource scaling in secondary regions
Backup restoration and disaster recovery services
Traffic Manager and Front Door routing changes‍

What to Look For

Map your usage anomalies to distinguish outage-driven costs from legitimate business activity.

Look for the following:

Outage-driven spikes: Functions invocations/timeouts, APIM 4xx/5xx errors, storage transaction surges, and data transfer consistent with retry loops/failures

Idle resource charges: Normal metering on resources that were effectively unreachable (compute, databases, gateways, Front Door) during the window.

Real usage: Legitimate business activity that occurred outside the outage window or in unaffected services

Look for short-term deviations (customer environments often report 5- 20x baseline levels during an outage period ( 12:00 PM - 7:40 PM ET )). These patterns can provide valuable context when discussing potential SLA adjustments with Microsoft.‍

Next Steps

1. Document Your Impact (This Week)

Export Service Health events per subscription/region from the Azure Portal (primary tenant-specific evidence)
Export hourly costs for October 29, 2025 (12:00 PM - 7:40 PM ET / 16:00-23:40 UTC)
Capture metrics showing service unavailability (Azure Monitor alerts, Application Insights errors, timeouts)
Calculate costs during the outage vs. normal operations
Document business impact (lost transactions, customer complaints, revenue impact)

2. File Your SLA Credit Claim

Microsoft defines SLA thresholds in its Service Level Agreements for individual Azure services.

CRITICAL DEADLINE: You must submit your claim within two months from the end of the billing month in which the incident occurred. For an October incident, the deadline is approximately December 31, 2025 (two months after October 31).

How to File:

Open a support request in the Azure Portal
Navigate to: Help + Support → Create a support request
Issue type: Billing
Subject: "Azure SLA Credit Request – [Service Name] Outage October 29, 2025"
Include:
- Dates and times of impact
- Affected resources (Resource IDs, subscription IDs)
- Outage Incident ID from Service Health Dashboard
- Azure Monitor logs, Application Insights data showing unavailability
- Cost comparison data
- Business impact documentation

Expected Credits:

Azure SLAs vary by service but typically range from 95% to 99.99% uptime guarantees:

99.9% SLA: Allows ~43 minutes downtime/month
99.95% SLA: Allows ~22 minutes downtime/month
99.99% SLA: Allows ~4 minutes downtime/month

Based on the reported 7.5-hour outage window, services with 99.9% or higher SLAs likely experienced breaches. Typical service credit tiers:

Monthly Uptime %	Service Credit
< 99.9%	10%
< 99%	25%
< 95%	100%

Credits appear as deductions on your next billing cycle and apply only to the affected services.‍

3. Recover Beyond-SLA Costs

Retry storms and failover costs aren't automatically credited. The following require separate Azure Support tickets:

Azure Functions retry overages
Data transfer spikes between regions
API Management cost explosions
Traffic Manager and Front Door failover expenses
Application Gateway and Load Balancer retry costs
Storage transaction surges from retry logic

Reference the October 29 incident and present cost comparisons showing these charges directly resulted from Azure infrastructure failure.‍

4. Verify Credits Applied

Monitor your next 1-2 billing cycles to confirm credits are reflected. Microsoft typically processes claims within 45 days of submission. If credits don't appear, re-open your support case for review.

No items found.

If you use PointFive: Open Anomaly Detection for Oct 29, 12:00–16:00 ET across all Azure regions, validate anomalies using Data Explorer, then export the evidence for Microsoft.

‍Summary & Key Takeaways

File SLA claims by December 29, 2025 – this deadline is non-negotiable

Standard SLAs provide 10-100% credit on affected services, depending on uptime percentage
Retry storms and failover costs require separate claims through Azure Support escalation
Document everything ASAP – logs and cost data become harder to retrieve over time
Multi-region architectures with Availability Zones qualify for better SLA commitments (99.99% vs 99.9%)
Each Azure service has its own SLA – review individual service agreements for your workloads
Microsoft does NOT automatically issue SLA credits – you must submit a claim with proper documentation

Microsoft has a strong track record of transparency and responsiveness in these processes. Be sure to engage your Microsoft account team or Cloud Solution Architect early for direct support.

‍Resources & Contact

Microsoft Official Resources:

Azure SLAs: azure.microsoft.com/en-us/support/legal/sla
Azure Status: status.azure.com
Azure Support: portal.azure.com → Help + Support
Service Health Dashboard: Azure Portal → Service Health

PointFive Contact:

📧 Email: hello@pointfive.co
🌐 Website: pointfive.co
💼 Schedule Consultation: pointfive.co/contact‍

About PointFive

PointFive pioneered Cloud Efficiency & Performance Management (CEPM) to help organizations proactively optimize cloud resources and identify hidden cost inefficiencies, including outage-related anomalies that traditional tools miss. Our proprietary DeepWaste™ engine uniquely identifies hidden inefficiencies, uncovering significant cost-saving opportunities typically missed by traditional tools. PointFive integrates seamlessly into your engineering workflows with minimal setup, ensuring immediate, measurable impact.

This advisory is provided for informational purposes. While we strive for accuracy, customers should verify all information with Microsoft documentation and their Microsoft account team. SLA terms and credit eligibility are determined solely by Microsoft.

Published: October 29, 2025 | Incident Date: October 29, 2025 | Affected Region: Global

Stay connected

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

Find out more

Azure OpenAI Cost Saving Optimizations

FinOps for AI: Master Your GenAI Unit Economics Across Every Cloud

AI for FinOps: Instant Remediation for Cloud Waste

Discover deeper cloud efficiency with PointFive.