Overview
Client: Gong
Industry: Revenue Intelligence / B2B SaaS
Cloud Provider: AWS
Challenge: High-scale AI inference in shared clusters made it difficult to identify underutilized workloads and their root causes, resulting in significant waste in critical production systems.
Solution: PointFive's detection engine and visualization dashboard continuously monitor inference service behavior, surface high-impact inefficiencies, and provide remediation context.
Results at a Glance
- $25K/month savings from a single AI inference workload
- Over 50% cost reduction on targeted inference services
- 16.8x ROI year-to-date
- Additional 3x savings already identified and in progress
- Custom dashboard for continuous monitoring of critical AI inference workloads
Background
Gong operates as a revenue intelligence platform powering analysis of customer interactions at scale. AI inference services process large language models for sentiment and concept identification in recorded conversations. The company processes 100M requests per day on production-critical infrastructure.
With AI workloads running inside shared Kubernetes clusters, traditional cost tools could not isolate individual service costs or identify where GPU resources were being underutilized. Gong needed a solution that could look deeper into inference behavior — not just cluster-level spend.
Objectives
- Detect hidden inefficiencies that standard cost reporting misses in shared AI inference clusters
- Drive continuous optimization with visibility into inference behavior and utilization trends
- Prioritize and remediate the highest-ROI inefficiencies while maintaining product velocity
- Create repeatable workflows that scale across multiple AI services and future multi-cloud expansion
Challenges
Shared cluster complexity: Multiple services and models run in blended clusters where costs and utilization signals are intertwined. Standard per-service attribution was not possible with existing tooling.
Workload criticality: These systems power Gong's core AI features — sentiment analysis, concept identification, and conversation intelligence. Cost reduction could not come at the expense of functionality or reliability.
Missing GPU signals: Standard cloud cost tools do not identify over-provisioned inference services at the model level. GPU utilization data was either unavailable or aggregated beyond usefulness.
Manual investigation limitations: Engineers needed contextualized data with clear remediation paths — not just alerts or spending dashboards.
Solution
PointFive enabled Gong to detect inefficiencies through detection models designed to:
- Track cluster costs and inference service behavior over time
- Identify deployments with the highest cost relative to utilization
- Provide remediation context including underutilization details, dependencies, and savings opportunities
Key Discovery: One high-cost inference workload comprising approximately 30 models was costing around $40K/month and showed clear underutilization signals. This included internal models like deBERTa used for concept identification in conversations. With PointFive's contextualized insights, engineers improved utilization and reduced costs without impacting performance or reliability.
Results
$25K/month in realized savings from a single AI inference workload — representing over 50% cost reduction on that service.
16.8x ROI year-to-date — the investment in PointFive paid for itself many times over within the first engagement.
3x additional savings identified — further optimization opportunities are already in the pipeline, extending the impact across additional inference services.
Continuous monitoring established — a custom dashboard now provides ongoing visibility into inference workload efficiency, enabling Gong's engineering team to catch regressions and new optimization opportunities as they emerge.
Conclusion
Gong's AI inference workloads represent a growing class of cloud spend that traditional tools cannot effectively optimize: shared clusters running dozens of models where waste hides beneath aggregate metrics.
PointFive's detection engine cut through the noise — isolating specific underutilized models, quantifying the savings opportunity, and providing engineers with the context they needed to act confidently. The result was immediate, measurable impact with a clear path to continued optimization.
About PointFive
PointFive redefines how enterprises continuously optimize cloud, infrastructure, and AI environments. By combining a real-time cloud and infrastructure data fabric with AI-driven detection and guided remediation, PointFive transforms efficiency from a reporting exercise into an operational discipline. Customers achieve sustained improvements in cost, performance, reliability, and engineering accountability at scale.
To learn more, book a demo.