Building PointFive: The Human Side of Being On-Call
Ilai Fallach
April 22, 2025
Share
At PointFive, we’re all about building systems that scale, processes that support our work, and a culture that empowers. But there’s one area that bridges all three: the on-call process. This isn’t just about handling alerts—it’s about how we, as engineers and teammates, respond to challenges, learn from them, support each other and build a better platform and organization along the way.
In this post, we’ll focus on the human side of on-call, exploring how we prioritize team well-being, foster growth, and create a sustainable approach to incident response.
Why Being On-Call Matters
When things go wrong, the way you respond defines your team, your culture, and ultimately, your product. But being on-call is about more than keeping the system running:
It Builds Resilience: Every alert is a stress test—not just for the platform but for the processes and tools we’ve built to support it.
It Drives Ownership: On-call engineers are the first line of defense, which creates a sense of responsibility for the systems they work on.
It Encourages Learning: Each incident offers a chance to understand what went wrong, how to fix it, and how to prevent it from happening again.
It Strengthens Trust: By ensuring someone is always watching over the system, we build trust with our customers and within our team.
The Toolbox We Use to Manage On-Call
The right tools make a big difference. Here’s what we use:
incident.io: Handles scheduling, paging, and incident coordination. Engineers can declare, manage, and resolve incidents from Slack.
Grafana Cloud: Our central source of truth for alerting and observability. Engineers use it to monitor system health, dig into metrics, and quickly trace issues. They log their observations and ideas at the end of each day in what we call the On-Call Captain’s Log.
Sentry: Error monitoring and performance tracking. Helps identify and diagnose bugs, track exceptions, and measure application performance across our codebase in real-time, allowing engineers to quickly respond to issues affecting users.
These tools empower the team, but the real magic is in how we use them.
How We Support Our Engineers
Being on-call is demanding, and at PointFive, we’ve worked hard to make it manageable and even fulfilling. Here’s how we create an environment where engineers can thrive:
1. Clear Expectations
Defined Responsibilities: When an engineer is on-call, their responsibilities include handling incidents, refining alerts, and responding to customer issues, rather than regular sprint work. This ensures focus and avoids context switching.
Structured Handovers: Weekly rotation schedules and Monday morning handover meetings ensure continuity and shared knowledge.
2. Reducing Noise
Alert Optimization: We filter out duplicate and low-priority alerts to minimize interruptions. Every alert is actionable.
Runbooks for Clarity: Each common incident is paired with a step-by-step guide, so engineers don’t have to reinvent the wheel under pressure.
3. Fostering Autonomy
Engineers are empowered to make decisions, whether it’s fixing an issue on the spot or escalating to a broader group.
They’re encouraged to improve alerts, dashboards, and runbooks during their shift, leaving the system better than they found it.
4. Prioritizing Well-Being
Work Hours Boundaries: Shifts run only during working hours to respect personal time and reduce burnout.
Support from Peers: Engineers know when to ask for help, whether it’s from their team or escalating to pod leaders.
The Growth Opportunities of Being On-Call
On-call isn’t just a responsibility–it’s a growth accelerator.
Deep System Knowledge: There’s no faster way to understand the intricacies of a system than being on the front lines during an incident.
Decision-Making Under Pressure: On-call engineers learn to assess, prioritize, and act quickly—skills that are invaluable in any role.
Continuous Improvement: Post-incident retrospectives encourage engineers to reflect, learn, and propose changes to prevent similar problems in the future.
A Day in the Life of an On-Call Engineer
To give you a sense of what this looks like, here’s a typical day:
Daytime: Handle alerts, update runbooks, refine Grafana dashboards, and support customer teams.
End of Day: Update the on-call Captain’s Log — a living doc where each on-call engineer captures what they saw, thought, did, and learned. This helps us identify friction, improve tooling, and evolve the system.
The Culture of On-Call at PointFive
At the heart of our on-call process is our culture of trust and collaboration. We know that every engineer has their strengths, and we trust them to make decisions, solve problems, and improve the system. And when things get tough, we rely on each other to step in and help.
We also celebrate the wins—whether it’s resolving a major incident, implementing a smarter alerting system, or simply making the next on-call shift easier for the team.
Closing Thoughts
Being on-call is an opportunity to make an immediate impact, learn deeply, and build resilience—not just for the system but for yourself and your team. At PointFive, we see it as a cornerstone of our engineering culture and an integral part of building a scalable, reliable platform.
What does your on-call process look like? How do you balance the challenges and opportunities? We’d love to hear your thoughts!
What’s next?
In our next post, we’ll discuss how to successfully onboard new team members in a complex environment.