Atmos Pro Logo

Atmos Pro

ProductPricingDocsBlogChangelog
Create Workspace
← Back to Changelog
Enhancement

Stress Tested at Scale: 2,000+ Components

Erik Osterman

Erik Osterman

CEO & Founder of Cloud Posse

|April 3, 2026

Older

In-App Feedback Widget

Newer

Native GitHub Run ID Linkage

Erik Osterman

Erik Osterman

CEO & Founder of Cloud Posse

Erik is the founder of Cloud Posse and creator of Atmos. With over a decade of experience helping teams adopt Terraform at scale, he is passionate about open-source infrastructure tooling and developer experience.

Book a Meeting
Atmos Pro Logo

Atmos Pro

The fastest way to deploy your apps on AWS with Terraform and GitHub Actions.

GitHubTwitterLinkedInYouTubeSlack

For Developers

  • Quick Start
  • Example Workflows
  • Atmos Documentation

Community

  • Register for Office Hours
  • Join the Slack Community
  • Try our Newsletter

Company

  • About Cloud Posse
  • Security
  • Pricing
  • Blog
  • Media Kit

Legal

  • SaaS Agreement
  • Terms of Use
  • Privacy Policy
  • Disclaimer
  • Cookie Policy

© 2026 Cloud Posse, LLC. All rights reserved.

Checking status...

Battle Tested with 2,000+ Affected Components

We pushed Atmos Pro to its limits by running a single PR that affected over 2,000 components — the kind of monorepo-scale change that stress tests every layer of the system. The goal was to validate that dispatch, GitHub API interaction, and status reconciliation all hold up under real-world load. They didn't, at first. So we fixed them.

What We Found and Fixed

The original dispatch architecture processed all workflow dispatches sequentially in a single step. At 2,000+ stacks, that exceeded serverless function timeouts and left hundreds of stacks permanently stuck. GitHub's secondary rate limits compounded the problem by silently blocking PR comment updates after dozens of rapid edits.
We redesigned the dispatch pipeline around a fan-out architecture. A thin coordinator step now emits one event per stack, and independent workers handle each dispatch with their own timeout budget, retries, and error isolation. Per-repository concurrency limits prevent GitHub API rate limiting, and adaptive comment debounce scales update frequency based on deployment size — faster feedback for small changes, throttled updates for large ones.
The reconciliation system now catches stacks in every status, including those that were never dispatched due to a crash or lost event. A periodic sweep resolves stale runs by checking their actual status on GitHub and updating the PR comment accordingly.

Performance at Scale

Processing 2,000 stacks now completes in roughly 100 seconds with a concurrency limit of 20 workers. Each worker independently dispatches a single workflow, updates the database, and triggers a debounced comment refresh. If any individual dispatch fails, only that stack is affected — every other stack continues without interruption.