The Codingdash Test: 12 Steps to Better Engineering

A 2026 update to Joel Spolsky's classic test for evaluating software team health.

Ritesh Shrivastav
· 18 mins read

How do you know if an engineering team is healthy? In 2000, Joel Spolsky—co-founder of Stack Overflow and Trello—proposed a deceptively simple answer: 12 yes/no questions you could answer in 3 minutes. Do you use source control? Can you make a build in one step? Do programmers have quiet working conditions? The “Joel Test” became a quick litmus test for engineering culture, widely used in hiring and team assessments for over two decades.

Twenty-six years after Joel Spolsky published his famous test, software development looks nothing like the Y2K era. Source control isn’t optional—it’s oxygen. “Daily builds” sound quaint when elite teams deploy multiple times per hour. The original Joel Test helped a generation of developers spot dysfunction, but asking “do you use source control?” in 2026 is like asking if you have electricity.

This post brings the same spirit—simple yes/no questions you can answer in 3 minutes—to modern cloud-native, platform-engineered, AI-augmented software teams. I’m calling it the Codingdash Test (after this blog), though the name matters less than whether it helps you spot problems. A score of 12 is excellent. 10-11 is solid. Below 10, and you’ve got work to do.

The Codingdash Test

The Codingdash Test

  1. Can you deploy to production in under 15 minutes from a code commit?
  2. Do you practice trunk-based development with branches living less than a day?
  3. Is your infrastructure defined as code and versioned alongside application code?
  4. Do developers have self-service access to environments, deployments, and observability?
  5. Can a new developer ship code to production on their first day?
  6. Do you have automated security scanning in every CI/CD pipeline?
  7. Can you trace a single request end-to-end across your entire system?
  8. Do you have documented SLOs with error budgets that actually influence prioritization?
  9. Do you conduct blameless postmortems for every significant incident?
  10. Are code reviews completed within 24 hours with PRs under 200 lines?
  11. Do you have AI coding tools available to all developers who want them?
  12. Is technical debt tracked and allocated dedicated engineering time?

Why These 12 Questions Matter

1. Can you deploy to production in under 15 minutes from a code commit?

This is the heartbeat of modern software delivery. The 2024 DORA report shows elite teams have lead times under one hour from commit to production—some under 15 minutes. Low performers? They measure in weeks.

This question replaces Joel’s “Can you make a build in one step?” because building is no longer the bottleneck. Deployment is the moment of truth. If your code takes days to reach production, you’re flying blind on whether it actually works. You’re also paying the compound cost of slow feedback loops—bugs that could have been caught in minutes fester for weeks.

The 15-minute threshold isn’t arbitrary. It’s the point where deployment becomes cheap enough to do many times per day, which fundamentally changes how teams write software. Small batches. Fast feedback. Fewer rollbacks.

A “yes” requires: automated CI/CD pipelines, minimal manual approvals, infrastructure that can receive deployments quickly, and tests that run in minutes, not hours.

2. Do you practice trunk-based development with branches living less than a day?

DORA research confirms that elite performers are 2.3x more likely to use trunk-based development than low performers. Long-lived feature branches are where velocity goes to die.

When branches live for days or weeks, you’re just delaying the pain of integration. The conflicts pile up. The merge becomes a project unto itself. Teams that merged small changes frequently to trunk see roughly 10-hour reductions in PR time compared to long-lived branches.

This question captures multiple modern best practices in one: continuous integration (real CI, not “we run tests eventually”), small batch sizes, and the discipline to keep trunk deployable at all times.

A “yes” requires: feature flags to ship incomplete work safely, a culture where developers merge at least daily, and automated testing comprehensive enough to keep trunk green.

3. Is your infrastructure defined as code and versioned alongside application code?

The 2024 DORA report found that organizations at higher levels of technical maturity “maintain infrastructure definitions with similar rigor to application code”—meaning version control, code review, automated testing, and documentation.

Infrastructure as Code isn’t just a convenience. It’s an audit trail, a disaster recovery plan, and a communication mechanism all in one. When infrastructure lives in Terraform, Pulumi, or CloudFormation alongside your app code, new team members can understand your architecture by reading the repository. Environments can be reproduced exactly. Drift becomes detectable.

The “versioned alongside” part matters. Infrastructure buried in a separate wiki or click-ops console doesn’t count. If a developer can’t see what infrastructure their code will touch in the same PR, you’ve created a coordination bottleneck.

A “yes” requires: Infrastructure defined in Terraform, Pulumi, CDK, or similar tools; stored in version control; and subject to code review before changes go live.

4. Do developers have self-service access to environments, deployments, and observability?

Platform engineering has gone from emerging trend to table stakes. Gartner predicts 80% of large software organizations will have platform teams by 2026, up from 45% in 2022. The reason is simple: cognitive load kills velocity.

When developers need to file tickets to get a test environment, wait for DevOps to check their logs, or ask permission to deploy their own code, you’re adding days of latency and interrupting the people who should be building platform capabilities, not handling requests.

Self-service doesn’t mean chaos. It means golden paths with guardrails—pre-approved workflows that let developers move fast while staying safe. Spotify’s internal developer platform made users 2.3x more active and able to deploy twice as often. That’s the target.

A “yes” requires: developers can spin up environments, trigger deployments, and access logs/traces/metrics without tickets or waiting for another team.

5. Can a new developer ship code to production on their first day?

This is the ultimate litmus test for developer experience. The State of Developer Experience 2024 found that 69% of developers lose 8+ hours weekly to inefficiencies—that’s one full day. New developers feel this pain most acutely.

Elite teams target first-day commits, with first production PR by day 2 or 3. This isn’t hazing—it’s a forcing function for everything else: documentation quality, environment setup, onboarding processes, and codebase navigability.

If your new developers are still setting up their laptops after a week, your entire developer experience is probably in rough shape. The first-day-to-production metric exposes broken onboarding, missing documentation, and environment complexity all at once.

A “yes” requires: pre-configured development environments (containers, cloud dev environments, or one-command setup), clear “first task” onboarding paths, and documentation good enough to navigate without tribal knowledge.

6. Do you have automated security scanning in every CI/CD pipeline?

The days of security as a phase are over. The average data breach now costs $4.88 million. Fixing a vulnerability in design costs roughly $80 versus $7,600 post-deployment. Shift-left security isn’t optional anymore—it’s basic hygiene.

This question captures several modern requirements: SAST (static analysis), dependency scanning (SCA), secret detection, and container scanning. The EU Cyber Resilience Act now legally requires SBOMs for products sold in Europe. Supply chain attacks increased 156% year-over-year through 2024, with over 700,000 malicious packages identified.

The “every pipeline” part matters. Security scanning that only runs occasionally, or requires manual invocation, finds vulnerabilities too late. When it’s automated and blocking, developers fix issues before they merge.

A “yes” requires: automated SAST, dependency scanning, and secret detection running on every build, with blocking thresholds for critical vulnerabilities.

7. Can you trace a single request end-to-end across your entire system?

Modern observability is the difference between “something is slow” and “this specific database query in this specific service is slow because of this specific data pattern.” Distributed tracing is no longer a nice-to-have—it’s how you debug microservices.

OpenTelemetry has emerged as the industry standard, now the second largest CNCF project behind Kubernetes. Organizations investing in both Prometheus and OpenTelemetry are betting on vendor-neutral telemetry as the foundation.

Traditional monitoring tells you what is broken. Observability tells you why. When your architecture spans dozens of services and you can’t trace requests across them, you’re debugging with blindfolds.

A “yes” requires: distributed tracing implemented across all critical services, with the ability to follow a single user request through your entire system.

8. Do you have documented SLOs with error budgets that actually influence prioritization?

“100% reliability is the wrong target,” Google’s SRE book famously states. “Even if you could achieve it, your customers won’t experience 100% reliability. The chain between you and them is long and complex.”

SLOs (Service Level Objectives) with error budgets transform reliability from a vague goal into a concrete decision-making framework. A 99.9% SLO gives you 43 minutes of downtime budget per month. When you’ve burned through it, feature work stops and reliability work takes priority. When budget remains, you can take risks.

The “actually influence prioritization” clause is critical. Many teams have SLOs written down somewhere that no one looks at. Real SLOs create tension between speed and stability that gets resolved through explicit tradeoffs, not heroics.

A “yes” requires: documented SLOs for key user journeys, tracked error budgets, and a written policy for what happens when budgets are exhausted.

9. Do you conduct blameless postmortems for every significant incident?

Google’s SRE book states it plainly: “For a postmortem to be truly blameless, it must focus on identifying contributing causes without indicting any individual for bad behavior.”

Blameless postmortems aren’t about being soft. They’re about being effective. When people fear blame, they hide mistakes. Hidden mistakes become systemic failures. Airlines and hospitals learned this decades ago; software is catching up.

The “every significant incident” part matters. Teams that only do postmortems for spectacular failures miss the patterns that predict the next spectacular failure. Triggers should include: user-visible downtime, data loss, manual intervention required, or monitoring failing to detect issues.

A “yes” requires: a documented postmortem process, action items tracked to completion, and postmortems shared broadly (not buried in wikis).

10. Are code reviews completed within 24 hours with PRs under 200 lines?

Code review effectiveness “nosedives” once PRs cross 400 lines, and optimal PRs are around 50-200 lines. Meanwhile, 50% of PRs sit idle for half their lifetime waiting for review. Both patterns kill velocity.

This question combines two practices that research shows matter: keeping PRs small, and reviewing them quickly. Meta’s research found that 75th-percentile review times correlate tightly with developer happiness. Elite teams target sub-6-hour turnaround on reviews.

The 24-hour threshold is a ceiling, not a goal. Fire alerts when pickup time exceeds one day. The 200-line limit forces decomposition of large changes into reviewable chunks—which also improves the quality of the review itself.

A “yes” requires: median PRs under 200 lines of meaningful code changes, and median review completion within 24 hours.

11. Do you have AI coding tools available to all developers who want them?

97% of developers have used AI coding tools at some point, and 76% use them at least weekly. GitHub reports 55% faster task completion with Copilot. Ignoring AI tools in 2026 is like refusing to use an IDE in 2000.

The caveat is “who want them”—this isn’t about mandating AI usage, but making it available. Some tasks benefit enormously (boilerplate, test generation, documentation). Others require human judgment (architecture, security-critical code). Developers should choose.

AI also creates new risks. Studies show 30-45% of AI-generated code contains vulnerabilities without human oversight. AI-assisted code needs review tailored to AI-specific failure modes—hallucinated dependencies, security antipatterns, subtle logic errors.

A “yes” requires: AI coding assistants available to all developers, with clear guidelines on appropriate use and review requirements for AI-generated code.

12. Is technical debt tracked and allocated dedicated engineering time?

Stripe research found engineers spend roughly one-third of their time managing technical debt instead of building features. McKinsey reports tech debt accounts for 20-40% of technology estate value. This is not a problem you can ignore.

High-performing organizations dedicate 20-25% of engineering capacity to debt reduction—Marty Cagan calls it “taking 20% off the top for engineering to spend as they see fit.” The key is tracking debt explicitly (not just grousing about it) and allocating real time (not “someday”).

Debt isn’t inherently bad—it’s a choice with tradeoffs. But untracked debt compounds invisibly until it becomes a crisis. Teams that make debt visible and systematically pay it down avoid the sudden rewrites that derail roadmaps.

A “yes” requires: technical debt tracked in the same system as feature work, with a defined percentage of capacity reserved for addressing it.


What the Original Joel Test Got Right

Joel’s original test worked because it was simple, opinionated, and immediately actionable. You didn’t need a consultant to interpret your score. You didn’t need to gather metrics for six months. You could answer 12 yes/no questions in three minutes and know whether you had serious problems.

The specific questions reflected the common failure modes of 2000: teams without source control, builds that required mysterious incantations, developers without good chairs. Some questions were prescient—”do programmers have quiet working conditions?” remains relevant even as open offices proved disastrous.

Other questions haven’t aged well. “Do you have testers?” assumes dedicated QA roles that modern teams have largely absorbed into developer responsibilities. “Do you have a spec?” clashes with iterative development. “Do you use source control?” is no longer distinguishing—it’s universal.

Several people have attempted updates over the years, including The Jonathan Test focused on DevOps practices and various modernizations addressing CI/CD and cloud workflows.


What’s Different About Modern Teams

The 2026 engineering landscape differs from 2000 in several fundamental ways:

Cloud-native architecture means infrastructure is code, deployment is continuous, and systems are distributed. The feedback loop from commit to production should be measured in minutes, not weeks.

Platform engineering has emerged to manage complexity. With microservices, Kubernetes, observability stacks, and security requirements, individual developers can’t be experts in everything. Platform teams provide self-service abstractions—golden paths that embed best practices.

AI has entered development as a genuine productivity multiplier—when used well. The 55% task completion improvements are real, but so are the security vulnerabilities and quality issues. Teams need policies, not just access.

Reliability has become measurable through SLOs, error budgets, and DORA metrics. We can now quantify engineering health in ways Joel couldn’t in 2000.

Security is non-negotiable. Supply chain attacks, data breach costs, and regulatory requirements (like the EU Cyber Resilience Act) have moved security from “nice to have” to legal obligation.


The Benchmarks Behind the Questions

Area Elite Performance Source
Deployment frequency Multiple times per day DORA 2024
Lead time for changes Under 1 hour DORA 2024
Change failure rate Under 15% DORA 2024
Recovery time Under 1 hour DORA 2024
Branch lifetime Under 1 day Graphite
PR review turnaround Under 6 hours Augment Code
Time to first commit (new dev) Under 1 day DevEx benchmarks
Developer time lost to inefficiency Under 4 hours/week Atlassian/DX
Engineering capacity for tech debt 20-25% Practical Engineering

How to Use This Test

Like Joel’s original, the Codingdash Test is a quick diagnostic, not a comprehensive maturity model. A score of 12 doesn’t guarantee success—you might have a great engineering culture building a product nobody wants. A score of 6 doesn’t mean you’re doomed—plenty of successful companies started with rough practices and improved.

The real value is identifying specific gaps. Each “no” points to a concrete improvement area. Unlike vague advice to “improve engineering culture,” these questions have actionable remedies:

  • Failing on deployments? Start with your pipeline bottlenecks.
  • Failing on developer experience? Time your new developer setup process and fix what you find.
  • Failing on observability? Pick OpenTelemetry and start instrumenting.

The test also works as a hiring signal. Candidates asking about your Codingdash score (or equivalent) are the ones who’ve seen dysfunction and know what to avoid. Companies willing to share their scores are signaling transparency about engineering culture.


Scoring Your Team

Give yourself 1 point for each “yes.” Be honest—partial credit is cheating yourself.

  • 12 points: Excellent. You’re among the elite. Focus on maintaining your edge and mentoring others.
  • 10-11 points: Solid foundation with room to improve. You’re probably competitive for top engineering talent.
  • 7-9 points: Functional but struggling. Your best engineers are likely frustrated. Pick two areas and invest seriously.
  • 4-6 points: Significant dysfunction. Development is probably painful. Consider dedicated improvement initiatives.
  • 0-3 points: Critical condition. Engineering is likely a bottleneck for the business. This needs executive attention.

Final Thought

Joel wrote in 2000 that “most software organizations are running with a score of 2 or 3, and they need serious help, because companies like Microsoft run at 12 full-time.”

Twenty-six years later, the baseline has shifted. What was elite in 2000 is table stakes now. But the fundamental insight remains: good engineering practices compound, and you can spot their presence or absence with simple questions.

The teams that will win in 2026 aren’t the ones with the most sophisticated tools or the biggest budgets. They’re the ones that do the basics exceptionally well—deploying frequently, keeping code healthy, learning from incidents, and making developers productive from day one.

That’s what the Codingdash Test measures. How’s your score?


References

  1. Spolsky, Joel. “The Joel Test: 12 Steps to Better Code.” Joel on Software, August 2000.

  2. Google Cloud. “Announcing the 2024 DORA Report.” Google Cloud Blog, 2024.

  3. Atlassian. “DORA Metrics: How to Measure DevOps Success.” Atlassian DevOps Guide.

  4. Graphite. “Trunk-based Development: Why You Should Stop Using Feature Branches.” Graphite Blog.

  5. Roadie. “Platform Engineering in 2026: Why DIY Is Dead.” Roadie Blog.

  6. Sonatype. “10th Annual State of the Software Supply Chain Report.” GlobeNewswire, October 2024.

  7. OpenTelemetry. “Adopters.” OpenTelemetry Documentation.

  8. Dynatrace. “OpenTelemetry Trends: Catching Up with OpenTelemetry in 2025.” Dynatrace Blog.

  9. Google. “Implementing SLOs.” Site Reliability Engineering Workbook.

  10. Google. “Error Budget Policy.” Site Reliability Engineering Workbook.

  11. Google. “Postmortem Culture: Learning from Failure.” Site Reliability Engineering Book.

  12. Atlassian. “How to Run a Blameless Postmortem.” Atlassian Incident Management Guide.

  13. Graphite. “Code Review Best Practices.” Graphite Blog.

  14. Augment Code. “Code Review Best Practices That Actually Scale.” Augment Code Guides.

  15. GitHub. “Survey: The AI Wave Continues to Grow on Software Development Teams.” GitHub Blog.

  16. Peng et al. “The Impact of AI on Developer Productivity: Evidence from GitHub Copilot.” arXiv, 2023.

  17. Kaspersky. “Security Risks of Vibe Coding and LLM Assistants for Developers.” Kaspersky Blog, 2025.

  18. Leanware. “Technical Debt Management: Strategies & Best Practices.” Leanware Insights.

  19. Practical Engineering Management. “Optimizing Your 20% Time for Technical Debt.” Substack.

  20. Hall, Jonathan. “The Jonathan Test: 12 Steps to Better DevOps.” jhall.io, May 2021.

  21. Bennett, Steve. “An Updated Joel Test.” stevebennett.co.

  22. JSOC IT Blog. “DevSecOps in 2025: Shifting Security Left Without Slowing Down.” Medium.

« Catch the Train - A Story About Effort, Urgency, and Survival
Mastodon

Follow me on Twitter

I tweet about tech more than I write about it here 😀

Ritesh Shrivastav