Skip to content
Diosh Lequiron
Systems Thinking14 min read

Why Resilient Systems Beat Efficient Systems Under Real Conditions

Efficiency optimizes against modeled conditions. Resilience survives real ones. Why resilience-first design outperforms in delivery operations, with evidence from enterprise programs and portfolio operations.

In 19 years of directing program delivery across more than ten countries — enterprise IT at OpenText and HPE, outsourced operations at Full Potential Solutions, an Australian agency network in structural distress, a US health and nutrition brand in recovery, and a US startup scaling from eight thousand dollars a month to over five hundred thousand — I have observed one operational pattern with unusual consistency. The systems that were designed to be efficient almost always failed under real conditions. The systems that were designed to be resilient almost always survived them, even when their efficiency numbers looked worse on paper.

This is not a criticism of efficiency. Efficiency is a real property, and in stable conditions it produces real value. The problem is that the conditions under which operational systems actually run — staff turnover, market volatility, supplier disruption, leadership change, scope shift — are not stable. The conditions against which most efficiency optimizations are designed exist in the modeling spreadsheets and almost nowhere else. When the system encounters real operational conditions, the efficiency-maximized design is exactly the one that is least equipped to absorb them.

Resilience looks wasteful in a spreadsheet. It has redundancy, slack, documented escalation paths, cross-training, and margins that an efficiency-focused review will flag as overhead. Under stable conditions, the efficiency-optimized system wins. Under real conditions, it does not. The efficient system produces better numbers right up until the moment it does not produce at all, and then someone has to explain what happened.

This article explains why efficiency-first design produces brittleness, what resilience actually looks like in operational practice, the evidence from delivery operations across multiple industries, and the boundary conditions where efficiency is still the correct objective.


Why Efficiency-First Design Fails

Three structural patterns account for most of the operational failures I have diagnosed where the root cause was efficiency optimization in the wrong direction. They appear in enterprise programs, agency operations, and venture portfolios with very similar mechanics.

The Removed-Margin Failure. Efficiency optimization almost always removes margin. The logic is: this margin is underutilized, therefore it can be removed without affecting output. The logic is correct under the conditions it was modeled against. The margin exists, however, not because the current load requires it, but because variance in load requires it. When the variance arrives — and it always arrives — the margin that was removed is the margin that would have absorbed the spike. The system that had two hours of slack in its delivery cadence operates fine until the week a critical staff member is unavailable, and then the two hours that had been called "underutilized capacity" turn out to have been the entire absorption capacity for unplanned variance.

At one of the multi-million-dollar programs I directed, the delivery team had been through three consecutive efficiency reviews. Each review had removed a different margin — slack in the sprint cadence, redundancy in the integration team, buffer in the release calendar. Each review's removal had been justified against the current load. The fourth quarter of that year, a single vendor delivery slipped by two weeks, and the program missed two major milestones in sequence because there was no longer any margin anywhere in the system to absorb the slip. The efficiency reviews had not made the program faster. They had made it fragile. The program still ran at the same rate. It just no longer had any ability to handle a deviation.

The Optimized-Handoff Failure. Efficiency optimization tends to tighten handoffs. The logic is: there is waiting time between stages, therefore the stages can be coupled more tightly. The tighter coupling does reduce waiting time under normal conditions. It also removes the isolation that would have prevented a failure in one stage from cascading into the next. A loosely coupled pipeline has waiting time and absorbs failures. A tightly coupled pipeline has less waiting time and propagates failures at the speed of the coupling.

I observed this pattern in the Australian agency network before the structural intervention. Individual offices had optimized their project handoffs internally — design straight to development, development straight to delivery, delivery straight to close — with minimal buffer between stages. The handoff time had been compressed. So had the failure-absorption capacity. When a single design deliverable was late, the entire downstream pipeline for that project slipped, and the office's capacity for other projects in parallel collapsed for weeks. The internal efficiency metrics looked good. The portfolio stability was gone. Losses of twenty to sixty percent were the aggregate result across offices of exactly this pattern playing out at scale.

The Shared-Resource Contention Failure. Efficiency optimization often increases utilization of shared resources. The logic is: a resource at fifty percent utilization can be pushed to eighty percent without adding capacity, which is more efficient. The logic is correct under average conditions. Under peak conditions — which arrive regularly in any real operation — the highly utilized shared resource becomes a contention point. Every process that needs it queues, the queue lengthens exponentially as utilization approaches one hundred percent, and the average case numbers conceal a worst case that is orders of magnitude worse than the average.

This is a classic queuing theory result and it plays out constantly in operational systems. A shared delivery team at ninety percent utilization produces reasonable average throughput and catastrophic variance. The same team at seventy percent utilization produces lower peak throughput and dramatically better variance. Which one is more efficient depends entirely on whether efficiency is measured by average or by reliability. Most efficiency reviews measure by average, and most operational failures occur in the tail that the average conceals.

These three patterns do not respond to more efficiency optimization. Adding another efficiency review cycle removes more margin, tightens more handoffs, and pushes utilization higher. The problem is not insufficient optimization. The problem is that optimization in this direction reduces the system's ability to handle the conditions it actually operates under.

The problem is architectural.


The Resilience Architecture

Resilience is not the absence of efficiency. It is a different objective function. An efficiency-optimized system is designed to maximize output under expected conditions. A resilience-optimized system is designed to maintain acceptable output under the full distribution of conditions the system will actually encounter — including the ones that were not in the model.

There are four design principles that consistently produce resilient operational systems, and three of them look like overhead until the moment they stop looking like overhead.

Preserved Margin as Structural Policy

Margin is the capacity a system holds in reserve against variance. Efficiency optimization targets margin because it appears underutilized. Resilience-optimized systems treat margin as a structural policy: a specified percentage of capacity is reserved against variance, and that percentage is maintained even when short-term pressure suggests it could be removed.

The critical design decision is that the margin policy is enforced structurally, not negotiated per decision. If the policy is that integration teams maintain twenty percent capacity in reserve, the reserve is protected at the staffing and scheduling level — not at the quarterly review level. If margin protection requires a human to say no to a load request every cycle, the margin will eventually be negotiated away. If margin protection is the default output of the scheduling architecture, it persists.

In the governance framework I run across the 18 ventures under HavenWizards 88 Ventures OPC, the operational reserves are policy, not discretion. Each venture runs with explicit reserve at the staffing level and at the operational-cadence level. The reserve is visible, it is named, and it is structurally protected. Individual pressure events can consume it temporarily. The default state it returns to is the reserve-intact state. This is what lets the portfolio absorb the volatility that a portfolio of 18 ventures produces continuously.

Loose Coupling Between Stages

The second principle is that stages in an operational pipeline should be loosely coupled, with explicit buffers, explicit handoffs, and explicit fallback paths. Loose coupling reduces efficiency in the normal case because it adds waiting time and buffer content. It increases resilience in the abnormal case because a failure in one stage does not cascade at the speed of the coupling.

The design question is not whether to couple. It is how loosely to couple, and on which interfaces the loose coupling matters. Internal stages of a single team, operating on a single artifact, may be tightly coupled without cost. Cross-team handoffs, cross-organizational boundaries, and interfaces with external dependencies should be loosely coupled by default. The rule of thumb I have used consistently: the more the interface crosses an authority boundary, the more loose coupling pays off.

This principle is visible in the delivery governance framework I put in place for the Australian agency recovery. Internal office operations retained efficient coupling within the office — designer to developer to QA inside the same team. Cross-office handoffs, which had previously been coupled tightly with no buffer, were restructured with explicit buffer, explicit handoff contracts, and explicit escalation paths. The internal office efficiency did not change much. The cross-office resilience changed dramatically, and the portfolio stability followed.

Documented Escalation Paths

The third principle is that every operational system needs documented escalation paths — explicit routes by which a problem moves from the place it was detected to the place it can be resolved, without depending on individual judgment about who to call. An efficiency-optimized system tends to rely on informal escalation because informal is faster. A resilience-optimized system pays the cost of documented escalation because documented is structural.

The payoff arrives when the informal network breaks. A key person leaves. A team reorganizes. A crisis arrives when the people who know the informal routes are unavailable. The informal system, which had been running fine, suddenly has no way to escalate because the routes were in people's heads and the people are gone. The documented system operates identically during the crisis as during normal conditions, because the routes are structural rather than personal.

I have seen this pattern across every enterprise program I have directed. The programs that had documented escalation paths survived leadership changes, staff rotations, and external disruptions. The programs that relied on informal escalation — even when the informal escalation was operating well — collapsed when the people running the informal network were no longer present. The documented path looks like bureaucracy until it looks like the only mechanism that is still working.

Cross-Training as Structural Policy

The fourth principle is that operational systems should structurally require cross-training — at a minimum, every critical role has at least one backup who can execute the role adequately during a disruption. Efficiency-optimized systems resist cross-training because it requires investment in capacity that is not currently being used. Resilience-optimized systems treat cross-training as a structural requirement, enforced at the staffing and scheduling level.

The structural requirement is what makes this stick. If cross-training is a nice-to-have, it will be deferred under pressure — because it looks exactly like the kind of overhead that efficiency optimization targets. If cross-training is a policy that every role must have structural coverage, the coverage persists even during periods when it is being eroded by other pressures.

In scaling the US startup I supported from eight thousand dollars a month to over five hundred thousand, one of the structural interventions was that no critical operational role could be staffed without an explicit cross-training plan for at least one backup. The growth rate would have made any single point of failure catastrophic. Enforcing cross-training as a staffing policy produced a distribution of operational knowledge that absorbed the disruptions an eighteen-month 6,150% growth cycle guaranteed would occur.


Operational Evidence

Scale. Across 18 ventures operating under HavenWizards 88 Ventures OPC — including Bayanihan Harvest with its 66 modules, CapitalWizards, 143BasketballHaven, SawasdeeTalk, the Autism Parenting System, WhimsyAI, and more — the shared operating model enforces all four resilience principles as structural policy. Margin is named and protected. Handoffs across venture boundaries are explicitly buffered. Escalation paths are documented. Cross-training is a staffing requirement. The efficiency of any individual venture is measurably lower than it could be if the resilience policies were removed. The portfolio-level stability that those policies produce is what allows 18 ventures to operate in parallel without a failure in one cascading into the others.

Recovery. The Australian digital agency network had been optimized for efficiency across multiple offices — tight internal coupling, high utilization of shared resources, minimal margin in delivery schedules — and had been losing between twenty and sixty percent across those offices for more than a year. The intervention that reversed losses to between forty and sixty percent profit was structurally a resilience intervention. Cross-office handoffs were loosely coupled with explicit buffer. Shared resources were brought down from ninety percent plus utilization to seventy to eighty percent. Escalation was documented. The internal efficiency metrics of individual offices declined slightly. The portfolio profitability changed direction within a single intervention cycle.

Prevention. The US health and nutrition brand that recovered from losses of forty percent to profits of sixty percent had a similar structural diagnosis. The fulfillment operation had been optimized for cost efficiency with minimal inventory margin, tight supplier coupling, and no cross-trained backup for key operational roles. Each of those decisions looked efficient on paper. Each of them was producing failure in the real operational conditions — supplier delays, demand spikes, staff turnover — the operation actually encountered. The intervention was not a cost intervention. It was a resilience intervention. Cost efficiency dropped slightly. Operational stability recovered. The financial recovery followed from the operational stability, not the other way around.

Compounding. Over the first eighteen months of operating 18 ventures under shared resilience architecture, the number of ventures that experienced a disruption requiring portfolio-level attention decreased meaningfully, even as the number of ventures and their aggregate operational load increased. The architecture was not preventing disruptions. Disruptions arrive regardless. The architecture was absorbing them at the venture level before they escalated to the portfolio level. This is the compounding effect of resilience design: the portfolio's capacity to handle volatility increases over time even as the portfolio's exposure to volatility increases. An efficiency-optimized portfolio does the opposite — its capacity to handle volatility decreases as its complexity increases, which is why efficiency-optimized portfolios have a natural size ceiling that resilience-optimized portfolios do not.


Where This Does Not Apply

Resilience architecture has costs. It is not the right default for every context, and using it well requires knowing where efficiency is still the correct objective.

Stable, commoditized operations. Some operations run in genuinely stable conditions against predictable demand. High-volume commoditized production with long-established supply chains and minimal variance can be optimized for efficiency with reasonable confidence that the efficiency gains will hold. The resilience premium is real overhead in those conditions, and paying for it produces no proportional return. The question is whether the stability is genuine or assumed. Most operations that appear stable are stable only against the variance they have already encountered, which is not the same as stable against the variance they will encounter.

Short time horizons. Resilience architecture pays off over time. If the operation only needs to run for a short defined period — a time-boxed campaign, a one-off event, a short-duration engagement — efficiency optimization may be the correct objective, because the variance that resilience absorbs may not arrive within the operating horizon. The cost of resilience is real during the period the operation runs. If the period is short enough, the resilience investment may not recover its cost before the operation ends.

Reversible, low-stakes decisions. Resilience architecture is disproportionately valuable when the cost of failure is high and the cost of recovery is high. For operations where failure is cheap and recovery is fast — prototypes, experiments, reversible decisions — efficiency is the correct objective. The cost of a failure is bounded and the recovery is immediate. Adding resilience architecture to an experimental context is overhead that does not correspond to the actual risk profile.

Competitive constraints that make efficiency binding. In some operational contexts, cost efficiency is the binding constraint of the market. Margins are thin enough that carrying resilience overhead is not an option — the operation must run at the efficiency frontier to remain viable at all. In those contexts, the response is not to impose resilience architecture against the market. It is to recognize that the operation is running against a binding efficiency constraint, to accept that the system will be brittle against variance, and to make the brittleness explicit and strategic rather than hidden.


The Principle

Efficiency is measured against modeled conditions. Resilience is measured against real conditions. The two diverge — sometimes modestly, sometimes catastrophically — depending on how far the real conditions deviate from the modeled ones. In any operational context with meaningful variance, the efficiency-optimized system produces better numbers under the modeled conditions and worse outcomes under the real ones.

The discipline is to recognize that variance is not an exception. It is a permanent property of the environment the system operates in. Staff turn over. Markets shift. Suppliers miss deadlines. Leadership changes. Scope moves. A system that is designed only against the modeled base case is a system that is designed not to encounter the conditions it will actually run under. Optimizing that system further, against the same modeled base case, makes it worse — not better — at handling the environment it will actually meet.

The operators I trust most are the ones who can look at a proposed efficiency gain and ask, quietly: what variance is this removing the system's ability to absorb, and what is the cost of the first event that the removed absorption would have handled? That question is not resistance to improvement. It is the only question that separates optimization that improves the system from optimization that compresses it into a shape that cannot survive the conditions it was built for.

ShareTwitter / XLinkedIn
Diosh Lequiron
Diosh Lequiron
Systems Architect · 19+ years designing operating systems for complexity across technology, education, agriculture, and governance.
About

Explore more

← All Writing