Can you actually build implementation capacity (across states) in K-12 education?

My notes on a paper 10 years in the making.

Oct 30, 2025

An old brick building with a flag in front. — Photo by Nicholas Fuentes on Unsplash

In “Developing Implementation & Scaling Capacity in Education,” Dean L. Fixsen, Caryn S. Ward, and Karen A. Blase document a decade working with 10 states to build implementation capacity - the organizational infrastructure needed to support sustained education reform. They measured progress repeatedly and documented what worked.

If implementation capacity can be systematically built, it could explain why some reforms succeed while others with identical policies fail.

Active Implementation Research Network (AIRN) worked with states in two phases using “usability testing” - work with a small group, identify problems, refine the approach, test with the next group. Straight from Deming’s Plan-Do-Study-Act cycle, originally developed for industrial quality control.

Study 1 (2007-2012): Five states

All started at 0-40% implementation capacity
Two quit after 20-23 months
One stayed under 20% for five years
Two reached 60-70% capacity after five years

Study 2 (2014-2017): Five states with refined approach

All reached 60-75% capacity within 24 months
More consistent progress across different contexts
One ended after 26 months due to leadership change

The intervention involved building implementation teams at state, regional, and district levels. Monthly 2-3 day site visits. Sustained coaching. Twice-yearly capacity assessments.

They measured progress using a State Capacity Assessment (SCA) tracking three areas:

SMT Investment: Leadership commitment, coordination, resources
System Alignment: Official guidance documents, design team functioning
Commitment to Regional Implementation: Resources and support for regional teams

Scores range from 0% (nothing in place) to 100% (everything fully implemented). Benchmarks at 60% for “acquisition” and 80% for “proficiency” based on recommendations from fidelity assessment research.

The Core Idea: From Ghost Systems to Host Systems

The paper frames education reform around a useful concept: most systems are “ghost systems.”

Ghost system: Official policies exist but aren’t actually implemented. You have policies requiring evidence-based reading instruction, teachers get one-day workshops and no ongoing support. Standards exist on paper, infrastructure doesn’t.

Host system: Infrastructure exists to actually implement what policies say should happen. Literacy coaches in schools. Sustained training programs. Regular assessment and adjustment. The organizational capacity to do what you say you’re doing.

The researchers argue most education systems are ghost systems. They tried to build host systems.

This reframes the Mississippi question. Not just whether Mississippi used better reading practices (phonics vs whole language). Whether Mississippi built a host system that could actually implement practices with fidelity while other states remained ghost systems with policies on paper.

The Mississippi Miracle Doesn't Scale; Building Implementation Capacity Does

Dave Deek

Oct 2

Read full story

The complexity science angle:

The paper roots this in complexity theory: “Systems need internal flexibility to match external complexity.”

Meaning: 14,000 school districts across 50 states, each with different histories, politics, demographics, resources. A centralized mandated solution (like Common Core) can’t adapt to all that variation. But a system of linked teams that can respond to local conditions while maintaining coherent structure theoretically can.

If this theory is right, it (partially) explains why standardized reforms fail. Not because the practices are wrong, but because complex systems can’t implement standardized solutions without the local infrastructure to even adopt it.

What Failed, What Worked, and Why the Sequence Matters

The paper provides specific failure patterns from Study 1:

State #1 (stayed under 20% for 5 years):

State Transformation Specialists employed outside the education system
No access to internal communications or meetings
No authority to call meetings or approach units independently
Consultants without actual power

States #2 and #3 (quit after 20-23 months):

“Little or no access” to State Management Team (superintendent and cabinet)
Specialists segregated in particular units (Special Education, turnaround division)
Limited collaboration across the broader system
Isolated specialists without executive support

States #4 and #5 (reached 60-70% but took 5 years):

“Loosely configured regional groups” with “no meaningful role in supporting programs”
Only developed functional regional agencies in years 4-5
No middle layer between state and district

The Study 2 adjustments based on these failures:

Made executive commitment a precondition. Required monthly meetings with superintendent and cabinet before starting work.

Required specialists be full-time state employees in the superintendent’s office with access to all internal systems. No external consultants, no segregated units.

Verified functional regional agencies existed and were willing to participate. Turned away two interested states without viable regional infrastructure (possible selection bias!).

Shifted to “just enough, just in time” teaching methods instead of comprehensive front-loaded training.

The results: All five Study 2 states reached 60%+ capacity within 24 months versus the 5-year timeline in Study 1.

What data shows about sequencing:

Looking at how capacity develops in Study 2:

State #9 trajectory:

Month 5: Leadership 46%, System Alignment 20%, Regional 0%
Month 13: Leadership 100%, System Alignment 50%, Regional 25%
Month 22: Leadership 88%, System Alignment 40%, Regional 69%

Notice the sequence: Leadership commitment enables regional development, which eventually enables system-wide alignment.

This makes sense when you think about what each measures:

Leadership investment includes: Does the management team meet regularly? Do they provide resources? Do State Transformation Specialists have executive access? This can happen quickly once committed.

Regional commitment includes: Are regions allocated staff time? Do they have implementation teams? This requires negotiating with separate agencies and reallocating resources.

System alignment includes: Does the state have written guidance documents describing implementation supports? Do official policies require regional agencies to provide implementation support to districts? Policy changes applying statewide, beyond the transformation zone.

If this sequence holds across contexts, and Study 2 suggests it might, you can’t skip steps. You need executive commitment before you can build regional capacity. You need regional capacity before you can align system-wide policies.

The Transformation Zone Strategy

Rather than attempting statewide change immediately, they start with a “transformation zone” - a vertical slice from classroom to capitol:

State leaders and staff
Three regions
Three districts per region
Three schools per district
All teachers and students in participating schools

Why three? The paper gives us an example “Three factorial (3!) is 6 (1×2×3) and four factorial is 24. Six problems to solve each day may be manageable while 24 may be overwhelming.”

The logic: You need enough variation (three regions with different challenges) to learn and test adaptations quickly. But not so much variation that you can’t respond to problems effectively. Once you’ve solved problems in the transformation zone, add the next cohort. We see this in management cybernetics with Ashby’s Law of Requisite Variety.

How change supposedly works - “behaving a path”:

Here’s the mechanism they’re testing: “There really is no a priori way to ‘analyze a path’ to system change. However, there is a way to ‘behave a path’ to transformative system change.”

You can’t plan complex system change in advance because you don’t know which parts will resist or how they’ll interact. But you can disturb the system by trying to change it, observe what happens, and respond.

The paper describes this: “As implementation teams are developed and include members of various departments who have not traditionally worked together, previous ways of work are threatened... The results of those disturbances reveal apparent and previously unknown connections and lack of connections among system components. Previously unknown proponents and detractors suddenly appear... As soon as the reactions are known, actual facilitators can be strengthened and relevant impediments can be resolved.”

They call this “practice-policy communication cycle” - implementation teams at regional and district levels attempt to implement, encounter barriers (lack of resources, conflicting policies, unclear authority), surface those problems to state leadership, who then modify policies and resources.

If this mechanism works as described, it would explain why Mississippi succeeded. They discovered and resolved implementation barriers over a decade rather than mandating a solution upfront and wondering why it didn’t work.

Whether it actually works that way, we can’t tell from this paper. They describe the theory but don’t document specific examples - like a barrier that surfaced through regional teams, the state policy that changed in response, and the implementation that improved as a result.

What the Evidence Actually Shows

The paper explicitly states it’s testing three predictions. Let’s assess each:

Prediction 1: “Purposeful development of implementation capacity is possible in complex state education systems”

What they showed: All Study 2 states improved from 20-40% baseline to 60-75% within 24 months with intensive support. The State Capacity Assessment administered repeatedly showed consistent improvement.

Meaning: Organizational capacity metrics improved with intensive support across different state contexts.

What we don’t know: Whether this organizational capacity enabled better implementation of educational practices or improved student learning. The paper doesn’t measure student outcomes or list what was being implemented.

That’s a limitation but not necessarily disqualifying. If you’re testing whether organizational capacity can be built at all, demonstrating it can be built is the first step. Connecting it to outcomes is the next step.

Prediction 2: “Purposeful capacity development can be replicated across departments of education in states that are unique in terms of history, size, and operations”

What they showed: Study 2 states ranged from 8 to 56 regions and 175 to 590 districts across Western, Midwestern, and Northeastern US. All showed similar improvement trajectories.

Meaning: Success appeared more replicable in Study 2 than Study 1, suggesting the refined approach worked across different contexts.

What we don’t know: Whether this replicates without intensive AIRN support. All successful states had monthly consulting visits over years. Can capacity develop without that level of external support? The paper doesn’t track what happens after AIRN leaves.

Prediction 3: “Repeated assessments of state capacity development can be conducted in education”

What they showed: The SCA was administered 2-6 times per state over multiple years with consistent data collection.

Meaning: They clearly demonstrated repeated assessment is feasible in education systems.

Caveat: The assessment was developed and administered by the intervention team, scored by participant consensus, with no independent validation. That creates potential for bias. But the 5-year data from Study 1 suggests repeated testing probably didn’t artificially inflate scores (some states showed no progress despite repeated assessment).

The Cost Question

The paper describes implementation capacity development as “nearly cost-neutral” because states “repurposed” existing staff.

What was repurposed:

State education staff became State Transformation Specialists
Regional staff formed Regional Implementation Teams
Staff time reallocated from previous duties to implementation roles

What this doesn’t include:

AIRN consultant fees for monthly site visits over years
Federal contract funding
Opportunity costs of staff time reallocation

Someone funded the consultants. I can’t figure out from this paper what building implementation capacity actually costs. That matters for scaling.

The context they provide:

Failed reforms with poor implementation:

Comprehensive School Reform: $2+ billion, 8,000 schools, no impact
School Improvement Grants: $3+ billion, no significant outcomes
Common Core: $15.8 billion estimated state costs, no measurable improvement

If implementation infrastructure prevents wasting billions on failed reforms, even substantial investment could be rational. Mississippi invested heavily in literacy coaches. Alabama’s Numeracy Act invested $114 million annually in math coaches. DoD schools pay teachers $88,000 versus $31,900 in some states.

Improvement isn’t free. But transparency about costs helps assess feasibility.

The Mississippi Connection

The framework AIRN developed sort of sound like what Mississippi actually built during their reading reforms (2013-2023):

Literacy coaches in every school
Multi-year sustained teacher training programs
Individual student reading plans for systematic tracking
Third-grade retention with extensive intervention
Stable leadership committed for a decade

This is exactly the host system infrastructure the paper describes. Mississippi didn’t just adopt phonics curricula but built comprehensive implementation capacity.

Why other states with similar policies got smaller results:

The paper’s explanation would be that other states didn’t fully build out host systems (at best), but most likely remained ghost systems. They had policies requiring phonics instruction and literacy coaches on paper but lacked the implementation infrastructure to ensure fidelity.

Researcher Matt Barnum noted (in broader context, not this paper) that other states adopted sixteen of the same policy recommendations as Mississippi including phonics training and retention policies, yet saw gains of only 4-5 points compared to Mississippi’s 10-point jump.

This helps reinforce what the paper suggests: implementation quality matters as much as what you implement. Quality is “difficult to codify in policy checklists.”

The unanswered question:

The similarities are interesting. Mississippi built host system capacity and measured student outcomes throughout. AIRN built organizational capacity metrics but didn’t track whether states where implementing similar things and whenever students learned more.

Questions This Leaves Unresolved

After working through this paper, several questions remain:

About mechanism: The practice-policy feedback cycle sounds promising in theory. But I’d like to see specific examples documented. Show me a concrete barrier that surfaced through regional teams, the specific state policy that changed in response, and measurable improvement in implementation as a result. That would make the theory tangible rather than abstract.

About measurement: The SCA tracks organizational changes: teams exist, meetings happen, documents get written. That’s capacity to have implementation infrastructure. But is that the same as capacity to actually implement practices with fidelity?

Mississippi had organizational infrastructure AND measured implementation fidelity of reading instruction AND tracked student reading outcomes. This paper measured the first part without connecting to the other two.

About sustainability: Study 2 worked better than Study 1. That’s legitimate learning. But all successful states had intensive monthly consulting support. What happens when that ends? One state saw work collapse after leadership change. If capacity requires permanent external support, that changes what “capacity building” means.

About standardization: The paper argues repeatedly against standardization while promoting Active Implementation Frameworks™ (note the trademark) with a State Capacity Development Plan that guides work month-by-month for 36 months.

Maybe that’s not a contradiction. Maybe the framework provides structure for local adaptation without becoming rigid. But every failed reform thought it was flexible too. How does this one avoid becoming the next standardized solution that fails for the reasons the paper identifies?

About the “ways of work” claim: The paper says implementation teams “change their ‘ways of work’ to increase organizational capacity.” This means changing how the organization functions, not just adding programs. Repurposing staff into implementation team roles. Changing meeting structures to include feedback loops. Modifying data systems to track fidelity, not just outcomes. Shifting from one-off workshops to sustained coaching.

That’s organizational transformation. But the paper describes this more than documents it. What specific organizational routines changed? How did day-to-day work differ before and after? The theory makes sense but the concrete examples would strengthen it.

What This Tells Us About Implementation

The paper addresses a real problem: we know many interventions that work in controlled settings but fail at scale. Understanding how to build implementation capacity systematically could matter.

The genuine contributions:

The failure analysis from Study 1 is valuable even if you’re skeptical about the broader framework. External consultants without system access fail. Segregated specialists without authority fail. Weak regional infrastructure predicts failure. Lack of executive commitment stalls progress.

These are specific, avoidable mistakes with decent evidence behind them, especially considering the history of education consultants.

The theoretical framework can explain things and can back up claims to a degree. The complexity science perspective (also known as Ashby’s Law) (that systems need internal flexibility to match external complexity) explains why standardized reforms fail. The ghost system vs host system distinction clarifies what’s missing in most reform efforts.

The subscale sequence data suggests implementation capacity develops in stages: leadership enables regional development, which enables system alignment. If that holds across contexts, it tells us something about how to sequence capacity building.

What would strengthen this:

The obvious next step: Connect capacity to outcomes. This is feasible retrospectively. Go back to those 10 states, get the NAEP data and graduation rates from the years they worked with them. Analyze whether higher-capacity states showed better student outcomes.

If higher capacity correlates with better outcomes, that validates the approach. If it doesn’t, that’s important to know too. Either way, connecting organizational capacity to educational results matters.

Additional evidence that would help: Independent assessment of capacity for one thing. Documentation of sustainability after intensive support ends. Detailed cost accounting including consultant fees and opportunity costs. Specific examples of the practice-policy feedback cycle working. Evidence that organizational capacity enables actual implementation fidelity.

The realistic assessment:

This represents serious work on a genuinely difficult problem. The researchers learned from initial failures and improved their methods. They documented organizational changes in complex systems with repeated measurement over a decade. They acknowledge limitations openly.

The evidence shows intensive consulting support can improve organizational capacity metrics across different state contexts. Whether this organizational capacity enables better educational implementation, and whether it persists without intensive support, remains unclear.

That doesn’t make the work worthless. It makes it incomplete. Both things can be true. We just need more work to find out, well, what works!

What this might explain about Mississippi:

The framework aligns with what Mississippi did: sustained infrastructure investment, literacy coaches, multi-year training, systematic implementation. Mississippi built host system capacity, not just ghost system policies.

But Mississippi also measured outcomes throughout. They knew literacy coaches were working because kids were reading better. This research measured organizational capacity without tracking whether education improved.

The alignment suggests the framework might explain part of Mississippi’s success. But it’s one piece of a larger puzzle that included measuring results and staying committed through an entire span of implementation.

Where This Leaves Us

Implementation capacity clearly matters. The graveyard of failed reforms exists because we focus on what to implement instead of how to implement. If someone has figured out how to build implementation capacity systematically, that would be significant progress.

This paper might be part of that solution. It’s developmental research (building and testing an approach) not proof the approach produces educational improvement. That’s legitimate work worth paying attention to.

It’s also incomplete in ways that matter for assessing whether this actually helps students learn more.

The specific lessons worth taking:

If you’re trying to build implementation capacity, the structural prerequisites from this paper seem well-supported. Embed specialists in the system with actual authority, not as external consultants. Secure executive commitment with regular meeting time before starting. Build regional partnerships with dedicated resources. Plan for sustained timelines measured in years. Start in transformation zones to learn before scaling. Measure progress and adjust based on data.

These align with what successful states like Mississippi and Alabama actually did.

What we still need to understand:

Does implementation capacity as measured here lead to better implementation of educational practices? Does it improve student learning (it really does depend on the curriculum!)? Can it be sustained without intensive consulting support? Can it scale beyond states with favorable conditions and substantial resources?

The paper addresses the first question in implementation science: can you purposefully build organizational capacity? The evidence suggests yes, with intensive support and the right conditions.

The second question (does that capacity improve education?) remains unanswered, but there are a number of reasons why like what’s being implemented is different!

Why I am a bit “verbose” on this:

Because implementation is where most reforms fail, and understanding how to build implementation capacity systematically would matter. This paper represents serious effort on that problem with evidence of learning from failures.

It’s also unfinished work. Organizational capacity tells us something about organizational development, but without, let’s say, a common set of curriculum or practices. Whether it tells us something about education improvement, this paper isn’t going to give you an answer, just clues to sniff out.

Both things are true. The work has value. The evidence is incomplete. It deserves attention and healthy skepticism.

People in the past was able to implement a lot of things without requiring the same support or costs as the paper suggest, but that was *decades* ago. I don’t think we need such fancy frameworks to achieve any sort of the same capacity, but to gain back any sort of similar capacity is going to cost a lot and will require support, especially as those who hawk “accountability” like 90s era GE Executives who later moved on to Boeing or some other storied now declining company (or worse Clinton admin alumni like Emanuel or Cuomo who are later reveled to be bad at governing) while ignoring Juran and Demings are swarming like vultures.

That’s where the evidence leaves us.

For more on implementation and quality management principles, Deming’s “Out of the Crisis” remains essential reading. He taught countries and companies alike how to build quality into systems through continuous improvement.

Governance Cybernetics

The Mississippi Miracle Doesn't Scale; Building Implementation Capacity Does

Discussion about this post

Ready for more?