Skip to content
Diosh Lequiron
AI & Digital Transformation15 min read

AI in Education: What It Can and Cannot Replace

AI in education can support content delivery, practice, and teacher administration. It cannot replace expert modeling, mentorship, or the assessment of judgment. The distinction matters.

The Question That Precedes the Technology

Before evaluating what AI can do in education, it is worth being precise about what education is trying to do. The answer is not obvious, and different answers lead to different conclusions about where AI belongs.

If education is primarily information delivery — transferring a body of knowledge from a source that has it to learners who do not — then AI is a compelling delivery vehicle. It is infinitely patient, available at any time, can adjust to individual pace, and produces personalized explanations on demand. On this definition, AI could replace a significant fraction of what teachers do.

If education is primarily the development of judgment — the capacity to reason well about complex problems, to evaluate evidence, to make decisions under uncertainty, to apply knowledge in novel contexts — then the picture is different. Judgment develops through practice under conditions of authentic difficulty, with feedback from someone who can see where the thinking went wrong and model what better thinking looks like. AI can support some of this, but it cannot model expert judgment in the way that an experienced practitioner can, and the question of whether AI can reliably evaluate judgment is genuinely open.

The honest answer is that education is both — and that the relative weight of each function varies by level, subject, and goal. What responsible AI integration in education requires is clarity about which functions are being supported and which are at risk of being substituted without adequate consideration of what is lost.

This article attempts that clarity, drawing on experience in graduate education at PCU Graduate School, where the integration of AI tools into graduate-level instruction has required ongoing navigation of these questions.

What AI Can Support Reliably

There are specific educational functions where AI creates genuine value — where the evidence of effectiveness is reasonable, the risk of harm is manageable, and the benefit to learners or teachers is clear.

Content delivery and knowledge transfer. Explanatory content — definitions, conceptual overviews, worked examples, background reading — can be delivered by AI systems effectively. Adaptive learning platforms that adjust the difficulty and pacing of content delivery based on learner performance have reasonable evidence of effectiveness for this function. AI tutoring systems that provide explanations and answer factual questions on demand give learners access to on-demand support that was previously limited by teacher availability.

The appropriate scope here is important. "Content delivery" means delivering information that has a correct answer, a defined explanation, or a knowable fact. It does not extend to the interpretation of complex ideas, the evaluation of whether a learner's understanding is genuine rather than superficial, or the development of the learner's own analytical framework. Those functions require different capacities.

Practice and repetition with structured feedback. AI systems are well-suited to providing structured practice environments — vocabulary review, mathematical problem sets, grammar exercises, coding practice — with immediate feedback on correctness. The evidence for AI-assisted practice in domains with clear right-and-wrong answers (mathematics, language vocabulary, procedural coding skills) is reasonably positive. Learners who get immediate, specific feedback on errors can correct them faster than learners who wait for a teacher to review their work.

The limitation is that this function addresses the acquisition of defined procedures and facts, not the development of judgment about when to apply them or how to handle cases that fall outside the defined problem set.

Administrative burden reduction for teachers. The time teachers spend on administrative tasks — grading objective assessments, managing scheduling and communication, tracking attendance and progress, generating standardized reports — is time not spent on instruction and mentorship. AI tools that reduce this administrative burden by handling routine documentation, flagging students who may need attention based on performance patterns, and automating communication on standard topics create capacity for teachers to do more of the work that requires human judgment.

This is one of the highest-value AI applications in education precisely because it does not substitute for teaching — it removes friction from teaching and increases the proportion of teacher time available for the functions that require human presence.

Language access and translation. For multilingual educational contexts, AI translation tools reduce language barriers to participation. A student who can access course materials, submit work, and receive feedback in their preferred language is better positioned to demonstrate the understanding they have than a student whose performance is confounded by language barriers. This applies at the graduate level as well as earlier stages — in contexts where graduate programs serve students from diverse linguistic backgrounds, AI translation tools can be equalizers.

What AI Cannot Replace Without Degrading the Outcome

There are specific educational functions where AI substitution degrades the outcome — where the function requires something that AI cannot provide, and where substituting AI for the function produces a worse educational experience even if it is a cheaper one.

The modeling of thinking by an expert practitioner. One of the central mechanisms of education — particularly at advanced levels — is the opportunity to observe how an expert thinks. This is not the observation of expert outputs (the right answer, the polished analysis) but of expert process: how an expert approaches a problem they do not immediately know how to solve, how they recognize relevant considerations, how they manage uncertainty, how they evaluate their own reasoning and revise it.

AI systems produce outputs. They do not model process in the same way. A language model asked to explain its reasoning produces a post-hoc rationalization of an output that was generated through pattern matching, not a live demonstration of how an expert navigates genuine difficulty. Graduate students who observe an experienced researcher work through a methodological problem in real time — including the dead ends and revisions — are seeing something that cannot be replicated by an AI explanation of the same topic.

This is not a gap that better AI will necessarily close. It is a structural feature of the difference between a system that produces outputs and a practitioner who models judgment.

The mentorship relationship. Motivation, persistence, and intellectual confidence develop partly through sustained relationships with people who know the learner's specific strengths and challenges, believe in the learner's capacity to improve, and have a stake in the learner's development over time. This is the mentorship relationship. It is not primarily an information-delivery relationship — a mentor often does not tell learners things they could not find elsewhere. The value is relational: someone who sees you clearly, holds you accountable, and sustains investment in your development.

AI systems can simulate relational interaction. They cannot form the actual relationship. A learner who interacts extensively with an AI tutor is not forming a mentorship relationship; they are using a tool. The distinction matters because the motivational and developmental functions of mentorship are not present in the tool interaction. Learners who lack access to genuine mentorship relationships do not get those functions from AI substitutes; they go without them.

At the graduate level, this distinction is particularly consequential. Graduate students are not primarily acquiring information — they are being initiated into a scholarly or professional community, developing an identity as researchers or practitioners, and building relationships with mentors whose sponsorship and guidance will shape their trajectories for years. AI cannot provide any of this.

The assessment of judgment and synthesis. Evaluating whether a learner has developed genuine judgment — the capacity to reason well about complex, ambiguous problems — requires a human evaluator who has the judgment to recognize it. This is circular in a meaningful way: only someone who possesses good judgment can reliably evaluate whether someone else is developing it. The evaluation of a graduate thesis, a professional case analysis, or a complex design project requires an evaluator who can see where the thinking is sophisticated and where it is superficial, where the synthesis is genuine and where it is imitative.

AI-based assessment of judgment is an active area of research, and AI tools can provide useful scaffolding for feedback on structured elements of written work. But the claim that AI can reliably evaluate the quality of graduate-level reasoning is not supported by current evidence. Institutions that substitute AI assessment for human evaluation of high-stakes graduate work are not discovering a cheaper way to achieve the same outcome — they are producing a different, lower-quality outcome.

A Worked Example: The Same Essay, Two Different Functions

The line between supporting and replacing is easiest to see when the same artifact is examined through both functions. Take a graduate student's literature-review chapter — the part of a thesis that surveys prior work and positions the student's own contribution against it. The chapter has two layers that look identical on the page and are entirely different underneath.

The first layer is production. The student needs to find relevant sources, track citations, summarize what each paper argued, and assemble the result into prose that reads cleanly and conforms to a citation style. AI handles all of this well, and using it here is a legitimate productivity gain — the equivalent of using a reference manager instead of index cards. A student who uses AI to draft these summaries and then verifies each one against the original paper has lost nothing of educational value.

The second layer is synthesis: the judgment about which prior work actually matters, where the field's existing arguments are weak, and why the student's question is worth asking given everything already published. This is the layer the degree exists to develop. A student who prompts an AI to "identify the gap in the literature" has not performed that judgment; they have outsourced the one cognitive act the chapter was assigned to build. The danger is that the two layers produce the same artifact — a fluent chapter with a stated gap — so an evaluator reading only the output cannot tell which student did the thinking and which student described an AI's. The function the chapter was supposed to serve has been quietly replaced, and nothing on the page reveals it. Distinguishing the two requires the evaluator to probe the reasoning directly — to ask the student why a particular paper was excluded, or what would change if its central claim were false — which is precisely the kind of live, judgment-dependent assessment AI cannot stand in for.

Responsible AI Integration at the Graduate Level

The PCU Graduate School context is one where AI integration has been approached deliberately — not with blanket prohibition and not with undiscriminating adoption.

The practical approach that has emerged from this context:

Treat AI tools as research and production aids, not as substitutes for the research or thinking itself. Graduate students using AI tools to search literature, identify sources, organize notes, and produce first drafts of non-analytical content (literature review summaries, citation management, formatting) are using AI as a productivity aid. Students using AI to generate the analytical content of their work — the argument, the synthesis, the evaluation — are not developing the capabilities that graduate education is supposed to produce. The boundary between these uses requires explicit teaching, not just a policy statement.

Use AI-assisted feedback for structured elements and preserve human feedback for judgment elements. On written work that has both structured elements (grammar, formatting, citation compliance, organization) and judgment elements (argument quality, evidence evaluation, conceptual depth), AI tools can provide useful feedback on the former while human feedback is preserved for the latter. This division of labor is more appropriate than either pure AI assessment or pure human assessment of everything.

Disclose AI assistance requirements and define them specifically. Blanket AI prohibition is not enforceable and pushes use underground, where it is ungoverned. Blanket AI permission does not develop the judgment that graduate education should produce. Specific disclosure requirements — students must identify what AI tools they used and how — combined with explicit discussion of the pedagogical rationale create a governed space for AI use that supports rather than substitutes for learning.

Prioritize the mentorship and modeling functions explicitly. If AI is handling more of the administrative and content-delivery functions, the time freed should flow toward the functions AI cannot handle — mentorship, expert modeling, the evaluation of judgment. This requires intentional redistribution, not just efficiency capture. The risk is that efficiency gains from AI-assisted administration are captured in reduced staffing rather than reallocated to higher-value teacher functions.

The Specific Failure Modes When AI Replaces Rather Than Supports

There are characteristic failure modes that emerge when AI is used to replace teacher functions rather than support them. These failure modes are observable in educational contexts that have moved faster toward AI substitution than the evidence supports.

Fluency without understanding. AI writing assistance that produces grammatically correct, well-organized prose on demand can mask the absence of genuine understanding. A student who uses AI to draft their analysis has not demonstrated the capacity to produce that analysis; they have demonstrated the capacity to prompt and lightly edit an AI. Assessment systems that evaluate the output rather than the process of production cannot distinguish these. Institutions that rely heavily on written work as assessment without accounting for AI assistance are assessing AI capability, not student capability.

Artificial confidence in weak knowledge. AI tutoring systems that are too helpful — that provide answers before learners have struggled adequately with the problem — can produce learners who feel confident in their understanding without having developed the retrieval strength and transfer capacity that comes from effortful practice. The learner who has been told the answer feels like they know it; the learner who has retrieved it from genuine understanding actually does. AI systems calibrated for learner satisfaction rather than learning outcomes often err toward excessive helpfulness.

Relationship atrophy. As AI systems handle more learner interaction, teachers who are not intentional about preserving relational contact can find that their actual relationship with individual learners has attenuated. Course completion rates, motivation, and persistence in the face of difficulty are all supported by the sense that a specific human is paying attention to the learner's progress. Institutions that reduce teacher-to-learner contact in proportion to AI system adoption may be trading a determinant of educational outcome for an operational efficiency.

Assessment drift toward what AI can evaluate. When AI assessment tools become the primary feedback mechanism, there is a systemic pressure toward designing assessments that AI can reliably evaluate — structured, objective, with clear correct answers. This pressure, if unresisted, gradually shifts the curriculum away from the development of judgment and synthesis toward the acquisition of retrievable facts and procedures. The result is an education that is easier to assess and less valuable to complete.

The Design Principle That Prevents These Failures

The design principle that prevents these failures is straightforward to state and requires discipline to maintain: AI handles what AI can handle better than humans, humans handle what humans can handle better than AI, and the boundary is drawn by what the educational function actually requires — not by what is operationally convenient or financially efficient.

This principle requires that the people making integration decisions — educators, administrators, technology teams — have a clear account of what the educational functions are, what AI can reliably do, and where the lines are. It requires resisting the pressure to let AI expand into educational functions it cannot perform adequately because doing so is cheaper or faster. And it requires the ongoing work of evaluating whether the integration is producing the outcomes education is supposed to produce, not just the outputs that are easiest to measure.

Where the Boundary Is Genuinely Hard, and What to Do This Week

The principle is clean; applying it is not, and pretending otherwise would be its own failure mode. The boundary between supporting and replacing is not always crisp, and three honest difficulties deserve acknowledgment. First, the boundary moves with the learner: an AI explanation that does a novice's thinking for them may, for an advanced student who already has the judgment, simply accelerate work they could do unaided. The same tool supports one student and replaces a function for another. Second, the functions AI cannot replace — mentorship, expert modeling, judgment assessment — are also the most expensive and least scalable, which means the pressure to substitute is strongest exactly where substitution does the most harm. Third, this account itself rests on the current state of the technology; the claim that AI cannot evaluate graduate-level reasoning is an empirical one, and an institution should hold it as a position to revisit rather than a permanent law. A program that treats the boundary as fixed forever will be as wrong as one that ignores it.

What an educator can do this week, without resolving any of that, is run one audit on a single assignment: for each thing the assignment is supposed to develop, ask whether AI could produce a passing submission without the student exercising the capability the assignment exists to build. Where the answer is yes — where a fluent output is indistinguishable from genuine work — the assignment is now measuring prompting skill, and the response is not to ban the tool but to move the assessment toward the process: an oral defense, a live problem worked in front of an evaluator, a draft history, a question that requires the student to reason from their own choices. The goal is not to outrun AI. It is to make sure that what is being assessed is still the thing the education was meant to produce.

Education is not immune to the pressures that produce poor AI adoption in other domains — the overestimation of AI capability, the conflation of operational efficiency with mission delivery, the tendency to measure what can be measured and optimize for it regardless of whether it captures what matters. In education, the cost of these errors is paid by learners who receive a less valuable education than they were entitled to expect. That cost is worth taking seriously before the integration decisions are made.

Continue in this series

This piece is part of AI Integration for Organizations: A Complete Implementation Guide, my systematic guide to applied AI and digital transformation. Related reading:

Working through this in your own organization? I help technical leaders design it directly — advisory engagements.

ShareXLinkedInFacebookThreads

Continue Reading

AI & Digital Transformation

Shadow AI: Governing the Tools Your Team Already Uses

Before any official AI rollout, your team is already pasting company data into consumer tools. Prohibition fails. Here is how to discover, classify, and govern shadow AI through enablement.

Read
AI & Digital Transformation

From Assistants to Agents: What Agentic AI Changes for Operations

An assistant suggests and a human acts. An agent acts within bounds. That single shift moves AI errors from bad advice to direct consequences — and changes what governance has to do.

Read
AI & Digital Transformation

When AI Fails in Production: An Incident Response Playbook

AI failures are silent, plausible, and propagate through automated downstream actions. This is the operational sequence for the first hour, the rollback, the postmortem, and the readiness you build before the first incident.

Read
AI & Digital Transformation

The True Cost of AI in Production: A TCO Framework

The license fee is the smallest line item in running AI in production. A total cost of ownership framework for the inference, review, monitoring, and failure costs that surface only at scale.

Read
AI & Digital Transformation

Build vs. Buy for AI Capabilities: A Decision Framework

Most teams get the AI build-vs-buy question backward — building commodities and buying differentiators. A framework for deciding by strategic value, rate of change, and where a capability sits in its lifecycle.

Read
AI & Digital Transformation

AI-Assisted Services People Will Actually Pay For

AI-assisted services become sellable when they focus on business outcomes, quality control, and risk reduction rather than tool novelty.

Read

Explore more

← All Writing