The AI Quality Ceiling: Why ChatGPT Will Never Preach Like a Trained Pastor

There is a widespread assumption that AI will keep getting better until it matches or exceeds human expertise in every domain. In preaching, in theology, in pastoral insight. The assumption is wrong. And the reason it is wrong is mathematical, not speculative.

Large language models have a structural ceiling. Three compounding forces create it. And understanding those forces is the difference between a pastor who uses AI wisely and one who trusts it with things it structurally cannot do.

Force One: Next-Token Prediction Trends Toward Average

Every word an LLM generates is a statistical prediction. Given the words that came before, the model calculates which word is most likely to come next. Not which word is most true. Not which word is most profound. Which word is most probable.

Ben Affleck, reflecting on AI-generated screenwriting, put it plainly: “It goes to the mean, to the average.”

This is not a metaphor. It is a description of how the mathematics work. The model was trained on the internet — billions of pages of human writing. Its predictions reflect the statistical center of that corpus. The output gravitates toward what is common, not what is exceptional.

In preaching, the “average” is a sermon that uses the right vocabulary, follows a standard structure, and makes points that are familiar to anyone who has heard a few sermons on the same passage. It will sound competent. It will not be revelatory. It will not carry the weight of a pastor who has been broken by the text, reshaped by the Spirit, and compelled to say something that only he could say to the specific people God entrusted to him.

Force Two: RLHF Trains for Familiarity, Not Quality

Reinforcement Learning from Human Feedback (RLHF) is the process that makes AI responses sound “helpful.” Human annotators rank the model's outputs, and the model learns to produce responses that score well.

The problem: the reward functions weight typicality at α=0.57. That is a technical way of saying the model is explicitly trained to produce outputs that feel familiar to human raters. Not outputs that are most accurate. Not outputs that are most theologically rigorous. Outputs that match what the raters expect to hear.

This creates a compounding bias. The model learns that familiar-sounding theology gets rewarded. Novel insights, challenging reinterpretations, prophetic applications that cut against cultural assumptions — these score lower because they feel less “typical.” The model optimizes away from exactly the kind of preaching the church needs most.

Force Three: Model Collapse Narrows the Range

As AI models increasingly train on AI-generated content — and they do, because AI content now saturates the internet — a phenomenon called “model collapse” occurs. The distributional tails shrink. The range of possible outputs narrows in a self-reinforcing loop.

Each generation of AI trained on the previous generation's output loses the outliers. The unusual. The surprising. The prophetic voice that says something nobody expected.

The result: AI-generated theology converges toward a narrower and narrower band of “acceptable” output. The already-average gets more average with each iteration. The ceiling does not rise. It lowers.

The 53-Point Gap

These three forces produce a measurable ceiling. On “Humanity's Last Exam” — a benchmark of expert-crafted questions designed to test the upper limits of AI knowledge — the top AI scored 37.5%. Human domain experts averaged roughly 90%.

53-point gap.

On expert-level questions, AI scored 37.5%. Human experts scored ~90%.

And the scaling problem makes the ceiling even harder to breach. Research shows that halving AI error rates requires 500 times more computational resources. Each marginal improvement costs exponentially more. The gap between AI and human expertise is not closing on a predictable timeline. It is hitting a wall.

“Neural networks are fantastic interpolators but terrible extrapolators — blind to the mechanisms that generate the data in the first place.”
— Ali Yahya, a16z

Interpolation means finding patterns within existing data. Extrapolation means generating genuine insight beyond the data. Theology, at its best, is extrapolation — the pastor sees connections, implications, and applications that emerge from the intersection of text, Spirit, experience, and congregation. AI interpolates. Pastors extrapolate.

The 95th Percentile Trap

Here is the most dangerous part of the quality ceiling. AI does not produce obviously bad output. It produces output that looks excellent to anyone who is not an expert.

Dubach describes it precisely: “AI can get to the 95th or 98th percentile of creating something that looks perfect — but then it isn't, and if you have deep knowledge you can spot it immediately.”

A seminary-trained pastor reads an AI-generated exegetical note on Romans 9 and immediately catches the subtle conflation of divine sovereignty and fatalism. A volunteer small group leader reads the same note and teaches it as written. The AI sounded authoritative. The error was at the 95th percentile — invisible to anyone without deep training.

MIT research confirmed the pattern: AI models use 34% more confident language when hallucinating than when stating accurate facts. The model sounds most sure when it is most wrong. On theological topics where Gloo's benchmark showed models scoring only 61 out of 100 on Christian worldview accuracy, this confidence is not just misleading. It is pastorally dangerous.

What This Means for the Pastor

The quality ceiling means your theological training is not obsolete. It is more valuable than ever.

AI will produce work that impresses your congregation. It will generate sermon outlines that sound polished, discussion questions that feel thoughtful, and exegetical notes that cite real scholars. And somewhere in that output, at a rate of 5 to 19 percent depending on the domain, there will be claims that are fabricated, distorted, or subtly wrong.

The only person who can catch those errors is someone who has independently studied the text. Who has built the theological muscles through years of deliberate practice. Who has the “deep knowledge” that spots the impossible backhand.

That person is you. And no model update is going to replace you. Because the ceiling is not a bug. It is the mathematics.

The Theological Claim

The Master's Seminary put it starkly: AI “cannot think for itself” and is “incapable of making a new point — only recombining existing data.” Recombination is interpolation. Ministry requires extrapolation — the new thing the Spirit speaks through a prepared vessel.

The quality ceiling is not just a technical limitation. It is a theological boundary. AI can organize what has been said. It cannot say what has never been said. And the church was built on the new thing God speaks through human voices yielded to Him.

Your training is your edge. The ceiling ensures it stays that way. Do not surrender the advantage God gave you to a tool that trends toward average.

OpenLumin does not generate sermons or theology.
It retrieves verified evidence from 15+ scholarly sources.
You provide the insight AI structurally cannot.

Start Researching — Free Support the Mission

About: AI Fluency Ministry is a project helping the church understand and use AI wisely. OpenLumin is the practical application of that research — a free Bible research companion that retrieves evidence from 15+ scholarly sources so pastors can study with depth and teach with confidence.

The AI Quality Ceiling.Why ChatGPT Will Never Preach Like a Trained Pastor.