Cytopia 2026

Modern AI cannot solve Sudoku: analysis of statistical induction limits and the neuro-symbolic future

Hugues Wattez

Tuesday 12 May 2026

Introduction

LLMs are decision-support tools, but they should not replace critical thinking
Sudoku serves here as a simple, concrete example for showing their limitations
The core idea: when faced with a new problem, a statistical model can produce a confident answer without any guarantee of validity
The presentation follows four stages:
1. the history of AI
2. an introduction to logic
3. Sudoku experiments
4. a neuro-symbolic perspective

*Le penseur* skeptical in the face of all these LLMs – *Original photo Wikipedia*

Brief History

From 1950 to 1980, the symbolic paradigm dominated: intelligence was seen as the manipulation of symbols and rules

Strengths: few examples needed, declarative representation, readable reasoning.
Limits: fragility to noise, ambiguity, dependence on expert-defined rules.

At the end of the 1980s, connectionism gained momentum: it drew inspiration from the neural networks of the human brain

Strength: better generalization on perceptual or language tasks
Limits: massive data requirements, low explainability, difficulty producing stable logical reasoning.

Neuro-symbolic AI then emerged as an attempt to combine statistical learning and symbolical reasoning.

Logic and Inference: Introduction

Deduction goes from the general → particular: if the premises are true, the conclusion is necessarily true.

flowchart LR
  A["Tous les hommes sont mortels"] --> P{"Déduction"}
  B["Socrate est un homme"] --> P
  P --> C["Socrate est mortel"]

Induction goes from the particular → general: it proposes a rule, without absolute certainty.

flowchart LR
  A["Socrate est mortel"] --> P{"Induction"}
  B["Platon est mortel"] --> P
  C["..."] --> P
  D["Andreas est mortel"] --> P
  P --> E["Tous les hommes sont mortels"]

Logic and Inference: Application

Applying these concepts to our problem:

Sudoku requires above all a rigorous deductive chain, not just recognition of regularities;
current LLMs still behave mostly in an inductive way: they complete probable sequences rather than applying formal rules.

Sudoku

Classic Sudoku consists in filling a \(9×9\) grid so that each row (Figure 1), column (Figure 2), and \(3×3\) block (Figure 3) contains the digits from \(1\) to \(9\).

2 1 8 | 3 9 4 | 6 7 5
. . . | . . . | . . .
. . . | . . . | . . .
------+-------+------
. . . | . . . | . . .
. . . | . . . | . . .
. . . | . . . | . . .
------+-------+------
. . . | . . . | . . .
. . . | . . . | . . .
. . . | . . . | . . .

Figure 1

2 . . | . . . | . . .
9 . . | . . . | . . .
4 . . | . . . | . . .
------+-------+------
6 . . | . . . | . . .
7 . . | . . . | . . .
8 . . | . . . | . . .
------+-------+------
1 . . | . . . | . . .
3 . . | . . . | . . .
5 . . | . . . | . . .

Figure 2

2 1 8 | . . . | . . .
9 7 3 | . . . | . . .
4 6 5 | . . . | . . .
------+-------+------
. . . | . . . | . . .
. . . | . . . | . . .
. . . | . . . | . . .
------+-------+------
. . . | . . . | . . .
. . . | . . . | . . .
. . . | . . . | . . .

Figure 3

It is important to distinguish the problem in the general sense from a particular instance, that is, a given grid to solve.
Some instances are simple, while others require advanced techniques such as X-Wing (see Wikipedia).

Solving one given grid does not mean being able to solve all instances of the problem in general.

Simple Instance Experiment (Cytopia)

. 1 8 | . . . | . 7 .
. 7 . | . . 1 | 9 . .
. 6 . | 8 5 . | 1 2 .
------+-------+------
6 . . | 7 . . | 3 . .
7 . . | 5 1 . | . 8 .
8 . 4 | . . 9 | 7 . 5
------+-------+------
. 4 7 | . 9 8 | . 5 .
. . 2 | 6 . 5 | . 3 .
. . 6 | . . . | 2 4 .

Instance given to the LLMs.

5 1 8 | 9 2 4 | 6 7 3
2 7 3 | 4 6 1 | 9 0 8 🛑
. . . | . . . | . . .
------+-------+------
. . . | . . . | . . .
. . . | . . . | . . .
. . . | . . . | . . .
------+-------+------
. . . | . . . | . . .
. . . | . . . | . . .
. . . | . . . | . . .

ChatGPT also produces an incorrect answer and ultimately stops before completing the grid.

2 1 8 | 9 6 3 | 5 7 4
5 7 3 | 2 4 1 | 9 6 8
4 6 9 | 8 5 7 | 1 2 3
------+-------+------
6 5 1 | 7 8 4 | 3 9 2
7 9 3 | 5 1 2 | 4 8 6
8 2 4 | 3 6 9 | 7 1 5
------+-------+------
3 4 7 | 1 9 8 | 6 5 2
1 8 2 | 6 7 5 | 4 3 9
9 3 6 | 4 2 1 | 2 4 7 ‼️

Gemini 3 Fast returns an incorrect grid, with an inconsistency in the last row.

4 1 8 | 9 2 6 | 5 7 3
2 7 5 | 4 3 1 | 9 6 8
9 6 3 | 8 5 7 | 1 2 4
------+-------+------
6 5 1 | 7 8 4 | 3 9 2
7 3 9 | 5 1 2 | 4 8 6
8 2 4 | 3 6 9 | 7 1 5
------+-------+------
3 4 7 | 2 9 8 | 6 5 1
1 9 2 | 6 4 5 | 8 3 7
5 8 6 | 1 7 3 | 2 4 9 ✅

Gemini 3 Pro solves the simple instance, thanks to extended chain-of-thought reasoning.

This test shows that a more powerful model can succeed on a simple case without having a robust logical solver.

Hard Instance Experiment (Cytopia)

Solving a difficult Sudoku by Gemini 3 Pro (reasoning):

instance requiring the use of a Y-Wing to solve it
reasoning time doubled to obtain this result (~3 minutes).

. . . | . 3 . | . . 6
. 1 . | . 5 . | . . .
. . . | . . 9 | 4 . 3
------+-------+------
8 . . | . . . | 5 . .
. . 4 | . . 2 | 1 . 7
6 . 5 | . . . | . . 4
------+-------+------
1 . 9 | 3 . . | . . .
. . . | . 4 . | . 3 .
4 . 2 | . 8 1 | . . .

Instance given to the LLM.

7 4 8 | 2 3 1 | 9 5 6
2 1 3 | 8 5 6 | 7 4 9
5 9 6 | 7 2 9 | 4 1 3
------+-------+------
8 2 1 | 4 6 7 | 5 9 3
3 9 4 | 5 1 2 | 6* 8 7
6 7 5 | 9 8 3 | 1 2 4
------+-------+------
1 5 9 | 3 7 8 | 2 6 4
9 8 7 | 6 4 5 | 3 1 2
4 3 2 | 1 9 4 | 8 7 5

* Modified initial value.

Initiating Grid Analysis

I've begun analyzing the Sudoku grid, identifying the user's objective and input format in French.

Defining Solution Strategy

I've determined text output is the best approach, and I'm starting to build a solving algorithm.

Analyzing Number Placement

I've just made significant deductions on number placement, specifically for '4' and '1'.

I've determined that R7C8 must be '4', and further analyzed Box 9 after.

Analyzing Number Placement (Again)

I've successfully placed '1' at R4C3, and initiated further deductions about other numbers.

Deducing Candidate Values

I've determined some missing numbers and narrowed potential candidates within specific rows and boxes.

Analyzing Candidate Constraints

I've determined possible values and exclusions for Row 7, and reexamined column and box 6 for restrictions.

Analyzing Number Placement (Again)

I've determined some new possible values, applying deductions across rows, columns, and boxes.

Deducing Possible Values

I've determined possible locations for '7', '8', and missing values in Row 5, and I'm examining Box 5.

Analyzing Number Placement (Again)

I've determined R5C4 must be '5' using column constraints and deduced candidates for row and box 5.

Analyzing Number Placement (Again)

I've determined some new possible locations for numbers, revisiting existing areas of interest, like Boxes 2, 4 and 7.

Confirming Number Placement

I've successfully placed '4' at R1C2 using box and column constraints, and I'm further examining 4's grid placement.

Refining Value Placement

I've determined possible candidate locations for '1' in Boxes 3 and 9, based on exclusions.

Confirming New Placement

I've determined R8C9 is '1', which I verified and used to begin Box 3 deductions.

Deducing New Values

I've determined R6C7 is '3', and I'm updating Row 6, finding the placement for missing numbers.

Analyzing Pair Deduction

I've determined a strong candidate pair in Box 4, using row and column analysis.

I've deduced that R5C1 and R5C2 are {3, 9} and R4C2 and R6C2 are {2, 7} within Box 4, due to missing values.

Confirming Box Values

I've determined that R5C5 is '6' and R5C8 is '8', and I'm updating deductions.

Refining Grid Analysis

I've determined possible locations for 5 and 9 within Box 9, using existing constraints.

Refining Candidate Pairs

I've determined a hidden pair of {1, 5} exists at R1C8 and R3C8 within Box 3, by exclusions and constraints.

Refining Row Analysis

I'm now focusing on Box 3, specifically on constraints within Rows 1, 2, and 3, to place '8'.

Confirming Col Deduction

I've determined R9C2 is '3', R5C2 is '9', and R5C1 is '3', significantly progressing column and box analysis.

Refining Placement Analysis

I've determined possible locations for 9 within column 7, and reexamined column and box constraints.

\n\n
Reassessing Row Analysis

I've narrowed down candidate values for Col 1 and reviewed locations of missing numbers in Box 7.

Deducing New Placements

I've determined possible locations for missing values within Columns 3 and 5, based on current exclusions.

Confirming Number Placement

I've determined that R4C6 is '3', R4C4 is '4', and reexamined candidate placements within Box 5.

Gemini chain of thought

Thus, even a more powerful model fails to solve a more complicated Sudoku instance.

Extended Experiments

The study by (Seely et al. 2025) tests 100 Sudokus of sizes 4×4, 6×6, and 9×9 with several models:

Modèle (LLM)	\(4×4\)	\(6×6\)	\(9×9\)
o3‑mini‑high	\(73.3 \%\)	\(6.7 \%\)	\(2.9 \%\)
Gemini 2.5 Pro	\(60.0 \%\)	\(13.3 \%\)	\(0.0 \%\)
GPT‑4.1	\(13.3 \%\)	\(0.0 \%\)	\(0.0 \%\)
Qwen-*	\(40-53 \%\)	\(0.0 \%\)	\(0.0 \%\)

The best performances remain far from general reliability, even on 4×4 grids.

Why These Results?

LLMs are trained through statistical induction on large text corpus: they predict the most probable continuation.
Sudoku, by contrast, requires a systematic and deductive exploration of constraints across the whole grid.
Without an explicit logical inference mechanism, a model can lose global coherence and hallucinate a solution.
Reasoning techniques such as chain-of-thought sometimes improve performance, but they remain far from symbolic solvers.

Conclusion

Sudoku illustrates a structural limitation of modern AIs when they rely only on statistical induction.
Good use of these tools therefore requires maintaining a critical mindset toward their answers.
Symbolic solvers remain much more reliable for guaranteeing a correct solution.
This observation naturally points toward hybrid systems better equipped for reasoning.

*Le penseur* keeping a critical mind in the face of all these LLMs – *Original photo Wikipedia*

Neuro-Symbolic Perspective

Current research explores architectures that combine neural networks and logical modules.¹
The goal is to inject formal constraints, make decisions more explainable, and improve robustness.
This promising direction seeks to combine the best of both worlds: statistical learning and deductive reasoning.

Le penseur seems relieved to learn about these perspectives. However, it is a penseur pensé and drawn by an AI. – Gemini/Nano Banana

Thanks

Afra, for her deductions and inductions about these terms
Sofia, for her neuroscientific perspective
Hugo, for a future discussion between LLMs and neuro-symbolic AI

An article is on the way!

Thanks to the audience, mortals and immortals alike, and thank you for your questions 🤗

Références

Seely, Jeffrey, Yuki Imajuku, Tianyu Zhao, Edoardo Cetin, and Llion Jones. 2025. “Sudoku-Bench: Evaluating Creative Reasoning with Sudoku Variants.” ArXiv abs/2505.16135. https://doi.org/10.48550/arxiv.2505.16135.