Compositional reasoning | ilija lichkovski

Can LLMs generalize out of distribution? In this work, I define a formalism and create a framework around it for generating text-based compositionality benchmarks that feature tunable difficulty axes in order to assess an LLM’s generalization capacity.

The hope is that this benchmark achieves better alignment between the inductive priors of humans and LLMs, allowing for a more faithful comparison.

Status: In Progress

Skills: Mathematics, Python