publications

Publications by Ilija Lichkovski — research papers on AI safety, mechanistic interpretability, reinforcement learning, and LLM agents.

2025

  1. RegML
    EU-Agent-Bench: Measuring Illegal Behavior of LLM Agents Under EU Law
    Ilija Lichkovski, Alexander Müller, Mariam Ibrahim, and 1 more author
    In NeurIPS 2025 Workshop on Regulatable ML, 2025
  2. MechInterp
    The Anatomy of Alignment: Decomposing Preference Optimization by Steering Sparse Features
    Jeremias Ferrao, Matthijs Lende, Ilija Lichkovski, and 1 more author
    In NeurIPS 2025 Mechanistic Interpretability Workshop, 2025