Macedonian multimodal AI infrastructure

Building open-source LLM infrastructure for the Macedonian language

Large language models have given us a glimpse of AGI. Current capabilities are impressive, but unevenly distributed. Alongside my colleagues at Manifold, we seek to embolden the Macedonian open-source ecosystem of tinkerers and builders to experiment more with LLMs in our language.

We’ve done this by releasing (multimodal) datasets counting millions of QA pairs, synthetic instruction tuning datasets, as well as training open-source VLMs on those.

Status: In Progress

Skills: Data engineering, Python