Macedonian multimodal AI infrastructure
Building open-source LLM infrastructure for the Macedonian language
Large language models have given us a glimpse of AGI. Current capabilities are impressive, but unevenly distributed. Alongside my colleagues at Manifold, we seek to embolden the Macedonian open-source ecosystem of tinkerers and builders to experiment more with LLMs in our language.
We’ve done this by releasing (multimodal) datasets counting millions of QA pairs, synthetic instruction tuning datasets, as well as training open-source VLMs on those.
Status: In Progress
Skills: Data engineering, Python