Kotlin-bench
Introduction: Kotlin-bench Can Language Models Resolve Real-world Kotlin & Android Issues?
Tags:
- [Apr. 3, 2025]: Firebender team introduces Kotlin-bench in this blog post.
👋 Overview
Kotlin-bench is a spinoff of SWE-bench and is the first benchmark that evaluates Large Language Models (LLMs) and AI agents on 100 real-world Kotlin and Android software engineering tasks.
Given a codebase and an issue, a language model is tasked with generating a patch that resolves the described problem.
🚀 Set Up
To build Kotlin-bench from source, follow these steps:
- Clone this repository locally
cd
into the repository.- Run
pipenv shell
to created a Python environment - Install dependencies
pip install -r requirements.txt
💽 Usage
To use Kotlin-bench, you can:
- Run Kotlin-bench's data collection procedure on your own repositories, to make new Kotlin-bench tasks.
- Evaluate models against Kotlin-bench. This is where you take a Kotlin-bench task and a model-proposed solution and evaluate its correctness.
⬇️ Downloads
Datasets |
---|
🤗 Kotlin-bench |
🤗 Kotlin-bench w/ Full file rewrite + "Oracle" Retrieval Context |
🤗 Kotlin-bench w/ Patch diff + "Oracle" Retrieval Context |
💫 Contributions
We would love to hear from the Kotlin & Android community interested in contributing!
Join our Discord community for fast responses.
Feel free to email me directly at aman@firebender.com
Citations
@inproceedings{
jimenez2024swebench,
title={{SWE}-bench: Can Language Models Resolve Real-world Github Issues?},
author={Carlos E Jimenez and John Yang and Alexander Wettig and Shunyu Yao and Kexin Pei and Ofir Press and Karthik R Narasimhan},
booktitle={The Twelfth International Conference on Learning Representations},
year={2024},
url={https://openreview.net/forum?id=VTF8yNQM66}
}
🪪 License
MIT. Check LICENSE.md
.