Kotlin-bench

Introduction: Kotlin-bench Can Language Models Resolve Real-world Kotlin & Android Issues?
More: Author   ReportBugs   OfficialWebsite   
Tags:
  • [Apr. 3, 2025]: Firebender team introduces Kotlin-bench in this blog post.

👋 Overview

Kotlin-bench is a spinoff of SWE-bench and is the first benchmark that evaluates Large Language Models (LLMs) and AI agents on 100 real-world Kotlin and Android software engineering tasks.

Given a codebase and an issue, a language model is tasked with generating a patch that resolves the described problem.

🚀 Set Up

To build Kotlin-bench from source, follow these steps:

  1. Clone this repository locally
  2. cd into the repository.
  3. Run pipenv shell to created a Python environment
  4. Install dependencies pip install -r requirements.txt

💽 Usage

To use Kotlin-bench, you can:

  • Run Kotlin-bench's data collection procedure on your own repositories, to make new Kotlin-bench tasks.
  • Evaluate models against Kotlin-bench. This is where you take a Kotlin-bench task and a model-proposed solution and evaluate its correctness.

⬇️ Downloads

Datasets
🤗 Kotlin-bench
🤗 Kotlin-bench w/ Full file rewrite + "Oracle" Retrieval Context
🤗 Kotlin-bench w/ Patch diff + "Oracle" Retrieval Context

💫 Contributions

We would love to hear from the Kotlin & Android community interested in contributing!

Join our Discord community for fast responses.

Feel free to email me directly at aman@firebender.com

Citations

@inproceedings{
    jimenez2024swebench,
    title={{SWE}-bench: Can Language Models Resolve Real-world Github Issues?},
    author={Carlos E Jimenez and John Yang and Alexander Wettig and Shunyu Yao and Kexin Pei and Ofir Press and Karthik R Narasimhan},
    booktitle={The Twelfth International Conference on Learning Representations},
    year={2024},
    url={https://openreview.net/forum?id=VTF8yNQM66}
}

🪪 License

MIT. Check LICENSE.md.

Apps
About Me
GitHub: Trinea
Facebook: Dev Tools