Kotlin-bench

Introduction: Kotlin-bench Can Language Models Resolve Real-world Kotlin & Android Issues?
More: Author   ReportBugs   OfficialWebsite   
Tags:

📰 Changelog

  • [Apr. 3, 2025]: Firebender team introduces Kotlin-bench in this blog post.

👋 Overview

Kotlin-bench is a spinoff of SWE-bench and is the first benchmark that evaluates Large Language Models (LLMs) and AI agents on 100 real-world Kotlin and Android software engineering tasks. It was created and maintained by the engineering team behind Firebender.

Given a codebase and an issue, a language model is tasked with generating a patch that resolves the described problem.

🚀 Set Up

To build Kotlin-bench from source, follow these steps:

  1. Clone this repository locally
  2. cd into the repository.
  3. Run pipenv shell to created a Python environment
  4. Install dependencies pip install -r requirements.txt

💽 Usage

To use Kotlin-bench, you can:

  • Run Kotlin-bench's data collection procedure on your own repositories, to make new Kotlin-bench tasks.
  • Evaluate models against Kotlin-bench. This is where you take a Kotlin-bench task and a model-proposed solution and evaluate its correctness.

⬇️ Downloads

Datasets
🤗 Kotlin-bench
🤗 Kotlin-bench w/ Full file rewrite + "Oracle" Retrieval Context
🤗 Kotlin-bench w/ Patch diff + "Oracle" Retrieval Context

💫 Contributions

We would love to hear from the Kotlin & Android community interested in contributing!

Join our Discord community for fast responses.

Feel free to email me directly at aman@firebender.com

Citations

@inproceedings{
    jimenez2024swebench,
    title={{SWE}-bench: Can Language Models Resolve Real-world Github Issues?},
    author={Carlos E Jimenez and John Yang and Alexander Wettig and Shunyu Yao and Kexin Pei and Ofir Press and Karthik R Narasimhan},
    booktitle={The Twelfth International Conference on Learning Representations},
    year={2024},
    url={https://openreview.net/forum?id=VTF8yNQM66}
}

🪪 License

MIT. Check LICENSE.md.

What is Firebender?

Firebender, is Cursor but for Android Studio. It's the best coding agent in Android Studio and Jetbrains IDEs, with integrations into the preview window, kotlin LSP, and more. Here's a 30-second walkthrough!

Apps
About Me
GitHub: Trinea
Facebook: Dev Tools