Kotlin-bench Kotlin-bench Can Language Models @codeKK pythonOpen Source Website

Kotlin-bench

Introduction: Kotlin-bench Can Language Models Resolve Real-world Kotlin & Android Issues?

Tags:

[Apr. 3, 2025]: Firebender team introduces Kotlin-bench in this blog post.

👋 Overview

Kotlin-bench is a spinoff of SWE-bench and is the first benchmark that evaluates Large Language Models (LLMs) and AI agents on 100 real-world Kotlin and Android software engineering tasks.

Given a codebase and an issue, a language model is tasked with generating a patch that resolves the described problem.

🚀 Set Up

To build Kotlin-bench from source, follow these steps:

Clone this repository locally
cd into the repository.
Run pipenv shell to created a Python environment
Install dependencies pip install -r requirements.txt

💽 Usage

To use Kotlin-bench, you can:

Run Kotlin-bench's data collection procedure on your own repositories, to make new Kotlin-bench tasks.
Evaluate models against Kotlin-bench. This is where you take a Kotlin-bench task and a model-proposed solution and evaluate its correctness.

⬇️ Downloads

Datasets
🤗 Kotlin-bench
🤗 Kotlin-bench w/ Full file rewrite + "Oracle" Retrieval Context
🤗 Kotlin-bench w/ Patch diff + "Oracle" Retrieval Context

💫 Contributions

We would love to hear from the Kotlin & Android community interested in contributing!

Join our Discord community for fast responses.

Feel free to email me directly at aman@firebender.com

Citations

@inproceedings{
    jimenez2024swebench,
    title={{SWE}-bench: Can Language Models Resolve Real-world Github Issues?},
    author={Carlos E Jimenez and John Yang and Alexander Wettig and Shunyu Yao and Kexin Pei and Ofir Press and Karthik R Narasimhan},
    booktitle={The Twelfth International Conference on Learning Representations},
    year={2024},
    url={https://openreview.net/forum?id=VTF8yNQM66}
}

🪪 License

MIT. Check LICENSE.md.

Apps

Android Developer Tools

Android Developer Tools Pro

About Me

Tools: TimeShining

GitHub: Trinea

Facebook: Dev Tools

JSON Format, Support error correction

MD5/SHA Encode, Support batch

Text Process

CSS Format and Compress