klibs-io
Context
This project was started outside of JetBrains as a PoC and was never meant to be a serious project.
Initially, the website pages were served by the backend (SSR with Thymeleaf), but before the project
was transferred to JetBrains, it was re-implemented to be an independent backend with REST API.
Last commit with SSR: f4e46939.
While most of the project was refactored to look consistent and more like a decent codebase, you may still see traces of the early PoC stage with cut corners, hacks and TODOs.
Modules
The structure of the project tries to follow the "module by feature" approach, encapsulating distinct parts of the app in separate modules.
app- the main server module. Contains high-level configurations for the whole app. Serves as glue for all other modules. Runnable.
Core modules represent the essential parts of the app:
core/package- Maven Packages (arfitacts). For example, kotlinx-coroutines-corecore/project- Most high-level aggregating entity. Maps to anscm-repository, consists of a number of packages.core/scm-owner- Owners ofscm-repositoryentities, be it organizations or individual authors. For example, github.com/Kotlincore/scm-repository- (Git) repositories ofprojectentities. For example, github.com/Kotlin/kotlinx.coroutines.core/search- Search functionality across all data (projects, packages, owners, repositories)
Third party integrations reside separately:
integrations/ai- Integration with OpenAI. For example, for generating descriptions of libraries.integrations/github- Integration with GitHub to collect info forscm-repositoryandscm-ownerintegrations/maven- Integration Maven Central to scan for newpackageentities.
Profiles & configuration properties
Spring profiles are used to run the app in different environments. Profile-specific configuration properties can be found in app/src/main/resources.
The prod profile is used to run the app in production, it hides or restricts some debug utilities. A separate local
profile can be used for testing the app locally.
Note: application-prod.yml is just a template, it contains configuration properties that need to be configured for the app to work. Adapt it and/or use externalized configuration if necessary.
Scanning of Maven Central and indexing of packages can be enabled/disabled by setting the klibs.indexing property.
Files on disk
This app needs to store files on disk:
- Cache of requests to GitHub's API, managed by OkHttp.
Helps with avoiding rate limits. Configuration property:
klibs.integration.github.cache.request-cache-path. - README files from GitHub, both in Markdown and in HTML. These are stored in S3. Configuration properties:
klibs.readme.mode,klibs.readme.cache-dir,klibs.readme.s3.bucket-name,klibs.readme.s3.prefix.
Build & Run
Boot Jar
Run
./gradlew bootJar
Output: app/build/libs/app.jar
Run locally
Note: we use docker compose to run the app locally. So you need to have docker installed.
You can run the main function from Application it will loads spring boot application with local profile.
Troubleshooting
In case of problems, check troubleshooting.md.
Endpoints
Swagger
Swagger API is available under /api-docs/swagger-ui.html
Actuator
Spring Actuator is used.
/actuator/health- has custom health indicators/actuator/info- has custom info contributors
Workflow
You can find information about the development workflow in workflow.md.
Implementation details
Indexing logic
This is by far the most confusing part of the whole backend.
The general flow:
- Check for new artifacts (published since the last check) using Maven Central's API.
- If new artifacts are available, add them to the processing queue (table
package_index_request) - Process the package indexing queue in a separate thread, one by one. If indexing of a package fails, increment
its
failed_attempts. Try to process each package up to N times. Projects and SCM owner/info are created in the process of indexing packages.
AI descriptions are generated by a separate scheduled task because the rate limits of OpenAI are much lower than of GitHub and Maven Central, so it's significantly slower.
Information taken from GitHub (repository/owner) is updated by a separate scheduled task too,
based on github_repo.updated_at
Full Text Search
As of this moment, PostgresSQL's Full Text Search is used for FTS.
All relevant data is aggregated in a single materialized view project_index, which is updated periodically and is
used for search queries. While it gets the job done, it leaves a lot to be desired.
At some point, FTS might need to be re-implemented to use Solr / ElasticSearch or something similar. Code-wise,
it shouldn't be too difficult because all search-related logic is contained in the search module, so hopefully
it's just a matter of re-implementing SearchRepository.
This is probably the biggest technical task (the rest of the tech debt is less scary)
How to update JVM version
There are 3 places, which should be updated:
- Build logic module toolchain version: build.gradle.kts
- Toolchain version in base jvm convention plugin: klibs.kotlin-jvm.gradle.kts
- Gradle daemon jvm version. Update jvm version in task
updateDaemonJvm: build.gradle.kts and runupdateDaemonJvmtask:
./gradlew updateDaemonJvm
Gradle Build Scans
Gradle Build Scans can provide insights into an klibs.io backend Build. JetBrains runs a Gradle Develocity server that can be used to automatically upload reports.
To automatically opt in add the following to $GRADLE_USER_HOME/gradle.properties.
io.klibs.build.scan.enabled=true
# optionally provide a username that will be attached to each report
io.klibs.build.scan.username=John Wick
Also, you need to create an access key:
./gradlew provisionDevelocityAccessKey
A Build Scan may contain identifiable information. See the Terms of Use https://gradle.com/legal/terms-of-use/.
