Kotlin Incremental Build Caching Design Doc

date
Nov 8, 2020
slug
kotlin-incremental-build-caching-design-doc
status
Published
tags
gradle
summary
kotlin incremental build caching design doc
type
Post

Objective

Make Kotlin's incremental build artifacts cacheable to increase rate of incremental builds vs rebuilds and speed up builds in general. It is a non-goal to make a general solution to the problem and we are aiming for a ByteDance-specific solution.

Background

In the official Kotlin Gradle Plugin, the outputs of Kotlin's incremental build caches (not to be confused with Gradle's build cache) are labeled as @get:LocalState which means that they are removed when outputs are fetched from cache and are not used for up-to-date checks.
This means that every time you fetch kotlin build artifacts from cache, the Incremental Build Cache is wiped and your next build will be a rebuild rather than an incremental build, increasing build times.
There has been previous work done to look into making Kotlin's Incremental Build Cache cacheable in Gradle's build cache, but that was left incomplete due to time constraints.

Overview

We will make the build outputs under the directory caches-jvm cacheable. The current known reason why the data cannot be relocated (moved to another directory) is because it contains absolute paths for source files in its mappings. The simple solution is to translate these absolute paths into paths relative to the project root. There is a strong likelihood that this is not a correct general solution, but it should work in our hierarchical project structures that are common at ByteDance.

Data Abstraction

Kotlin's compilation flow using Gradle is split between two sections: the Gradle Plugin which sets up the build and the Compiler which is run as a daemon process. The Kotlin Incremental Build Cache is written and read by the Compiler to inform it of files that need to be compiled in an iterative fashion to ensure that all "dirty" symbols are updated. The Gradle Plugin also reads a small subset of the Incremental Build Cache in order to determine the dirty symbols and files to tell the Compiler to initially compile, saving some iterative compilation loops.
The Compiler is meant to be a build-tool agnostic compiler compatible with a variety of build systems. It sometimes fails to achieve this with some existing references to specific build tools internally, but this is a common design strategy. Because of this, the Compiler does not know much about Gradle-specific settings such as project locations and will need to be given that information by the Gradle Plugin. This is also a red-flag for upstreaming as it injects additional tool-specific information into the Compiler.
We will need to modify the Java RMI API used to also pass the Root Project Path that the Plugin is using to the Compiler so that the Compiler can properly create its relative paths.

Mechanism to Translate Absolute and Relative Paths

Kotlin's code is combined to have both the Plugin and Compiler in the same Git Repository so they share much code. This sharing includes a FileToPathConverter interface that is used by many of the caches to translate File instances into storable String paths. We will use and expand this mechanism by implementing our own class that will output relative paths when given a root project path. Kotlin's code already has an implementation, RelativeFileToPathConverter, that handles relative paths so we will re-use that and expand its usage to cover all incremental caches that store absolute paths.
This should make it so that the paths written to the files are all relative, and so relocatable, but does rely on us properly translating all relevant caches which is error-prone and a maintenance burden.

Make Cacheable in Gradle

Right now the caches-jvm directory of the Compiler's Incremental Build Cache is marked as LocalState which means that the data is deleted if the Task's outputs are fetched from the Gradle Build Cache.
In order to make things cacheable in the Gradle Build Cache, all we need to do is flip taskBuildDirectory's annotation to be OutputDirectory instead which will automatically tell Gradle to put caches-jvm into the Gradle Build Cache for the task. From there, standard Gradle caching behavior will handle things.

Brief Description of the New Cache Behavior

On a new set of code, the Gradle Build Cache will behave the same as before and will perform a clean or incremental build depending on the current state on the file system. The new behavior is that the Incremental Build Cache will be uploaded to the Gradle Build Cache so that subsequent fetches for this particular set of code will also get the Incremental Build Cache state.
For a set of code that has had the Task upload output artifacts to the Build Cache, the outputs and Incremental Build Cache will be fetched and made available for future builds, meaning that the next build after the fetch will always be incremental. Originally, the next build would only be incremental if a clean had not been performed.

Testing

We will run kotlin's tests and make sure that all tests which pass in the base (1.3.100-douyin) also pass in the new version that we create. Even though this is easy to say, it is relatively difficult to do since 1.3.100-douyin has many existing test failures, there does not appear to be an easy way to run all tests so we have to enumerate all of the various test tasks in Kotlin, and it takes over an hour to run all tests.
In addition, we will perform a series of local tests with copies of the code in different locations but using the same local disk cache. We will ensure that the same code in a different location will fetch an existing build cache entry which will then allow a correct incremental build.
  1. Populate gradle build cache with artifacts for a build.
  1. Make copies of code and point them to the same gradle build cache.
  1. Have each of them perform a build and ensure that the build artifacts are fetched from the gradle build cache instead of being built.
  1. Make an incremental change, perform a build, and ensure that incremental builds were performed and that the cache contents have been increased and are the same.

Upstream to Kotlin Mainline

We want to upstream this back to Kotlin instead of keeping it in a ByteDance-specific fork. However, to do so will probably take more work to either only enable this feature in situations where it is known to work and/or to disable the feature in situations where it won't work. Currently, it is not known in what situations things will break down so this is a future avenue of research and development.
For right now, there are several code paths which are not supported including, but not limited to:
  • experimental
  • jps
  • js
  • kapt
Support for these aren't required for the ./gradlew install task, which creates a maven entry for Kotlin which is what ByteDance uses, to properly run. If we want to upstream then proper support for these would be needed.
 

© Guang Feng 2022 - 2024