chapi

Project Url: phodal/chapi
Introduction: Chapi is A common language meta information convertor, convert different languages to same meta-data model
More: Author   ReportBugs   
Tags:
Chapi Logo

Chapi CI codecov Maven Central

CHAPI (Common Hierarchical Abstract Parser and Information Converter) streamlines code analysis by converting source code from different languages into a unified abstract model, making cross-language analysis and tooling easier.

Chapi 是一个通用层次抽象解析器与信息转换器,它可以将不同编程语言的源代码转换为统一的层次抽象模型, 从而简化跨语言的代码分析与工具构建。

Chapi => Cha Pi => Tea Pi => Tea π => 茶 π. Reference: Tea if by sea, cha if by land.

Chapi (pronounced /tʃɑpi/) can also be read as “XP” in Chinese if you pronounce “X” as “叉”.

Status & language coverage

Language stages

Feature Java Python Go Kotlin TS/JS C C# Scala C++ Rust Swift
HTTP API decl 🆕 🆕 🆕
Syntax parsing
Function calls
Arch/package 🆕
Real-world

IDL stages

Feature Protobuf Thrift
Syntax parsing
HTTP API decl
Arch/package
Real-world

Projects using Chapi

  • ArchGuard — An architecture workbench for architecture governance. It can analyze architecture at container/component/code levels, create architecture fitness functions, and inspect system dependencies.
  • UnitGen — A fine-tuning data framework that generates datasets from your existing codebase.
  • ChocoBuilder — An LLM toolkit for building custom AI assistants.

PS: PRs are welcome — feel free to add your project here.

Language information

Tested language versions:

  • Java: 8, 11, 17
  • TypeScript/JavaScript
  • Kotlin
  • Rust: v1.60.0
  • Python: 2, 3
  • Swift: 5, 6 (with typed throws, async/await, actors, ownership modifiers)

Gradle modules (by tier):

// tier 1 languages
":chapi-ast-java",
":chapi-ast-typescript",

// tier 1 model language
":chapi-ast-protobuf",
":chapi-ast-thrift",

// tier 2 languages
":chapi-ast-kotlin",
":chapi-ast-go",
":chapi-ast-python",
":chapi-ast-scala",

// tier 3 languages
":chapi-ast-rust",
":chapi-ast-csharp",
":chapi-ast-c",
":chapi-ast-cpp",
":chapi-ast-swift",

// others
":chapi-parser-toml",
":chapi-parser-cmake",

Language families (refs):

Category Languages Planned support
C family C#, Java, Go, C, C++, Objective-C, Rust, ... C++, C, Java, C#, Rust?
Functional Scheme, Lisp, Clojure, Scala, ... Scala
Scripting Lua, PHP, JavaScript, Python, Perl, Ruby, ... Python, JavaScript
Other Fortran, Swift, Matlab, ... Swift?, Fortran?

Parsing / analysis rules

Chapi scans twice to improve cross-file resolution.

  • It helps find data structures in the same package/module.

TypeScript

  1. PackageName uses the resolved path. For example, src/grammar/blbla.ts becomes @.grammar.
  2. Top-level functions in a file use default as DataStructure.Name.
  3. export default Object uses default as FunctionName and belongs to the default data structure.

C# notes

C

We use https://github.com/shevek/jcpp to preprocess C code.

Kotlin

  • warpTargetFullType is required to resolve classes in the same package.

Usage

Add dependencies:

dependencies {
    implementation "com.phodal.chapi:chapi-ast-java:2.5.2"
    implementation "com.phodal.chapi:chapi-domain:2.5.2"
}

Example (Kotlin):

import chapi.domain.core.CodeDataStruct
import kotlinx.coroutines.async
import kotlinx.coroutines.awaitAll
import kotlinx.coroutines.runBlocking
import org.archguard.scanner.core.sourcecode.SourceCodeContext
import java.io.File

class CSharpAnalyser(override val context: SourceCodeContext)

private val client = context.client
private val impl = chapi.ast.csharpast.CSharpAnalyser()

fun analyse(): List<CodeDataStruct> = runBlocking {
    getFilesByPath(context.path) {
        it.absolutePath.endsWith(".cs")
    }
        .map { async { analysisByFile(it) } }.awaitAll()
        .flatten()
        .also { client.saveDataStructure(it) }
}

fun analysisByFile(file: File): List<CodeDataStruct> {
    val codeContainer = impl.analysis(file.readContent(), file.name)
    return codeContainer.Containers.flatMap { container ->
        container.DataStructures.map {
            it.apply {
                it.Imports = codeContainer.Imports
                it.FilePath = file.absolutePath
            }
        }
    }
}

Examples Input & Output

Java source:

package adapters.outbound.persistence.blog;

public class BlogPO implements PersistenceObject<Blog> {
    @Override
    public Blog toDomainModel() {

    }
}

Output:

{
    "Imports": [],
    "Implements": [
        "PersistenceObject<Blog>"
    ],
    "NodeName": "BlogPO",
    "Extend": "",
    "Type": "CLASS",
    "FilePath": "",
    "InOutProperties": [],
    "Functions": [
        {
            "IsConstructor": false,
            "InnerFunctions": [],
            "Position": {
                "StartLine": 6,
                "StartLinePosition": 133,
                "StopLine": 8,
                "StopLinePosition": 145
            },
            "Package": "",
            "Name": "toDomainModel",
            "MultipleReturns": [],
            "Annotations": [
                {
                    "Name": "Override",
                    "KeyValues": []
                }
            ],
            "Extension": {},
            "Override": false,
            "extensionMap": {},
            "Parameters": [],
            "InnerStructures": [],
            "ReturnType": "Blog",
            "Modifiers": [],
            "FunctionCalls": []
        }
    ],
    "Annotations": [],
    "Extension": {},
    "Parameters": [],
    "Fields": [],
    "MultipleExtend": [],
    "InnerStructures": [],
    "Package": "adapters.outbound.persistence.blog",
    "FunctionCalls": []
}

Development

Syntax parsing identification rules:

  1. Package name
  2. Import name
  3. Class / data structure
    1. Structure name
    2. Structure parameters
    3. Function names
    4. Return types
    5. Function parameters
  4. Function
    1. Function name
    2. Return types
    3. Function parameters
  5. Method call
    1. New instance call
    2. Parameter call
    3. Field call

Build Antlr grammar

  1. Install Antlr: brew install antlr
  2. Compile grammars: ./scripts/compile-antlr.sh

Data structures

classDiagram
direction TB

%% project/module/package
CodeProject "1" o-- "*" CodeModule : Modules
CodeModule "1" o-- "*" CodePackage : Packages
CodeModule "1" o-- "1" CodePackageInfo : packageInfo
CodePackageInfo "1" o-- "*" CodeDependency : Dependencies

%% package/container
CodePackage "1" o-- "*" CodeContainer : codeContainers
CodePackage "1" o-- "*" CodePackage : Packages
CodeContainer "1" o-- "*" CodeImport : Imports
CodeContainer "1" o-- "*" CodeMember : Members
CodeContainer "1" o-- "*" CodeDataStruct : DataStructures
CodeContainer "1" o-- "*" CodeField : Fields
CodeContainer "1" o-- "*" CodeContainer : Containers
CodeContainer "0..1" o-- "1" TopLevelScope : TopLevel

%% core data structures
CodeDataStruct "1" o-- "*" CodeField : Fields
CodeDataStruct "1" o-- "*" CodeFunction : Functions
CodeDataStruct "1" o-- "*" CodeDataStruct : InnerStructures
CodeDataStruct "1" o-- "*" CodeAnnotation : Annotations
CodeDataStruct "1" o-- "*" CodeCall : FunctionCalls
CodeDataStruct "1" o-- "*" CodeImport : Imports
CodeDataStruct "1" o-- "1" CodePosition : Position

CodeFunction "1" o-- "*" CodeProperty : Parameters
CodeFunction "1" o-- "*" CodeProperty : MultipleReturns
CodeFunction "1" o-- "*" CodeCall : FunctionCalls
CodeFunction "1" o-- "*" CodeAnnotation : Annotations
CodeFunction "1" o-- "*" CodeDataStruct : InnerStructures
CodeFunction "1" o-- "*" CodeFunction : InnerFunctions
CodeFunction "1" o-- "1" CodePosition : Position

CodeField "1" o-- "*" CodeAnnotation : Annotations
CodeField "1" o-- "*" CodeCall : Calls
CodeField "1" o-- "*" CodeField : ArrayValue

CodeCall "1" o-- "*" CodeProperty : Parameters
CodeCall "1" o-- "1" CodePosition : Position

CodeMember "1" o-- "*" CodeDataStruct : StructureNodes
CodeMember "1" o-- "*" CodeFunction : FunctionNodes
CodeMember "1" o-- "1" CodePosition : Position

License

Phodal's Idea

@2020 A Phodal Huang's Idea. This code is distributed under the MPL license. See LICENSE in this directory.

Apps
About Me
GitHub: Trinea
Facebook: Dev Tools