retailscm-biz-suite

Introduction: 供应链中台系统,集成零售管理, 电子商务, 供应链管理, 财务管理, 车队管理, 仓库管理, 人员管理, 产品管理, 订单管理, 会员管理, 连锁店管理, 加盟管理, 前端 React/Ant Design, 后端 Java Spring+自有开源框架,全面支持 MySQL, PostgreSQL, 全面支持国产数据库南大通用 GBase 8s,通过 REST 接口调用,前后端完全分离。
More: Author   ReportBugs   
Tags:

TEAQL Agent Kit is an evaluation environment for coding agents and language models on auditable business software tasks.

It is designed to measure not only whether generated code works, but also whether an agent can preserve business semantics, follow framework boundaries, maintain auditability, recover from errors, and use tokens efficiently.


What This Repository Is

This repository is focused on evaluation.

It provides tasks, prompts, guides, and reports for observing how coding agents behave when working with TEAQL-based business software.

The current goal is not ungated production automation.

The current goal is to answer a more basic question:

How do coding agents actually behave when business rules, generated APIs, audit traces, and framework boundaries matter?


Main Branch: Controlled Evaluation

The main branch is the primary entry point.

It is used for controlled and reproducible evaluation, with:

  • Clear task definitions
  • Explicit TEAQL API rules
  • Agent-readable guides
  • Optional human checkpoints
  • Comparable evaluation reports

This branch asks:

What can a coding agent do when the rules are clear and the evaluation is controlled?


Autonomous Branch: No-Gate Evaluation

The autonomous branch is for experimental no-gate evaluation.

It is used to observe how far coding agents can go without human intervention checkpoints.

This branch focuses on:

  • Fully automatic task attempts
  • Self-repair behavior
  • Unsafe shortcuts
  • Framework boundary violations
  • Token usage
  • Guardrails that may be needed before production use

The autonomous branch is for benchmarking and stress-testing. It is not a recommendation for ungated production deployment.

This branch asks:

What does a coding agent actually do when no human gate is present?


Evaluation Focus

TEAQL Agent Kit evaluates agents across dimensions such as:

  • Functional completion
  • API adherence
  • Hallucinated API count
  • Audit coverage
  • Framework discipline
  • Error recoverability
  • Human intervention count
  • Token efficiency

For long-lived business software, these dimensions matter as much as whether the code compiles.


Reports

Evaluation reports will be published in this repository.

Reports may include controlled and autonomous runs across different coding agents, language models, and TEAQL stacks.


Evaluation Across Stacks

TEAQL Agent Kit may evaluate equivalent business software tasks across different TEAQL implementations, including:

  • TEAQL Java stack
  • TEAQL Rust stack

The purpose is not to rank programming languages.

The purpose is to understand how coding agents preserve semantics, auditability, and framework boundaries across different implementation stacks.


Long-Term Direction

Today, TEAQL Agent Kit evaluates coding agents.

Long term, the same evidence may help define which AI coding tasks can be safely automated, which require human gates, and which should never bypass review.

The goal is measured automation, not blind automation.

Apps
About Me
GitHub: Trinea
Facebook: Dev Tools
AI Daily Digest