Certus

oss

A standard for AI-generated code to carry machine-checkable certificates of correctness, plus a Python reference implementation and a finetuned model that generates them.

status
shipped
started
2024
role
creator
stack
PythonQwen 2.5 CoderQLoRAHypothesisunsloth

When an LLM writes a function, it can also emit a certificate: a structured declaration of what the function guarantees about its behavior. Certus defines the format for these certificates and provides tooling to verify them automatically.

A certificate looks like this:

@certus(
    preconditions=["len(arr) > 0", "1 <= k <= len(arr)"],
    postconditions=[
        {"when": "always",
         "guarantees": ["result in arr",
                        "sum(1 for x in arr if x < result) < k",
                        "sum(1 for x in arr if x <= result) >= k"]}
    ],
)
def kth_smallest(arr: list, k: int) -> int:
    return sorted(arr)[k - 1]

The checker then verifies these claims through structural validation (AST analysis of expressions), dynamic testing (Hypothesis-based property checking), and strength measurement (filtering tautological certificates that claim nothing useful).

Code generation is getting very good, but there is no standard way to verify that generated code does what it claims. The key insight is that certificates don’t need to be proofs to be valuable. Certus automates both sides.

Numbers

  • Training data: 874 MBPP samples → 710 valid certificates (81.2% structural pass) → 1420 training examples
  • Held-out eval: 83.3% end-to-end pass rate at 14B, 75.6% at 7B
  • 9 / 10 production-style utility functions verified correctly

License: Apache-2.0.