Language-Agnostic Code Generation: The Driver Plugin Model

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • MyrinNew
    Senior Member
    • Feb 2024
    • 5175

    #1

    Language-Agnostic Code Generation: The Driver Plugin Model

    TestSmith generates test scaffolds for five languages: Go, Python, TypeScript, Java, and C#. Each language has its own project structure conventions, test frameworks, import styles, and code patterns. The naive implementation would be a big switch statement throughout the codebase. We chose a plugin model instead.


    The Problem with Hardcoded Branches

    When a codebase switches on language in multiple places, every new language requires touching every branch point. Miss one and you get a silent bug — the new language falls through to some default behavior that doesn't apply to it. This is the classic Open-Closed violation: you have to modify existing code to extend it.


    The LanguageDriver Interface

    Every language in TestSmith implements a single interface:






    type LanguageDriver interface {
    // Detection
    DetectProject(dir string) (*ProjectContext, error)
    FileExtensions() []string

    // Analysis
    AnalyzeFile(path string, ctx *ProjectContext) (*SourceAnalysis, error)
    ClassifyDependency(dep ImportInfo, ctx *ProjectContext) DependencyCategory
    DeriveTestPath(sourcePath string, ctx *ProjectContext) (string, error)
    DeriveModulePath(sourcePath string, ctx *ProjectContext) (string, error)

    // Generation
    GenerateTestFile(analysis *SourceAnalysis, opts GenerateOpts) (*GeneratedFile, error)
    GenerateFixture(dep string, analysis *SourceAnalysis, opts GenerateOpts) (*GeneratedFile, error)
    GenerateBootstrap(plan *GenerationPlan, ctx *ProjectContext) (*GeneratedFile, error)

    // Framework config
    GetTestFrameworkConfig() TestFrameworkConfig
    SelectAdapter(ctx *ProjectContext) TestAdapter

    // LLM integration
    LLMContext(ctx *ProjectContext) map[string]string
    LLMVocabulary() map[string]string

    // Migration and validation
    ListMigrators() []Migrator
    ValidateFile(path string, ctx *ProjectContext) ([]ValidationIssue, error)
    }







    The generation pipeline, the CLI commands, and the watch mode all work against this interface. They never import a specific driver package.


    How Detection Works

    When you run testsmith generate, the first step is figuring out what language you're in. The registry tries each registered driver in turn:






    func Detect(dir string) (domain.LanguageDriver, error) {
    for _, d := range drivers {
    ctx, err := d.DetectProject(dir)
    if err == nil && ctx != nil {
    return d, nil
    }
    }
    return nil, domain.ErrProjectNotFound
    }







    Each driver's DetectProject walks upward from the starting directory looking for its own project markers — go.mod for Go, pyproject.toml or setup.py for Python, package.json for TypeScript, pom.xml or build.gradle for Java, .csproj or .sln for C#.


    One subtle requirement: a driver must not claim an ancestor project that belongs to a different language. If you run TestSmith from inside an example project that lives inside a Go repo, the Python driver shouldn't walk up past the Go project's .git boundary and claim the repo root. We solve this by checking VCS stop markers (.git, .hg, .svn) at ancestor directories only — not at the starting directory itself, since a legitimate project root can have both a project marker and a .git directory.






    func findRoot(startDir string) (string, error) {
    dir := startDir
    for {
    // At ancestor dirs, stop at VCS boundaries first.
    if dir != startDir {
    for _, stop := range stopMarkers {
    if _, err := os.Stat(filepath.Join(dir, stop)); err == nil {
    return "", domain.ErrProjectNotFound
    }
    }
    }
    // Then check for project markers.
    for _, marker := range rootMarkers {
    if _, err := os.Stat(filepath.Join(dir, marker)); err == nil {
    return dir, nil
    }
    }
    parent := filepath.Dir(dir)
    if parent == dir {
    break
    }
    dir = parent
    }
    return "", domain.ErrProjectNotFound
    }







    The Adapter Layer

    Within a language, there can be multiple test frameworks. TypeScript has Jest, Vitest, and Mocha. Java has JUnit 4, JUnit 5, TestNG, and Spring Boot Test. Each framework has its own import style, mock library, assertion syntax, and file naming conventions.


    We model this with a TestAdapter interface:






    type TestAdapter interface {
    Name() string
    FileNamingConvention() FileNaming
    ImportStyle() ImportStyle
    MockLibrary() string
    AssertionStyle() string
    LLMVocabulary() map[string]string
    }







    Each driver has a registry of adapters and a SelectAdapter method that reads the project config (or sniffs package.json devDependencies, pom.xml dependencies, etc.) to pick the right one. The LLM prompt gets the vocabulary from the selected adapter — so the model knows to generate expect(x).toBe(y) for Jest but assert.Equal(t, x, y) for Go's testify.


    Adding a New Language

    Because everything flows through the interface, adding a new language driver is isolated:

    1. Create a new package under internal/drivers//
    2. Implement domain.LanguageDriver — the compiler tells you exactly what's missing
    3. Register it in internal/registry/registry.go
    4. Optionally add a Verifier in internal/generation/verify.go for post-write compile checking


    No other files change. The existing drivers are untouched. The pipeline, CLI, and watch mode pick it up automatically.


    The Dependency Direction

    The plugin model enforces a strict dependency direction:






    cmd → generation → domain ← drivers
    ← llm







    domain defines the interfaces. drivers implement them. generation uses them via the interface. Neither generation nor drivers imports the other. This is the Dependency Inversion Principle applied at the package level — and it's enforced by Go's import cycle detector.


    When you add a new driver, it's impossible to accidentally reach into the generation pipeline or the LLM layer — Go won't compile it. The architecture is self-enforcing.


    Next in this series: making LLM calls reliable when you're hitting them for every public member of every source file.




    More...
Working...