Ingestor CLI Specification
Overview
The Ingestor CLI processes Kubernetes resources (particularly Crossplane XRDs) and generates Backstage catalog entities. It provides a command-line interface for testing, validating, and processing resources outside of the Backstage runtime environment.
Design Principles
- Unified Core: Same ingestion logic as the Backstage runtime plugin
- Simple Interface: Single source argument with flag-based modes
- Unix Philosophy: Do one thing well - ingest resources into entities
- Predictable Behavior: Consistent output and error handling
Command Structure
ingestor <source> [options]
Arguments
source: Required. Can be:- File path:
xrd.yaml,./configs/my-xrd.yaml - Directory path:
./xrds/,/path/to/templates/ - Cluster keyword:
cluster(uses current kubectl context)
- File path:
Mode Flags (Mutually Exclusive)
-p, --preview: Preview what would be generated without writing files-v, --validate: Validate resources only, exit with status code-l, --list: List discovered resources without processing
Options
Input Options:
-c, --config <file>: Configuration file path-t, --type <type>: Resource type filter (xrd, k8s, all)-n, --namespace <ns>: Namespace filter for cluster source
Output Options:
-o, --output <dir>: Output directory (default: ./output)-f, --format <fmt>: Output format - yaml or json (default: yaml)--organize: Organize output by entity type
Entity Metadata:
--owner <owner>: Set owner for generated entities--tags <tags>: Add tags to entities (comma-separated)
Display Options:
--quiet: Suppress non-error output--verbose: Show detailed processing information-h, --help: Show help message--version: Show version information
Functional Requirements
Resource Discovery
The CLI must discover resources from three source types:
-
File Source
- Single YAML/JSON file
- Multi-document YAML support
- Error on non-existent files
-
Directory Source
- Recursive directory scanning
- Filter by file extensions (.yaml, .yml, .json)
- Skip hidden files and directories
- Handle symlinks safely
-
Cluster Source
- Use current kubectl context
- Support namespace filtering
- Handle RBAC permissions gracefully
- Timeout after 30 seconds
Processing Pipeline
Discovery → Validation → Conversion → Output
-
Discovery Phase
- Load resources from source
- Parse YAML/JSON
- Extract resource metadata
-
Validation Phase
- Check resource structure
- Validate required fields
- Verify API versions
- Report validation errors
-
Conversion Phase
- Use shared IngestionEngine
- Build Backstage entities
- Apply metadata overrides
-
Output Phase
- Format entities (YAML/JSON)
- Write to filesystem
- Organize by type if requested
Mode Behaviors
Default Mode (Process)
- Discovers resources
- Validates resources
- Converts to entities
- Writes to output directory
- Shows summary
Preview Mode (--preview)
- Discovers resources
- Validates resources
- Shows what would be generated
- No file writes
- Displays entity counts and types
Validate Mode (--validate)
- Discovers resources
- Validates resources
- Reports validation results
- Exit code 0 for success, 2 for validation failure
- No conversion or output
List Mode (--list)
- Discovers resources
- Lists resource names and types
- Shows basic metadata
- No validation or conversion
Technical Architecture
Shared Core Engine
interface IngestionEngine {
ingest(resources: Resource[]): Promise<Entity[]>
}
The CLI uses the same IngestionEngine as the Backstage runtime plugin, ensuring consistent behavior.
CLI Adapter
class CLIAdapter {
constructor(private engine: IngestionEngine) {}
async processFile(path: string, options: Options): Promise<void> {
const resources = await this.loadFile(path);
const entities = await this.engine.ingest(resources);
await this.writeEntities(entities, options);
}
}
Configuration Loading
Priority order:
- Command-line flags (highest)
- Config file (
--config) - Environment variables
- Backstage app-config.yaml (if found)
- Default values (lowest)
Output Specifications
Directory Structure
Default output structure:
output/
├── templates/
│ ├── xrd1-template.yaml
│ └── xrd2-template.yaml
├── apis/
│ ├── xrd1-api.yaml
│ └── xrd2-api.yaml
└── components/
└── service1-component.yaml
With --organize flag, entities are grouped by type.
Entity Format
Generated entities follow Backstage catalog format:
apiVersion: backstage.io/v1beta3
kind: Template
metadata:
name: generated-name
title: Human Readable Title
description: Description from source
tags:
- ingestor
- <additional-tags>
spec:
owner: platform-team
type: service
# ... rest of spec
Error Handling
Exit Codes
- 0: Success
- 1: General error
- 2: Validation failed
- 3: No resources found
- 4: Configuration error
- 5: Cluster connection error
Error Messages
Format: [ERROR] <context>: <message>
Example: [ERROR] Validation: XRD 'my-xrd' missing required field 'spec.group'
Performance Requirements
- Single file: < 1 second
- Directory (100 files): < 10 seconds
- Cluster discovery: < 30 seconds timeout
- Memory usage: < 256MB typical, < 512MB maximum
Examples
# Process single XRD
ingestor xrd.yaml
# Process with custom output
ingestor xrd.yaml --output ./backstage/templates --owner platform-team
# Preview directory processing
ingestor ./xrds/ --preview
# Validate before processing
ingestor ./xrds/ --validate && ingestor ./xrds/
# Process from cluster
ingestor cluster --namespace production
# List XRDs in cluster
ingestor cluster --list --type xrd
# Process with configuration
ingestor xrd.yaml --config ./ingestor.yaml
# Quiet mode for scripts
ingestor ./xrds/ --quiet --output ./generated
Configuration Schema
ingestor:
defaults:
owner: platform-team
namespace: default
tags:
- ingestor
- auto-generated
discovery:
includeNamespaces:
- default
- production
excludeNamespaces:
- kube-system
- kube-public
fileExtensions:
- .yaml
- .yml
- .json
processing:
xrd:
templateType: crossplane-resource
includeVersions: all # all, served, latest
publishPhase:
enabled: false
repository: github.com?owner=org&repo=catalog
branch: main
output:
format: yaml
organize: true
cleanOutput: false # Remove output dir before writing
validation:
strict: false
requiredFields:
- apiVersion
- kind
- metadata.name
Security Considerations
-
File System Access
- Validate file paths to prevent directory traversal
- Respect file permissions
- Handle symlinks safely
-
Cluster Access
- Use existing kubectl configuration
- Respect RBAC permissions
- Never store credentials
-
Output Sanitization
- Sanitize generated file names
- Validate output paths
- Prevent overwriting system files
Testing Requirements
Unit Tests
- Resource discovery for each source type
- Validation logic
- Entity conversion
- Configuration loading
Integration Tests
- End-to-end processing
- Mode flag behaviors
- Error handling
- Output generation
E2E Tests
- Real XRD processing
- Cluster interaction
- Large directory processing
- Configuration precedence
Future Enhancements
- Watch mode for continuous processing
- Incremental processing (only changed files)
- Parallel processing for large directories
- Plugin system for custom transformers
- Interactive mode with prompts
- Dry-run with diff output