Ingestor v2 Architecture Concept
Vision
Redesign the ingestor as three independent, composable tools that follow the Unix philosophy: do one thing well. Each tool can be used standalone or orchestrated together, with complete control over template generation through Eta templates.
Architecture Decisions
Core Principles
- Three Independent Tools - Extract, Transform, Provide (complete implementations, not frameworks)
- Replaceable Tools - Each tool can be replaced by external alternatives (not plugins, but complete replacements)
- Template-Driven - Full control via Eta templates for transformation
- File-Based Interface - Tools communicate via files/JSON, enabling easy replacement
- Dual Execution - Run embedded in Backstage or as external processes
Tool Separation & Replaceability
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ EXTRACT │────▶│ TRANSFORM │────▶│ PROVIDE │
└─────────────┘ └─────────────┘ └─────────────┘
│ │ │
▼ ▼ ▼
K8s API/Files Templates Backstage
(Eta Engine) (Plugin Only)
OR OR OR
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ kubectl │────▶│ Custom │────▶│ GitHub │
│ get xrd │ │ Script │ │ Actions │
└─────────────┘ └─────────────┘ └─────────────┘
Key Design Decision: Each tool is a complete, standalone implementation. We don't build plugin systems within the tools. Instead, the tools communicate via standard file formats (JSON/YAML), allowing users to replace any tool entirely with their own implementation or existing tools.
Tool Interface Contracts
Each tool has a clear input/output contract, enabling replacement:
-
Extract Output → JSON file with structure:
{
"source": "kubernetes|file|git",
"timestamp": "ISO-8601",
"xrd": { /* XRD object */ },
"metadata": { /* extraction context */ }
} -
Transform Input → Extract's JSON output Transform Output → Backstage YAML/JSON entities
-
Provide Input → Directory of Backstage entities Provide Output → Entities in Backstage catalog
Detailed Architecture
1. Extract Tool (@openportal/xrd-extract)
Purpose: Extract XRDs from various sources
CLI Usage:
xrd-extract --source kubernetes --cluster production --output xrds/
xrd-extract --source file --path ./my-xrd.yaml --output xrds/
xrd-extract --source git --repo https://github.com/org/templates --output xrds/
Library Usage:
import { extract } from '@openportal/xrd-extract';
const xrds = await extract({
source: 'kubernetes',
cluster: 'production',
filters: { labels: { 'backstage.io/enabled': 'true' } }
});
Output Format: JSON with XRD + metadata
{
"source": "kubernetes",
"timestamp": "2024-01-01T00:00:00Z",
"xrd": { /* Original XRD */ },
"metadata": {
"cluster": "production",
"namespace": "crossplane-system"
}
}
Can Be Replaced With (external tools, not plugins):
# Using kubectl instead of xrd-extract
kubectl get xrd -o json | jq '{
source: "kubernetes",
timestamp: now|todate,
xrd: .,
metadata: {cluster: "production"}
}' > xrd.json
# Using a simple shell script
#!/bin/bash
cat my-xrd.yaml | yq eval -o=json '{
source: "file",
timestamp: "'$(date -Iseconds)'",
xrd: .,
metadata: {path: "'$1'"}
}' > xrd.json
2. Transform Tool (@openportal/xrd-transform)
Purpose: Transform XRDs into Backstage templates using Eta templates
CLI Usage:
xrd-transform --input xrds/ --templates ./templates --output catalog/
xrd-transform --input xrd.json --template-dir ./my-templates --output catalog/
Library Usage:
import { transform } from '@openportal/xrd-transform';
const backstageEntities = await transform({
xrd: xrdData,
templateDir: './templates',
context: { /* additional context */ }
});
Template Structure:
templates/
├── metadata.yaml # Template configuration
├── backstage/
│ ├── default.eta # Default Backstage template
│ ├── simple.eta # Simple form template
│ └── advanced.eta # Advanced template with all features
├── wizard/
│ ├── default.eta # Default wizard configuration
│ ├── gitops.eta # GitOps-focused wizard
│ └── multi-cluster.eta # Multi-cluster wizard
└── steps/
├── default.eta # Default steps
├── github-pr.eta # GitHub PR creation
├── gitlab-mr.eta # GitLab MR creation
└── direct-apply.eta # Direct kubectl apply
Template Selection via XRD Annotation:
apiVersion: apiextensions.crossplane.io/v1
kind: CompositeResourceDefinition
metadata:
name: databases.platform.io
annotations:
# Select which templates to use
backstage.io/template: advanced # Use advanced.eta
backstage.io/wizard: gitops # Use gitops.eta wizard
backstage.io/steps: github-pr # Use github-pr.eta steps
# Or use default if not specified
Eta Template Example (backstage/advanced.eta):
<%#
Available variables:
- xrd: The full XRD object
- metadata: Extraction metadata
- helpers: Utility functions
%>
apiVersion: scaffolder.backstage.io/v1beta3
kind: Template
metadata:
name: <%= helpers.slugify(xrd.metadata.name) %>
title: <%= helpers.extractTitle(xrd) %>
description: <%= xrd.metadata.annotations?.['backstage.io/description'] || 'Manage ' + xrd.spec.names.kind %>
tags:
- crossplane
- <%= xrd.spec.group %>
<% if (xrd.metadata.labels) { %>
<% Object.entries(xrd.metadata.labels).forEach(([key, value]) => { %>
- <%= value %>
<% }) %>
<% } %>
spec:
owner: <%= xrd.metadata.annotations?.['backstage.io/owner'] || 'platform-team' %>
type: crossplane-resource
<%# Include the wizard %>
<%= include('wizard/' + (xrd.metadata.annotations?.['backstage.io/wizard'] || 'default')) %>
<%# Include the steps %>
<%= include('steps/' + (xrd.metadata.annotations?.['backstage.io/steps'] || 'default')) %>
Can Be Replaced With (external tools):
# Using jq to generate simple template
cat xrd.json | jq '
.xrd | {
apiVersion: "scaffolder.backstage.io/v1beta3",
kind: "Template",
metadata: {
name: .metadata.name,
title: .spec.names.kind
},
spec: {
owner: "platform-team",
type: "crossplane-resource",
parameters: [],
steps: []
}
}
' > template.yaml
# Using Python script with Jinja2
python generate-template.py --xrd xrd.json --template my-template.j2
# Using Go template CLI
gomplate -f template.gotmpl -d xrd=xrd.json -o template.yaml
3. Provide Tool (@openportal/backstage-provider)
Purpose: Backstage-specific plugin to provide templates to the catalog
Note: This is primarily a Backstage plugin, not a CLI tool
Plugin Configuration:
# app-config.yaml
catalog:
providers:
xrdTemplates:
enabled: true
sourceDirectory: /app/generated-templates # Where transform outputs
watchForChanges: true
schedule:
frequency: { minutes: 5 }
Library Usage (within Backstage):
import { XrdTemplateProvider } from '@openportal/backstage-provider';
// In backend plugin
export default createBackendPlugin({
pluginId: 'catalog',
register(env) {
env.registerInit({
deps: {
catalog: catalogServiceRef,
config: configServiceRef,
scheduler: schedulerServiceRef
},
async init({ catalog, config, scheduler }) {
const provider = new XrdTemplateProvider({
sourceDirectory: config.getString('catalog.providers.xrdTemplates.sourceDirectory'),
schedule: scheduler.createScheduledTaskRunner(/* ... */)
});
catalog.addEntityProvider(provider);
}
});
}
});
Can Be Replaced With (for providing to Backstage):
# Using GitHub Actions to commit templates
git add generated-templates/
git commit -m "Update templates"
git push
# Using curl to register with Backstage API
curl -X POST http://backstage:7007/api/catalog/locations \
-H "Content-Type: application/json" \
-d '{"type":"url","target":"https://github.com/org/templates"}'
# Using static catalog configuration
# In app-config.yaml:
catalog:
locations:
- type: file
target: /app/generated-templates/**/*.yaml
Implementation Plan
Phase 1: Core Libraries (Weeks 1-2)
- Create three npm packages
- Define interfaces between tools
- Implement basic Extract (K8s + files)
- Implement Transform with Eta
- Implement Provide as Backstage plugin
Phase 2: Template System (Weeks 3-4)
- Design template directory structure
- Create default templates
- Implement template selection logic
- Add helper functions for Eta
- Create template documentation
Phase 3: CLI Tools (Week 5)
- Create CLI for Extract
- Create CLI for Transform
- Add piping support (Unix philosophy)
- Add validation commands
- Create examples
Phase 4: Integration (Week 6)
- Update existing ingestor to use new libraries
- Migration guide
- Performance testing
- Documentation
- Examples repository
Tool Composition & Unix Philosophy
Piping Support
The tools support Unix-style piping for composition:
# Full pipeline using our tools
xrd-extract --source kubernetes | xrd-transform --templates ./templates | backstage-provide
# Mix our tools with standard Unix tools
xrd-extract --source file --path "*.yaml" | \
jq '.xrd' | \
xrd-transform --templates ./templates | \
tee generated.yaml | \
wc -l
# Replace middle step with custom script
xrd-extract --source kubernetes | \
python my-transformer.py | \
backstage-provide
Stdin/Stdout Support
Each tool can read from stdin and write to stdout:
# Extract reads file, outputs to stdout
xrd-extract --source file --path xrd.yaml
# Transform reads from stdin, outputs to stdout
cat xrd.json | xrd-transform --templates ./templates
# Provide reads from stdin (when not used as plugin)
cat template.yaml | backstage-provide --api-url http://backstage:7007
Usage Scenarios
Scenario 1: Development Time
# Extract XRD from file
xrd-extract --source file --path my-xrd.yaml --output temp/
# Transform with custom templates
xrd-transform --input temp/my-xrd.json \
--templates ./my-templates \
--output ./generated/
# Review generated template
cat ./generated/my-xrd-template.yaml
Scenario 2: CI/CD Pipeline
# .github/workflows/generate-templates.yml
- name: Extract XRDs from cluster
run: xrd-extract --source kubernetes --output xrds/
- name: Transform to Backstage templates
run: xrd-transform --input xrds/ --templates ./templates --output catalog/
- name: Commit templates
run: |
git add catalog/
git commit -m "Update Backstage templates"
git push
Scenario 3: Kubernetes CronJob
apiVersion: batch/v1
kind: CronJob
metadata:
name: xrd-template-generator
spec:
schedule: "*/30 * * * *"
jobTemplate:
spec:
template:
spec:
containers:
- name: generator
image: openportal/xrd-pipeline:latest
command:
- sh
- -c
- |
xrd-extract --source kubernetes --output /tmp/xrds/
xrd-transform --input /tmp/xrds/ --output /output/
# Output mounted as volume shared with Backstage
Scenario 4: Backstage Embedded
# Standard Backstage deployment
# Provider watches directory populated by external process
catalog:
providers:
xrdTemplates:
sourceDirectory: /shared/templates
Template Examples
Simple Backstage Template (backstage/simple.eta):
apiVersion: scaffolder.backstage.io/v1beta3
kind: Template
metadata:
name: <%= helpers.slugify(xrd.metadata.name) %>
title: <%= xrd.spec.names.kind %>
spec:
owner: platform-team
type: crossplane-resource
parameters:
- title: Resource Configuration
properties:
name:
title: Name
type: string
<% helpers.extractProperties(xrd).forEach(prop => { %>
<%= prop.name %>:
title: <%= prop.title %>
type: <%= prop.type %>
<% }) %>
steps:
- id: create-resource
name: Create <%= xrd.spec.names.kind %>
action: kubernetes:apply
input:
manifest: |
apiVersion: <%= xrd.spec.group %>/<%= xrd.spec.versions[0].name %>
kind: <%= xrd.spec.names.kind %>
metadata:
name: ${{ parameters.name }}
spec: ${{ parameters }}
GitOps Wizard (wizard/gitops.eta):
parameters:
- title: Resource Metadata
required:
- name
- owner
properties:
name:
title: Name
type: string
owner:
title: Owner
type: string
ui:field: OwnerPicker
- title: GitOps Configuration
properties:
repository:
title: Target Repository
type: string
default: catalog-orders
branch:
title: Target Branch
type: string
default: main
createPR:
title: Create Pull Request
type: boolean
default: true
- title: Resource Specification
properties:
<% helpers.extractProperties(xrd).forEach(prop => { %>
<%= prop.name %>:
title: <%= prop.title %>
type: <%= prop.type %>
<% if (prop.description) { %>
description: <%= prop.description %>
<% } %>
<% }) %>
Benefits of This Architecture
- Modularity: Each tool does one thing well
- Flexibility: Tools can be replaced/combined differently
- Testability: Each tool can be tested independently
- Customization: Complete control via templates
- Deployment Options: Run anywhere (CI, K8s, Backstage)
- Developer Experience: Simple CLI tools, clear interfaces
- Maintenance: Smaller, focused codebases
Next Discussion Points
Immediate Decisions Needed
-
Package Naming:
- Option A:
@openportal/xrd-extract,@openportal/xrd-transform,@openportal/backstage-provider - Option B: More generic like
@openportal/k8s-resource-extract,@openportal/template-engine, etc.
- Option A:
-
Template Directory Structure:
- Should we have a standard directory layout that users must follow?
- Or allow complete flexibility with configuration?
-
Error Handling Strategy:
- What happens when a template has an error?
- Should we generate a "safe" default or fail completely?
-
Library vs CLI First:
- Should we build the library first and wrap with CLI?
- Or build CLI first and extract library?
Technical Questions
-
Template Helpers: What helper functions do we need in Eta?
slugify()- Convert names to valid K8s namesextractProperties()- Extract properties from XRD schemagenerateValidation()- Create validation rules- What else?
-
Streaming vs Batch:
- Should Extract output one JSON per XRD?
- Or batch all XRDs in one JSON array?
- How does this affect piping?
-
Configuration:
- How much should be configurable vs convention?
- Environment variables vs config files vs CLI flags?
Future Considerations
-
Template Marketplace:
- Should we plan for a template registry/marketplace?
- How would templates be shared/discovered?
-
Validation & Testing:
- How do we validate Eta templates before use?
- Should we provide a test framework for templates?
-
Migration Path:
- How do we migrate from current ingestor?
- Can we support both architectures temporarily?
Summary
This architecture provides:
- Clear separation of concerns with three independent tools
- Complete flexibility through tool replacement (not plugins)
- Full template control via Eta templates with XRD annotations
- Multiple deployment options (embedded, external, CI/CD)
- Unix philosophy with piping and file-based interfaces
The key insight is that by making tools completely independent and communicating via files, we enable maximum flexibility without the complexity of a plugin system. Users can mix and match our tools with their own scripts, existing tools, or completely custom solutions.