Migrator
Migrates existing SQL queries to semantic layer by analyzing queries and generating model definitions.
Key features:
- Generate model definitions from existing queries
- Rewrite queries to use semantic layer syntax
- Measure what’s missing from your semantic layer
Quick Start
Generate models from your existing queries:
sidemantic migrator analyze queries/ --output-dir models/This analyzes SQL files in queries/ and generates model YAML files in models/.
Example output models/orders.yml:
model:
name: orders
table: orders
dimensions:
- name: status
sql: status
- name: region
sql: region
metrics:
- name: sum_amount
agg: sum
sql: amount
- name: count
agg: count
sql: "*"Add --rewrite to also generate rewritten queries:
sidemantic migrator analyze queries/ --output-dir models/ --rewriteOriginal query:
SELECT status, region, SUM(amount), COUNT(*)
FROM orders
WHERE status = 'completed'
GROUP BY status, regionRewritten query:
SELECT orders.status, orders.region, orders.sum_amount, orders.count
FROM orders
WHERE status = 'completed'Supported Patterns
Basic Aggregations
Extracts dimensions from GROUP BY and metrics from aggregations:
SELECT
status,
SUM(amount) as revenue,
COUNT(DISTINCT customer_id) as customers
FROM orders
GROUP BY statusComplex Expressions
Handles CASE, COALESCE, CAST, math, string functions:
SELECT
UPPER(status) as status,
SUM(CASE WHEN priority = 'high' THEN amount ELSE 0 END) as high_priority_revenue,
SUM(quantity * price) as total_revenue
FROM orders
GROUP BY UPPER(status)Extracts underlying columns (status, amount, quantity, price).
Time Dimensions
SELECT
DATE_TRUNC('month', order_date) as month,
EXTRACT(YEAR FROM order_date) as year,
SUM(amount) as revenue
FROM orders
GROUP BY DATE_TRUNC('month', order_date), EXTRACT(YEAR FROM order_date)Extracts order_date as time dimension. Rewrites to orders.order_date__month, orders.order_date__year.
Derived Metrics
SELECT
status,
SUM(revenue) / COUNT(*) as avg_revenue_per_order,
SUM(amount - discount) as net_amount
FROM orders
GROUP BY statusExtracts base metrics and derived metrics separately:
metrics:
- name: sum_revenue
agg: sum
sql: revenue
- name: count
agg: count
sql: "*"
- name: avg_revenue_per_order
type: derived
sql: "SUM(revenue) / COUNT(*)"Joins and Relationships
SELECT
c.region,
COUNT(o.order_id) as order_count
FROM customers c
JOIN orders o ON c.id = o.customer_id
GROUP BY c.regionExtracts relationship: orders.customer_id → customers.id (many_to_one).
# orders.yml
relationships:
- name: customers
type: many_to_one
foreign_key: customer_idSubqueries
Resolves subquery aliases to underlying tables:
SELECT
sub.status,
COUNT(*) as order_count
FROM (
SELECT status, amount
FROM orders
WHERE amount > 100
) sub
GROUP BY sub.statusExtracts orders table and resolves sub.status → orders.status.
Window Functions and Cumulative Metrics
Detects window functions and generates cumulative metric definitions.
Running totals:
SELECT
order_date,
SUM(amount) OVER (ORDER BY order_date) as running_total
FROM ordersmetrics:
- name: running_total
type: cumulative
sql: "orders.sum_amount"Rolling windows:
SELECT
order_date,
SUM(amount) OVER (
ORDER BY order_date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) as rolling_7day
FROM ordersmetrics:
- name: rolling_7day
type: cumulative
sql: "orders.sum_amount"
window: "6 days"Period-to-date:
SELECT
order_date,
SUM(amount) OVER (
PARTITION BY DATE_TRUNC('month', order_date)
ORDER BY order_date
) as mtd_revenue
FROM ordersmetrics:
- name: mtd_revenue
type: cumulative
sql: "orders.sum_amount"
grain_to_date: "month"What’s extracted:
- Aggregation window functions:
SUM/AVG/COUNT/MIN/MAXwithOVER() - Rolling windows with
ROWS BETWEEN - Period-to-date with
PARTITION BY DATE_TRUNC()
What’s ignored:
- Window-only functions:
ROW_NUMBER(),RANK(),LAG(),LEAD() - Complex window calculations like
COUNT(*) * 100.0 / SUM(COUNT(*)) OVER()
Base aggregations inside complex calculations are still extracted as regular metrics.
Other Patterns
CTEs: Extracts from underlying tables, preserves structure in rewrites
GROUP BY ordinals: GROUP BY 1, 2 resolved to columns
Implicit joins: FROM a, b WHERE a.id = b.fk extracts relationships
Self-joins: Generates single model
CLI Usage
Generate models from queries:
sidemantic migrator analyze queries/ --output-dir models/Generate models and rewritten queries:
sidemantic migrator analyze queries/ --output-dir models/ --rewriteThis analyzes all SQL files in queries/ and generates: - Model YAML files in models/ - Rewritten queries in queries_rewritten/ (if --rewrite flag used)
Other commands:
# Analyze specific query
sidemantic migrator analyze-query \
"SELECT status, COUNT(*) FROM orders GROUP BY status"
# Generate coverage report
sidemantic migrator report queries/ > coverage-report.txtOptions:
--pattern "*.sql"- File pattern to match--output-dir- Where to write generated models--rewrite- Also generate rewritten queries--verbose- Show detailed analysis
Migration Workflow
1. Analyze queries and generate models
sidemantic migrator analyze queries/ --output-dir models/generated/ --verboseReview generated models in models/generated/ directory.
2. Generate rewritten queries
sidemantic migrator analyze queries/ --output-dir models/generated/ --rewriteThis creates: - Model definitions in models/generated/ - Rewritten queries in queries_rewritten/
3. Test rewritten queries
Compare results between original and rewritten queries to ensure correctness.
4. Deploy models
cp models/generated/*.yml models/Load into your semantic layer and start using.
Troubleshooting
Low Coverage
Cause: Missing models, dimensions, or metrics.
Fix: Generate models from your queries first:
sidemantic migrator analyze queries/ --output-dir models/Then load those models into your semantic layer before re-analyzing.
Dimensions Not Extracted
Cause: No GROUP BY clause.
-- ❌ No dimensions (no GROUP BY)
SELECT COUNT(*) FROM orders
-- ✅ Extracts status
SELECT status, COUNT(*) FROM orders GROUP BY statusRelationships Not Detected
Causes:
- Non-equi joins
- Complex conditions
- Multiple join keys
Fix: Manually add relationships to generated models or use simple equality joins in queries.
Metric Names Don’t Match
Analyzer generates names from agg_column:
SUM(amount)→sum_amountCOUNT(*)→count
Use aliases to control names:
SELECT SUM(amount) as total_revenue FROM orders -- Generates "total_revenue"Python API
Get coverage metrics from report:
report = analyzer.analyze_queries(queries)
print(f"Coverage: {report.coverage_percentage:.1f}%")
print(f"Missing models: {report.missing_models}")
print(f"Missing dimensions: {report.missing_dimensions}")Provide database connection for better inference:
import duckdb
conn = duckdb.connect(":memory:")
analyzer = Migrator(layer, connection=conn)Benefits: Resolves ambiguous columns, infers relationships from foreign keys, detects primary keys.