Common Gotchas and Solutions
This guide covers common pitfalls and their solutions when working with Sidemantic.
Parameters and Filters
Don’t add quotes around parameter placeholders
# Wrong - creates double quotes
filters = [f"orders.order_date >= '{{{{ start_date }}}}'"]
# Result: orders.order_date >= ''2024-01-01'' (invalid SQL)
# Correct
filters = ["orders.order_date >= {{ start_date }}"]
# Result: orders.order_date >= '2024-01-01'Why: Parameters are automatically formatted with quotes based on their type. Adding extra quotes creates ''value'' which SQLGlot can’t parse.
Parameter formatting: - string: Adds quotes → 'value' - date: Adds quotes → '2024-01-01' - number: No quotes → 100 - unquoted: No quotes → table_name
Filters automatically trigger joins
Sidemantic automatically joins tables when you reference them in filters:
# This works - customers table is auto-joined
sql = layer.compile(
metrics=["orders.revenue"],
dimensions=["orders.order_date"],
filters=["customers.region = 'US'"] # Auto-joins customers!
)Why: The SQL rewriter detects that customers.region is referenced and automatically adds the customers dimension to trigger the join.
If you encounter “Table not found” errors: 1. Check that relationships are properly defined 2. Verify the table/dimension name is correct 3. Ensure there’s a join path between the tables
Filter Parsing
Complex SQL in filters
Filters support standard SQL comparison operators and simple expressions:
# ✅ These work
filters = [
"orders.order_date >= '2024-01-01'",
"orders.amount > 100",
"orders.status IN ('completed', 'shipped')",
"customers.region = 'US' AND customers.tier = 'premium'"
]Not supported: - Subqueries in filters - Window functions in filters - Complex CASE expressions in filters
Workaround for complex logic: Use metric-level filters or derived metrics:
# Instead of complex filter in query
# Create a filtered metric
Metric(
name="completed_revenue",
agg="sum",
sql="amount",
filters=["{model}.status = 'completed'"] # Applied automatically
)
# Then query the filtered metric
layer.compile(metrics=["orders.completed_revenue"])Symmetric Aggregates
Understanding fan-out handling
Sidemantic automatically uses symmetric aggregates when needed:
# Single one-to-many relationship - regular aggregation
sql = layer.compile(
metrics=["orders.revenue", "order_items.quantity"]
)
# Uses: SUM(orders_cte.revenue_raw) - regular sum
# Multiple one-to-many relationships - symmetric aggregates
sql = layer.compile(
metrics=["orders.revenue", "order_items.quantity", "shipments.count"]
)
# Uses: SUM(DISTINCT HASH(...) + revenue) - symmetric aggregatesWhy: Symmetric aggregates prevent double-counting when you have ≥2 one-to-many joins creating fan-out. With a single join, regular aggregation is correct and faster.
To verify:
sql = layer.compile(...)
print(sql)
# Look for: HASH(primary_key) in the SQLRelationships
Understanding foreign_key direction
Relationship types determine where the foreign key lives:
# many_to_one: foreign_key is in THIS table
orders = Model(
name="orders",
table="orders",
primary_key="order_id",
relationships=[
# foreign_key is in orders table (orders.customer_id)
Relationship(name="customers", type="many_to_one", foreign_key="customer_id")
]
)
# one_to_many: foreign_key is in the OTHER table
customers = Model(
name="customers",
table="customers",
primary_key="customer_id",
relationships=[
# foreign_key is in orders table (orders.customer_id)
Relationship(name="orders", type="one_to_many", foreign_key="customer_id")
]
)Rule of thumb: - many_to_one: foreign_key is in this table (most common for fact tables) - one_to_many: foreign_key is in the other table (most common for dimension tables) - one_to_one: foreign_key can be in either table, specify which one
SQL Generation
Don’t forget to set primary_key
# Wrong - no primary key
orders = Model(
name="orders",
# primary_key missing!
)
# Result: Symmetric aggregates won't work
# Result: Defaults to "id" which might not existWhy: Symmetric aggregates require primary_key to hash for deduplication.
# Correct
orders = Model(
name="orders",
primary_key="id", # or whatever your PK is
)ORDER BY
Use column aliases in order_by
# Wrong
sql = layer.compile(
metrics=["orders.revenue"],
dimensions=["orders.order_date"],
order_by=["orders.order_date"] # Full reference won't work
)
# Correct
sql = layer.compile(
metrics=["orders.revenue"],
dimensions=["orders.order_date"],
order_by=["order_date"] # Just column name
)Why: ORDER BY uses the column alias from the final SELECT, not the table reference.
Also works with metrics:
sql = layer.compile(
metrics=["orders.revenue"],
dimensions=["orders.order_date"],
order_by=["revenue DESC", "order_date"] # Sort by metric and dimension
)Common Error Messages
“Table X not found”
Cause: Table isn’t joined in the query, or relationship is missing. Fix: 1. Check that relationships are properly defined between models 2. Verify the table/dimension reference is correct 3. Ensure there’s a join path between the tables
“Column X_raw not found”
Cause: Trying to reference a metric’s raw column directly in custom SQL. Fix: Metrics are stored as {name}_raw in CTEs. Reference the metric by name in your query, not the raw column.
“Failed to parse … into Condition”
Cause: Parameter produced invalid SQL (usually a quoting issue). Fix: Don’t add quotes around { param } placeholders - they’re added automatically based on type.
Performance
Slow queries with multiple one-to-many joins?
Multiple one-to-many relationships trigger symmetric aggregates which are slower:
# 3 one-to-many joins = symmetric aggregates
sql = layer.compile(
metrics=["orders.revenue", "items.qty", "shipments.count", "notes.count"]
)
# This works but may be slow on large datasetsOptimization strategies:
Query subsets separately: Break into multiple queries and combine results
Use pre-aggregations: Enable
use_preaggregations=Truewith materialized rollupsDenormalize data: Consider flattening your schema for frequently-joined tables
Filter early: Add restrictive filters to reduce row counts before joins
Debugging
Inspect generated SQL
Always check the generated SQL to understand what’s happening:
sql = layer.compile(
metrics=["orders.revenue"],
dimensions=["orders.order_date"],
filters=["orders.status = 'completed'"]
)
print(sql) # See the actual SQL that will runCheck for symmetric aggregates
if "HASH(" in sql:
print("Using symmetric aggregates (multiple one-to-many joins)")
else:
print("Using regular aggregation")Verify joins
if "LEFT JOIN customers_cte" in sql:
print("Customers table is joined")Build complexity incrementally
# Start simple
sql1 = layer.compile(
metrics=["orders.revenue"],
dimensions=["orders.order_date"]
)
# Add joins
sql2 = layer.compile(
metrics=["orders.revenue"],
dimensions=["orders.order_date", "customers.region"]
)
# Add filters
sql3 = layer.compile(
metrics=["orders.revenue"],
dimensions=["orders.order_date", "customers.region"],
filters=["customers.region = 'US'"]
)Best Practices
- Always inspect the generated SQL - Use
print(sql)to see what’s generated - Start simple, add complexity incrementally - Test each addition
- Use parameters correctly - Don’t add quotes around
{ param }placeholders - Define relationships properly - Ensure join paths exist between models
- Set primary keys - Required for symmetric aggregates and some metric types
- Test with real data - Edge cases reveal issues
- Use pre-aggregations for performance - Materialize frequently-queried rollups
- Check query plans - Use
EXPLAIN ANALYZEfor slow queries
Getting Help
If you encounter issues:
- Inspect the SQL:
print(layer.compile(...))to see generated SQL - Test directly: Run the SQL in DuckDB to isolate Sidemantic vs database issues
- Check relationships: Verify join paths exist between models
- Simplify: Remove complexity until it works, then add back incrementally
- Review examples: Check
examples/directory for working patterns - Check tests: The test suite has examples of nearly every feature