database-schema-designer

$npx skills add softaworks/agent-toolkit --skill database-schema-designer
SKILL.md

Database Schema Designer

Design production-ready database schemas with best practices built-in. * * * Just describe your data model:

Database Schema Designer

Design production-ready database schemas with best practices built-in.

Quick Start

Just describe your data model:
design a schema for an e-commerce platform with users, products, orders
You'll get a complete SQL schema like:
CREATE TABLE users (
  id BIGINT AUTO_INCREMENT PRIMARY KEY,
  email VARCHAR(255) UNIQUE NOT NULL,
  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE orders (
  id BIGINT AUTO_INCREMENT PRIMARY KEY,
  user_id BIGINT NOT NULL REFERENCES users(id),
  total DECIMAL(10,2) NOT NULL,
  INDEX idx_orders_user (user_id)
);
What to include in your request:
  • Entities (users, products, orders)
  • Key relationships (users have orders, orders have items)
  • Scale hints (high-traffic, millions of records)
  • Database preference (SQL/NoSQL) - defaults to SQL if not specified

Triggers

Trigger
Example
design schema
"design a schema for user authentication"
database design
"database design for multi-tenant SaaS"
create tables
"create tables for a blog system"
schema for
"schema for inventory management"
model data
"model data for real-time analytics"
I need a database
"I need a database for tracking orders"
design NoSQL
"design NoSQL schema for product catalog"

Key Terms

Term
Definition
Normalization
Organizing data to reduce redundancy (1NF → 2NF → 3NF)
3NF
Third Normal Form - no transitive dependencies between columns
OLTP
Online Transaction Processing - write-heavy, needs normalization
OLAP
Online Analytical Processing - read-heavy, benefits from denormalization
Foreign Key (FK)
Column that references another table's primary key
Index
Data structure that speeds up queries (at cost of slower writes)
Access Pattern
How your app reads/writes data (queries, joins, filters)
Denormalization
Intentionally duplicating data to speed up reads

Quick Reference

Task
Approach
Key Consideration
New schema
Normalize to 3NF first
Domain modeling over UI
SQL vs NoSQL
Access patterns decide
Read/write ratio matters
Primary keys
INT or UUID
UUID for distributed systems
Foreign keys
Always constrain
ON DELETE strategy critical
Indexes
FKs + WHERE columns
Column order matters
Migrations
Always reversible
Backward compatible first

Process Overview

Your Data Requirements
    |
    v
+-----------------------------------------------------+
| Phase 1: ANALYSIS                                   |
| * Identify entities and relationships               |
| * Determine access patterns (read vs write heavy)   |
| * Choose SQL or NoSQL based on requirements         |
+-----------------------------------------------------+
    |
    v
+-----------------------------------------------------+
| Phase 2: DESIGN                                     |
| * Normalize to 3NF (SQL) or embed/reference (NoSQL) |
| * Define primary keys and foreign keys              |
| * Choose appropriate data types                     |
| * Add constraints (UNIQUE, CHECK, NOT NULL)         |
+-----------------------------------------------------+
    |
    v
+-----------------------------------------------------+
| Phase 3: OPTIMIZE                                   |
| * Plan indexing strategy                            |
| * Consider denormalization for read-heavy queries   |
| * Add timestamps (created_at, updated_at)           |
+-----------------------------------------------------+
    |
    v
+-----------------------------------------------------+
| Phase 4: MIGRATE                                    |
| * Generate migration scripts (up + down)            |
| * Ensure backward compatibility                     |
| * Plan zero-downtime deployment                     |
+-----------------------------------------------------+
    |
    v
Production-Ready Schema

Commands

Command
When to Use
Action
design schema for {domain}
Starting fresh
Full schema generation
normalize {table}
Fixing existing table
Apply normalization rules
add indexes for {table}
Performance issues
Generate index strategy
migration for {change}
Schema evolution
Create reversible migration
review schema
Code review
Audit existing schema
Workflow: Start with design schema → iterate with normalize → optimize with add indexes → evolve with migration

Core Principles

Principle
WHY
Implementation
Model the Domain
UI changes, domain doesn't
Entity names reflect business concepts
Data Integrity First
Corruption is costly to fix
Constraints at database level
Optimize for Access Pattern
Can't optimize for both
OLTP: normalized, OLAP: denormalized
Plan for Scale
Retrofitting is painful
Index strategy + partitioning plan

Anti-Patterns

Avoid
Why
Instead
VARCHAR(255) everywhere
Wastes storage, hides intent
Size appropriately per field
FLOAT for money
Rounding errors
DECIMAL(10,2)
Missing FK constraints
Orphaned data
Always define foreign keys
No indexes on FKs
Slow JOINs
Index every foreign key
Storing dates as strings
Can't compare/sort
DATE, TIMESTAMP types
SELECT * in queries
Fetches unnecessary data
Explicit column lists
Non-reversible migrations
Can't rollback
Always write DOWN migration
Adding NOT NULL without default
Breaks existing rows
Add nullable, backfill, then constrain

Verification Checklist

After designing a schema:
  • Every table has a primary key
  • All relationships have foreign key constraints
  • ON DELETE strategy defined for each FK
  • Indexes exist on all foreign keys
  • Indexes exist on frequently queried columns
  • Appropriate data types (DECIMAL for money, etc.)
  • NOT NULL on required fields
  • UNIQUE constraints where needed
  • CHECK constraints for validation
  • created_at and updated_at timestamps
  • Migration scripts are reversible
  • Tested on staging with production data

Normal Forms

Form
Rule
Violation Example
1NF
Atomic values, no repeating groups
product_ids = '1,2,3'
2NF
1NF + no partial dependencies
customer_name in order_items
3NF
2NF + no transitive dependencies
country derived from postal_code

1st Normal Form (1NF)

-- BAD: Multiple values in column
CREATE TABLE orders (
  id INT PRIMARY KEY,
  product_ids VARCHAR(255)  -- '101,102,103'
);

-- GOOD: Separate table for items
CREATE TABLE orders (
  id INT PRIMARY KEY,
  customer_id INT
);

CREATE TABLE order_items (
  id INT PRIMARY KEY,
  order_id INT REFERENCES orders(id),
  product_id INT
);
`### 2nd Normal Form (2NF)`
-- BAD: customer_name depends only on customer_id
CREATE TABLE order_items (
  order_id INT,
  product_id INT,
  customer_name VARCHAR(100),  -- Partial dependency!
  PRIMARY KEY (order_id, product_id)
);

-- GOOD: Customer data in separate table
CREATE TABLE customers (
  id INT PRIMARY KEY,
  name VARCHAR(100)
);
`### 3rd Normal Form (3NF)`
-- BAD: country depends on postal_code
CREATE TABLE customers (
  id INT PRIMARY KEY,
  postal_code VARCHAR(10),
  country VARCHAR(50)  -- Transitive dependency!
);

-- GOOD: Separate postal_codes table
CREATE TABLE postal_codes (
  code VARCHAR(10) PRIMARY KEY,
  country VARCHAR(50)
);

When to Denormalize

Scenario
Denormalization Strategy
Read-heavy reporting
Pre-calculated aggregates
Expensive JOINs
Cached derived columns
Analytics dashboards
Materialized views
-- Denormalized for performance
CREATE TABLE orders (
  id INT PRIMARY KEY,
  customer_id INT,
  total_amount DECIMAL(10,2),  -- Calculated
  item_count INT               -- Calculated
);

String Types

Type
Use Case
Example
CHAR(n)
Fixed length
State codes, ISO dates
VARCHAR(n)
Variable length
Names, emails
TEXT
Long content
Articles, descriptions
-- Good sizing
email VARCHAR(255)
phone VARCHAR(20)
country_code CHAR(2)

Numeric Types

Type
Range
Use Case
TINYINT
-128 to 127
Age, status codes
SMALLINT
-32K to 32K
Quantities
INT
-2.1B to 2.1B
IDs, counts
BIGINT
Very large
Large IDs, timestamps
DECIMAL(p,s)
Exact precision
Money
FLOAT/DOUBLE
Approximate
Scientific data
-- ALWAYS use DECIMAL for money
price DECIMAL(10, 2)  -- $99,999,999.99

-- NEVER use FLOAT for money
price FLOAT  -- Rounding errors!
`### Date/Time Types`
DATE        -- 2025-10-31
TIME        -- 14:30:00
DATETIME    -- 2025-10-31 14:30:00
TIMESTAMP   -- Auto timezone conversion

-- Always store in UTC
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP
`### Boolean`
-- PostgreSQL
is_active BOOLEAN DEFAULT TRUE

-- MySQL
is_active TINYINT(1) DEFAULT 1

When to Create Indexes

Always Index
Reason
Foreign keys
Speed up JOINs
WHERE clause columns
Speed up filtering
ORDER BY columns
Speed up sorting
Unique constraints
Enforced uniqueness
-- Foreign key index
CREATE INDEX idx_orders_customer ON orders(customer_id);

-- Query pattern index
CREATE INDEX idx_orders_status_date ON orders(status, created_at);

Index Types

Type
Best For
Example
B-Tree
Ranges, equality
price > 100
Hash
Exact matches only
email = 'x@y.com'
Full-text
Text search
MATCH AGAINST
Partial
Subset of rows
WHERE is_active = true

Composite Index Order

CREATE INDEX idx_customer_status ON orders(customer_id, status);

-- Uses index (customer_id first)
SELECT * FROM orders WHERE customer_id = 123;
SELECT * FROM orders WHERE customer_id = 123 AND status = 'pending';

-- Does NOT use index (status alone)
SELECT * FROM orders WHERE status = 'pending';
Rule: Most selective column first, or column most queried alone.

Index Pitfalls

Pitfall
Problem
Solution
Over-indexing
Slow writes
Only index what's queried
Wrong column order
Unused index
Match query patterns
Missing FK indexes
Slow JOINs
Always index FKs

Primary Keys

-- Auto-increment (simple)
id INT AUTO_INCREMENT PRIMARY KEY

-- UUID (distributed systems)
id CHAR(36) PRIMARY KEY DEFAULT (UUID())

-- Composite (junction tables)
PRIMARY KEY (student_id, course_id)
`### Foreign Keys`
FOREIGN KEY (customer_id) REFERENCES customers(id)
  ON DELETE CASCADE     -- Delete children with parent
  ON DELETE RESTRICT    -- Prevent deletion if referenced
  ON DELETE SET NULL    -- Set to NULL when parent deleted
  ON UPDATE CASCADE     -- Update children when parent changes
Strategy
Use When
CASCADE
Dependent data (order_items)
RESTRICT
Important references (prevent accidents)
SET NULL
Optional relationships

Other Constraints

-- Unique
email VARCHAR(255) UNIQUE NOT NULL

-- Composite unique
UNIQUE (student_id, course_id)

-- Check
price DECIMAL(10,2) CHECK (price >= 0)
discount INT CHECK (discount BETWEEN 0 AND 100)

-- Not null
name VARCHAR(100) NOT NULL
`### One-to-Many`
CREATE TABLE orders (
  id INT PRIMARY KEY,
  customer_id INT NOT NULL REFERENCES customers(id)
);

CREATE TABLE order_items (
  id INT PRIMARY KEY,
  order_id INT NOT NULL REFERENCES orders(id) ON DELETE CASCADE,
  product_id INT NOT NULL,
  quantity INT NOT NULL
);
`### Many-to-Many`
-- Junction table
CREATE TABLE enrollments (
  student_id INT REFERENCES students(id) ON DELETE CASCADE,
  course_id INT REFERENCES courses(id) ON DELETE CASCADE,
  enrolled_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
  PRIMARY KEY (student_id, course_id)
);
`### Self-Referencing`
CREATE TABLE employees (
  id INT PRIMARY KEY,
  name VARCHAR(100) NOT NULL,
  manager_id INT REFERENCES employees(id)
);
`### Polymorphic`
-- Approach 1: Separate FKs (stronger integrity)
CREATE TABLE comments (
  id INT PRIMARY KEY,
  content TEXT NOT NULL,
  post_id INT REFERENCES posts(id),
  photo_id INT REFERENCES photos(id),
  CHECK (
    (post_id IS NOT NULL AND photo_id IS NULL) OR
    (post_id IS NULL AND photo_id IS NOT NULL)
  )
);

-- Approach 2: Type + ID (flexible, weaker integrity)
CREATE TABLE comments (
  id INT PRIMARY KEY,
  content TEXT NOT NULL,
  commentable_type VARCHAR(50) NOT NULL,
  commentable_id INT NOT NULL
);

Embedding vs Referencing

Factor
Embed
Reference
Access pattern
Read together
Read separately
Relationship
1:few
1:many
Document size
Small
Approaching 16MB
Update frequency
Rarely
Frequently

Embedded Document

{
  "_id": "order_123",
  "customer": {
    "id": "cust_456",
    "name": "Jane Smith",
    "email": "jane@example.com"
  },
  "items": [
    { "product_id": "prod_789", "quantity": 2, "price": 29.99 }
  ],
  "total": 109.97
}
`### Referenced Document`
{
  "_id": "order_123",
  "customer_id": "cust_456",
  "item_ids": ["item_1", "item_2"],
  "total": 109.97
}
`### MongoDB Indexes`
// Single field
db.users.createIndex({ email: 1 }, { unique: true });

// Composite
db.orders.createIndex({ customer_id: 1, created_at: -1 });

// Text search
db.articles.createIndex({ title: "text", content: "text" });

// Geospatial
db.stores.createIndex({ location: "2dsphere" });

Migration Best Practices

Practice
WHY
Always reversible
Need to rollback
Backward compatible
Zero-downtime deploys
Schema before data
Separate concerns
Test on staging
Catch issues early

Adding a Column (Zero-Downtime)

-- Step 1: Add nullable column
ALTER TABLE users ADD COLUMN phone VARCHAR(20);

-- Step 2: Deploy code that writes to new column

-- Step 3: Backfill existing rows
UPDATE users SET phone = '' WHERE phone IS NULL;

-- Step 4: Make required (if needed)
ALTER TABLE users MODIFY phone VARCHAR(20) NOT NULL;
`### Renaming a Column (Zero-Downtime)`
-- Step 1: Add new column
ALTER TABLE users ADD COLUMN email_address VARCHAR(255);

-- Step 2: Copy data
UPDATE users SET email_address = email;

-- Step 3: Deploy code reading from new column
-- Step 4: Deploy code writing to new column

-- Step 5: Drop old column
ALTER TABLE users DROP COLUMN email;
`### Migration Template`
-- Migration: YYYYMMDDHHMMSS_description.sql

-- UP
BEGIN;
ALTER TABLE users ADD COLUMN phone VARCHAR(20);
CREATE INDEX idx_users_phone ON users(phone);
COMMIT;

-- DOWN
BEGIN;
DROP INDEX idx_users_phone ON users;
ALTER TABLE users DROP COLUMN phone;
COMMIT;
`### Query Analysis`
EXPLAIN SELECT * FROM orders
WHERE customer_id = 123 AND status = 'pending';
Look For
Meaning
type: ALL
Full table scan (bad)
type: ref
Index used (good)
key: NULL
No index used
rows: high
Many rows scanned

N+1 Query Problem

# BAD: N+1 queries
orders = db.query("SELECT * FROM orders")
for order in orders:
    customer = db.query(f"SELECT * FROM customers WHERE id = {order.customer_id}")

# GOOD: Single JOIN
results = db.query("""
    SELECT orders.*, customers.name
    FROM orders
    JOIN customers ON orders.customer_id = customers.id
""")

Optimization Techniques

Technique
When to Use
Add indexes
Slow WHERE/ORDER BY
Denormalize
Expensive JOINs
Pagination
Large result sets
Caching
Repeated queries
Read replicas
Read-heavy load
Partitioning
Very large tables

Extension Points

  1. Database-Specific Patterns: Add MySQL vs PostgreSQL vs SQLite variations
  2. Advanced Patterns: Time-series, event sourcing, CQRS, multi-tenancy
  3. ORM Integration: TypeORM, Prisma, SQLAlchemy patterns
  4. Monitoring: Query performance tracking, slow query alerts