Database & SQL Quick Reference
Databases power every modern application,from banking systems to social networks. Understanding database fundamentals is essential for building scalable, reliable systems that protect data integrity and deliver exceptional performance. This comprehensive reference covers the core concepts every engineer and technical leader needs to make informed architectural decisions.
ACID Properties
ACID defines the gold standard for database transactions. These four guarantees,Atomicity, Consistency, Isolation, and Durability,ensure your data remains reliable even when systems crash, networks fail, or multiple users compete for the same records. Think of ACID as the safety net that prevents partial updates, corrupted data, and lost changes.
Atomicity
All or Nothing
Like an online checkout: you either get a confirmed order (payment + inventory reserved), or you get nothing. No “half-order.”
A transaction is treated as a single unit: all steps succeed and commit together, or any failure rolls everything back.
- All steps succeed or everything rolls back
- Prevents “charged but no order” scenarios
- Failures leave the database unchanged
- Makes multi-step operations safe
Consistency
Rules Always Apply
Like ticketing: you can’t sell the same seat twice, and you can’t sell a seat that doesn’t exist. The rules prevent “impossible” outcomes.
A transaction must move the database from one valid state to another valid state, obeying constraints like keys, checks, and relationships.
- Constraints (PK/FK/UNIQUE/CHECK) are always enforced
- Rejects invalid data instead of saving it
- Invariants hold before and after the transaction
- Prevents broken relationships (no orphan rows)
Isolation
Transactions Don't Interfere
Like two people trying to buy the last concert ticket at the same time: each checkout session should behave predictably, not see the other person’s in-progress changes.
Concurrent transactions behave as if they ran separately (depending on isolation level). You don’t read someone else’s uncommitted work.
- Uncommitted changes aren’t visible to others
- Prevents anomalies (dirty / non-repeatable / phantom reads)
- Isolation levels trade consistency vs concurrency
- Higher isolation can mean more blocking
Durability
Changes are Permanent
Like getting “Order confirmed” on your phone: even if the app crashes or the server restarts, the order is still there when you come back.
Once a transaction commits, the database won’t lose it. Changes are persisted (typically via transaction logs) and can be recovered after failures.
- A committed transaction won’t be “forgotten”
- Transaction logs enable crash recovery
- Replicas/backups improve durability further
- Power loss shouldn’t corrupt committed data
Normalization
Normalization is the art of organizing database tables to eliminate redundancy and prevent data anomalies. By systematically breaking down tables into smaller, focused structures, you ensure each piece of information lives in exactly one place. This reduces storage costs, simplifies updates, and prevents inconsistencies,though sometimes you'll intentionally denormalize for performance gains.
First Normal Form (1NF)
Atomic Values Only
Like an order form: each blank expects one answer. You don’t write “red, blue, green” in the “Color” box , you add separate rows/items.
Store one value per column (no lists/arrays in a cell), avoid repeating column groups, and ensure rows are uniquely identifiable.
- No comma-separated lists in a column
- No repeating columns like Phone1/Phone2/Phone3
- Each row has a primary key (unique identifier)
- One row = one “thing” being described
Second Normal Form (2NF)
No Partial Dependencies
Like a receipt line item: (OrderId, ProductId) identifies the line. ProductName shouldn’t be stored there , it depends only on ProductId, not the whole pair.
If a table uses a composite key, every non-key column must depend on the full key. If a column depends on only part of the key, move it to a table where it fits.
- Requires 1NF
- Relevant mainly when you have a composite primary key
- Eliminates columns that depend on only part of the key
- Typically splits “lookup” data into its own table
Third Normal Form (3NF)
No Transitive Dependencies
Like shipping addresses: if you store ZipCode and also store City/State in the same row, City/State really depend on ZipCode. Put Zip → City/State in its own table.
Non-key columns should depend only on the primary key , not on other non-key columns. Move “derived/lookup” facts to separate tables.
- Requires 2NF
- Stops non-key → non-key dependencies
- Avoids update anomalies (change it once, not everywhere)
- Reduces duplicated descriptive data
Boyce-Codd Normal Form (BCNF)
Every Determinant Is a Candidate Key
Like a keycard system: any attribute that can identify others must itself be a keycard.
Stronger than 3NF. For every functional dependency X → Y, X must be a candidate key. Prevents edge-case anomalies.
- Requires 3NF first
- Eliminates dependencies on non-key determinants
- Useful for overlapping candidate keys
- May require extra tables
Normalization Goals
Why Normalize?
Like keeping one contact card per person: you don’t want the same phone number copied into 20 places where it can get out of sync.
Normalize to reduce duplication and prevent “anomaly” bugs: insert problems, update problems, and delete problems caused by storing the same fact in multiple places.
- Reduce duplicated facts (single source of truth)
- Prevent update anomalies (change once)
- Prevent insert/delete anomalies (no forced fake data)
- Improve data integrity and maintainability
Denormalization
Intentional Redundancy
Like caching a summary page so you don’t re-calculate everything for every request.
Intentionally duplicate data to speed reads or simplify queries. Great for reporting and analytics when writes are controlled.
- Improves read performance
- Increases write complexity
- Requires sync strategy (triggers, jobs, ETL)
- Best for read-heavy workloads
Indexes & Performance
Indexes are the secret weapon for database performance. Like a book's index helping you find topics instantly, database indexes accelerate queries by orders of magnitude. Understanding clustered vs. non-clustered indexes, covering indexes, and index strategies turns slow table scans into lightning-fast seeks,but they come with storage overhead and write penalties you need to balance.
Clustered Index
Physical Row Order
Like a book where pages are physically bound in a single order. You can’t have the same book “physically sorted” two different ways at once.
The table’s rows are stored on disk in the clustered index order. Great for range lookups, but you only get one because data can only be physically ordered one way.
- Defines the table’s physical row order
- Only one clustered index per table
- Excellent for range queries (date ranges, ID ranges)
- Often (but not always) the primary key
Non-Clustered Index
Separate Index Structure
Like the index at the back of a textbook: for each keyword, it lists page numbers where the topic appears. You can have many indexes (by topic, by name, by acronym).
A separate structure that maps key values to where the rows live. You can build many of these for different query patterns.
- Multiple per table are common
- Stores index keys + a pointer to the row (or clustered key)
- Speeds up selective lookups and joins
- Costs extra storage and write overhead
Covered Index
All Columns in Index
Like a book index entry that includes a short definition/summary right there. You can answer the question without turning to the referenced page.
When the index contains every column needed by the query (keys + included columns), SQL Server can answer it from the index alone.
- Avoids Key Lookup / Bookmark Lookup back to the base table
- Big win when you’d otherwise do many lookups (N lookups = lots of random reads)
- Built using key columns + INCLUDE columns (SQL Server)
- Tradeoff: bigger index, more memory, and slower writes/updates
Filtered Index
Partial Index with WHERE
Like an index that only lists entries for one chapter (e.g., “Chapter 7: Concurrency”) instead of indexing the entire book. Smaller and faster for that chapter’s questions.
Indexes only the rows that match a filter (WHERE clause). This makes the index smaller and faster for queries that use the same filter.
- Indexes only rows that meet a condition (e.g., IsActive = 1)
- Smaller index = fewer pages to read
- Great for highly skewed/sparse data
- Only helps when queries match the filter predicate
Composite Index
Multi-Column Index
Like a phone book sorted by Last Name, then First Name , order matters.
Index built on multiple columns. Supports queries that filter or sort using the left-most columns first.
- Column order is critical
- Supports left-most prefix searches
- Helps multi-column filters and ORDER BY
- Not helpful if leading column isn’t used
Index Scan
Reading All Index Pages
Like flipping through the book page-by-page until you find what you need. It’s fine if you plan to read most of the book anyway.
The engine reads through many (or all) index entries. This can be the right choice when a query returns a large portion of the table.
- Reads lots of pages (sometimes all of them)
- Often chosen when results are large or filters aren’t selective
- Plan operator you’ll see: Index Scan (or Table Scan if no useful index)
- Not always “bad” , can be efficient for big reads
- Cost grows roughly with table/index size
Index Seek
Targeted Index Lookup
Like using the book’s index to jump directly to the right page numbers instead of scanning every page.
The engine navigates the index tree to jump to the matching key range, then reads just the matching entries.
- Best for selective filters (small result sets)
- Uses the index tree to jump to the right place
- Plan operator you’ll see: Index Seek
- Usually much less I/O than a scan
- If you need extra columns, this can pair with Key Lookup (lots of lookups can hurt)
- Cost grows with the number of matching rows
Key Lookup
Fetch Missing Columns
Like finding a book in the index, then walking to the shelf to read the full page.
Happens when a nonclustered index finds rows but the query needs extra columns not in the index. The engine fetches the base row.
- One lookup per matching row
- Can cause many random reads
- Fix with a covered index (INCLUDE columns)
- Sometimes acceptable for small result sets
Isolation Levels
Isolation levels control how transactions see each other's changes. This is the critical tradeoff between data consistency and system concurrency. Higher isolation prevents anomalies like dirty reads and phantom rows but increases blocking. Lower isolation boosts throughput but risks reading uncommitted or inconsistent data. Choosing the right level depends on your application's tolerance for stale data vs. need for speed.
READ UNCOMMITTED
Dirty Reads Allowed
Like reading someone's draft email before they finish writing it – it might change.
Lowest isolation level. Statements can read rows modified by other transactions but not yet committed. Allows dirty reads.
- No locks on reads
- Allows dirty reads
- Highest concurrency
- Least data accuracy
READ COMMITTED
Default SQL Server Level
Like reading only published articles, not drafts – you see only finalized content.
Statements cannot read data modified but not committed by others. Prevents dirty reads but allows non-repeatable reads.
- Default isolation level
- No dirty reads
- Shared locks released quickly
- Non-repeatable reads possible
REPEATABLE READ
Locked Until Transaction Ends
Like bookmarking articles – the ones you saw stay unchanged, but new ones can appear.
Data read cannot be modified by other transactions until current transaction completes. Prevents non-repeatable reads, allows phantom reads.
- Locks held until commit
- Same data on re-read
- Phantom reads possible
- More blocking than READ COMMITTED
SNAPSHOT
Point-in-Time View
Like viewing a photo taken at the start – the world changes but your photo stays the same.
Reads data as it existed at the start of the transaction using row versioning in tempdb. No shared locks needed.
- Uses row versioning
- No read locks needed
- Consistent point-in-time view
- Tempdb overhead
SERIALIZABLE
Highest Isolation
Like locking a filing cabinet – no one can add, remove, or change files while you're reading.
Highest isolation level. Locks ranges of data to prevent phantom reads. Transactions execute as if they ran sequentially.
- Range locks applied
- No phantom reads
- Maximum consistency
- Lowest concurrency
READ COMMITTED SNAPSHOT (RCSI)
Row Versioning for Reads
Like reading a published edition of a book while writers update the next edition in the background.
SQL Server uses row versions for READ COMMITTED, removing reader/writer blocking while still avoiding dirty reads.
- Statement-level snapshot view
- Readers don’t block writers
- Uses tempdb row versions
- Non-repeatable reads still possible
Locks & Concurrency
Locks are the traffic lights of database concurrency. They coordinate access when multiple users modify the same data simultaneously, preventing conflicts and corruption. Shared locks allow multiple readers, exclusive locks grant sole write access, and update locks prevent deadlocks. Understanding lock behavior helps you diagnose blocking issues and design applications that scale under concurrent load.
Shared Locks
Multiple Readers Allowed
Like a library book: multiple people can read the same book, but no one can edit it while being read.
Used for read operations. Multiple transactions can hold shared locks on the same resource simultaneously.
- Multiple readers allowed
- Blocks exclusive locks
- Used by SELECT statements
- Released after read
Exclusive Locks
Single Writer Only
Like editing a document: only one person can edit, and no one else can read or write.
Used for data-modification operations (INSERT, UPDATE, DELETE). Ensures only one transaction modifies data at a time.
- Blocks all other locks
- Prevents reads and writes
- Used by DML statements
- Held until transaction ends
Update Locks
Prevents Deadlocks
Like reserving a seat before buying a ticket: signals intent to modify without blocking readers yet.
Used when reading data with intent to update. Prevents common deadlock scenarios between two concurrent updates.
- Signals update intent
- Converts to exclusive lock
- Prevents deadlocks
- Compatible with shared locks
Deadlock
Circular Wait
Like two cars at an intersection, each waiting for the other to move first – neither can proceed.
Occurs when two transactions are waiting for each other to release locks. SQL Server detects and resolves by killing one transaction.
- Circular lock dependency
- One transaction rolled back
- Deadlock victim chosen
- Minimize with proper indexing
NOLOCK Hint
Read Without Locking
Like peeking at someone's screen while they're typing – you might see incomplete or changing data.
Query hint that allows dirty reads. No locks are taken, preventing blocking but allowing inconsistent data.
- No shared locks taken
- Allows dirty reads
- Improves concurrency
- Risk of inconsistent data
Intent Locks
Locking Hierarchy Signals
Like putting a “reserved” sign on a table before placing individual plates.
Indicate a transaction intends to take locks at a lower level (row/page). Prevents incompatible higher-level locks.
- IS, IX, SIX lock types
- Supports lock hierarchy (table → page → row)
- Helps avoid conflicts
- Automatically managed by the engine
Lock Escalation
Row/Page → Table
Like replacing many small locks with one big padlock to simplify management.
SQL Server can convert many row/page locks into a table lock to reduce overhead, which can increase blocking.
- Reduces lock memory overhead
- Can increase contention
- Triggered by lock thresholds
- Visible in execution plans and DMVs
Data Integrity
Data integrity constraints are your database's immune system,they reject bad data before it can infect your system. Primary keys ensure every row is uniquely identifiable, foreign keys maintain referential integrity between tables, check constraints validate business rules, and unique constraints prevent duplicates. These guardrails catch errors at the database level, where they're easiest and cheapest to fix.
Entity Integrity
Primary Key Constraint
Like every person having a unique social security number – no duplicates, no blanks.
Every table must have a primary key, and the primary key columns must be unique and not null.
- Primary key required
- Must be unique
- Cannot be null
- Identifies each row uniquely
Referential Integrity
Foreign Key Constraint
Like ensuring every order references a real customer – no orphaned orders pointing to non-existent people.
Foreign key values must refer to existing primary key values in the referenced table, or be null.
- Foreign keys must match
- References existing primary key
- Can be null if allowed
- Prevents orphaned records
Domain Integrity
Valid Values Only
Like a form accepting only valid dates, not February 31st or years beyond 2100.
Ensures data values follow defined rules for valid values, ranges, and formats using constraints and rules.
- Data type enforcement
- Check constraints
- Default values
- Valid range and format
Unique Constraint
No Duplicate Values
Like usernames: two people can’t pick the same one.
Ensures all values in a column (or column set) are distinct. Allows a single NULL depending on the database.
- Prevents duplicates
- Creates a supporting index
- Can cover multiple columns
- Different from primary key (NULLs)
Check Constraint
Custom Validation Rules
Like a bouncer checking age before entry – only valid rows are allowed in.
Validates column values with a Boolean expression (e.g., Amount > 0, Status IN ('New','Closed')).
- Rejects invalid data
- Enforces business rules
- Can reference multiple columns
- Checked on INSERT and UPDATE
Default Constraint
Automatic Column Values
Like a form that auto-fills today’s date if you leave it blank.
Provides a default value when none is supplied for a column, ensuring consistent and predictable data.
- Used on INSERT when value is omitted
- Common for timestamps and flags
- Can be constants or functions
- Improves data consistency
Query Optimization
Query optimization turns slow queries into fast ones through intelligent execution planning. The database engine analyzes statistics, evaluates multiple strategies, and chooses the best path to retrieve your data. Execution plans reveal what's happening under the hood, while concepts like SARGability and join algorithms help you write queries the optimizer can accelerate. Small query changes can yield massive performance improvements.
Execution Plan
Query Roadmap
Like GPS showing the route taken: which indexes used, how tables joined, where time was spent.
Visual representation of steps SQL Server takes to execute a query. Essential for identifying performance bottlenecks.
- Shows query strategy
- Identifies bottlenecks
- Index usage visible
- Cost percentages shown
Statistics
Data Distribution Info
Like a census helping city planners: statistics help SQL Server plan the best query strategy.
Metadata about data distribution in columns. SQL Server uses statistics to estimate cardinality and choose optimal execution plans.
- Tracks data distribution
- Helps query optimizer
- Auto-created on indexes
- Should be kept updated
Cardinality
Uniqueness of Values
Like measuring diversity in a group: high cardinality is many unique values, low is mostly duplicates.
Refers to uniqueness of data values in a column. High cardinality (many unique values) makes better indexes.
- Measures value uniqueness
- High = good for indexes
- Low = poor selectivity
- Affects index effectiveness
Query Hints
Override Optimizer
Like telling GPS to avoid highways – you override its decision with your preference.
Instructions that override SQL Server's query optimizer decisions. Use sparingly when optimizer doesn't choose best plan.
- NOLOCK, FORCESEEK, etc.
- Forces specific behavior
- Use with caution
- Can improve or hurt performance
SARGable Queries
Search ARGument ABLE
Like searching for "Smith" vs "names containing 'mit'" – the first can use an index efficiently.
Queries written so SQL Server can use indexes effectively. Avoid functions on indexed columns in WHERE clauses.
- Allows index usage
- No functions on columns
- Simple predicates preferred
- Much faster queries
Join Algorithms
Nested Loop / Hash / Merge
Like choosing between hand-matching lists, hashing names into buckets, or merging two sorted lists.
SQL Server picks a join strategy based on row counts, indexes, and sort order to minimize cost.
- Nested Loop: small input + index
- Hash Join: large, unsorted inputs
- Merge Join: sorted inputs
- Driven by statistics and cardinality
Parameter Sniffing
Plan Based on First Input
Like cooking for the first guest size, then reusing that plan for every party size.
SQL Server compiles a plan using initial parameter values. Later executions with different values can perform poorly.
- Plan cached from first execution
- Skewed data makes this worse
- Solutions: recompile, optimize for, hints
- Not always a problem
Database Objects & Features
Database objects extend SQL beyond simple queries. Stored procedures encapsulate complex logic, views provide abstraction layers, triggers automate responses to data changes, and window functions enable sophisticated analytics. These tools help you build robust data layers that handle complex business logic efficiently while maintaining security and consistency across your application.
Stored Procedures
Precompiled SQL
Like a recipe saved in a cookbook: reusable, optimized, and everyone executes it the same way.
Precompiled collection of SQL statements stored on the server. Improves performance, security, and code reusability.
- Precompiled execution plan
- Reduces network traffic
- Enforces security
- Reusable code
Views
Virtual Tables
Like a window showing a specific view of your yard: it's not a separate yard, just a different perspective.
Virtual table based on a SELECT query. Simplifies complex queries, provides security by limiting column access.
- Stored query definition
- No data duplication
- Simplifies complex joins
- Provides security layer
Triggers
Automatic Actions
Like a smoke alarm: automatically triggers an action when a specific event occurs.
Stored procedures that automatically execute when INSERT, UPDATE, or DELETE events occur on a table.
- Event-driven execution
- Enforces business rules
- Audit trail creation
- Can affect performance
CTE (Common Table Expression)
Temporary Named Result
Like writing notes on a whiteboard before solving a math problem – temporary workspace.
Temporary named result set that exists within a single query. Makes complex queries more readable.
- Improves query readability
- Can be recursive
- Scoped to single query
- No performance difference vs subquery
Window Functions
ROW_NUMBER, RANK, LAG
Like looking at your position in a race while also seeing everyone else's – contextual calculations.
Functions that perform calculations across a set of rows related to the current row. Includes ranking, aggregate, and analytic functions.
- OVER clause defines window
- Avoids self-joins
- Running totals, rankings
- Very powerful for analytics
Table Partitioning
Divide Large Tables
Like organizing files into folders by year: easier to manage and find what you need.
Divides large tables into smaller pieces based on column values. Improves query performance and manageability.
- Horizontal table splitting
- Based on partition function
- Improves query performance
- Easier maintenance
Transactions
BEGIN / COMMIT / ROLLBACK
Like bundling steps into a single purchase: either everything succeeds or nothing is charged.
Group multiple statements into a single unit of work. Commit saves changes; rollback undoes them.
- Ensures atomic changes
- Supports error recovery
- Explicit or implicit transactions
- Critical for multi-step updates
User-Defined Functions (UDFs)
Reusable SQL Logic
Like a helper function you call from multiple places in code.
Encapsulates logic in scalar or table-valued functions. Good for reuse, but can impact performance if misused.
- Scalar or table-valued
- Encourages reuse and consistency
- Inline TVFs often perform best
- Beware row-by-row execution
SQL Query Processing
Understanding how SQL processes queries demystifies unexpected results and performance issues. Queries execute in a specific logical order,FROM before WHERE, WHERE before SELECT,which explains why you can't reference column aliases in WHERE clauses. JOINs combine tables, subqueries nest logic, and set operations merge results. Mastering these fundamentals helps you write correct, efficient queries on the first try.
Logical Query Processing Order
FROM → WHERE → GROUP BY → HAVING → SELECT → ORDER BY
Like cooking: gather ingredients (FROM), pick what you need (WHERE), group them (GROUP BY), filter groups (HAVING), plate (SELECT), arrange (ORDER BY).
The order SQL Server logically processes query clauses, regardless of how you write them. Understanding this is crucial for query writing.
- FROM clause processes first
- WHERE filters before grouping
- HAVING filters after grouping
- SELECT comes near the end
Subqueries
Query Within Query
Like solving a math problem that requires solving a smaller problem first to get an answer you need.
Query nested inside another query. Can return scalar values, single column, or full result sets. Alternative to joins in some cases.
- Scalar, row, or table subqueries
- IN, EXISTS, ANY, ALL operators
- Can be correlated or uncorrelated
- Sometimes less efficient than joins
Set Operations
UNION / INTERSECT / EXCEPT
Like combining guest lists, finding overlap, or removing one list from another.
Combine or compare result sets. Columns must align by count and type, and ordering applies to the final result.
- UNION removes duplicates (UNION ALL keeps them)
- INTERSECT returns overlap
- EXCEPT returns rows in left not in right
- Ordering only at the end
CASE Expression
Conditional Logic in SQL
Like an if/else ladder in code that returns a value based on conditions.
Returns different values based on conditions. Helpful for bucketing, labeling, and conditional aggregates.
- Simple CASE or searched CASE
- Great for categorizing rows
- Works inside SELECT, ORDER BY, GROUP BY
- Deterministic and readable
JOINs
JOINs are the foundation of relational databases, allowing you to combine data from multiple tables into meaningful results. Whether you need matching records, all records from one side, or every possible combination, understanding join types and their behavior with NULLs is essential. Choosing the right join type and writing efficient join conditions directly impacts both query correctness and performance.
INNER JOIN
Matching Rows Only
Like inviting only people who RSVP'd yes from both the bride's and groom's lists. If someone didn't RSVP from either side, they're not invited.
Returns only rows where the join condition matches in both tables. Most common join type. Rows without matches are excluded.
- Returns only matching rows from both sides
- Most restrictive join type
- Default JOIN behavior (JOIN = INNER JOIN)
- Non-matching rows disappear from result
LEFT (OUTER) JOIN
All Left Rows + Matches
Like keeping all customers even if they haven't placed orders yet. Customers without orders show up with empty order details.
Returns all rows from the left table, plus matching rows from the right. When no match exists, right-side columns are NULL.
- Preserves all rows from left table
- Right-side columns are NULL for non-matches
- Great for "all X, even those without Y" queries
- Use WHERE right.id IS NULL to find non-matches
RIGHT (OUTER) JOIN
All Right Rows + Matches
Like listing all products including those never ordered. Products without orders show NULL customer info.
Returns all rows from the right table, plus matching rows from the left. When no match exists, left-side columns are NULL.
- Preserves all rows from right table
- Left-side columns are NULL for non-matches
- Less common (swap tables and use LEFT instead)
- Mirror of LEFT JOIN with reversed order
FULL (OUTER) JOIN
All Rows from Both Tables
Like merging two contact lists: everyone appears once, with info from both sides when available and blanks when not.
Returns all rows from both tables. Matching rows are combined; non-matching rows appear with NULLs for the missing side.
- Preserves all rows from both tables
- Combines LEFT and RIGHT join behavior
- Useful for finding mismatches on both sides
- Less common than INNER or LEFT joins
CROSS JOIN
Cartesian Product
Like pairing every shirt with every pair of pants to see all outfit combinations. 5 shirts × 3 pants = 15 outfits.
Returns every possible combination of rows from both tables. Result has rows = left table rows × right table rows. No join condition needed.
- Every row paired with every other row
- No ON condition (it's a full combination)
- Result size = Table1.rows × Table2.rows
- Useful for generating test data or calendar tables
SELF JOIN
Table Joins to Itself
Like an employee table where each employee has a manager who's also in the same table. You match employees to their managers.
Joins a table to itself using table aliases. Common for hierarchical data like org charts, or comparing rows within the same table.
- Same table referenced twice with different aliases
- Essential for hierarchical relationships
- Compare rows within the same table
- Can use any join type (INNER, LEFT, etc.)
Join Conditions & Performance
Writing Efficient Joins
Like using indexed sections in a filing system instead of comparing every document to every other document.
Join conditions define how tables relate. Indexed join columns dramatically improve performance. Filter predicates belong in WHERE, not ON (for INNER joins).
- Always index foreign key columns
- ON clause for join logic, WHERE for filtering
- Multiple join conditions with AND are common
- Avoid functions on join columns (breaks SARGability)
NULL Behavior in Joins
NULLs Don't Match
Like blank forms, two blank forms aren't "matching" forms. NULL = NULL evaluates to unknown, not true.
NULL values never match other NULLs in join conditions (NULL = NULL is false). This surprises many developers. Use IS NULL explicitly if needed.
- NULL = NULL evaluates to false in joins
- Rows with NULL keys won't match each other
- LEFT/RIGHT/FULL joins preserve NULLs from source
- Check for IS NULL separately if matching NULLs needed