Database & SQL Quick Reference

Databases power every modern application,from banking systems to social networks. Understanding database fundamentals is essential for building scalable, reliable systems that protect data integrity and deliver exceptional performance. This comprehensive reference covers the core concepts every engineer and technical leader needs to make informed architectural decisions.

ACID Properties

ACID defines the gold standard for database transactions. These four guarantees,Atomicity, Consistency, Isolation, and Durability,ensure your data remains reliable even when systems crash, networks fail, or multiple users compete for the same records. Think of ACID as the safety net that prevents partial updates, corrupted data, and lost changes.

All or Nothing Transaction

Atomicity

All or Nothing

Like an online checkout: you either get a confirmed order (payment + inventory reserved), or you get nothing. No “half-order.”

A transaction is treated as a single unit: all steps succeed and commit together, or any failure rolls everything back.

  • All steps succeed or everything rolls back
  • Prevents “charged but no order” scenarios
  • Failures leave the database unchanged
  • Makes multi-step operations safe
Data Validation

Consistency

Rules Always Apply

Like ticketing: you can’t sell the same seat twice, and you can’t sell a seat that doesn’t exist. The rules prevent “impossible” outcomes.

A transaction must move the database from one valid state to another valid state, obeying constraints like keys, checks, and relationships.

  • Constraints (PK/FK/UNIQUE/CHECK) are always enforced
  • Rejects invalid data instead of saving it
  • Invariants hold before and after the transaction
  • Prevents broken relationships (no orphan rows)
Separate Transactions

Isolation

Transactions Don't Interfere

Like two people trying to buy the last concert ticket at the same time: each checkout session should behave predictably, not see the other person’s in-progress changes.

Concurrent transactions behave as if they ran separately (depending on isolation level). You don’t read someone else’s uncommitted work.

  • Uncommitted changes aren’t visible to others
  • Prevents anomalies (dirty / non-repeatable / phantom reads)
  • Isolation levels trade consistency vs concurrency
  • Higher isolation can mean more blocking
Permanent Storage

Durability

Changes are Permanent

Like getting “Order confirmed” on your phone: even if the app crashes or the server restarts, the order is still there when you come back.

Once a transaction commits, the database won’t lose it. Changes are persisted (typically via transaction logs) and can be recovered after failures.

  • A committed transaction won’t be “forgotten”
  • Transaction logs enable crash recovery
  • Replicas/backups improve durability further
  • Power loss shouldn’t corrupt committed data

Normalization

Normalization is the art of organizing database tables to eliminate redundancy and prevent data anomalies. By systematically breaking down tables into smaller, focused structures, you ensure each piece of information lives in exactly one place. This reduces storage costs, simplifies updates, and prevents inconsistencies,though sometimes you'll intentionally denormalize for performance gains.

First Normal Form

First Normal Form (1NF)

Atomic Values Only

Like an order form: each blank expects one answer. You don’t write “red, blue, green” in the “Color” box , you add separate rows/items.

Store one value per column (no lists/arrays in a cell), avoid repeating column groups, and ensure rows are uniquely identifiable.

  • No comma-separated lists in a column
  • No repeating columns like Phone1/Phone2/Phone3
  • Each row has a primary key (unique identifier)
  • One row = one “thing” being described
Second Normal Form

Second Normal Form (2NF)

No Partial Dependencies

Like a receipt line item: (OrderId, ProductId) identifies the line. ProductName shouldn’t be stored there , it depends only on ProductId, not the whole pair.

If a table uses a composite key, every non-key column must depend on the full key. If a column depends on only part of the key, move it to a table where it fits.

  • Requires 1NF
  • Relevant mainly when you have a composite primary key
  • Eliminates columns that depend on only part of the key
  • Typically splits “lookup” data into its own table
Third Normal Form

Third Normal Form (3NF)

No Transitive Dependencies

Like shipping addresses: if you store ZipCode and also store City/State in the same row, City/State really depend on ZipCode. Put Zip → City/State in its own table.

Non-key columns should depend only on the primary key , not on other non-key columns. Move “derived/lookup” facts to separate tables.

  • Requires 2NF
  • Stops non-key → non-key dependencies
  • Avoids update anomalies (change it once, not everywhere)
  • Reduces duplicated descriptive data
Boyce-Codd Normal Form

Boyce-Codd Normal Form (BCNF)

Every Determinant Is a Candidate Key

Like a keycard system: any attribute that can identify others must itself be a keycard.

Stronger than 3NF. For every functional dependency X → Y, X must be a candidate key. Prevents edge-case anomalies.

  • Requires 3NF first
  • Eliminates dependencies on non-key determinants
  • Useful for overlapping candidate keys
  • May require extra tables
Normalization Goals

Normalization Goals

Why Normalize?

Like keeping one contact card per person: you don’t want the same phone number copied into 20 places where it can get out of sync.

Normalize to reduce duplication and prevent “anomaly” bugs: insert problems, update problems, and delete problems caused by storing the same fact in multiple places.

  • Reduce duplicated facts (single source of truth)
  • Prevent update anomalies (change once)
  • Prevent insert/delete anomalies (no forced fake data)
  • Improve data integrity and maintainability
Denormalization

Denormalization

Intentional Redundancy

Like caching a summary page so you don’t re-calculate everything for every request.

Intentionally duplicate data to speed reads or simplify queries. Great for reporting and analytics when writes are controlled.

  • Improves read performance
  • Increases write complexity
  • Requires sync strategy (triggers, jobs, ETL)
  • Best for read-heavy workloads

Indexes & Performance

Indexes are the secret weapon for database performance. Like a book's index helping you find topics instantly, database indexes accelerate queries by orders of magnitude. Understanding clustered vs. non-clustered indexes, covering indexes, and index strategies turns slow table scans into lightning-fast seeks,but they come with storage overhead and write penalties you need to balance.

Clustered Index

Clustered Index

Physical Row Order

Like a book where pages are physically bound in a single order. You can’t have the same book “physically sorted” two different ways at once.

The table’s rows are stored on disk in the clustered index order. Great for range lookups, but you only get one because data can only be physically ordered one way.

  • Defines the table’s physical row order
  • Only one clustered index per table
  • Excellent for range queries (date ranges, ID ranges)
  • Often (but not always) the primary key
Non-Clustered Index

Non-Clustered Index

Separate Index Structure

Like the index at the back of a textbook: for each keyword, it lists page numbers where the topic appears. You can have many indexes (by topic, by name, by acronym).

A separate structure that maps key values to where the rows live. You can build many of these for different query patterns.

  • Multiple per table are common
  • Stores index keys + a pointer to the row (or clustered key)
  • Speeds up selective lookups and joins
  • Costs extra storage and write overhead
Covered Index

Covered Index

All Columns in Index

Like a book index entry that includes a short definition/summary right there. You can answer the question without turning to the referenced page.

When the index contains every column needed by the query (keys + included columns), SQL Server can answer it from the index alone.

  • Avoids Key Lookup / Bookmark Lookup back to the base table
  • Big win when you’d otherwise do many lookups (N lookups = lots of random reads)
  • Built using key columns + INCLUDE columns (SQL Server)
  • Tradeoff: bigger index, more memory, and slower writes/updates
Filtered Index

Filtered Index

Partial Index with WHERE

Like an index that only lists entries for one chapter (e.g., “Chapter 7: Concurrency”) instead of indexing the entire book. Smaller and faster for that chapter’s questions.

Indexes only the rows that match a filter (WHERE clause). This makes the index smaller and faster for queries that use the same filter.

  • Indexes only rows that meet a condition (e.g., IsActive = 1)
  • Smaller index = fewer pages to read
  • Great for highly skewed/sparse data
  • Only helps when queries match the filter predicate
Composite Index

Composite Index

Multi-Column Index

Like a phone book sorted by Last Name, then First Name , order matters.

Index built on multiple columns. Supports queries that filter or sort using the left-most columns first.

  • Column order is critical
  • Supports left-most prefix searches
  • Helps multi-column filters and ORDER BY
  • Not helpful if leading column isn’t used
Index Scan

Index Scan

Reading All Index Pages

Like flipping through the book page-by-page until you find what you need. It’s fine if you plan to read most of the book anyway.

The engine reads through many (or all) index entries. This can be the right choice when a query returns a large portion of the table.

  • Reads lots of pages (sometimes all of them)
  • Often chosen when results are large or filters aren’t selective
  • Plan operator you’ll see: Index Scan (or Table Scan if no useful index)
  • Not always “bad” , can be efficient for big reads
  • Cost grows roughly with table/index size
Index Seek

Index Seek

Targeted Index Lookup

Like using the book’s index to jump directly to the right page numbers instead of scanning every page.

The engine navigates the index tree to jump to the matching key range, then reads just the matching entries.

  • Best for selective filters (small result sets)
  • Uses the index tree to jump to the right place
  • Plan operator you’ll see: Index Seek
  • Usually much less I/O than a scan
  • If you need extra columns, this can pair with Key Lookup (lots of lookups can hurt)
  • Cost grows with the number of matching rows
Key Lookup

Key Lookup

Fetch Missing Columns

Like finding a book in the index, then walking to the shelf to read the full page.

Happens when a nonclustered index finds rows but the query needs extra columns not in the index. The engine fetches the base row.

  • One lookup per matching row
  • Can cause many random reads
  • Fix with a covered index (INCLUDE columns)
  • Sometimes acceptable for small result sets

Isolation Levels

Isolation levels control how transactions see each other's changes. This is the critical tradeoff between data consistency and system concurrency. Higher isolation prevents anomalies like dirty reads and phantom rows but increases blocking. Lower isolation boosts throughput but risks reading uncommitted or inconsistent data. Choosing the right level depends on your application's tolerance for stale data vs. need for speed.

Read Uncommitted

READ UNCOMMITTED

Dirty Reads Allowed

Like reading someone's draft email before they finish writing it – it might change.

Lowest isolation level. Statements can read rows modified by other transactions but not yet committed. Allows dirty reads.

  • No locks on reads
  • Allows dirty reads
  • Highest concurrency
  • Least data accuracy
Read Committed

READ COMMITTED

Default SQL Server Level

Like reading only published articles, not drafts – you see only finalized content.

Statements cannot read data modified but not committed by others. Prevents dirty reads but allows non-repeatable reads.

  • Default isolation level
  • No dirty reads
  • Shared locks released quickly
  • Non-repeatable reads possible
Repeatable Read

REPEATABLE READ

Locked Until Transaction Ends

Like bookmarking articles – the ones you saw stay unchanged, but new ones can appear.

Data read cannot be modified by other transactions until current transaction completes. Prevents non-repeatable reads, allows phantom reads.

  • Locks held until commit
  • Same data on re-read
  • Phantom reads possible
  • More blocking than READ COMMITTED
Snapshot Isolation

SNAPSHOT

Point-in-Time View

Like viewing a photo taken at the start – the world changes but your photo stays the same.

Reads data as it existed at the start of the transaction using row versioning in tempdb. No shared locks needed.

  • Uses row versioning
  • No read locks needed
  • Consistent point-in-time view
  • Tempdb overhead
Serializable

SERIALIZABLE

Highest Isolation

Like locking a filing cabinet – no one can add, remove, or change files while you're reading.

Highest isolation level. Locks ranges of data to prevent phantom reads. Transactions execute as if they ran sequentially.

  • Range locks applied
  • No phantom reads
  • Maximum consistency
  • Lowest concurrency
Read Committed Snapshot

READ COMMITTED SNAPSHOT (RCSI)

Row Versioning for Reads

Like reading a published edition of a book while writers update the next edition in the background.

SQL Server uses row versions for READ COMMITTED, removing reader/writer blocking while still avoiding dirty reads.

  • Statement-level snapshot view
  • Readers don’t block writers
  • Uses tempdb row versions
  • Non-repeatable reads still possible

Locks & Concurrency

Locks are the traffic lights of database concurrency. They coordinate access when multiple users modify the same data simultaneously, preventing conflicts and corruption. Shared locks allow multiple readers, exclusive locks grant sole write access, and update locks prevent deadlocks. Understanding lock behavior helps you diagnose blocking issues and design applications that scale under concurrent load.

Shared Lock

Shared Locks

Multiple Readers Allowed

Like a library book: multiple people can read the same book, but no one can edit it while being read.

Used for read operations. Multiple transactions can hold shared locks on the same resource simultaneously.

  • Multiple readers allowed
  • Blocks exclusive locks
  • Used by SELECT statements
  • Released after read
Exclusive Lock

Exclusive Locks

Single Writer Only

Like editing a document: only one person can edit, and no one else can read or write.

Used for data-modification operations (INSERT, UPDATE, DELETE). Ensures only one transaction modifies data at a time.

  • Blocks all other locks
  • Prevents reads and writes
  • Used by DML statements
  • Held until transaction ends
Update Lock

Update Locks

Prevents Deadlocks

Like reserving a seat before buying a ticket: signals intent to modify without blocking readers yet.

Used when reading data with intent to update. Prevents common deadlock scenarios between two concurrent updates.

  • Signals update intent
  • Converts to exclusive lock
  • Prevents deadlocks
  • Compatible with shared locks
Deadlock

Deadlock

Circular Wait

Like two cars at an intersection, each waiting for the other to move first – neither can proceed.

Occurs when two transactions are waiting for each other to release locks. SQL Server detects and resolves by killing one transaction.

  • Circular lock dependency
  • One transaction rolled back
  • Deadlock victim chosen
  • Minimize with proper indexing
NOLOCK Hint

NOLOCK Hint

Read Without Locking

Like peeking at someone's screen while they're typing – you might see incomplete or changing data.

Query hint that allows dirty reads. No locks are taken, preventing blocking but allowing inconsistent data.

  • No shared locks taken
  • Allows dirty reads
  • Improves concurrency
  • Risk of inconsistent data
Intent Locks

Intent Locks

Locking Hierarchy Signals

Like putting a “reserved” sign on a table before placing individual plates.

Indicate a transaction intends to take locks at a lower level (row/page). Prevents incompatible higher-level locks.

  • IS, IX, SIX lock types
  • Supports lock hierarchy (table → page → row)
  • Helps avoid conflicts
  • Automatically managed by the engine
Lock Escalation

Lock Escalation

Row/Page → Table

Like replacing many small locks with one big padlock to simplify management.

SQL Server can convert many row/page locks into a table lock to reduce overhead, which can increase blocking.

  • Reduces lock memory overhead
  • Can increase contention
  • Triggered by lock thresholds
  • Visible in execution plans and DMVs

Data Integrity

Data integrity constraints are your database's immune system,they reject bad data before it can infect your system. Primary keys ensure every row is uniquely identifiable, foreign keys maintain referential integrity between tables, check constraints validate business rules, and unique constraints prevent duplicates. These guardrails catch errors at the database level, where they're easiest and cheapest to fix.

Entity Integrity

Entity Integrity

Primary Key Constraint

Like every person having a unique social security number – no duplicates, no blanks.

Every table must have a primary key, and the primary key columns must be unique and not null.

  • Primary key required
  • Must be unique
  • Cannot be null
  • Identifies each row uniquely
Referential Integrity

Referential Integrity

Foreign Key Constraint

Like ensuring every order references a real customer – no orphaned orders pointing to non-existent people.

Foreign key values must refer to existing primary key values in the referenced table, or be null.

  • Foreign keys must match
  • References existing primary key
  • Can be null if allowed
  • Prevents orphaned records
Domain Integrity

Domain Integrity

Valid Values Only

Like a form accepting only valid dates, not February 31st or years beyond 2100.

Ensures data values follow defined rules for valid values, ranges, and formats using constraints and rules.

  • Data type enforcement
  • Check constraints
  • Default values
  • Valid range and format
Unique Constraint

Unique Constraint

No Duplicate Values

Like usernames: two people can’t pick the same one.

Ensures all values in a column (or column set) are distinct. Allows a single NULL depending on the database.

  • Prevents duplicates
  • Creates a supporting index
  • Can cover multiple columns
  • Different from primary key (NULLs)
Check Constraint

Check Constraint

Custom Validation Rules

Like a bouncer checking age before entry – only valid rows are allowed in.

Validates column values with a Boolean expression (e.g., Amount > 0, Status IN ('New','Closed')).

  • Rejects invalid data
  • Enforces business rules
  • Can reference multiple columns
  • Checked on INSERT and UPDATE
Default Constraint

Default Constraint

Automatic Column Values

Like a form that auto-fills today’s date if you leave it blank.

Provides a default value when none is supplied for a column, ensuring consistent and predictable data.

  • Used on INSERT when value is omitted
  • Common for timestamps and flags
  • Can be constants or functions
  • Improves data consistency

Query Optimization

Query optimization turns slow queries into fast ones through intelligent execution planning. The database engine analyzes statistics, evaluates multiple strategies, and chooses the best path to retrieve your data. Execution plans reveal what's happening under the hood, while concepts like SARGability and join algorithms help you write queries the optimizer can accelerate. Small query changes can yield massive performance improvements.

Execution Plan

Execution Plan

Query Roadmap

Like GPS showing the route taken: which indexes used, how tables joined, where time was spent.

Visual representation of steps SQL Server takes to execute a query. Essential for identifying performance bottlenecks.

  • Shows query strategy
  • Identifies bottlenecks
  • Index usage visible
  • Cost percentages shown
Statistics

Statistics

Data Distribution Info

Like a census helping city planners: statistics help SQL Server plan the best query strategy.

Metadata about data distribution in columns. SQL Server uses statistics to estimate cardinality and choose optimal execution plans.

  • Tracks data distribution
  • Helps query optimizer
  • Auto-created on indexes
  • Should be kept updated
Cardinality

Cardinality

Uniqueness of Values

Like measuring diversity in a group: high cardinality is many unique values, low is mostly duplicates.

Refers to uniqueness of data values in a column. High cardinality (many unique values) makes better indexes.

  • Measures value uniqueness
  • High = good for indexes
  • Low = poor selectivity
  • Affects index effectiveness
Query Hints

Query Hints

Override Optimizer

Like telling GPS to avoid highways – you override its decision with your preference.

Instructions that override SQL Server's query optimizer decisions. Use sparingly when optimizer doesn't choose best plan.

  • NOLOCK, FORCESEEK, etc.
  • Forces specific behavior
  • Use with caution
  • Can improve or hurt performance
SARGable Queries

SARGable Queries

Search ARGument ABLE

Like searching for "Smith" vs "names containing 'mit'" – the first can use an index efficiently.

Queries written so SQL Server can use indexes effectively. Avoid functions on indexed columns in WHERE clauses.

  • Allows index usage
  • No functions on columns
  • Simple predicates preferred
  • Much faster queries
Join Algorithms

Join Algorithms

Nested Loop / Hash / Merge

Like choosing between hand-matching lists, hashing names into buckets, or merging two sorted lists.

SQL Server picks a join strategy based on row counts, indexes, and sort order to minimize cost.

  • Nested Loop: small input + index
  • Hash Join: large, unsorted inputs
  • Merge Join: sorted inputs
  • Driven by statistics and cardinality
Parameter Sniffing

Parameter Sniffing

Plan Based on First Input

Like cooking for the first guest size, then reusing that plan for every party size.

SQL Server compiles a plan using initial parameter values. Later executions with different values can perform poorly.

  • Plan cached from first execution
  • Skewed data makes this worse
  • Solutions: recompile, optimize for, hints
  • Not always a problem

Database Objects & Features

Database objects extend SQL beyond simple queries. Stored procedures encapsulate complex logic, views provide abstraction layers, triggers automate responses to data changes, and window functions enable sophisticated analytics. These tools help you build robust data layers that handle complex business logic efficiently while maintaining security and consistency across your application.

Stored Procedures

Stored Procedures

Precompiled SQL

Like a recipe saved in a cookbook: reusable, optimized, and everyone executes it the same way.

Precompiled collection of SQL statements stored on the server. Improves performance, security, and code reusability.

  • Precompiled execution plan
  • Reduces network traffic
  • Enforces security
  • Reusable code
Views

Views

Virtual Tables

Like a window showing a specific view of your yard: it's not a separate yard, just a different perspective.

Virtual table based on a SELECT query. Simplifies complex queries, provides security by limiting column access.

  • Stored query definition
  • No data duplication
  • Simplifies complex joins
  • Provides security layer
Triggers

Triggers

Automatic Actions

Like a smoke alarm: automatically triggers an action when a specific event occurs.

Stored procedures that automatically execute when INSERT, UPDATE, or DELETE events occur on a table.

  • Event-driven execution
  • Enforces business rules
  • Audit trail creation
  • Can affect performance
CTE

CTE (Common Table Expression)

Temporary Named Result

Like writing notes on a whiteboard before solving a math problem – temporary workspace.

Temporary named result set that exists within a single query. Makes complex queries more readable.

  • Improves query readability
  • Can be recursive
  • Scoped to single query
  • No performance difference vs subquery
Window Functions

Window Functions

ROW_NUMBER, RANK, LAG

Like looking at your position in a race while also seeing everyone else's – contextual calculations.

Functions that perform calculations across a set of rows related to the current row. Includes ranking, aggregate, and analytic functions.

  • OVER clause defines window
  • Avoids self-joins
  • Running totals, rankings
  • Very powerful for analytics
Table Partitioning

Table Partitioning

Divide Large Tables

Like organizing files into folders by year: easier to manage and find what you need.

Divides large tables into smaller pieces based on column values. Improves query performance and manageability.

  • Horizontal table splitting
  • Based on partition function
  • Improves query performance
  • Easier maintenance
Transactions

Transactions

BEGIN / COMMIT / ROLLBACK

Like bundling steps into a single purchase: either everything succeeds or nothing is charged.

Group multiple statements into a single unit of work. Commit saves changes; rollback undoes them.

  • Ensures atomic changes
  • Supports error recovery
  • Explicit or implicit transactions
  • Critical for multi-step updates
User-Defined Functions

User-Defined Functions (UDFs)

Reusable SQL Logic

Like a helper function you call from multiple places in code.

Encapsulates logic in scalar or table-valued functions. Good for reuse, but can impact performance if misused.

  • Scalar or table-valued
  • Encourages reuse and consistency
  • Inline TVFs often perform best
  • Beware row-by-row execution

SQL Query Processing

Understanding how SQL processes queries demystifies unexpected results and performance issues. Queries execute in a specific logical order,FROM before WHERE, WHERE before SELECT,which explains why you can't reference column aliases in WHERE clauses. JOINs combine tables, subqueries nest logic, and set operations merge results. Mastering these fundamentals helps you write correct, efficient queries on the first try.

Query Processing Order

Logical Query Processing Order

FROM → WHERE → GROUP BY → HAVING → SELECT → ORDER BY

Like cooking: gather ingredients (FROM), pick what you need (WHERE), group them (GROUP BY), filter groups (HAVING), plate (SELECT), arrange (ORDER BY).

The order SQL Server logically processes query clauses, regardless of how you write them. Understanding this is crucial for query writing.

  • FROM clause processes first
  • WHERE filters before grouping
  • HAVING filters after grouping
  • SELECT comes near the end
Subqueries

Subqueries

Query Within Query

Like solving a math problem that requires solving a smaller problem first to get an answer you need.

Query nested inside another query. Can return scalar values, single column, or full result sets. Alternative to joins in some cases.

  • Scalar, row, or table subqueries
  • IN, EXISTS, ANY, ALL operators
  • Can be correlated or uncorrelated
  • Sometimes less efficient than joins
Set Operations

Set Operations

UNION / INTERSECT / EXCEPT

Like combining guest lists, finding overlap, or removing one list from another.

Combine or compare result sets. Columns must align by count and type, and ordering applies to the final result.

  • UNION removes duplicates (UNION ALL keeps them)
  • INTERSECT returns overlap
  • EXCEPT returns rows in left not in right
  • Ordering only at the end
CASE Expression

CASE Expression

Conditional Logic in SQL

Like an if/else ladder in code that returns a value based on conditions.

Returns different values based on conditions. Helpful for bucketing, labeling, and conditional aggregates.

  • Simple CASE or searched CASE
  • Great for categorizing rows
  • Works inside SELECT, ORDER BY, GROUP BY
  • Deterministic and readable

JOINs

JOINs are the foundation of relational databases, allowing you to combine data from multiple tables into meaningful results. Whether you need matching records, all records from one side, or every possible combination, understanding join types and their behavior with NULLs is essential. Choosing the right join type and writing efficient join conditions directly impacts both query correctness and performance.

Inner Join

INNER JOIN

Matching Rows Only

Like inviting only people who RSVP'd yes from both the bride's and groom's lists. If someone didn't RSVP from either side, they're not invited.

Returns only rows where the join condition matches in both tables. Most common join type. Rows without matches are excluded.

  • Returns only matching rows from both sides
  • Most restrictive join type
  • Default JOIN behavior (JOIN = INNER JOIN)
  • Non-matching rows disappear from result
Left Outer Join

LEFT (OUTER) JOIN

All Left Rows + Matches

Like keeping all customers even if they haven't placed orders yet. Customers without orders show up with empty order details.

Returns all rows from the left table, plus matching rows from the right. When no match exists, right-side columns are NULL.

  • Preserves all rows from left table
  • Right-side columns are NULL for non-matches
  • Great for "all X, even those without Y" queries
  • Use WHERE right.id IS NULL to find non-matches
Right Outer Join

RIGHT (OUTER) JOIN

All Right Rows + Matches

Like listing all products including those never ordered. Products without orders show NULL customer info.

Returns all rows from the right table, plus matching rows from the left. When no match exists, left-side columns are NULL.

  • Preserves all rows from right table
  • Left-side columns are NULL for non-matches
  • Less common (swap tables and use LEFT instead)
  • Mirror of LEFT JOIN with reversed order
Full Outer Join

FULL (OUTER) JOIN

All Rows from Both Tables

Like merging two contact lists: everyone appears once, with info from both sides when available and blanks when not.

Returns all rows from both tables. Matching rows are combined; non-matching rows appear with NULLs for the missing side.

  • Preserves all rows from both tables
  • Combines LEFT and RIGHT join behavior
  • Useful for finding mismatches on both sides
  • Less common than INNER or LEFT joins
Cross Join

CROSS JOIN

Cartesian Product

Like pairing every shirt with every pair of pants to see all outfit combinations. 5 shirts × 3 pants = 15 outfits.

Returns every possible combination of rows from both tables. Result has rows = left table rows × right table rows. No join condition needed.

  • Every row paired with every other row
  • No ON condition (it's a full combination)
  • Result size = Table1.rows × Table2.rows
  • Useful for generating test data or calendar tables
Self Join

SELF JOIN

Table Joins to Itself

Like an employee table where each employee has a manager who's also in the same table. You match employees to their managers.

Joins a table to itself using table aliases. Common for hierarchical data like org charts, or comparing rows within the same table.

  • Same table referenced twice with different aliases
  • Essential for hierarchical relationships
  • Compare rows within the same table
  • Can use any join type (INNER, LEFT, etc.)
Join Conditions

Join Conditions & Performance

Writing Efficient Joins

Like using indexed sections in a filing system instead of comparing every document to every other document.

Join conditions define how tables relate. Indexed join columns dramatically improve performance. Filter predicates belong in WHERE, not ON (for INNER joins).

  • Always index foreign key columns
  • ON clause for join logic, WHERE for filtering
  • Multiple join conditions with AND are common
  • Avoid functions on join columns (breaks SARGability)
NULL Behavior in Joins

NULL Behavior in Joins

NULLs Don't Match

Like blank forms, two blank forms aren't "matching" forms. NULL = NULL evaluates to unknown, not true.

NULL values never match other NULLs in join conditions (NULL = NULL is false). This surprises many developers. Use IS NULL explicitly if needed.

  • NULL = NULL evaluates to false in joins
  • Rows with NULL keys won't match each other
  • LEFT/RIGHT/FULL joins preserve NULLs from source
  • Check for IS NULL separately if matching NULLs needed