Chapter 1: Big O Notation & Complexity Analysis

1.1

What is Algorithm Analysis?

Algorithm analysis is the process of evaluating an algorithm's resource consumption—primarily time and memory—as a function of input size. This mathematical framework allows us to predict performance, compare solutions objectively, and design scalable systems.

Why We Analyze Algorithms

Before computers became ubiquitous, algorithm analysis was an academic exercise. Today, it's essential. Consider a sorting algorithm that works perfectly for 1,000 items but takes hours for 100,000 items. A web service that handles 100 concurrent users gracefully might crash at 10,000. Without analysis, we build systems that fail catastrophically under real-world load.

Rigorous algorithm analysis helps us:

1

Predict scaling behavior: Know how performance degrades as input grows from thousands to millions of items

2

Compare solutions objectively: Choose between algorithms based on mathematical proof, not gut feeling or single benchmarks

3

Identify bottlenecks: Spot the real performance limits before they become production issues

4

Design efficient systems: Make architectural decisions that remain valid as your system grows

The RAM Model of Computation

Algorithm analysis uses the Random Access Machine (RAM) model—an abstract computational model that approximates real hardware behavior. In this model:

RAM Model Assumptions

Each primitive operation (arithmetic, assignment, comparison) takes exactly 1 unit of time
Loops and functions are not primitive—we count their internal operations
Memory access takes 1 unit of time (ignoring cache effects)
Infinite memory is available

Why These Simplifications?

Real hardware is complex: CPUs have caches, branch prediction, parallelization. Analyzing actual hardware costs is nearly impossible. The RAM model ignores low-level details to focus on algorithmic scalability—how the cost grows with input size. This captures the essence of algorithmic efficiency.

The critical insight: we count operations, not wall-clock time. A sorting algorithm that performs 5n² comparisons is more efficient than one that performs 100n comparisons for small n, but the O(n) algorithm wins when n is large. This is the power of asymptotic analysis.

ℹ️

Asymptotic analysis focuses on growth rates, not absolute times. We care about how the cost function behaves as n approaches infinity, not whether code runs in 5ms or 50ms on your laptop.

1.2

Big O Notation (O)

Big O notation describes the upper bound of an algorithm's growth rate. It tells us the worst-case scenario: what's the maximum number of operations we might need as input size increases?

Formal Definition

In mathematical terms, f(n) = O(g(n)) if there exist positive constants c and n₀ such that:

f(n) ≤ c·g(n) for all n ≥ n₀

What does this mean in plain English? If an algorithm performs 3n² + 5n + 12 operations, we say it's O(n²) because:

We can choose c = 4 and n₀ = 1
For any input size n ≥ 1, the expression 3n² + 5n + 12 is always ≤ 4n²
The constant factors and lower-order terms don't matter for asymptotic behavior

Upper Bound Interpretation

Think of Big O as a pessimistic guarantee. When we say an algorithm is O(n), we're promising it will never need more than cn operations for any input of size n (for some constant c and sufficiently large n). This is safe to rely on—we know the algorithm won't surprise us with exponential behavior.

⚠️

Common misconception: O(n) means "takes exactly n operations." Wrong. It means "takes at most cn operations for some constant c." An algorithm performing n/2 operations is still O(n). An algorithm performing 1000n operations is also O(n). We ignore constants.

Example: Linear Search

C#

public static int LinearSearch(int[] arr, int target)
{
    for (int i = 0; i < arr.Length; i++)
    {
        if (arr[i] == target)
            return i;
    }
    return -1;
}

In the worst case, the loop executes n times (checking every element). Even with the comparison and potential early return, the dominant operation count is n. We say this algorithm is O(n).

1.3

Big Omega (Ω) & Big Theta (Θ)

While Big O describes upper bounds, Big Omega describes lower bounds, and Big Theta describes tight bounds. Together, they provide complete information about algorithm behavior.

Big Omega (Ω) — Lower Bound

f(n) = Ω(g(n)) if there exist positive constants c and n₀ such that f(n) ≥ c·g(n) for all n ≥ n₀.

Big Omega tells us the minimum work an algorithm must do. For example, any algorithm that must examine every element of an array has an Ω(n) lower bound. Even if we find the target immediately, the algorithm might still need to check each element in different scenarios—so we know it's at least O(n).

Big Theta (Θ) — Tight Bound

f(n) = Θ(g(n)) if f(n) = O(g(n)) AND f(n) = Ω(g(n)). In other words, the function grows at exactly the same rate as g(n), ignoring constants.

Big O (Upper)

f(n) ≤ c₁·g(n)
"We promise it won't exceed this"

Big Theta (Tight)

c₁·g(n) ≤ f(n) ≤ c₂·g(n)
"The actual growth rate"

When someone says "this algorithm is O(n log n)," they often mean Θ(n log n)—the algorithm's growth rate is precisely n log n (ignoring constants), not some unspecified upper bound. In interviews and technical communication, we usually use Big O loosely, but understanding the distinction matters for precision.

✓

For most practical purposes, use Big O. It's the standard in industry. But if you're analyzing an algorithm and you know both the upper and lower bounds are the same, mentioning Big Theta shows deep understanding.

1.4

Common Complexity Classes

A handful of complexity classes cover 99% of real algorithms. Understanding each one—how to recognize them, what they mean, and which algorithms fall into each class—is fundamental.

Notation	Name	Rank	Growth	Typical Use Case
O(1)	Constant	Excellent	No growth—always the same cost	Hash table lookup, array access
O(log n)	Logarithmic	Excellent	Grows very slowly (10 ops for 1M items)	Binary search, balanced tree operations
O(n)	Linear	Good	Proportional to input	Linear search, iteration
O(n log n)	Linearithmic	Good	Just slightly worse than linear	Efficient sorting (merge, quick)
O(n²)	Quadratic	Fair	Doubles 4x when input doubles	Nested loops, simple sorting
O(2ⁿ)	Exponential	Poor	Doubles when input grows by 1	Brute-force recursion, subsets
O(n!)	Factorial	Terrible	Explodes for even modest n	Permutation generation

O(1) — Constant Time

An O(1) algorithm performs a fixed amount of work regardless of input size. This is the holy grail of performance.

C#

// Dictionary/hashtable lookup: O(1) average case
var value = myDictionary[key];

// Array access by index: always O(1)
int third = arr[2];

// Arithmetic operations: O(1)
int sum = a + b;
double average = sum / 2;

ℹ️

Hash table lookup is O(1) on average. Worst case (all hash collisions) is O(n), but this is rare with good hash functions. We discuss this with best/worst/average cases later.

O(log n) — Logarithmic Time

The operation count grows logarithmically with input size. Classic example: binary search eliminates half the remaining items each iteration.

C#

// Binary search on sorted array
public static int BinarySearch(int[] arr, int target)
{
    int left = 0, right = arr.Length - 1;
    while (left <= right)
    {
        int mid = left + (right - left) / 2;
        if (arr[mid] == target) return mid;
        if (arr[mid] < target) left = mid + 1;
        else right = mid - 1;
    }
    return -1;
}

Why O(log n)? The search space halves each iteration. For 1 million items, binary search needs at most log₂(1,000,000) ≈ 20 comparisons. For 1 billion items, still only about 30 comparisons. This is incredibly efficient.

O(n) — Linear Time

Work is proportional to input size. We touch every element roughly once.

C#

// Find maximum: O(n)
public static int FindMax(int[] arr)
{
    int max = arr[0];
    for (int i = 1; i < arr.Length; i++)
    {
        if (arr[i] > max)
            max = arr[i];
    }
    return max;
}

Linear is excellent for most applications. You can process millions of items in seconds. O(n) algorithms scale directly with data: double the input, double the time. Predictable and manageable.

O(n log n) — Linearithmic Time

Slightly worse than linear, but vastly better than quadratic. This is the complexity of efficient sorting algorithms like merge sort, quicksort, and heap sort.

C#

// Merge sort (simplified pseudocode)
void MergeSort(int[] arr)
{
    if (arr.Length <= 1) return;

    // Divide: O(log n) depth
    int mid = arr.Length / 2;
    MergeSort(Left(arr, mid));
    MergeSort(Right(arr, mid));

    // Conquer: O(n) work per level
    Merge(arr, mid);
}

For 1 million items, O(n log n) requires about 20 million operations. O(n²) would require a trillion operations—vastly slower. This is why choosing the right sorting algorithm matters.

O(n²) — Quadratic Time

Typical of algorithms with nested loops over the input.

C#

// Bubble sort: O(n²)
public static void BubbleSort(int[] arr)
{
    for (int i = 0; i < arr.Length - 1; i++)
    {
        for (int j = 0; j < arr.Length - i - 1; j++)
        {
            if (arr[j] > arr[j + 1])
                Swap(ref arr[j], ref arr[j + 1]);
        }
    }
}

Quadratic is acceptable for small inputs (< 10,000 items) but becomes painful at scale. For 10,000 items, 100 million operations. For 100,000 items, 10 billion operations. Avoid for large datasets.

O(2ⁿ) — Exponential Time

The work doubles for each unit increase in input size. Exponential algorithms are impractical for n > 30 or so.

C#

// Naive Fibonacci: O(2ⁿ)
public static long Fibonacci(int n)
{
    if (n <= 1) return n;
    return Fibonacci(n - 1) + Fibonacci(n - 2);
    // For each recursive call, we make 2 more calls!
}

Computing Fibonacci(40) requires billions of function calls. Computing Fibonacci(50) requires trillions. This is why we use dynamic programming—to reduce the complexity from O(2ⁿ) to O(n).

O(n!) — Factorial Time

The worst of the worst. Factorial grows explosively: 10! = 3.6 million, 15! = 1.3 trillion.

C#

// Generating all permutations: O(n!)
public static void GeneratePermutations(int[] arr, int start)
{
    if (start == arr.Length - 1)
    {
        // Process permutation
        return;
    }

    for (int i = start; i < arr.Length; i++)
    {
        Swap(ref arr[start], ref arr[i]);
        GeneratePermutations(arr, start + 1);
        Swap(ref arr[start], ref arr[i]);
    }
}

Only use these algorithms for tiny inputs (n < 10). For anything larger, you need a different approach—usually dynamic programming or accepting an approximate solution.

1.5

Space Complexity

Time isn't the only resource that matters. Space (memory) is equally important. An algorithm using gigabytes of RAM is impractical, regardless of speed. We analyze space complexity using the same Big O framework.

Auxiliary Space vs Total Space

There are two ways to think about space:

Auxiliary Space (Often Reported)

Memory used by the algorithm excluding input and output

What we typically report as "space complexity"

More relevant for comparing algorithm variants

Total Space

Memory for input + algorithm + output

Important for absolute resource planning

Less useful for algorithm comparison

Examples: Time vs Space Tradeoff

Linear Search — O(n) time, O(1) space

C#

public static int LinearSearch(int[] arr, int target)
{
    // Only uses: index variable i
    // Space complexity: O(1)
    for (int i = 0; i < arr.Length; i++)
        if (arr[i] == target)
            return i;
    return -1;
}

Hash Table Lookup — O(1) time, O(n) space

C#

// Preprocess: build a dictionary
var lookup = new Dictionary<int, bool>();
foreach (var num in arr)
    lookup[num] = true;

// Space complexity: O(n) for the dictionary
// Time complexity for lookup: O(1)
bool found = lookup.ContainsKey(target);

This is a classic tradeoff: spend more space upfront to gain speed later. Hash tables, indexes, and caches all follow this pattern.

Merge Sort Space Complexity

Merge sort is O(n log n) time but O(n) space—it needs temporary arrays for merging. Quick sort, by contrast, is O(n log n) time but O(log n) space (just recursion stack), making it preferable when memory is tight.

✓

Always consider space complexity. In-place algorithms (O(1) auxiliary space) are valuable. When space is unlimited, algorithms with worse time complexity but better space usage (like storing precomputed results) might make sense.

1.6

Rules for Calculating Big O

Analyzing complexity from code requires understanding several rules. These rules let you determine Big O from first principles rather than memorizing tables.

Rule 1: Drop Constants

Constants don't affect asymptotic growth. O(2n) = O(n), O(5n²) = O(n²), O(100) = O(1).

C#

// This does 3n operations (two loops)
for (int i = 0; i < n; i++) ProcessA(i);
for (int i = 0; i < n; i++) ProcessB(i);

// But 3n is still O(n), not O(3n)
// Constant is dropped

Rule 2: Drop Lower-Order Terms

When you have multiple terms, only the largest one matters asymptotically. O(n² + n) = O(n²), O(n + log n) = O(n).

C#

// O(n²) nested loop dominates
for (int i = 0; i < n; i++)
    for (int j = 0; j < n; j++)
        Process(i, j);  // O(n²)

// This O(n) loop is negligible
for (int i = 0; i < n; i++)
    Cleanup(i);  // O(n)

// Total: O(n² + n) = O(n²)

Rule 3: Different Variables for Different Inputs

If a function operates on two independent inputs, don't combine them into a single variable.

C#

// Process two arrays independently
public void ProcessArrays(int[] a, int[] b)
{
    // This is O(a.Length), not O(n)
    foreach (int x in a)
        Process(x);

    // This is O(b.Length), not O(n)
    foreach (int y in b)
        Process(y);

    // Total: O(a.Length + b.Length), or O(m + n)
}

This is crucial. A function that processes m elements from one array and n from another is O(m + n), not O(n). Merging two sorted arrays is O(m + n), not O(n²).

Rule 4: Multiplication vs Addition

When loops are nested, multiply. When they're sequential, add.

Nested Loops → Multiply

for i in 1..n:
    for j in 1..n:

Total iterations: n × n = n²
Complexity: O(n²)

Sequential Loops → Add

for i in 1..n:
for j in 1..n:

Total iterations: n + n = 2n
Complexity: O(n) (constant dropped)

Rule 5: Recursion and Branching

When a function calls itself multiple times, the branches multiply. Binary tree traversal visits 2^d nodes at depth d.

C#

// Naive recursive Fibonacci calls itself twice per invocation
long Fib(int n)
{
    if (n <= 1) return n;
    return Fib(n-1) + Fib(n-2);
    // Two branches per call = O(2ⁿ)
}

1.7

Amortized Analysis

Some operations are expensive individually but cheap on average across many invocations. Amortized analysis reveals the true cost.

Dynamic Array Example

A dynamic array (C# List<T>) doubles its capacity when it runs out of space. Appending usually takes O(1), but occasionally requires copying all elements, taking O(n). How do we characterize this?

Amortized complexity: O(1) per operation.

Why? If we perform n append operations on an initially empty list:

First append: 1 element (cost 1)
When capacity 1 is full, double to 2 (copy 1 element, cost 1)
When capacity 2 is full, double to 4 (copy 2 elements, cost 2)
When capacity 4 is full, double to 8 (copy 4 elements, cost 4)
...and so on
Total copying cost: 1 + 1 + 2 + 4 + 8 + ... + n/2 < 2n
Total for n operations: n (appends) + 2n (copying) = 3n = O(n)
Average per operation: 3n/n = O(1)

ℹ️

Amortized ≠ Average case. Average case assumes random inputs. Amortized assumes a sequence of operations and distributes the expensive operations across many cheap ones. Amortized analysis is often tighter and more useful.

1.8

Best, Worst, Average Case

An algorithm's performance depends on input data. The same algorithm might be fast on one input and slow on another. We characterize performance in three scenarios.

Example: Linear Search

Best Case

Target is at index 0. We find it immediately.

Operations: 1
Complexity: O(1)
Note: Rarely useful to report. Usually too optimistic.

Worst Case

Target is at the last index (or missing). We scan the entire array.

Operations: n
Complexity: O(n)
Note: Most important. This is our guarantee.

Average Case

On average, the target is at position n/2. We examine half the array.

Operations: n/2
Complexity: O(n)
Note: Often same Big O as worst case, just with different constants.

Example: Quicksort

Quicksort partitions the array around a pivot. If the pivot is always at the median (balanced partition), it's O(n log n). If the pivot is always at an extreme (worst partition), it's O(n²).

Case	Pivot Choice	Complexity	Typical Scenario
Best	Always median (balanced partition)	O(n log n)	Random pivot on random data
Average	Random pivot on average data	O(n log n)	Real-world usage
Worst	Always smallest/largest element	O(n²)	Already sorted data + naive pivot selection

Why We Usually Report Worst Case

Worst case is the default because:

It's a guarantee. You know the algorithm will never be slower than this.
It's predictable. You can plan system capacity around it.
It's conservative. Overestimating is safer than underestimating performance.

⚠️

In interviews, specify which case you're analyzing. "This algorithm is O(n log n) on average, O(n²) worst case" is more precise than just saying "O(n log n)".

1.9

Analyzing Recursive Algorithms

Recursion requires special analysis techniques. We use recurrence relations and the Master Theorem to determine complexity.

Recurrence Relations

A recurrence relation expresses the total work in terms of smaller subproblems. For example, merge sort divides the problem into two halves, solves each recursively, then merges in O(n) time:

T(n) = 2·T(n/2) + n

Breaking this down:

2·T(n/2): Two recursive calls on arrays of size n/2
+ n: O(n) work to merge the results
T(1) = 1: Base case—single element takes O(1)

The Master Theorem

For recurrences of the form T(n) = a·T(n/b) + f(n), the Master Theorem gives us the solution:

Case	Condition	Solution
1	f(n) = O(n^(log_b(a) - ε)) for some ε > 0	T(n) = Θ(n^(log_b(a)))
2	f(n) = Θ(n^(log_b(a)) · log^k(n))	T(n) = Θ(n^(log_b(a)) · log^(k+1)(n))
3	f(n) = Ω(n^(log_b(a) + ε)) and a·f(n/b) ≤ c·f(n)	T(n) = Θ(f(n))

Example: Merge Sort

T(n) = 2·T(n/2) + n

Using the Master Theorem: a = 2, b = 2, f(n) = n.

log_b(a) = log_2(2) = 1, so n^(log_b(a)) = n^1 = n.

f(n) = n = Θ(n^1), which matches Case 2 with k = 0.

Solution: T(n) = Θ(n · log(n)) = O(n log n)

Example: Binary Search

T(n) = 1·T(n/2) + 1

a = 1, b = 2, f(n) = 1.

log_b(a) = log_2(1) = 0, so n^(log_b(a)) = n^0 = 1.

f(n) = 1 = Θ(1), Case 2 with k = 0.

Solution: T(n) = Θ(log n)

✓

Memorize the Master Theorem for interviews. Most recursive algorithms follow this pattern. If you can identify a, b, and f(n), you can solve the complexity immediately.

1.10

Common Patterns to Recognize

In practice, you don't solve every recurrence with the Master Theorem. You learn to recognize patterns instantly.

Single Loop: O(n)

C#

for (int i = 0; i < n; i++)
    DoSomething();

Pattern: Loop runs n times, O(1) work per iteration.

Complexity: O(n)

Nested Loops: O(n²)

C#

for (int i = 0; i < n; i++)
    for (int j = 0; j < n; j++)
        DoSomething();

Pattern: Two nested loops, each running n times.

Complexity: O(n²)

Halving the Problem: O(log n)

C#

while (n > 1)
{
    DoSomething();
    n = n / 2;
}

Pattern: Halving the input each iteration (binary search, halving log₂(n) times).

Complexity: O(log n)

Halving with Linear Work: O(n log n)

C#

void SortHelper(int[] arr, int start, int end)
{
    if (start >= end) return;

    int mid = start + (end - start) / 2;
    SortHelper(arr, start, mid);      // T(n/2)
    SortHelper(arr, mid + 1, end);  // T(n/2)
    Merge(arr, start, mid, end);     // O(n)
}

Pattern: Recursion depth is log n, O(n) work at each level.

Complexity: O(n log n)

Independent Collections: O(m + n)

C#

void Process(int[] a, int[] b)
{
    foreach (int x in a) ProcessA(x);  // O(m)
    foreach (int y in b) ProcessB(y);  // O(n)
}

Pattern: Two separate loops on different inputs.

Complexity: O(m + n)

Exponential Branching: O(2ⁿ)

C#

int Fib(int n)
{
    if (n <= 1) return n;
    return Fib(n-1) + Fib(n-2);
}

Pattern: Each call makes multiple recursive calls (branching factor ≥ 2).

Complexity: O(2ⁿ) without memoization

1.11

Practice Problems

Apply complexity analysis to real code. Work through these problems to internalize the patterns.

Problem 1: Simple Array Processing

Find Sum

Calculate the sum of all elements in an array.

1

C#

public static int FindSum(int[] arr)
{
    int sum = 0;
    for (int i = 0; i < arr.Length; i++)
        sum += arr[i];
    return sum;
}

Time Complexity

O(n)

Space Complexity

O(1)

Explanation: Single loop through all n elements, constant space for sum variable.

Problem 2: Nested Loop with Condition

Print Pairs

Print all pairs (i,j) where i < j.

2

C#

public static void PrintPairs(int[] arr)
{
    for (int i = 0; i < arr.Length; i++)
    {
        for (int j = i + 1; j < arr.Length; j++)
        {
            Console.WriteLine($"{i}, {j}");
        }
    }
}

Time Complexity

O(n²)

Space Complexity

O(1)

Explanation: Nested loops generate all pairs. Even though inner loop doesn't always run n times, the total iterations are n + (n-1) + (n-2) + ... + 1 = n(n+1)/2 = O(n²).

Problem 3: Dictionary Lookup

Check Duplicates

Determine if array contains duplicate values.

3

C#

public static bool ContainsDuplicate(int[] arr)
{
    var seen = new HashSet<int>();
    foreach (int num in arr)
    {
        if (seen.Contains(num))
            return true;
        seen.Add(num);
    }
    return false;
}

Time Complexity

O(n)

Space Complexity

O(n)

Explanation: Single pass through array, each lookup/insert in HashSet is O(1) average. Space used for HashSet is O(n) in worst case.

Problem 4: Recursive Doubling

Power Function

Calculate 2^n recursively.

4

C#

public static long Power(int n)
{
    if (n == 0) return 1;
    return 2 * Power(n - 1);
}

Time Complexity

O(n)

Space Complexity

O(n)

Explanation: Recursion depth is n, each call does O(1) work. Total O(n). Stack depth also O(n).

Problem 5: Binary Search Variant

Count in Range

Count elements in a sorted array within a value range using binary search.

5

C#

public static int CountInRange(int[] arr, int low, int high)
{
    // Two binary searches: O(log n) each
    int leftIdx = FindFirst(arr, low);
    int rightIdx = FindLast(arr, high);
    return rightIdx - leftIdx + 1;
}

Time Complexity

O(log n)

Space Complexity

O(1)

Explanation: Two sequential binary searches, each O(log n). Total O(log n + log n) = O(log n).

Problem 6: Nested with Different Sizes

Cross Product

Process elements from two different-sized arrays.

6

C#

public static void ProcessCrossProduct(int[] a, int[] b)
{
    for (int i = 0; i < a.Length; i++)
    {
        for (int j = 0; j < b.Length; j++)
        {
            Process(a[i], b[j]);
        }
    }
}

Time Complexity

O(m·n)

Space Complexity

O(1)

Explanation: Nested loops on different inputs. Outer runs m times, inner n times, total m×n operations. Don't use 'n' for both!

Problem 7: Logarithmic Reduction

Sum Powers of 2

Sum all powers of 2 up to 2^n.

7

C#

public static long SumPowersOfTwo(int n)
{
    long sum = 0, power = 1;
    while (power <= n)
    {
        sum += power;
        power *= 2;  // Doubles each iteration
    }
    return sum;
}

Time Complexity

O(log n)

Space Complexity

O(1)

Explanation: Power grows exponentially (1, 2, 4, 8, ...), so loop runs log₂(n) times.

Problem 8: Complex Recursion

Fibonacci with Memoization

Calculate Fibonacci number using memoization.

8

C#

private static Dictionary<int, long> memo = new();

public static long Fib(int n)
{
    if (memo.ContainsKey(n))
        return memo[n];

    if (n <= 1) return n;

    long result = Fib(n-1) + Fib(n-2);
    memo[n] = result;
    return result;
}

Time Complexity

O(n)

Space Complexity

O(n)

Explanation: Each unique subproblem computed once. We compute Fib(0) through Fib(n), so n + 1 unique problems. Each takes O(1) with memoization. Total O(n) time and space.

1.12

Toptal Interview Tips

Algorithm interviews at Toptal and similar companies always involve complexity analysis. Here's how to ace the complexity discussion.

Tip 1: Always State Both Time and Space

Never say just "O(n)." Say "The time complexity is O(n) and space complexity is O(1)" or "This solution uses O(n) extra space." Volunteers information demonstrates completeness.

✓

Example answer format: "My solution iterates through the array once—that's O(n) time. It only uses a couple of variables for tracking, so O(1) auxiliary space. The space complexity is better than a hash table approach, which would be O(n)."

Tip 2: Justify Your Complexity Analysis

Don't just state the answer. Explain your reasoning so the interviewer can follow your logic (and correct you if you're wrong).

Bad

"This is O(n log n)."

Good

"We have a loop that runs n times, and inside we do a binary
search which is log n. So that's n times (log n) operations
each, giving us O(n log n)."

Tip 3: Compare Approaches

Strong candidates compare multiple solutions. Show you understand the tradeoffs.

Approach	Time	Space	When to Use
Linear scan	O(n)	O(1)	Simplest, good when n is small
Hash table	O(n)	O(n)	Fast lookups, space isn't constrained
Sorted + binary search	O(n log n)	O(1)	Space is critical, or data is already sorted

Tip 4: Specify Best/Worst/Average When Relevant

For algorithms where this matters, mention all three. It shows sophistication.

Example: "Quicksort has O(n log n) average case, but O(n²) worst case if we always pick a bad pivot. To avoid this, we can randomize the pivot selection, which gives us O(n log n) expected time with high probability."

Tip 5: Discuss Optimization Possibilities

Show you're thinking beyond the first solution. Can you reduce space? Trade time for space?

ℹ️

"This first pass is O(n²) because of nested loops. But I could sort first in O(n log n) and then use a two-pointer approach to get O(n log n) overall instead." This demonstrates optimization thinking.

Tip 6: Know When Complexity Doesn't Matter

Sometimes, code clarity trumps micro-optimizations. Show judgment.

"Both approaches are O(n), so I'd choose the clearer implementation rather than the one that saves a constant factor. Readability matters."

Tip 7: Recognize Patterns Instantly

Practice pattern recognition until you don't have to think. Seeing nested loops → O(n²) should be automatic.

PATTERN

Single Loop

One loop through n items → O(n)

PATTERN

Nested Loops

Loop in loop over same input → O(n²)

PATTERN

Halving

Divide in half each iteration → O(log n)

PATTERN

Hash Lookup

Dictionary/HashSet operation → O(1) avg

PATTERN

Sort + Process

Sort then iterate → O(n log n)

PATTERN

Multiple Recursions

f(n) calls f multiple times → exponential

Tip 8: Practice Writing Detailed Explanations

In Toptal interviews, you'll explain your code via video or text. Practice writing clear complexity explanations.

⚠️

Common mistakes to avoid:

Confusing "worst case" with "Big O" (Big O is usually worst case, but always clarify)
Forgetting to account for hidden operations (string concatenation, array resizing)
Using 'n' for multiple different inputs
Assuming operations take O(1) when they don't (copying arrays, string operations)

Tip 9: Code Along and Analyze Simultaneously

As you code, narrate your complexity analysis. Don't save it for the end.

"I'll iterate once through the array here—that's O(n). I'm using a hash set for lookups, which is O(1) on average, so no additional complexity. The total time is O(n)."

Tip 10: Master the Master Theorem

For recursive problems, quickly apply the Master Theorem. This impresses interviewers and speeds up your analysis.

✓

"This has T(n) = 2·T(n/2) + n. Using the Master Theorem with a=2, b=2, f(n)=n, we get T(n) = Θ(n log n)." This is cleaner than working through the recursion manually.

The Master Theorem — Complete Proof

A recurrence like T(n) = 2·T(n/2) + n defines itself in terms of itself — it has no direct answer yet. The Master Theorem is called a closed-form method because it closes that loop: it converts a self-referential recurrence into a direct, final expression like Θ(n log n) that you can evaluate immediately with no expansion, no iteration, and no guessing.

❌ Without it (open form)

Expand step by step: T(n) → 2T(n/2)+n → 4T(n/4)+2n → 8T(n/8)+3n → ... log n steps later → Θ(n log n). You must unroll, spot a pattern, and sum a series manually.

✓ With it (closed form)

Read off a=2, b=2, f(n)=n. Compare f(n) to n^log₂2=n. They match → Case 2 → Θ(n log n). One step. No expansion needed.

The same idea applies elsewhere in math: summing 1+2+…+n by hand is open-form; the formula n(n+1)/2 is closed-form. The Master Theorem is the closed-form equivalent for divide-and-conquer recurrences.

Divide & Conquer Recurrence Relations Asymptotic Analysis

MT

The Three Inputs You Provide

The Master Theorem needs exactly three things from you. You read them directly off your recurrence — nothing is computed yet.

a ≥ 1

Number of recursive calls

b > 1

Factor input shrinks by

f(n)

Work outside recursion

T(n) = a ↓ how many subproblems · T(n/ b ↓ how much smaller each is ) + f(n) ↓ cost to divide & combine

For merge sort: you make 2 recursive calls, each on an array of size n/2, and merging costs n work. So a=2, b=2, f(n)=n.

Where does n^{log_b a} come from? — Deriving it from scratch

log_b a is not a fourth input. It is a quantity that emerges naturally when you draw the recursion tree and count the leaf nodes. Here is the full derivation:

A

How deep is the tree?
At each level the problem size is divided by b. Starting from n, after k levels the size is n/b^k. The tree bottoms out when the problem is size 1:
n / b^k = 1 ⟹ b^k = n ⟹ k = log_b n The tree has exactly log_b n levels. This comes purely from b — the branching factor a plays no role in depth.

Definition of a Logarithm

The step b^k = n ⟹ k = log_b n is simply the definition of a logarithm:
log_b n = k means exactly b^k = n In plain English: "log_b n is the power you raise b to in order to get n."

Proof that k = log_b n:

Start with b^k = n
Take log_b of both sides: log_b(b^k) = log_b n
Left side simplifies — log and exponent cancel: k · log_b b = log_b n
Since log_b b = 1 by definition: k = log_b n ✓

Concrete example: If b=2 and n=8, then 2^k=8 ⟹ k=3, and log₂ 8 = 3. ✓ A binary tree over 8 elements is 3 levels deep.

B

How many leaves are there?
At each level, every node spawns a children. So at level k there are a^k nodes. At the leaf level k = log_b n, the number of leaves is:
a^{log_b n} This is the number of base cases the algorithm reaches. Each base case costs Θ(1), so the total leaf cost is Θ(a^{log_b n}).

C

Simplify using the logarithm identity
The expression a^{log_b n} looks messy, but a standard log identity lets us swap the base and exponent:
a^{log_b n} = n^{log_b a} Proof of identity: let x = a^{log_b n}. Take log_b of both sides: log_b x = log_b n · log_b a. So x = b^{log_bn · log_ba} = n^log_ba. ✓

This is where n^{log_b a} comes from — it is simply a cleaner way to write the total number of leaf nodes. The exponent log_b a is derived; you never look it up.

Building the Recursion Tree — Merge Sort Step by Step

The best way to understand the proof is to trace through a real example level by level. We use merge sort: T(n) = 2·T(n/2) + n with n = 8, so a = 2, b = 2, f(n) = n.

At every level we track three things: how many subproblems exist, what size they are, and what the combine cost is per node and in total.

Level	Number of nodes	Subproblem size	Divide & combine cost per node	Total cost at this level
0 (root)	1	n = 8	f(8) = 8	`1 × 8 = 8 = n`
1	2 = a¹	n/2 = 4	f(4) = 4	`2 × 4 = 8 = n`
2	4 = a²	n/4 = 2	f(2) = 2	`4 × 2 = 8 = n`
3 (leaves)	8 = a³ = n	n/8 = 1	Θ(1) base case	`8 × 1 = 8 = n`

ℹ️

Every level costs exactly n = 8 total work. There are log₂ 8 = 3 + 1 = 4 levels (depth 0 through 3). Grand total = n × (log n + 1) levels = Θ(n log n). The "+1" is absorbed into the Θ notation.

RECURSION TREE — MERGE SORT (n=8) levels × combine cost = total

Level 0  nodes=1   size=8
              [8]                               cost: 8
           ┌────┴────┐
Level 1  nodes=2   size=4
          [4]       [4]                       cost: 4+4 = 8
        ┌──┴──┐   ┌──┴──┐
Level 2  nodes=4   size=2
       [2]  [2]  [2]  [2]               cost: 2×4 = 8
      ┌─┴─┐ ┌─┴─┐ ┌─┴─┐ ┌─┴─┐
Level 3  nodes=8   size=1  (leaves)
     [1][1][1][1][1][1][1][1]        cost: 1×8 = 8

─────────────────────────────────────────────────────
Total = 8 + 8 + 8 + 8 = 4 × n = n × log₂(n)  →  Θ(n log n)
        └── log₂8 = 3 levels + 1 leaf level ──┘

The General Sum Formula

For any recurrence T(n) = a·T(n/b) + f(n), the pattern from the table generalises to:

T(n) = ∑ k=0 to log_bn−1 a^k · f(n/b^k) + Θ(n^log_ba)

Left sum — internal nodes
At level k there are a^k nodes, each processing a subproblem of size n/b^k, costing f(n/b^k) each. Summing across all log_bn levels gives the total internal work.

Right term — leaf nodes
There are n^log_ba leaves (derived above), each costing Θ(1). This is the total cost of all base cases. It represents the recursive work stripped of the combine steps.

The Three Cases — Which Term Wins?

The theorem asks: does the internal work (the sum) grow faster, slower, or the same as the leaf work (n^log_ba)? The answer determines which term dominates the total.

CASE 1 Leaves dominate — recursion wins

If f(n) = O(n^{log_ba − ε}) for some ε > 0

Then T(n) = Θ(n^log_ba)

Why: The combine cost f(n) is polynomially smaller than the leaf cost n^log_ba. As we go deeper in the tree, the combine work per level shrinks fast enough that the sum of all internal work is dominated by the bottom-most levels. It's like a geometric series that converges — the vast majority of work happens at the leaves.

Example — Binary tree traversal: T(n) = 2T(n/2) + 1 → a=2, b=2, f(n)=1.
n^log₂2 = n. Is f(n)=1 = O(n^1−ε)? Yes (ε=1). So T(n) = Θ(n). The tree visits n nodes, each doing O(1) work.

CASE 2 All levels tied — multiply by log n

If f(n) = Θ(n^log_ba)

Then T(n) = Θ(n^log_ba · log n)

Why: The combine cost per level exactly matches the leaf cost. As seen in the merge sort table, every level contributes the same total work — n. With log_bn levels each contributing the same amount, the total is simply that amount multiplied by the number of levels: Θ(n^log_ba × log n). The log n factor is the fingerprint of this balanced case.

Example — Merge sort: T(n) = 2T(n/2) + n → a=2, b=2, f(n)=n.
n^log₂2 = n. Is f(n) = Θ(n)? Yes. So T(n) = Θ(n log n). Each of log n levels costs n to merge.

CASE 3 Root dominates — combine wins

If f(n) = Ω(n^{log_ba + ε}) for some ε > 0 AND a · f(n/b) ≤ c · f(n) for some c < 1 (regularity)

Then T(n) = Θ(f(n))

Why: The combine cost f(n) is polynomially larger than the leaf cost. As we go deeper the work decreases geometrically (the regularity condition guarantees this), so the root level — where subproblem size is largest — contributes the most. The geometric series of internal costs sums to a constant multiple of f(n).

Regularity condition: We need a · f(n/b) ≤ c · f(n) to confirm that work truly decreases at each level. Without it, f(n) could grow fast but oscillate in a way that prevents the root from dominating.

Example: T(n) = T(n/2) + n² → a=1, b=2, f(n)=n².
n^log₂1 = n⁰ = 1. Is f(n)=n² = Ω(n^0+ε)? Yes. Check regularity: 1·f(n/2) = n²/4 ≤ (3/4)·n². Holds. So T(n) = Θ(n²).

Quick Reference

Case	Condition on f(n)	Result T(n)	Who dominates	Classic example
Case 1	`f(n) = O(n^{log_ba − ε})`	`Θ(n^log_ba)`	Leaves (base cases)	Binary tree traversal: T(n)=2T(n/2)+1 → Θ(n)
Case 2	`f(n) = Θ(n^log_ba)`	`Θ(n^log_ba · log n)`	All levels equally	Merge sort: T(n)=2T(n/2)+n → Θ(n log n)
Case 3	`f(n) = Ω(n^{log_ba + ε})` + regularity	`Θ(f(n))`	Root (combine step)	T(n)=T(n/2)+n² → Θ(n²)

Complete Worked Example — Merge Sort

Here is the full process from recurrence to answer, tracing every step explicitly.

1

Write the recurrence: T(n) = 2·T(n/2) + n
Merge sort splits the array in two (2 recursive calls), each half (n/2), and merging takes linear time (n).

2

Read off a, b, f(n): a = 2, b = 2, f(n) = n
These are the only inputs to the theorem. No computation yet.

3

Find the tree depth: log_b n = log₂ n levels
We divide by b=2 each time; the tree is log₂ n deep.

4

Count the leaves: a^log_bn = 2^log₂n = n leaves
Using the identity: n^log_ba = n^log₂2 = n¹ = n
This is where n^log_ba appears — it is simply the total leaf count, rewritten.

5

Compare f(n) to n^log_ba: f(n) = n vs n¹ = n
They are equal: f(n) = Θ(n^log₂2) = Θ(n). Case 2 applies.

6

Apply Case 2: T(n) = Θ(n^log₂2 · log n) = Θ(n · log n) = Θ(n log n)
Intuition check: log₂ n levels, each costs n to merge → n × log n total. ✓

⚠️

When the Master Theorem does NOT apply:

Unequal subproblem sizes — e.g. T(n) = T(n/3) + T(2n/3) + n. All subproblems must be the same size n/b. Use Akra-Bazzi for unequal sizes.
Gap between cases — e.g. f(n) = n/log n sits between Case 1 and 2 with no valid ε. The theorem only covers polynomial gaps.
Regularity condition fails in Case 3 — even if f(n) is large enough, you must verify a·f(n/b) ≤ c·f(n) for some c < 1.
Generalized Case 2 — if f(n) = Θ(n^log_ba · log^kn), the result is Θ(n^log_ba · log^k+1n).