bloooooooooooooooooooooooooooooooooooooooooooooooooog

Advent of Code 2025

Advent of Code is an annual event where people solve Santa-themed programming problems in December.

Here, I’ll write about my solutions for the 2025 event. It’s shorter than past ones, at just 12 days. The twist in my version is that I’ll try to squeeze the pointless microseconds off my solutions’ running times.

In Julia, to stay high-level, yet have compiled speeds.

In the grand tradition of these writeups, I will not include either the question, or my input. You are invited to try solving on your own first.

Day 1

I want to measure my solution’s speed, not the StringInt64 performance, so let’s start by importing the measurument tools, and parsing the input. I’ll use positive and negative ints for R and L rotations respectively.

using BenchmarkTools

scan_val(s) = (s[1] == 'L' ? -1 : 1) * parse(Int64, s[2:end])
lines = readlines(stdin) .|> scan_val

My first solution to part 1 is what you’d expect:

using IterTools
positions(xs) = accumulate(+, chain([ 50 ], xs))
fits(x) = x % 100 == 0
part1_v1(r) = r |> positions .|> fits |> sum

To measure the run time, I use BenchmarkTools.jl:

display(@benchmark part1_v1($lines))
BenchmarkTools.Trial: 10000 samples with 7 evaluations per sample.
Range (min … max):  3.974 μs … 293.885 μs  ┊ GC (min … max):  0.00% … 95.64%
Time  (median):     4.490 μs               ┊ GC (median):     0.00%
Time  (mean ± σ):   5.316 μs ±   5.461 μs  ┊ GC (mean ± σ):  10.55% ± 11.21%

Memory estimate: 33.45 KiB, allocs estimate: 6.

Note the allocations, and the 4 µs time. We are creating intermediate arrays. This could be converted to iterators, or we can use a plain loop.

function part1_v2(rs)
    cur, p1 = 50, 0
    for r  rs
        cur += r
        p1 += fits(cur)
    end
    return p1
end

Did it improve things?

BenchmarkTools.Trial: 10000 samples with 9 evaluations per sample.
Range (min … max):  2.862 μs …  4.351 μs  ┊ GC (min … max): 0.00% … 0.00%
Time  (median):     2.878 μs              ┊ GC (median):    0.00%
Time  (mean ± σ):   2.890 μs ± 62.957 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

Memory estimate: 0 bytes, allocs estimate: 0.

Indeed.

I don’t think much else can be done for part 1: we do need to look at each prefix sum, and do a check for each point. If there was a guarantee that each rotation is less than a full circle, we could convers the modulo division to a comparison and subtraction, but there is no such guarantee, and it’s not true for my input.

For part 2, let’s start with something simple:

moves(ps) = zip(ps[1:end-1], ps[2:end]) .|> sort
zeros_between_v1((x, y)) = x:y .|> fits |> sum
part2_v1(r) = (r |> positions |> moves .|> zeros_between_v1 |> sum) - part1_v1(r) + fits(sum(r))

zeros_between_v1 checks every number in a closed interval [x, y]. So every “stop” is counted twice; to counter that, we subtract the result of part 1. But the very last point was only counted once in both, so we need to add it back in, if it fits. Off-by-one is cursed.

Benchmark!

BenchmarkTools.Trial: 9013 samples with 1 evaluation per sample.
Range (min … max):  481.182 μs …   3.512 ms  ┊ GC (min … max): 0.00% … 77.50%
Time  (median):     495.684 μs               ┊ GC (median):    0.00%
Time  (mean ± σ):   553.220 μs ± 221.447 μs  ┊ GC (mean ± σ):  8.20% ± 13.44%

Memory estimate: 618.52 KiB, allocs estimate: 4242.

I love a high baseline.

Checking every number is silly; we can do integer division to decide how many circles we’ve done.

zeros_between((x, y)) = fld(y, 100) - fld(x, 100) + fits(x)
part2_v2(r) = (r |> positions |> moves .|> zeros_between |> sum) - part1_v2(r) + fits(sum(r))

Note how we need to check the left edge separately; consider [200, 250]: fld would return 2 for both edges.

Did that help?

BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
Range (min … max):  32.967 μs …  1.131 ms  ┊ GC (min … max):  0.00% … 92.84%
Time  (median):     36.134 μs              ┊ GC (median):     0.00%
Time  (mean ± σ):   44.375 μs ± 49.280 μs  ┊ GC (mean ± σ):  14.41% ± 11.87%

Memory estimate: 321.41 KiB, allocs estimate: 34.

That’s better. How far can we go?

Let’s again try and control allocations by going iterative.

function part2_v3(rs)
    p2 = p1 = 0
    step, last, next = 0, 50, 50
    for step  rs
        next = last + step
        p2 += step>0 ? zeros_between((last, next)) : zeros_between((next, last))
        p1 += fits(last)
        last = next
    end
    p1 += fits(last)
    return p2 - p1
end

Our zeros_between expects a sorted pair. Note that I’ve inlined the part 1 calculation.

BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
Range (min … max):  14.754 μs …  65.051 μs  ┊ GC (min … max): 0.00% … 0.00%
Time  (median):     14.835 μs               ┊ GC (median):    0.00%
Time  (mean ± σ):   14.905 μs ± 679.352 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

Memory estimate: 0 bytes, allocs estimate: 0.

More than 2x speedup.

Can we go further?

I’m so glad you asked. Euclidean division is expensive; the integer division computers do (“closest to zero”) is faster, but will only work for us if all positions are positive. Let’s make them so. Also, cache results of integer division as much as possible: each edge is effectively processed twice, as we run zeros_between twice for each end. Let’s inline it.

function part2_v4(rs)
    p2 = 0
    step, last, next = 0, 10000050, 10000050 # "ensure" no negative numbers so div == fld
    divlast = divnext = 10000050 ÷ 100
    fitslast = fitsnext = 0
    for step  rs
        next = last + step
        divnext = next ÷ 100
        fitsnext = Int64(next == 100 * divnext)
        # let's group both counts to avoid extra assignments
        if signbit(step)
            p2 += divlast - divnext + fitsnext - fitslast
        else
            p2 += divnext - divlast
        end
        last, divlast, fitslast = next, divnext, fitsnext
    end
    return p2
end

I’ve also inlined “part 1” calculation further.

BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
Range (min … max):  8.180 μs …  83.509 μs  ┊ GC (min … max): 0.00% … 0.00%
Time  (median):     8.239 μs               ┊ GC (median):    0.00%
Time  (mean ± σ):   8.288 μs ± 825.242 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

Memory estimate: 0 bytes, allocs estimate: 0.

For the last step, let’s chunk together same-direction rotations.

function part2_v5(rs)
    p2 = p1 = 0
    step, last = rs[1], 1000050
    divlast = 10000
    fitslast = 0
    for r  Iterators.rest(rs, 2) # rs[2:end] would allocate!
        if signbit(r) == signbit(step) # step and r never zero, so this is equivalent to sign()
            step += r
            continue
        end
        # don’t want to keep extra vars, will update in place instead. extra assignment to p2, but -3 assignments to *next.
        if signbit(step)
            p2 += divlast - fitslast
            last += step
            divlast = last ÷ 100
            fitslast = Int64(last == 100 * divlast)
            p2 += -divlast + fitslast
        else
            p2 -= divlast
            last += step
            divlast = last ÷ 100
            fitslast = Int64(last == 100 * divlast)
            p2 += divlast
        end
        step = r
    end
    if signbit(step)
        p2 += divlast - fitslast
        last += step
        divlast = last ÷ 100
        fitslast = Int64(last == 100 * divlast)
        p2 += -divlast + fitslast
    else
        p2 -= divlast
        last += step
        divlast = last ÷ 100
        fitslast = Int64(last == 100 * divlast)
        p2 += divlast
    end
return p2
end

Giving

BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
Range (min … max):  5.198 μs …  22.137 μs  ┊ GC (min … max): 0.00% … 0.00%
Time  (median):     5.238 μs               ┊ GC (median):    0.00%
Time  (mean ± σ):   5.278 μs ± 378.826 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

Memory estimate: 0 bytes, allocs estimate: 0.

Not bad.