bloooooooooooooooooooooooooooooooooooooooooooooooooog

Advent of Code 2025

Advent of Code is an annual event where people solve Santa-themed programming problems in December.

Here, I’ll write about my solutions for the 2025 event. It’s shorter than past ones, at just 12 days. The twist in my version is that I’ll try to squeeze the pointless microseconds off my solutions’ running times.

In Julia, to stay high-level, yet have compiled speeds.

In the grand tradition of these writeups, I will not include either the question, or my input. You are invited to try solving on your own first.

Day 1

I want to measure my solution’s speed, not the StringInt64 performance, so let’s start by importing the measurument tools, and parsing the input. I’ll use positive and negative ints for R and L rotations respectively.

using BenchmarkTools

scan_val(s) = (s[1] == 'L' ? -1 : 1) * parse(Int64, s[2:end])
lines = readlines(stdin) .|> scan_val

My first solution to part 1 is what you’d expect:

using IterTools
positions(xs) = accumulate(+, chain([ 50 ], xs))
fits(x) = x % 100 == 0
part1_v1(r) = r |> positions .|> fits |> sum

To measure the run time, I use BenchmarkTools.jl:

display(@benchmark part1_v1($lines))
BenchmarkTools.Trial: 10000 samples with 7 evaluations per sample.
Range (min … max):  3.974 μs … 293.885 μs  ┊ GC (min … max):  0.00% … 95.64%
Time  (median):     4.490 μs               ┊ GC (median):     0.00%
Time  (mean ± σ):   5.316 μs ±   5.461 μs  ┊ GC (mean ± σ):  10.55% ± 11.21%

██▆▃▂▁         ▁                                            ▂
████████▇▆▆▆▅▅██▆▆▅▄▅▃▁▃▁▁▁▄▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▅██▇▆▆▅▅ █
3.97 μs      Histogram: log(frequency) by time      30.2 μs <

Memory estimate: 33.45 KiB, allocs estimate: 6.

Note the allocations, and the 4 µs time. We are creating intermediate arrays. This could be converted to iterators, or we can use a plain loop.

function part1_v2(rs)
    cur, p1 = 50, 0
    for r  rs
        cur += r
        p1 += fits(cur)
    end
    return p1
end

Did it improve things?

BenchmarkTools.Trial: 10000 samples with 9 evaluations per sample.
Range (min … max):  2.862 μs …  4.351 μs  ┊ GC (min … max): 0.00% … 0.00%
Time  (median):     2.878 μs              ┊ GC (median):    0.00%
Time  (mean ± σ):   2.890 μs ± 62.957 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

▁▆█▇▄                     ▁                             ▁  ▂
█████▇▃▁▁▁▁▃▁▁▁▁▁▁▁▁▃▁▁▃▇███▇▇▇▄▄▅▃▄▄▄▄▄▃▁▄▃▁▃▄▃▁▄▁▁▃▁▄▆██ █
2.86 μs      Histogram: log(frequency) by time     3.19 μs <

Memory estimate: 0 bytes, allocs estimate: 0.

Indeed.

I don’t think much else can be done for part 1: we do need to look at each prefix sum, and do a check for each point. If there was a guarantee that each rotation is less than a full circle, we could convers the modulo division to a comparison and subtraction, but there is no such guarantee, and it’s not true for my input.

For part 2, let’s start with something simple:

moves(ps) = zip(ps[1:end-1], ps[2:end]) .|> sort
zeros_between_v1((x, y)) = x:y .|> fits |> sum
part2_v1(r) = (r |> positions |> moves .|> zeros_between_v1 |> sum) - part1_v1(r) + fits(sum(r))

zeros_between_v1 checks every number in a closed interval [x, y]. So every “stop” is counted twice; to counter that, we subtract the result of part 1. But the very last point was only counted once in both, so we need to add it back in, if it fits. Off-by-one is cursed.

Benchmark!

BenchmarkTools.Trial: 9013 samples with 1 evaluation per sample.
Range (min … max):  481.182 μs …   3.512 ms  ┊ GC (min … max): 0.00% … 77.50%
Time  (median):     495.684 μs               ┊ GC (median):    0.00%
Time  (mean ± σ):   553.220 μs ± 221.447 μs  ┊ GC (mean ± σ):  8.20% ± 13.44%

█▆▅▄▃▂                                                        ▁
████████▇▇▇▆▅▅▁▄▁▁▁▁▁▁▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▅▆▆▆▇▇▇▇██▇▆▇▇▆▇▆▆▆ █
481 μs        Histogram: log(frequency) by time       1.54 ms <

Memory estimate: 618.52 KiB, allocs estimate: 4242.

I love a high baseline.

Checking every number is silly; we can do integer division to decide how many circles we’ve done.

zeros_between((x, y)) = fld(y, 100) - fld(x, 100) + fits(x)
part2_v2(r) = (r |> positions |> moves .|> zeros_between |> sum) - part1_v2(r) + fits(sum(r))

Note how we need to check the left edge separately; consider [200, 250]: fld would return 2 for both edges.

Did that help?

BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
Range (min … max):  32.967 μs …  1.131 ms  ┊ GC (min … max):  0.00% … 92.84%
Time  (median):     36.134 μs              ┊ GC (median):     0.00%
Time  (mean ± σ):   44.375 μs ± 49.280 μs  ┊ GC (mean ± σ):  14.41% ± 11.87%

█▅▂▁                                                        ▁
█████▆▆▁▄▄▇█▆▄▃▃▄▃▁▁▃▁▁▁▁▄▁▁▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▁▁▁▁▁▁▁▄▅▆██ █
33 μs        Histogram: log(frequency) by time       351 μs <

Memory estimate: 321.41 KiB, allocs estimate: 34.

That’s better. How far can we go?

Let’s again try and control allocations by going iterative.

function part2_v3(rs)
    p2 = p1 = 0
    step, last, next = 0, 50, 50
    for step  rs
        next = last + step
        p2 += step>0 ? zeros_between((last, next)) : zeros_between((next, last))
        p1 += fits(last)
        last = next
    end
    p1 += fits(last)
    return p2 - p1
end

Our zeros_between expects a sorted pair. Note that I’ve inlined the part 1 calculation.

BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
Range (min … max):  14.754 μs …  65.051 μs  ┊ GC (min … max): 0.00% … 0.00%
Time  (median):     14.835 μs               ┊ GC (median):    0.00%
Time  (mean ± σ):   14.905 μs ± 679.352 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

▂▆█▇▄                                                        ▂
█████▅▁▁▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▅▆▇▇▆▆▆▅▆▅▅▅▇█▇▅▄▄ █
14.8 μs       Histogram: log(frequency) by time      16.5 μs <

Memory estimate: 0 bytes, allocs estimate: 0.

More than 2x speedup.

Can we go further?

I’m so glad you asked. Euclidean division is expensive; the integer division computers do (“closest to zero”) is faster, but will only work for us if all positions are positive. Let’s make them so. Also, cache results of integer division as much as possible: each edge is effectively processed twice, as we run zeros_between twice for each end. Let’s inline it.

function part2_v4(rs)
    p2 = 0
    step, last, next = 0, 10000050, 10000050 # "ensure" no negative numbers so div == fld
    divlast = divnext = 10000050 ÷ 100
    fitslast = fitsnext = 0
    for step  rs
        next = last + step
        divnext = next ÷ 100
        fitsnext = Int64(next == 100 * divnext)
        # let's group both counts to avoid extra assignments
        if signbit(step)
            p2 += divlast - divnext + fitsnext - fitslast
        else
            p2 += divnext - divlast
        end
        last, divlast, fitslast = next, divnext, fitsnext
    end
return p2
end

I’ve also inlined “part 1” calculation further.

BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
Range (min … max):  8.180 μs …  83.509 μs  ┊ GC (min … max): 0.00% … 0.00%
Time  (median):     8.239 μs               ┊ GC (median):    0.00%
Time  (mean ± σ):   8.288 μs ± 825.242 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

▄▇█▆▁                                                       ▂
█████▃▁▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▅██▅▁▁▁▁▁▁▁▁▁▁▁▄▅▅▆▆▇▅▅▃▅ █
8.18 μs      Histogram: log(frequency) by time      9.69 μs <

Memory estimate: 0 bytes, allocs estimate: 0.

For the last step, let’s chunk together same-direction rotations.

function part2_v5(rs)
    p2 = p1 = 0
    step, last = rs[1], 1000050
    divlast = 10000
    fitslast = 0
    for r  Iterators.rest(rs, 2) # rs[2:end] would allocate!
        if signbit(r) == signbit(step) # step and r never zero, so this is equivalent to sign()
            step += r
            continue
        end
        # don’t want to keep extra vars, will update in place instead. extra assignment to p2, but -3 assignments to *next.
        if signbit(step)
            p2 += divlast - fitslast
            last += step
            divlast = last ÷ 100
            fitslast = Int64(last == 100 * divlast)
            p2 += -divlast + fitslast
        else
            p2 -= divlast
            last += step
            divlast = last ÷ 100
            fitslast = Int64(last == 100 * divlast)
            p2 += divlast
        end
        step = r
    end
    if signbit(step)
        p2 += divlast - fitslast
        last += step
        divlast = last ÷ 100
        fitslast = Int64(last == 100 * divlast)
        p2 += -divlast + fitslast
    else
        p2 -= divlast
        last += step
        divlast = last ÷ 100
        fitslast = Int64(last == 100 * divlast)
        p2 += divlast
    end
return p2
end

Giving

BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
Range (min … max):  5.198 μs …  22.137 μs  ┊ GC (min … max): 0.00% … 0.00%
Time  (median):     5.238 μs               ┊ GC (median):    0.00%
Time  (mean ± σ):   5.278 μs ± 378.826 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

▃▇█▇▆▄▄▄▂                                                   ▂
██████████▇▇█▆▆▇▅▄▅▅▅▅▄▄▄▄▁▃▅▃▃▄▃▃▁▁▁▄▁▁▁▁▁▁▁▁▃▁▁▁▃▁▁▄▇███▆ █
5.2 μs       Histogram: log(frequency) by time      5.99 μs <

Memory estimate: 0 bytes, allocs estimate: 0.

Day 2

Let’s start with data preparation:

inp = "<puzzle input>"

stringed_intervals = split(inp, ",") .|> (x->split(x, "-")) .|> x->(x[1], x[2])
numeric_intervals = stringed_intervals .|> x->(x .|> x->parse(Int, x))

The problem is at an intersection, where stringy representations are useful (we’re dealing with properties of the decimal representation), but numeric ones also can be (it is a numeric property).

Let’s start with a “regex” approach:

re1 = r"^(.+)\1$"
re2 = r"^(.+)\1+$"
match_in_interval(re, (x, y)) = x:y .|> string .|> (s->occursin(re, s) ? parse(Int, s) : 0) |> sum
match_in_interval(re) = (xs...) -> match_in_interval(re, xs...)
p1 = numeric_intervals .|> match_in_interval(re1) |> sum
p2 = numeric_intervals .|> match_in_interval(re2) |> sum
display((p1, p2))

Note the quotes around “regex”: this is a form PCRE supports, but these are not regular expressions, and cannot be compiled to FSMs for efficient execution. We are also checking every number in the sequence.

Benchmarks:

BenchmarkTools.Trial: 24 samples with 1 evaluation per sample.
 Range (min … max):  204.602 ms … 222.144 ms  ┊ GC (min … max): 5.23% … 3.80%
 Time  (median):     210.692 ms               ┊ GC (median):    5.25%
 Time  (mean ± σ):   210.891 ms ±   4.028 ms  ┊ GC (mean ± σ):  5.71% ± 1.33%
 Memory estimate: 133.89 MiB, allocs estimate: 3899771.

BenchmarkTools.Trial: 24 samples with 1 evaluation per sample.
 Range (min … max):  207.480 ms … 237.776 ms  ┊ GC (min … max): 4.29% … 6.68%
 Time  (median):     215.786 ms               ┊ GC (median):    6.20%
 Time  (mean ± σ):   215.849 ms ±   5.896 ms  ┊ GC (mean ± σ):  5.61% ± 1.35%
 Memory estimate: 133.89 MiB, allocs estimate: 3899771.

About as bad as you’d expect.

Let’s do some more preprocessing on our source data:

using DataStructures

function preprocessed_numeric_intervals(n_i)
    out = Stack{Tuple{Int64, Int64}}()
    for (x, y)  n_i
        digits_x, digits_y = trunc(Int64, log10(x)), trunc(Int64, log10(y))
        while digits_x < digits_y
            push!(out, (x, 10^(digits_x+1)-1))
            x = 10^(digits_x+1)
            digits_x += 1
        end
        push!(out, (x, y))
    end
    return out
end

prepr_int = numeric_intervals |> preprocessed_numeric_intervals

This gives us a stack of intervals where start and end are equally long. This makes things easier. The timing here is sub-µs.

Now for part 1. If we have an interval of, say, 8-digit numbers, all those that “fit” will be divisible by 10 001. So we can just look at an arithmetic progression, and at the part of it within our interval.

function part1_v1(intervals)
    tot = 0
    for (x, y)  intervals
        digits = trunc(Int64, log10(x)) + 1
        if digits % 2 == 1
            continue
        end
        divisor = 10^(digits ÷ 2) + 1
        min_fits = x ÷ divisor + 1
        max_fits = y ÷ divisor
        if max_fits >= min_fits 
            tot += divisor * trunc(Int64, (min_fits + max_fits) / 2 * (max_fits - min_fits + 1))
        end
    end
    return tot
end

Benchmark:

BenchmarkTools.Trial: 10000 samples with 88 evaluations per sample.
 Range (min … max):  810.341 ns … 1.328 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     812.273 ns             ┊ GC (median):    0.00%
 Time  (mean ± σ):   814.401 ns ± 9.249 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

Yeah.

For part 2, we’ll need to generalize this to take the length of the repeated substring:

function failed_for_length(substr_length, (x, y))
    if substr_length == 0 return 0 end
    digits = trunc(Int64, log10(x)) + 1
    if digits % substr_length != 0 return 0 end
    divisor = sum(10^i for i  0:substr_length:(digits-substr_length))
    min_fits, max_fits = x ÷ divisor+1, y ÷ divisor
    if max_fits < min_fits return 0 end
    return divisor * (min_fits + max_fits) * (max_fits - min_fits + 1) ÷ 2 
end

Reimplementing part 1 with this

function part1_v2(intervals)
    tot = 0
    for (x, y)  intervals
        digits = trunc(Int64, log10(x)) + 1
        tot += failed_for_length(digits ÷ 2, (x, y))
    end
    return tot
end

causes a slowdown (largely due to extra integer division), but we’ve only done this to check.

So for part 2 we want to check all possible substring length, right? Except: if we are looking at 8-digit numbers, number 12121212 will be included both in failed_for_length(2,...), and failed_for_length(4,...). So we have to be smart. Full proof is left as an exercise to the reader, but:

If you have a divisor di of total length D, and divisor dj, and di divides dj, and C(x) is all ids formed by repeating substring of length x, then C(di) ∈ C(dj).

So we need to:

  1. Find all di < D such that di|D
  2. for each di, calculate Si = ∑x ∈ C(di)x
  3. for each di, find Ni = |j : di|dj ∧ i ≠ j|
  4. result is iSi(1 − Ni)

I fell that we can just pre-compute all di and 1 − Ni for the lengths we have in our input. We have lengths up to 10.

function part2_v2(intervals)
    # pre-computed divisors and coefficients.
    divs = [
        [ ],
        [ (1, 1) ],
        [ (1, 1) ],
        [ (2, 1) ],
        [ (1, 1) ],
        [ (1, -1), (2, 1), (3, 1), ],
        [ (1, 1) ],
        [ (4, 1) ],
        [ (3, 1) ],
        [ (1, -1), (2, 1), (5, 1), ],
    ]
    tot = 0
    for (x, y)  intervals
        D = trunc(Int64, log10(x)) + 1
        tot += sum(
            (n * failed_for_length(d, (x, y))
                for (d, n)  divs[D]
                if d != 0);
            init=0)
    end
    return tot
end

Giving a benchmark of

BenchmarkTools.Trial: 10000 samples with 6 evaluations per sample.
 Range (min … max):  5.359 μs …  1.844 ms  ┊ GC (min … max): 0.00% … 99.19%
 Time  (median):     6.152 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   6.841 μs ± 21.484 μs  ┊ GC (mean ± σ):  4.29% ±  1.40%
 Memory estimate: 6.23 KiB, allocs estimate: 230.

We can get some more speedup out of this. Note, for example, that we are computing digits both in part2_v2 and in failed_for_length. We can just pass in what was calculated, and cut out a log10. Exponentiation may also be problematic, so let’s pre-compute powers of 10. Finally, let’s make the divs array static, so we don’t allocate.

using StaticArrays

function failed_for_length_v2(substr_length, digits, (x, y))
    powers_10 = SA[ 1, 10, 100, 1000, 10_000, 100_000, 1_000_000, 10_000_000, 100_000_000, 1_000_000_000, 10_000_000_000 ]
    divisor = sum(view(powers_10, 1:substr_length:(1+digits-substr_length)))
    min_fits, max_fits = x ÷ divisor+1, y ÷ divisor
    if max_fits < min_fits return 0 end
    return divisor * (min_fits + max_fits) * (max_fits - min_fits + 1) ÷ 2 
end

function part2_v4(intervals)
    # pre-computed divisors and coefficients.
    divs = SA[
        SA[ (0,0), (0,0), (0,0), ],
        SA[ (1, 1), (0,0), (0,0), ],
        SA[ (1, 1), (0,0), (0,0), ],
        SA[ (2, 1), (0,0), (0,0), ],
        SA[ (1, 1), (0,0), (0,0), ],
        SA[ (1, -1), (2, 1), (3, 1), ],
        SA[ (1, 1), (0,0), (0,0), ],
        SA[ (4, 1), (0,0), (0,0), ],
        SA[ (3, 1), (0,0), (0,0), ],
        SA[ (1, -1), (2, 1), (5, 1), ],
    ]
    tot = 0
    for (x, y)  intervals
        D = trunc(Int64, log10(x)) + 1
        tot += sum(
            (n * failed_for_length_v2(d, D, (x, y))
                for (d, n)  divs[D]
                if d != 0);
            init=0)
    end
    return tot    
end

Benchmark:

BenchmarkTools.Trial: 10000 samples with 10 evaluations per sample.
 Range (min … max):  1.788 μs …   4.093 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     2.035 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   2.046 μs ± 108.202 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%
 Memory estimate: 0 bytes, allocs estimate: 0.
And that’s all that comes to mind.