Puzzle #9

 21st July 2023 at 7:57am

First part

Some two days have passed without barely any possibility of proceeding with the programming. We have one less room in the house, as it is undergoing some renovation works, and we have been having some electricity problems as well: there is only one outlet. With all of this, I feel like my progress has been a little setback and I'd like to keep going steady and get back on track; thus, I think I'll be using Phind a little more aggressively.

As for the problem at hand, it seems simple enough: something like

def produce_decompressed_message(
        raw: str,
        decom: str,
        ) -> str:
    if raw == '':
        return decom
    else:
        # get the starting and ending indices of marker;
        # as well as the market object (of the form
        # `(AxB)`, where A and B are arbitrary length ints)
        start, end, marker = get_marker_info(raw)
        expanded_marker = decompress_raw(marker)
        return produce_decompressed_message(
                raw[end + 1:],
                decom + raw[:start] + expanded_marker
                )

feels like it could work (at least, without any pen and paper or further reasoning, I don't immediately see a failing point).


of course, there were some issues when translating this into proper code. It's quite late, and I won't go into deep writings now, and hopefully Alex of the future will make some sense of this just from the comments alone. I think there is only one issue, properly signaled with the TODO tag:

import re

pattern = r'\((\d+)x(\d+)\)'


def get_marker_info(
        raw: str
        ):
    match = re.search(pattern, raw)
    if not match:
        return None, None, None
    else:
        return tuple([
                match.start(),
                match.end(),
                tuple(map(int, match.groups()))])


def decompress_raw(
        raw: str,
        index: int,
        repeats: int
        ) -> str:
    return raw[:index] * repeats


def produce_decompressed_message(
        raw: str,
        decom: str,
        ) -> str:
    if raw == '':
        return decom
    else:
        # get the starting and ending indices of marker;
        # as well as the market object (of the form
        # `(AxB)`, where A and B are arbitrary length ints)
        # start and end are needed;
        # consider the case where the decompressed is
        # smaller than the marker?
        start, end, marker = get_marker_info(raw)
        if marker is None:
            return decom + raw
        # start decompressing from
        # the end of the marker onward
        expanded_marker = decompress_raw(raw[end:],
                                         *marker)
        return produce_decompressed_message(
                # TODO read only after the expanded data
                raw[end + 1:],
                decom + raw[:start] + expanded_marker
                )

I've finished the first part.

import re

pattern = r'\((\d+)x(\d+)\)'


def get_marker_info(
        raw: str
        ):
    match = re.search(pattern, raw)
    if not match:
        return None, None, None
    else:
        return tuple([
                match.start(),
                match.end(),
                tuple(map(int, match.groups()))])


def decompress_raw(
        raw: str,
        index: int,
        repeats: int
        ) -> str:
    return raw[:index] * repeats


def produce_decompressed_message(
        raw: str,
        decom: str,
        ) -> str:
    if raw == '':
        return decom
    else:
        # get the starting and ending indices of marker;
        # as well as the market object (of the form
        # `(AxB)`, where A and B are arbitrary length ints,
        # A being the number of chars to repeat,
        # B being the number of repetitions)
        # start and end are needed;
        start, end, marker = get_marker_info(raw)
        if marker is None:
            return decom + raw
        # start decompressing from
        # the end of the marker onward
        expanded_marker = decompress_raw(raw[end:],
                                         *marker)
        return produce_decompressed_message(
                raw[end + marker[0]:],
                decom + raw[:start] + expanded_marker
                )


with open('input') as file:
    INPUT = file.read().strip()

print(len(produce_decompressed_message(INPUT, '')))

This week was not a particularly good one for regularity in programming, and I could definitely benefit from having a stricter way of iterating on my own work: it's hard to try tackling a problem, make some meaningful progress, only to come back a couple days later and having to reassess the situation all over again.


Second part

The second part is usually easily done by doing some minor refactors of the previous code, but this one is making me a little uneasy. It should not be so difficult, as essentially we're only introducing recursion, but the problem also changed slightly: there's no need to produce the message anymore, only its length, which, although simplifying the issue, is actually demanding some further changes in the code.

In essence, I'm afraid of having to substantially refactor the whole code, and this also shows that I'm struggling a bit with achieving recursion here. I could also benefit from some spare time to study other people's solutions, but that should be getting better soon.


I managed to advance the code during the work breaks, and it seems like there was a breakthrough. The example output is already correct; all it took was to transform the return expression into two parts; one is further expanding the first marker found, and the other is taking care of everything outside of the first marker found boundaries. In fact, this seems like a saner approach even for the first part of the problem.

def get_marker_info(
        raw: str
        ):
    match = re.search(pattern, raw)
    if not match:
        # NOTE could this be improved?
        return None, None, None
    else:
        return tuple([
                match.start(),
                match.end(),
                tuple(map(int, match.groups()))])


def decompress_raw(
        raw: str,
        index: int,
        rep: int
        ) -> str:
    return raw[:index] * rep


def produce_decompressed_message(
        raw: str,
        decom: str,
        ) -> str:
    if raw == '':
        return decom
    else:
        # get the starting and ending indices of marker;
        # as well as the market object (of the form
        # `(AxB)`, where A and B are arbitrary length ints,
        # A being the number of chars to repeat,
        # B being the number of repetitions)
        start, end, marker = get_marker_info(raw)
        if marker is None:
            return decom + raw
        # start decompressing from
        # the end of the marker onward
        expanded_marker = decompress_raw(raw[end:],
                                         *marker)
        # get the index of next token
        return (
                # explore the found marker
                produce_decompressed_message(
                    raw[:start] + expanded_marker,
                    decom
                    ) +
                # explore the rest
                produce_decompressed_message(
                    raw[end + marker[0]:],
                    ''
                    ))

As the exercise already hints at,

Unfortunately, the computer you brought probably doesn't have enough memory to actually decompress the file; you'll have to come up with another way to get its decompressed length.

and it seems like so; it just takes too long. So the second part does really entail two major changes in the code: first, to handle recursive calls; second, to just handle the final product length instead of taking the whole string.

When printing the result of each recursive function call input (the raw variable),

X(8x2)(3x3)ABCY ­– the X(8x2)(3x3)ABC string will be expanded in a separate call;

X(3x3)ABC(3x3)ABC – the (8x2) expanded (and the single X character came along...not necessary?); there are two subsequent recursive calls (for each occurrence of (3x3).

XABCABCABC – the (3x3) expanded; there are also no further markers.

(3x3)ABC – further expansion.

ABCABCABC – nothing to be done here.


2023-07-21-Friday

To start with, I fixed carrying over the everything that is to the left of the marker. When starting out with X(8x2)(3x3)ABCY, only (8x2)(3x3)ABC will be expanded in a separate call (everything from the Y onward is already on a separate call, too). This is a small optimisation, but one nevertheless.

From X(8x2)(3x3)ABCY to X(3x3)ABC(3x3)ABC, further work can be done. The string concatenation feels unnecessary – there is no need to keep its contents (everything that is not a marker) as we only care about the final string length.

X(8x2)(3x3)ABC is, in fact, 2 * 3 * len(ABC) – the quicker I can get to this transformation, the better. The answer lies in this piece of code:

        return (
                # explore the found marker
                produce_decompressed_message(
                    expanded_marker,
                    length + len(raw[:start])
                    ) +
                # explore the rest
                produce_decompressed_message(
                    raw[end + marker[0]:],
                    0
                    ))

Instead of passing expanded_marker – (3x3)ABC(3x3)ABC – I'll multiply the produce_decompressed_message call by 2 and send it the substring of length 8 (thus making use of the (8x2) marker). Thus, the first part of the sum is now

marker[1] * produce_decompressed_message(
                    raw[end:end + marker[0]],
                    length + len(raw[:start])
                    )

which is still not correct. In fact, the whole call should now be

length + len(raw[:start]) + marker[1] * produce_decompressed_message(
                raw[end:end + marker[0]],
                0)

which is very ugly! This further simplification makes evident that all of the produce_decompressed_message calls have now 0 as the passed length parameter, so it is not really necessary anymore (and, in the meantime, decompress_raw became obsolete). The code is now much better (and passing the exercise).

pattern = r'\((\d+)x(\d+)\)'


def get_marker_info(
        raw: str
        ):
    match = re.search(pattern, raw)
    if not match:
        # NOTE could this be improved?
        return None, None, None
    else:
        return tuple([
                match.start(),
                match.end(),
                tuple(map(int, match.groups()))])


def expand_raw(
        raw: str,
        ) -> str:
    if raw == '':
        return 0
    else:
        # get the starting and ending indices of marker;
        # as well as the market object (of the form
        # `(AxB)`, where A and B are arbitrary length ints,
        # A being the number of chars to repeat,
        # B being the number of repetitions)
        start, end, marker = get_marker_info(raw)
        if marker is None:
            return len(raw)
        return (len(raw[:start]) +
                marker[1] * expand_raw(raw[end:end + marker[0]])
                + expand_raw(raw[end + marker[0]:]))


with open('input') as file:
    INPUT = file.read().strip()

print(expand_raw(INPUT))

With a final refactor of the get_marker_info function,

import re
from collections import namedtuple

pattern = r'\((\d+)x(\d+)\)'

MarkerInfo = namedtuple('MarkerInfo',
                        [
                            'start',
                            'end',
                            'length',
                            'repetitions']
                        )


def get_marker_info(raw: str) -> MarkerInfo:
    match = re.search(pattern, raw)
    if match:
        length, repetitions = map(int, match.groups())
        return MarkerInfo(match.start(), match.end(),
                          length, repetitions)
    return None


def expand_raw(
        raw: str,
        ) -> str:
    if raw == '':
        return 0
    else:
        marker = get_marker_info(raw)
        if marker is None:
            return len(raw)
        else:
            start, end, length, reps = marker
        return (len(raw[:start]) +
                reps * expand_raw(raw[end:end + length])
                + expand_raw(raw[end + length:]))


with open('input') as file:
    INPUT = file.read().strip()

print(expand_raw(INPUT))