First part
Some two days have passed without barely any possibility of proceeding with the programming. We have one less room in the house, as it is undergoing some renovation works, and we have been having some electricity problems as well: there is only one outlet. With all of this, I feel like my progress has been a little setback and I'd like to keep going steady and get back on track; thus, I think I'll be using Phind a little more aggressively.
As for the problem at hand, it seems simple enough: something like
def produce_decompressed_message(
raw: str,
decom: str,
) -> str:
if raw == '':
return decom
else:
# get the starting and ending indices of marker;
# as well as the market object (of the form
# `(AxB)`, where A and B are arbitrary length ints)
start, end, marker = get_marker_info(raw)
expanded_marker = decompress_raw(marker)
return produce_decompressed_message(
raw[end + 1:],
decom + raw[:start] + expanded_marker
)
feels like it could work (at least, without any pen and paper or further reasoning, I don't immediately see a failing point).
of course, there were some issues when translating this into proper code. It's quite late, and I won't go into deep writings now, and hopefully Alex of the future will make some sense of this just from the comments alone. I think there is only one issue, properly signaled with the TODO
tag:
import re
pattern = r'\((\d+)x(\d+)\)'
def get_marker_info(
raw: str
):
match = re.search(pattern, raw)
if not match:
return None, None, None
else:
return tuple([
match.start(),
match.end(),
tuple(map(int, match.groups()))])
def decompress_raw(
raw: str,
index: int,
repeats: int
) -> str:
return raw[:index] * repeats
def produce_decompressed_message(
raw: str,
decom: str,
) -> str:
if raw == '':
return decom
else:
# get the starting and ending indices of marker;
# as well as the market object (of the form
# `(AxB)`, where A and B are arbitrary length ints)
# start and end are needed;
# consider the case where the decompressed is
# smaller than the marker?
start, end, marker = get_marker_info(raw)
if marker is None:
return decom + raw
# start decompressing from
# the end of the marker onward
expanded_marker = decompress_raw(raw[end:],
*marker)
return produce_decompressed_message(
# TODO read only after the expanded data
raw[end + 1:],
decom + raw[:start] + expanded_marker
)
I've finished the first part.
import re
pattern = r'\((\d+)x(\d+)\)'
def get_marker_info(
raw: str
):
match = re.search(pattern, raw)
if not match:
return None, None, None
else:
return tuple([
match.start(),
match.end(),
tuple(map(int, match.groups()))])
def decompress_raw(
raw: str,
index: int,
repeats: int
) -> str:
return raw[:index] * repeats
def produce_decompressed_message(
raw: str,
decom: str,
) -> str:
if raw == '':
return decom
else:
# get the starting and ending indices of marker;
# as well as the market object (of the form
# `(AxB)`, where A and B are arbitrary length ints,
# A being the number of chars to repeat,
# B being the number of repetitions)
# start and end are needed;
start, end, marker = get_marker_info(raw)
if marker is None:
return decom + raw
# start decompressing from
# the end of the marker onward
expanded_marker = decompress_raw(raw[end:],
*marker)
return produce_decompressed_message(
raw[end + marker[0]:],
decom + raw[:start] + expanded_marker
)
with open('input') as file:
INPUT = file.read().strip()
print(len(produce_decompressed_message(INPUT, '')))
This week was not a particularly good one for regularity in programming, and I could definitely benefit from having a stricter way of iterating on my own work: it's hard to try tackling a problem, make some meaningful progress, only to come back a couple days later and having to reassess the situation all over again.
Second part
The second part is usually easily done by doing some minor refactors of the previous code, but this one is making me a little uneasy. It should not be so difficult, as essentially we're only introducing recursion, but the problem also changed slightly: there's no need to produce the message anymore, only its length, which, although simplifying the issue, is actually demanding some further changes in the code.
In essence, I'm afraid of having to substantially refactor the whole code, and this also shows that I'm struggling a bit with achieving recursion here. I could also benefit from some spare time to study other people's solutions, but that should be getting better soon.
I managed to advance the code during the work breaks, and it seems like there was a breakthrough. The example output is already correct; all it took was to transform the return
expression into two parts; one is further expanding the first marker found, and the other is taking care of everything outside of the first marker found boundaries. In fact, this seems like a saner approach even for the first part of the problem.
def get_marker_info(
raw: str
):
match = re.search(pattern, raw)
if not match:
# NOTE could this be improved?
return None, None, None
else:
return tuple([
match.start(),
match.end(),
tuple(map(int, match.groups()))])
def decompress_raw(
raw: str,
index: int,
rep: int
) -> str:
return raw[:index] * rep
def produce_decompressed_message(
raw: str,
decom: str,
) -> str:
if raw == '':
return decom
else:
# get the starting and ending indices of marker;
# as well as the market object (of the form
# `(AxB)`, where A and B are arbitrary length ints,
# A being the number of chars to repeat,
# B being the number of repetitions)
start, end, marker = get_marker_info(raw)
if marker is None:
return decom + raw
# start decompressing from
# the end of the marker onward
expanded_marker = decompress_raw(raw[end:],
*marker)
# get the index of next token
return (
# explore the found marker
produce_decompressed_message(
raw[:start] + expanded_marker,
decom
) +
# explore the rest
produce_decompressed_message(
raw[end + marker[0]:],
''
))
As the exercise already hints at,
Unfortunately, the computer you brought probably doesn't have enough memory to actually decompress the file; you'll have to come up with another way to get its decompressed length.
and it seems like so; it just takes too long. So the second part does really entail two major changes in the code: first, to handle recursive calls; second, to just handle the final product length instead of taking the whole string.
When printing the result of each recursive function call input (the raw
variable),
X(8x2)(3x3)ABCY
– the X(8x2)(3x3)ABC
string will be expanded in a separate call;
X(3x3)ABC(3x3)ABC
– the (8x2)
expanded (and the single X
character came along...not necessary?); there are two subsequent recursive calls (for each occurrence of (3x3)
.
XABCABCABC
– the (3x3)
expanded; there are also no further markers.
(3x3)ABC
– further expansion.
ABCABCABC
– nothing to be done here.
To start with, I fixed carrying over the everything that is to the left of the marker. When starting out with X(8x2)(3x3)ABCY
, only (8x2)(3x3)ABC
will be expanded in a separate call (everything from the Y
onward is already on a separate call, too). This is a small optimisation, but one nevertheless.
From X(8x2)(3x3)ABCY
to X(3x3)ABC(3x3)ABC
, further work can be done. The string concatenation feels unnecessary – there is no need to keep its contents (everything that is not a marker) as we only care about the final string length.
X(8x2)(3x3)ABC
is, in fact, 2 * 3 * len(ABC)
– the quicker I can get to this transformation, the better. The answer lies in this piece of code:
return (
# explore the found marker
produce_decompressed_message(
expanded_marker,
length + len(raw[:start])
) +
# explore the rest
produce_decompressed_message(
raw[end + marker[0]:],
0
))
Instead of passing expanded_marker
– (3x3)ABC(3x3)ABC
– I'll multiply the produce_decompressed_message
call by 2
and send it the substring of length 8
(thus making use of the (8x2)
marker). Thus, the first part of the sum is now
marker[1] * produce_decompressed_message(
raw[end:end + marker[0]],
length + len(raw[:start])
)
which is still not correct. In fact, the whole call should now be
length + len(raw[:start]) + marker[1] * produce_decompressed_message(
raw[end:end + marker[0]],
0)
which is very ugly! This further simplification makes evident that all of the produce_decompressed_message
calls have now 0
as the passed length
parameter, so it is not really necessary anymore (and, in the meantime, decompress_raw
became obsolete). The code is now much better (and passing the exercise).
pattern = r'\((\d+)x(\d+)\)'
def get_marker_info(
raw: str
):
match = re.search(pattern, raw)
if not match:
# NOTE could this be improved?
return None, None, None
else:
return tuple([
match.start(),
match.end(),
tuple(map(int, match.groups()))])
def expand_raw(
raw: str,
) -> str:
if raw == '':
return 0
else:
# get the starting and ending indices of marker;
# as well as the market object (of the form
# `(AxB)`, where A and B are arbitrary length ints,
# A being the number of chars to repeat,
# B being the number of repetitions)
start, end, marker = get_marker_info(raw)
if marker is None:
return len(raw)
return (len(raw[:start]) +
marker[1] * expand_raw(raw[end:end + marker[0]])
+ expand_raw(raw[end + marker[0]:]))
with open('input') as file:
INPUT = file.read().strip()
print(expand_raw(INPUT))
With a final refactor of the get_marker_info
function,
import re
from collections import namedtuple
pattern = r'\((\d+)x(\d+)\)'
MarkerInfo = namedtuple('MarkerInfo',
[
'start',
'end',
'length',
'repetitions']
)
def get_marker_info(raw: str) -> MarkerInfo:
match = re.search(pattern, raw)
if match:
length, repetitions = map(int, match.groups())
return MarkerInfo(match.start(), match.end(),
length, repetitions)
return None
def expand_raw(
raw: str,
) -> str:
if raw == '':
return 0
else:
marker = get_marker_info(raw)
if marker is None:
return len(raw)
else:
start, end, length, reps = marker
return (len(raw[:start]) +
reps * expand_raw(raw[end:end + length])
+ expand_raw(raw[end + length:]))
with open('input') as file:
INPUT = file.read().strip()
print(expand_raw(INPUT))