total number of increasing strings of length n

 20th March 2024 at 5:37pm

Suppose wanting to count the amount of strings of length n that are non-decreasing.

eg: abc, aaa, abb qualify; cba doesn't.

Well, well, well. This is a problem very much akin to what Project Euler usually sets up, in the sense that there is a possible programming approach (one that I tried, naively at first, until realising this is much better accomplished with some dynamic programming (with the help of Phind):

def count_non_decreasing_strings(n):
    # Initialize a 2D list to store the counts of non-decreasing strings ending with each character
    dp = [[0 for _ in range(n+1)] for _ in range(26)] # Assuming lowercase English letters
    
    # Base case: a single character string is always non-decreasing
    for i in range(26):
        dp[i][1] = 1
    
    # Fill the table in a bottom-up manner
    for length in range(2, n+1):
        for last_char in range(26):
            # Sum of all strings of length length-1 ending with a character <= last_char
            for prev_char in range(last_char+1):
                dp[last_char][length] += dp[prev_char][length-1]
    
    # The total count of non-decreasing strings of length n is the sum of all counts for the last character
    total_count = sum(dp[i][n] for i in range(26))
    return total_count

...but, of course, there is a much simpler, quicker solution, relying on the mathematics of combinatorics.


Mathematics establishes the concept of a set, which is an unordered assortment of elements that do not repeat (eg. {1,3,2}\{1, 3, 2\} is a set — the same set as {1,2,3}\{1, 2, 3\} — but {1,2,2}\{1, 2, 2\} is not a set); but there's also the multiset, which allows for repetition (where {1,2,2}\{1, 2, 2\} is now a valid multiset).

With this in mind, this problem can be modelled by considering combinations with repetitions. Let SS be a set of 2626 elements — think of abcd...xyz — from which one can take nn elements in a sequence, putting back (or possibly taking multiple) of each element. Thus, abab is valid for n=4n = 4.

The hardest part of modelling in this way, I suppose, is wrangling the utility of a multiset (unordered by nature) in representing our ordered strings. See it this way: retrieve all multisets of 33 letters: {a,b,c}\{a, b, c\} is there, and so is {b,c,c}\{b, c, c\} — all good so far — but {z,d,c}\{z, d, c\} is also there. But a multiset is unordered, and there is only one non-decreasing string with {z,d,c}\{z, d, c\} as elements: cdz. Same thought process for {b,c,c,a,b}\{b, c, c, a, b\}: abbcc. So each multiset maps to exactly one non-decreasing string, transforming the counting of strings into the counting of possible multisets...which is given by nn multichoose kk, in the following binomial (26+n126)\binom{26 + n - 1}{26}