43
$\begingroup$

Suppose that at the beginning there is a blank document, and a letter "a" is written in it. In the following steps, only the three functions of "select all", "copy" and "paste" can be used.

Find the minimum number of steps to reach at least $100,000$ a's (each of the three operations of "select all", "copy" and "paste" is counted as one step). If the target number is not specified, and I want to get the exact amount of a, is there a general formula?

This is a fascinating question. My friend and I discussed it for a long time that day but to no avail.

What I started thinking about was - If the steps of "select all", "copy" and "paste" are roughly counted as one step. Each step makes the number $\times2$, so it is a geometric progression with a common difference of 2.

Let $a_{n}≥100000 \: (n\in \mathbb{N})$, where $a_{1}=1$.

According to the general formula of geometric progression: $a_{n}=a_{1}\times q^{n-1}$

We can get: $n=17$

So if the three operations of "select all", "copy" and "paste" are each counted as one step, there are a total of $17×3=51$ steps

But this ignores a problem: can we paste all the time?

So this seems to be an interesting optimization problem, and we need to find a strategy to minimize the number of steps from one "a" to one hundred thousand "a".

  1. Select all + copy + paste: These three operations double the number of "a". If there are currently $n$ "a", then there will be $2n$ "a" after the operation.

  2. Paste: This operation will add the number of "a" equal to the clipboard content. If there are $k$ "a"s in the clipboard, then there will be $(n+k)$ "a"s after the operation.

We define a function $f(n)$ that represents the minimum number of steps required to reach $n$ "a". Initially, we have one "a", so $f(1) = 0$.

If we choose the doubling operation, $f(2n) = f(n) + 3$.

If we choose the paste operation, then $f(n+k) = f(n) + 1$, where $k$ is the number of "a"s in the clipboard.

Then I started to get confused because I realized that it seemed that every step was facing optimization, and it seemed to be complicated. I started to use function comparison to get $n=14$ as the minimum value, but I realized that it was only optimized once.

Thank you for your help.

New contributor
Frank is a new contributor to this site. Take care in asking for clarification, commenting, and answering. Check out our Code of Conduct.
$\endgroup$
7
  • 1
    $\begingroup$ If the problem was changed to $~10~$ a's, instead of $~100,000~$ a's, and if I have understood the question correctly, you would need $~3~$ operations to go to $~2^3 = 8~$ a's, and then one further operation to append $~(10 - 8) = 2~$ a's. $\endgroup$ Commented Jun 29 at 4:55
  • 2
    $\begingroup$ Related: math.stackexchange.com/q/3852616/42969 $\endgroup$
    – Martin R
    Commented Jun 29 at 5:04
  • 10
    $\begingroup$ You should clarify the rules. For example, I would expect that select/copy/paste/paste is needed to double the input as the first paste would replace the still selected text with itself (or rather its copy). $\endgroup$
    – Carsten S
    Commented 2 days ago
  • 2
    $\begingroup$ I'm sure there must be a dynamic programming solution based on the number of a's and what's in the buffer. $\endgroup$
    – qwr
    Commented yesterday
  • 2
    $\begingroup$ related post on codegolf.SE $\endgroup$
    – emanresu A
    Commented yesterday

3 Answers 3

35
$\begingroup$

S=select all
C=copy
P=paste

SS, SP, CS, PC or CC don't make sense, of course. So after a S there must be a C, after a C there must be a P, and after a P there must be a P or an S.

SC$k$P is SC and $k$ pastes, for example SCPPPP=SC$4$P.

SC$k$P is $k+2$ steps and multiplies the number of characters by $k+1$.

So, after $a_1$ SC$1$P, $a_2$ SC$2$P, etc, the number of characters is $$\prod_{k=1}^\infty (k+1)^{a_k}$$ and the number of steps is $$\sum_{k=1}^\infty (k+2)a_k$$

Note that only a finite number of $a_k$ are positive.

Roughly, SC$k$P multiplies the number of characters by $\sqrt[k+2]{k+1}$ for each step. The maximum of $\sqrt[k+2]{k+1}$ for integer $k$ is at $k=3$, but $k=2$ is very close to it. (For real $k$ it's not $e$, just in case you are guessing). So we'd rather choose SC$2$P or SC$3$P whenever it is possible.

With some try-error I have found that $$3^3\cdot 4^6=110592>10^5$$ which yields $3\cdot 4+6\cdot 5=42$ steps.

On the other hand, to get exactly 100,000 characters, since $10^5=2\cdot 4^2\cdot 5^5$, the number of steps is $$3\cdot1+5\cdot 2+6\cdot 5=43$$

I don't know if $41$ or less steps is possible, but I don't think so.

EDIT
To show an example to @badjohn, if you want to get exactly 100,001 characters, you should facttorize $100001=11\cdot 9091$, so the best way to get it is 1SC$10$P and 1SC$9090$P, that is, $12+9092=9104$ steps.

For 99,999 it is $99999=3^2\cdot 41\cdot 271$. Since $3^2$ is relatively small, we try

  • 2SP2C+1SP40C+1SP270C$=8+42+272=302$ steps.
  • 1SP8C+1SP40C+1SP270C$=10+42+272=304$ steps.

EDIT2

The formula of the minimal steps to get exactly $n$ characters is as follows:

Factorize $n$ this way: $$n=2^{\alpha_2}4^{\alpha_4}\prod_{p\text{ odd prime}}p^{\alpha_p},$$ where $a_2$ must be $0$ or $1$ and $a_k\ge 0$ for $k\ge 3$.

Then the minimal number of steps is $$\sum_{p=4\text{ or $p$ is prime}}\alpha_p(p+1)$$


Some notes:

1. Finding the maximum of $\sqrt[k+2]{k+1}$ with calculus involves a non-elementary equation that I solved with sofware. Fortunately, it is not essential, it's just a hint.

2. The essential points of my reasoning is that that the minimal number of steps $S(n)$ to get exactly $n$ characters, depends only on some factorization of $n$, that involves $4$, odd prime numbers and $2$ only if it's needed.

The number of steps for some factorization (not necessarily with prime factors) $$\prod_{n}n^{\alpha_n}$$ is $$\sum_{n}(n+1)\alpha_n$$

To get the best factorization we can begin from the prime factorization and then we can see if 'combining' some factors, we can get a better one.

Combining a factor $p^{\alpha}$ with a factor $q^{\alpha}$, ($p,q\ge 2$, of course) ,we get the factor $(pq)^{\alpha}$. The number of steps for that factors goes from $$\alpha(p+q+2)$$ to $$\alpha(pq+1)$$ so the question is if $p+q+2$ is greater or lesser than $pq+1$. Combining gets lesser number of steps iff $p+q+2>pq+1$.

So assume for example that $p+q+2>pq+1$. The following inequalities are equivalent to this one: $$p+q+1>pq$$ $$p(q-1)<q+1$$ Since $q-1>0$, $$p<\frac{q+1}{q-1}=1+\frac2{q-1}$$ This is true only for $p=q=2$, so the 'combining' gets better results only to make a $4$ from two $2$'s.

That is, SC3P is better than 2SCP. But repeating P instead of making a new SC is worse in every other situation. Or equal for SCPPSCP and SC5P. (That is, to multiply the number of charaters by 6 you must make 7 steps, no matter how).

3. The problem is much, much harder if you want to get at least $n$ characters, because it involves the factorization of every number $\ge n$.

$\endgroup$
3
  • 4
    $\begingroup$ My guess would be that a greedy algorithm of choosing SCPP repeatedly until you reach some low factor that's left, and then choosing remnants appropriately at that point, should work for the problem of getting at least $n$ characters. For example: $SCP^k$ can never appear in an optimal solution for any $k > 7$, or you can replace it with $SCP^{k-4}SCP$ to replace $\cdot (k+1)$ with $\cdot (k-3) \cdot 2$, where $2(k-3) > k+1$ and the replacement is shorter. Likewise, no optimal solution can have two SCP blocks, or you could replace it with SCPPP for the same effect in shorter length, etc. $\endgroup$ Commented Jun 29 at 20:10
  • 2
    $\begingroup$ The point being that with enough analysis of that sort, you should be able to prove that for $n$ greater than some relatively small value, there exists an optimal solution with an SCPP block; and then you can recursively solve the subproblem for $n/3$ in place of $n$. $\endgroup$ Commented Jun 29 at 20:25
  • 1
    $\begingroup$ Yes, you are correct. According to what you mentioned, 110592 is the number of a's over 100000 that can result in the smallest total number of steps. "at least" will keep the answer the same in a certain range, which makes it complicated to find a general formula. Thus, finding the exact number of "a" can more conveniently express the general formula. Thank you very much:)You're excellent! $\endgroup$
    – Frank
    Commented 2 days ago
23
$\begingroup$

Elaborating on the answer by @ajotatxe we can actually show what is the minimum number of steps needed.

As he mentioned, the only sensible moves are $M_k : =\textrm{SC}$k$\textrm{P}$ , that is "select all, copy and then paste $k$ times" where $k \ge 1$.

Each time we apply $M_p$, we multiply the number by $(p+1)$ and use $(p+2)$ steps. Let us call $C(M_p) = p+2$ the cost of the move and $U(M_p) = p+1$ the utility of the move. For a sequence of moves $\bar{M} = (M_{p_1}, \ldots, M_{p_k})$, the cost and utility functions are respectively $$ C(\bar{M} ) = \sum_{j=1}^k (p_j +2) , \ \ \ U(\bar{M}) = \prod_{j=1}^k (p_j+1) $$ Let us say that a sequence of moves $\bar{M}$ is worse than another sequence $\bar{N}$, denoted $\bar{M} \prec \bar{N}$, if $C(\bar{M}) \ge C(\bar{N})$ and $U(\bar{M}) \le U(\bar{N})$. In practice, we will almost always compare sequences with the same cost. A sequence of moves $\bar{M}$ is called optimal if it is maximal with respect to $\prec$. On a side, note that $\prec$ is only a preorder, that is we can have different sequences with equal utility and cost.

I claim that the only sensible moves are $M_1, M_2, M_3, M_4$. I will show this fact by the following Lemma. If a sequence of moves $\bar{M}$ contains $M_k$ with $k \ge 5$, there exists $\bar{M} \prec \bar{N} $ with $\bar{N}$ not containing $M_k$ for $k \ge 5$.

Proof. Since composing moves is multiplicative in the utility and additive in the cost, it is enough to show that $M_k$ for $k\ge 5$ is worse than a sequence of moves only made of $M_1, M_2, M_3, M_4$. For $k =5$ we have $M_5 \preceq (M_2, M_1)$ , while for $k \ge 6$ I claim that $M_k \prec (M_{k-5}, M_3)$. Indeed, $k+1 \le 4k - 16 $ whenever $k \ge 17/3 \approx 5.67$.

The second observation is that $M_3$ has the best utility-per-cost; this can be seen by comparing the factor $\sqrt[k+2]{k+1}$ for $k=1,2,3,4$: $$ \alpha_1 = \sqrt[3]{2} \approx 1.26, \ \ \ \alpha_2 = \sqrt[4]{3} \approx 1.316, \ \ \ \alpha_3 = \sqrt[5]{4} \approx 1.319, \ \ \ \alpha_4 =\sqrt[6]{5} \approx 1.308$$ As a result, we have the following:

Lemma. An optimal sequence contains $M_1$ and $M_4$ at most once and $M_2$ at most four times. Furthermore, $M_2$ and $M_4$ cannot appear together. In particular, all but 5 moves in an optimal sequence are $M_3$ moves.

Proof. Suppose by contradiction that $M_2$ appears five times in an optimal sequence. Note that $$ C(M_2, M_2, M_2, M_2, M_2) = 20 = C(M_3, M_3, M_3, M_3) $$ The utility of $M_2$ five times is $U(M_2)^5 = \alpha_2^{20}$, while the utility of $M_3$ repeated $4$ times is $\alpha_3^{20}$, concluding the argument.

To prove that $M_4$ and $M_1$ can appear at most once, note that $(M_4, M_4) \prec (M_2, M_2, M_2)$ and $(M_1, M_1) \prec M_4$. The last claim follows from $(M_2, M_4) \prec (M_3, M_3)$.

Lastly, note that $(M_1, M_2, M_2) \prec (M_4, M_3)$ allows for only one $M_2$ when $M_1$ appears. As a result, the only possible optimal, non-$M_3$ moves are the following $7$ moves: $$M_1, \ \ \ M_1 M_2, \ \ \ M_2, \ \ \ M_2 M_2, \ \ \ M_2 M_2M_2, \ \ \ M_2 M_2 M_2 M_2 M_2, \ \ \ M_4 $$ It is possible to show, with the help of a simple program, that these are indeed optimal sequences of moves.

We are ready to show the final theorem:

Theorem. The minimal number of steps $S(n)$ to copy&paste a text $n$ times with select-all, copy and paste moves is given by $$ S(n) = \min_{i=1, \ldots, 8} 5 \lceil \log_4(n/u_i) \rceil+c_i $$ where $(c_i, u_i)$ varies in the set $$ I = \{ (1,1), (3,2), (7,6), (4,3), (8,9), (12,27), (16,81), (6,5) \} $$

Proof. The proof is a direct consequence of the above argument. The set $I$ is the set of cost-utility coordinates for the 7 above move sequences, plus an eight element representing the empty move (corresponding to only using $M_3$). If we use a starting sequence with cost-utility $(c,u)$ and then apply $k$ times $M_3$, we get a total cost-utility of $(c+5k, u \cdot 4^k)$. Imposing $u \cdot 4^k \ge n$ we get $k \ge \log_4(n/u)$, so that the smallest integer solution is $k_* = \lceil \log_4(n/u) \rceil$. Substituting into the cost we obtain the claimed formula.

As an application, for $n= 100,000$ we get the best value is obtained with starting move $M_2 M_2 M_2$, and the associated number of steps is... 42, the Answer to the Ultimate Question of Life, The Universe, and Everything!! My compliments to @ajotatxe for having guessed the answer by trial and errors.

To the OP: in hindsight, it's no surprise you and your friend had a hard time finding the right answer. Thanks for the very nice problem!

$\endgroup$
2
  • 1
    $\begingroup$ Great answer! We could shorten the proof somewhat, at the cost of making the resulting formula harder to compute, by only finding for each $M_k$ the threshold of how many times we maximally can use it before we'd be better off using $M_3$ instead. I especially like how the answer format follows the standard greedy proof schema "Exchange argument" (or if one is familiar with it, the schema of a kernel reduction). $\endgroup$
    – ConnFus
    Commented 2 days ago
  • 1
    $\begingroup$ By the "$\alpha$ argument", every $\M_k$ can be used at most four times. This is a consequence of the fact that $k=3$ realized the maximum of utility-per-cost function ($\sqrt[k+2](k+1)$). But I would also exclude $M_k$ for $k\ge 5$ , that's pretty easy and 'general'. Note that the same schema gives an answer to the variations where SCkP has a different (linear) cost, e.g. the first time you paste you overwrite the text. $\endgroup$ Commented 2 days ago
12
$\begingroup$

The other answers are nice but I thought I'd like to comment on how to computationally verify the answer: One can use breadth-first search on a graph where each node represents document state. The operations SELECT, COPY and PASTE allows us to move from one node to another in this graph. The below C++ program implements this algorithm.

#include <iostream>
#include <queue>

enum Mode
{
    SELECT,
    COPY,
    PASTE
};

struct Node
{
    int noOfAs;
    int steps;
    int noOfAsCopied;
    Mode mode;
};

int main()
{
    std::queue<Node> q;

    q.push({1, 0, 0, SELECT});

    while (!q.empty())
    {
        Node n = q.front();
        q.pop();

        if (n.noOfAs >= 100000)
        {
            std::cout << n.steps << std::endl;
            break;
        }

        switch (n.mode)
        {
        case SELECT:
            q.push({n.noOfAs, n.steps + 1, n.noOfAsCopied, COPY});
            break;
        case COPY:
            q.push({n.noOfAs, n.steps + 1, n.noOfAs, PASTE});
            break;
        case PASTE:
            q.push({n.noOfAs, n.steps, n.noOfAsCopied, SELECT});
            q.push({n.noOfAs + n.noOfAsCopied, n.steps + 1, n.noOfAsCopied, PASTE});
            break;
        }
    }

    return 0;
}

Running this program will display the output 42, which is the minimum number of operations for the document to be filled with at least 100000 a's.

$\endgroup$
4
  • $\begingroup$ I can 2nd this using Mathematica. Since no one has said it, the solution is (SCP3)6(SCP4)2. $\endgroup$ Commented 2 days ago
  • $\begingroup$ I'm sure there must be a dynamic programming solution based on the number of a's and what's in the buffer. $\endgroup$
    – qwr
    Commented yesterday
  • $\begingroup$ 42 is the answer yet again... $\endgroup$
    – IronEagle
    Commented yesterday
  • $\begingroup$ @FarSeenNomic Or (SCP3)6(SCP2)3 which also takes 42 steps and adds a few more a's. $\endgroup$
    – causative
    Commented yesterday

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .