SSP-MMC Algorithm

SSP: stochastic shortest path

MMC: minimize memorization cost

In traditional spaced repetition algorithms, reviews are endless. In this paper, we remove items from the review schedule once their stability reaches a certain threshold. This introduces a new problem to solve: how to design a review scheduling algorithm that minimizes the expected cost (such as time) of reviewing items.

Symbols：
- $S_0$ - initial stability
- $S_{N}$ - target stability
- $m_k \sim Bernoulli(2^{-\Delta t_k / h_k})$ - result of review（recall = 1，forget = 0）
- $f(S_k,\Delta t_k,m_k) = S_k \cdot SInc[S_k,\Delta t_k] \cdot m_k + S_0 \cdot (1 - m_k)$ - transition equation of stability
  - $SInc[]$ - Calculate the next stability based on the current stability and the next review interval.
- $g(S_k,\Delta t_k,m_k) = a \cdot m_k + b \cdot (1-m_k)$ - cost function of review
  - $a$ - cost of recall
  - $b$ - cost of forget
- $J_\pi(S_0)$ - total cost of review

Given the random nature of stability transitions and the existence of a target stability threshold, this problem can be reformulated as a stochastic shortest path problem.

Bellman's equation:

$J(S_k) = \min\limits_{\Delta t_k \in \Delta T(S_k)} E[g(S_k,\Delta t_k,m_k) + J(f(S_k,\Delta t_k,m_k))]$

Iteration equation:

$J(S_k) = \min\limits_{\Delta t_k \in \Delta T(S_k)} [e^{\ln 0.9 \times \Delta t_k / S_k} \cdot (a + J(h_k \cdot SInc[S_k,\Delta t_k])) + (1-e^{\ln 0.9 \times \Delta t_k / S_k}) \cdot (b + J(S_0))]$

Using this equation, we can obtain an iterative solution through dynamic programming. However, since $S$ is a continuous value, it's not ideal for recording states. To address this, we can discretize it.