🤖 AI Summary
This work addresses the finite-block Pareto optimality of injective source coding under the constraint of the four-symbol restricted Markov source introduced by Dalai and Leonardi. By introducing an information-cost function, the authors impose a total order on admissible strings based on information cost, length, and lexicographic order, and construct a canonical injective coding scheme that assigns binary codewords via short lexicographic ordering. This scheme achieves, for the first time on a constrained Markov source, a finite-block average codeword length strictly superior to classical invertible codes: for block lengths \(n \geq 2\), the expected code length satisfies \(\mathbb{E}[|C(X_1^n)|] < 3n/2\) and adheres to the asymptotic upper bound \(\mathbb{E}[|C(X_1^n)|] \leq 3n/2 - c/\sqrt{n}\) for some constant \(c > 0\). Consequently, the Dalai–Leonardi code is shown not to be Pareto optimal.
📝 Abstract
We study a Pareto-type notion of finite-block optimality for injective source codes, where two codes are compared through the full sequence of expected block lengths. As a concrete and fully analyzable test case, we revisit the four-symbol constrained Markov source introduced by Dalai and Leonardi in their "meaningful example'' on constrained-source decodability. For each admissible nonempty string $u=x_1^m \in \mathscr{A} \subset \mathscr{X}^+$, let $$ K(u):=-\log_2 \mathbb{P}(X_1^m=u) $$ denote its information cost. We construct a canonical injective binary mapping $C:\mathscr{A} \to \{0,1\}^+$ by ordering admissible strings by increasing $K(u)$, then by length and lexicographic order, and assigning binary strings in shortlex order. For the length-$n$ block $X_1^n$ we prove $$ \mathbb{E}[|C(X_1)|]=\tfrac32, \qquad \mathbb{E}[|C(X_1^n)|]<\tfrac32\,n\quad (n\ge 2). $$ Moreover, for every fixed $$ 0<c<\frac{\sqrt2}{18\sqrtπ} $$ we have $$ \mathbb{E}[|C(X_1^n)|]\le \tfrac32\,n-\frac{c}{\sqrt n} $$ for all sufficiently large $n$. Thus, for this source, the reversible Dalai-Leonardi code is not Pareto-optimal with respect to finite-block average length. The proof is based on an exact enumeration of admissible strings by information cost and on a shortlex gap identity implying that each cost class splits evenly between lengths $K(u)-1$ and $K(u)$. The example is simple, but it already exhibits the kind of finite-block Pareto comparison that seems natural for injective source coding under source constraints.