Skip to article frontmatterSkip to article content

Channel Cutoff Rate

The Design of Coded Modulation

The design of coded modulation integrates coding and modulation techniques to achieve efficient and reliable transmission of information over a communication channel.

Two fundamental approaches guide this design:

These two approaches complement each other: the algebraic approach provides concrete, implementable solutions, while the probabilistic approach offers insight into theoretical best-case performance, informing the trade-offs between complexity, rate, and reliability in coded modulation design.

The Error Probability of a Memoryless Channel

Consider a memoryless channel with an input alphabet X\mathcal{X} (e.g., discrete symbols like {0,1}\{0,1\} or continuous values like R\mathbb{R}) and an output alphabet Y\mathcal{Y} (which can also be discrete or continuous).

The channel is characterized by a conditional probability density function (PDF) or probability mass function (PMF) p(yx)p(y \mid x), specifying the likelihood of receiving output yYy \in \mathcal{Y} given input xXx \in \mathcal{X}.

The memoryless property implies that the channel’s output at time ii depends only on the input at time ii, with no dependence on prior inputs or outputs.

For sequences of length nn, x=(x1,,xn)\vec{x} = (x_1, \ldots, x_n) and y=(y1,,yn)\vec{y} = (y_1, \ldots, y_n), the joint conditional probability factors as

p(yx)=i=1np(yixi).\boxed{ p(\vec{y} \mid \vec{x}) = \prod_{i=1}^n p(y_i \mid x_i). }

In a coded system, not all of the possible Xn\lvert \mathcal{X}\rvert^n input sequences of length nn are used.

Instead, a subset of M=2kM = 2^k sequences is selected, called codewords, where k=nRk = n R and RR is the rate in bits per symbol (assuming a binary information source).

Denote these codewords by x1,x2,,xM\vec{x}_1, \vec{x}_2, \ldots, \vec{x}_M, each a unique nn-tuple from Xn\mathcal{X}^n.

The choice of MM balances data rate and error protection: a smaller MM increases the distance between codewords (reducing errors) but decreases the rate, while a larger MM does the opposite.

Error Probability With Maximum-Likelihood Detection

When codeword xm\vec{x}_m is transmitted over the memoryless channel, the receiver observes y\vec{y} and performs maximum-likelihood (ML) detection to decide which codeword was sent.

Under ML detection, the receiver chooses xm\vec{x}_{m'} that maximizes p(yxm)p(\vec{y}\mid \vec{x}_{m'}), i.e.,

m^=argmaxmp(yxm).\hat{m} = \arg \max_{m'} p(\vec{y}\mid \vec{x}_{m'}).

An error occurs if m^m\hat{m} \neq m.

The error probability given that xm\vec{x}_m is sent, denoted PemP_{e\mid m}, is the probability that y\vec{y} falls into the decision region of any other codeword:

Pem=P[m^mxm sent]=m=1mmMP[yDmxm sent],\boxed{ P_{e\mid m} = P\bigl[\hat{m} \neq m \big\vert \vec{x}_m \text{ sent}\bigr] = \sum_{\substack{m' = 1 \\ m' \neq m}}^M P\bigl[\vec{y} \in D_{m'} \big\vert \vec{x}_m \text{ sent}\bigr], }

where DmD_{m'} is the decision region for xm\vec{x}_{m'}, defined by

Dm={y:p(yxm)>p(yxm),mm}.D_{m'} = \{\vec{y} : p(\vec{y}\mid \vec{x}_{m'}) > p(\vec{y}\mid \vec{x}_{m''}), \forall m'' \neq m'\}.

Union Bound

To bound PemP_{e \mid m}, consider the pairwise error event between xm\vec{x}_m and xm\vec{x}_{m'}.

Define the pairwise decision region DmmD_{mm'} where xm\vec{x}_{m'} is more likely than xm\vec{x}_m:

Dmm={y:p(yxm)>p(yxm)}.D_{mm'} = \{\vec{y} : p(\vec{y}\mid \vec{x}_{m'}) > p(\vec{y}\mid \vec{x}_m)\}.

An upper bound on PemP_{e\mid m} is then

Pemm=1mmMP[yDmmxm sent],\boxed{ P_{e\mid m} \le \sum_{\substack{m' = 1 \\ m' \neq m}}^M P\bigl[\vec{y}\in D_{mm'} \big\vert \vec{x}_m \text{ sent}\bigr], }

which is the union bound, typically an overestimate because DmmD_{mm'} includes points where p(yxm)p(\vec{y}\mid \vec{x}_{m'}) exceeds p(yxm)p(\vec{y}\mid \vec{x}_m), even if a third codeword xm\vec{x}_{m''} might have an even higher likelihood.

Log-Likelihood Ratio

Define the log-likelihood ratio:

Zmm=ln[p(yxm)p(yxm)].Z_{mm'} = \ln \Bigl[\frac{p(\vec{y}\mid \vec{x}_{m'})}{p(\vec{y}\mid \vec{x}_m)}\Bigr].

Hence,

Dmm={y:Zmm>0}.D_{mm'} = \{\vec{y} : Z_{mm'} > 0\}.

Since the channel is memoryless:

p(yxm)=i=1np(yixm,i),p(yxm)=i=1np(yixm,i).p(\vec{y}\mid \vec{x}_m) = \prod_{i=1}^n p\bigl(y_i \mid x_{m,i}\bigr), \quad p(\vec{y}\mid \vec{x}_{m'}) = \prod_{i=1}^n p\bigl(y_i \mid x_{m',i}\bigr).

Then,

Zmm=i=1nln[p(yixm,i)p(yixm,i)].Z_{mm'} = \sum_{i=1}^n \ln \Bigl[\frac{p(y_i \mid x_{m',i})}{p(y_i \mid x_{m,i})}\Bigr].

The pairwise error probability can be expressed as

P[yDmmxm sent]=P[Zmm>0xm sent],\boxed{ P\bigl[\vec{y}\in D_{mm'} \big\vert \vec{x}_m \text{ sent}\bigr] = P\bigl[Z_{mm'} > 0 \big\vert \vec{x}_m \text{ sent}\bigr], }

which depends on the distribution of ZmmZ_{mm'}, which is in turn determined by the channel statistics.

For an AWGN channel, for instance, these distributions have a Gaussian-like form.

Summing over all mmm' \neq m in the union bound provides a starting framework for analyzing the probability of error in coded modulation systems.

Overview of Chernov Bound

The Chernov bound provides a method to bound the probability of events for an arbitrary random variable XX.

Using the parameter λ, the bound is expressed as follows.

Consider a random variable XX and a real number δ.

Define a transformed variable Y=eλXY = e^{\lambda X}, where λ is a real number, and a constant α=eλδ\alpha = e^{\lambda \delta}.

Since YY is nonnegative, the Markov inequality can be applied to yield Proakis (2007, Eq. (2.4-3)):

P[eλXeλδ]E[eλX]eλδ=E[eλ(Xδ)].P[e^{\lambda X} \geq e^{\lambda \delta}] \leq \frac{\mathbb{E}[e^{\lambda X}]}{e^{\lambda \delta}} = \mathbb{E}[e^{\lambda (X - \delta)}].

The event {eλXeλδ}\{e^{\lambda X} \geq e^{\lambda \delta}\} simplifies to {Xδ}\{X \geq \delta\} when λ>0\lambda > 0, or {Xδ}\{X \leq \delta\} when λ<0\lambda < 0.

Thus, the Chernov bound using λ is Proakis (2007, Eq. (2.4-4)):

P[Xδ]E[eλ(Xδ)],for all λ>0,P[X \geq \delta] \leq \mathbb{E}[e^{\lambda (X - \delta)}], \quad \text{for all } \lambda > 0,
P[Xδ]E[eλ(Xδ)],for all λ<0.P[X \leq \delta] \leq \mathbb{E}[e^{\lambda (X - \delta)}], \quad \text{for all } \lambda < 0.

Chernov Bound and Bhattacharyya Bound of PEP

In the context of a memoryless channel with input alphabet X\mathcal{X} and output alphabet Y\mathcal{Y}, the pairwise error probability (PEP) PmmP_{m \to m'} represents the probability of mistaking codeword xm\vec{x}_m for xm\vec{x}_{m'} when xm\vec{x}_m is transmitted.

Under a maximum-likelihood (ML) detector,

Pmm=P[yDmmxmsent]=P[Zmm>0xm],\boxed{ P_{m \to m'} = P[\vec{y} \in D_{mm'} \mid \vec{x}_m \text{sent}] = P[Z_{mm'} > 0 \mid \vec{x}_m], }

where Dmm={y:p(yxm)>p(yxm)}D_{mm'} = \{\vec{y} : p(\vec{y} \mid \vec{x}_{m'}) > p(\vec{y} \mid \vec{x}_m)\} is the decision region favoring xm\vec{x}_{m'}, and

Zmm=ln[p(yxm)p(yxm)]Z_{mm'} = \ln \biggl[\frac{p(\vec{y} \mid \vec{x}_{m'})}{p(\vec{y} \mid \vec{x}_m)}\biggr]

is the log-likelihood ratio.

Chernov Bound

The Chernov bound provides an upper bound on PmmP_{m \to m'} via the moment-generating function of ZmmZ_{mm'}.

For any λ>0\lambda > 0,

Pmm=P[Zmm>0xm]E[eλZmmxm].\boxed{ P_{m \to m'} = P[Z_{mm'} > 0 \mid \vec{x}_m] \le \mathbb{E}\bigl[e^{\lambda Z_{mm'}} \mid \vec{x}_m\bigr]. }

This follows from Chernoff’s inequality, which bounds the probability of a random variable exceeding a threshold in terms of its exponential moments.

Expanding the expectation,

E[eλZmmxm]=yYneλln[p(yxm)p(yxm)]p(yxm).\mathbb{E}\bigl[e^{\lambda Z_{mm'}} \mid \vec{x}_m\bigr] = \sum_{\vec{y} \in \mathcal{Y}^n} e^{\lambda \ln\bigl[\frac{p(\vec{y} \mid \vec{x}_{m'})}{p(\vec{y} \mid \vec{x}_m)}\bigr]} p(\vec{y} \mid \vec{x}_m).

Since eλln(a)=aλe^{\lambda \ln(a)} = a^\lambda,

eλln[p(yxm)p(yxm)]=(p(yxm))λ(p(yxm))λ,e^{\lambda \ln\bigl[\frac{p(\vec{y} \mid \vec{x}_{m'})}{p(\vec{y} \mid \vec{x}_m)}\bigr]} = \bigl(p(\vec{y} \mid \vec{x}_{m'})\bigr)^\lambda \bigl(p(\vec{y} \mid \vec{x}_m)\bigr)^{-\lambda},

so [Proakis, Eq. (6.8-6)]

PmmyYn[p(yxm)]λ[p(yxm)]1λ,λ>0.\boxed{ P_{m \to m'} \le \sum_{\vec{y} \in \mathcal{Y}^n} \bigl[p(\vec{y} \mid \vec{x}_{m'})\bigr]^\lambda \bigl[p(\vec{y} \mid \vec{x}_m)\bigr]^{1 - \lambda}, \quad \lambda > 0. }

This bound is general and applies to any channel.

One often optimizes over λ to achieve the tightest possible Chernov bound.

Bhattacharyya Bound

The Bhattacharyya bound is a special case of the Chernov bound obtained by setting λ=12\lambda = \frac{1}{2}.

Then,

PmmyYn[p(yxm)]12[p(yxm)]12=yYnp(yxm)p(yxm).\boxed{ P_{m \to m'} \le \sum_{\vec{y} \in \mathcal{Y}^n} \bigl[p(\vec{y} \mid \vec{x}_{m'})\bigr]^{\frac{1}{2}} \bigl[p(\vec{y} \mid \vec{x}_m)\bigr]^{\frac{1}{2}} = \sum_{\vec{y} \in \mathcal{Y}^n} \sqrt{p(\vec{y} \mid \vec{x}_m) p(\vec{y} \mid \vec{x}_{m'})}. }

This form, known as the Bhattacharyya coefficient, is symmetric and simpler to compute, though generally less tight than the optimized Chernov bound.

Role of λ

The parameter λ is a positive real number (λ>0\lambda > 0) utilized in the Chernov bound to derive an upper bound on the pairwise error probability PmmP_{m \to m'} in a memoryless communication channel.

Specifically, it appears in the context of bounding the probability P[Zmm>0xm]P[Z_{mm'} > 0 \mid \vec{x}_m], where Zmm=ln[p(yxm)p(yxm)]Z_{mm'} = \ln \left[ \frac{p(\vec{y} \mid \vec{x}_{m'})}{p(\vec{y} \mid \vec{x}_m)} \right] is the log-likelihood ratio.

The Chernov bound is expressed as:

PmmE[eλZmmxm]=yYn[p(yxm)]λ[p(yxm)]1λ.P_{m \to m'} \leq \mathbb{E}\left[e^{\lambda Z_{mm'}} \mid \vec{x}_m\right] = \sum_{\vec{y} \in \mathcal{Y}^n} \left[p(\vec{y} \mid \vec{x}_{m'})\right]^\lambda \left[p(\vec{y} \mid \vec{x}_m)\right]^{1 - \lambda}.

Nomenclature: The parameter λ lacks a specific standardized name in information theory literature.

It is typically referred to as the parameter in the Chernov bound.

Usage: The parameter λ is used within the Chernov bound, a mathematical tool employed to analyze error probabilities in coded modulation systems over memoryless channels.

It governs the exponential moments of the log-likelihood ratio ZmmZ_{mm'}, enabling the derivation of an upper bound on the probability of mistaking one codeword for another under maximum-likelihood detection.

Significance: The significance of λ lies in its flexibility to tighten the Chernov bound.

By adjusting λ, one can optimize the bound to be as close as possible to the true error probability, depending on the channel’s statistical properties.

A specific choice of λ=12\lambda = \frac{1}{2} yields the Bhattacharyya bound, a simpler but less tight special case.

Thus, λ plays a critical role in balancing computational simplicity and the accuracy of error probability estimates in communication systems.

Bounds for Memoryless Channels

When the channel is memoryless, we have

p(yx)=i=1np(yixi).p(\vec{y} \mid \vec{x}) = \prod_{i=1}^n p\bigl(y_i \mid x_i\bigr).

Chernov Bound (Memoryless Case)

Substituting the memoryless property into the Chernov bound gives

PmmyYn(i=1np(yixm,i))λ(i=1np(yixm,i))1λ.P_{m \to m'} \le \sum_{\vec{y} \in \mathcal{Y}^n} \biggl( \prod_{i=1}^n p\bigl(y_i \mid x_{m',i}\bigr)\biggr)^{\lambda} \biggl( \prod_{i=1}^n p\bigl(y_i \mid x_{m,i}\bigr)\biggr)^{1 - \lambda}.

Rewriting,

=yYni=1npλ(yixm,i)p1λ(yixm,i).= \sum_{\vec{y} \in \mathcal{Y}^n} \prod_{i=1}^n p^\lambda\bigl(y_i \mid x_{m',i}\bigr) p^{ 1 - \lambda}\bigl(y_i \mid x_{m,i}\bigr).

Because the sum over y\vec{y} is a sum over all nn-tuples, and the terms factorize across ii,

=i=1n[yiYpλ(yixm,i)p1λ(yixm,i)].= \prod_{i=1}^n \Biggl[ \sum_{y_i \in \mathcal{Y}} p^\lambda\bigl(y_i \mid x_{m',i}\bigr) p^{ 1 - \lambda}\bigl(y_i \mid x_{m,i}\bigr) \Biggr].

Hence,

Pmmi=1n[yiYpλ(yixm,i)p1λ(yixm,i)],λ>0.\boxed{ P_{m \to m'} \le \prod_{i=1}^n \Bigl[ \sum_{y_i \in \mathcal{Y}} p^\lambda\bigl(y_i \mid x_{m',i}\bigr) p^{ 1 - \lambda}\bigl(y_i \mid x_{m,i}\bigr) \Bigr], \quad \lambda > 0. }

Bhattacharyya Bound (Memoryless Case)

For λ=12\lambda = \frac{1}{2},

Pmmi=1n[yiYp(yixm,i)p(yixm,i)].P_{m \to m'} \le \prod_{i=1}^n \Bigl[ \sum_{y_i \in \mathcal{Y}} \sqrt{p\bigl(y_i \mid x_{m',i}\bigr) p\bigl(y_i \mid x_{m,i}\bigr)} \Bigr].

Chernov and Bhattacharyya Parameters

Define the Chernov parameter and Bhattacharyya parameter for a single-symbol pair (x1,x2)(x_1, x_2) by [Proakis, Eq. (6.8-10)]

Δx1x2(λ)=yY[p(yx2)]λ[p(yx1)]1λ,Δx1,x2=yYp(yx1)p(yx2).\boxed{ \begin{split} \Delta^{(\lambda)}_{x_1 \to x_2} &= \sum_{y \in \mathcal{Y}} \bigl[p(y \mid x_2)\bigr]^\lambda \bigl[p(y \mid x_1)\bigr]^{ 1 - \lambda}, \\ \Delta_{x_1, x_2} &= \sum_{y \in \mathcal{Y}} \sqrt{p(y \mid x_1) p(y \mid x_2)}. \end{split} }
Properties of the Parameters

Bounds for the Memoryless Channels using the Parameters

For a memoryless channel, the Chernov bound reduces to

Pmmi=1nΔxm,ixm,i(λ),λ>0,\boxed{ P_{m \to m'} \le \prod_{i=1}^n \Delta^{(\lambda)}_{ x_{m,i} \to x_{m',i}}, \quad \lambda > 0, }

and the Bhattacharyya bound reduces to

Pmmi=1nΔxm,i,xm,i.\boxed{ P_{m \to m'} \le \prod_{i=1}^n \Delta_{ x_{m,i}, x_{m',i}}. }

These product forms highlight how PEP depends on the product of single-symbol parameters.

The error probability typically decreases exponentially with the Hamming distance between xm\vec{x}_m and xm\vec{x}_{m'}.

For example, in a binary symmetric channel (BSC) with crossover probability pp,

Δ0,1=y{0,1}p(y0)p(y1)=p(1p)+p(1p)=2p(1p),\Delta_{0,1} = \sum_{y \in \{0,1\}} \sqrt{p(y \mid 0) p(y \mid 1)} = \sqrt{p (1-p)} + \sqrt{p (1-p)} = 2 \sqrt{p (1-p)},

and the bound becomes tighter as the Hamming distance increases.

Example: Bounds of PEP for BSC

This example is based on Proakis (2007, Example 6.8-1).

Consider a binary symmetric channel (BSC) with crossover probability pp (0p1)\bigl(0 \le p \le 1\bigr), input alphabet X={0,1}\mathcal{X} = \{0,1\}, and output alphabet Y={0,1}\mathcal{Y} = \{0,1\}.

Two binary sequences (codewords) xm\vec{x}_m and xm\vec{x}_{m'} of length nn differ in dd components, where dd is the Hamming distance (the number of positions ii for which xmixmix_{m i} \neq x_{m' i}).

The BSC transition probabilities are

P[y=0x=0]=1p,P[y=1x=0]=p,P[y = 0 \mid x = 0] = 1 - p, \quad P[y = 1 \mid x = 0] = p,
P[y=1x=1]=1p,P[y=0x=1]=p.P[y = 1 \mid x = 1] = 1 - p, \quad P[y = 0 \mid x = 1] = p.

The Bhattacharyya bound on the pairwise error probability (PEP) PmmP_{m \rightarrow m'} for a memoryless channel is

Pmmi=1nΔxmi,xmi,P_{m \rightarrow m'} \le \prod_{i=1}^{n} \Delta_{x_{m i}, x_{m' i}},

where

Δxmi,xmi=yiYp(yixmi)p(yixmi).\Delta_{x_{m i}, x_{m' i}} = \sum_{y_i \in \mathcal{Y}} \sqrt{p\bigl(y_i \mid x_{m i}\bigr)p\bigl(y_i \mid x_{m' i}\bigr)}.

Case 1: xmi=xmix_{m i} = x_{m' i} (same symbol, occurs in ndn-d positions)

If xmi=xmi=0x_{m i} = x_{m' i} = 0:

Δ0,0=p(00)p(00)+p(10)p(10)=(1p)2+p2=(1p)+p=1.\begin{split} \Delta_{0,0} &= \sqrt{ p(0 \mid 0) p(0 \mid 0) } + \sqrt{ p(1 \mid 0) p(1 \mid 0) } \\ &= \sqrt{(1-p)^2} + \sqrt{p^2} = (1 - p) + p = 1. \end{split}

If xmi=xmi=1x_{m i} = x_{m' i} = 1:

Δ1,1=p(11)p(11)+p(01)p(01)=(1p)2+p2=1.\begin{split} \Delta_{1,1} &= \sqrt{ p(1 \mid 1) p(1 \mid 1) } + \sqrt{ p(0 \mid 1) p(0 \mid 1) } \\ &= \sqrt{(1-p)^2} + \sqrt{p^2} = 1. \end{split}

Hence, whenever the symbols match, Δxmi,xmi=1\Delta_{x_{m i}, x_{m' i}} = 1, contributing a factor of 1 to the overall product.

Case 2: xmixmix_{m i} \neq x_{m' i} (different symbols, occurs in dd positions)

If xmi=0x_{m i} = 0 and xmi=1x_{m' i} = 1:

Δ0,1=p(00)p(01)+p(10)p(11)=(1p)p+p(1p)=2p(1p).\begin{split} \Delta_{0,1} &= \sqrt{ p(0 \mid 0) p(0 \mid 1) } + \sqrt{ p(1 \mid 0) p(1 \mid 1) } \\ &= \sqrt{(1-p) p} + \sqrt{p (1-p)} = 2 \sqrt{p (1-p)}. \end{split}

If xmi=1x_{m i} = 1 and xmi=0x_{m' i} = 0, the result is identical by symmetry.

Thus, the product simplifies to

Pmm(1)nd×[2p(1p)]d=(2p(1p))d.P_{m \rightarrow m'} \le (1)^{n-d} \times \bigl[ 2 \sqrt{p (1-p)}\bigr]^d = \Bigl(2 \sqrt{p (1-p)}\Bigr)^d.

It can be re-written as 4p(1p)\sqrt{4 p (1-p)}, so

Pmm(4p(1p))d.\boxed{ P_{m \rightarrow m'} \le \bigl(\sqrt{4 p (1-p)}\bigr)^d. }

Because 4p(1p)14 p (1-p) \le 1 (with a maximum of 1 at p=12p = \frac{1}{2}), we have Δ=4p(1p)1\Delta = \sqrt{4 p (1-p)} \le 1.

For p12p \neq \frac{1}{2}, Δ<1\Delta < 1, and as dd increases, Δd0\Delta^d \to 0 exponentially, reflecting how greater Hamming distance translates to stronger error-correction potential.

Example: Bounds of PEP for BPSK over AWGN

Given the same example above for BSC, but now consider binary phase-shift keying (BPSK) over an AWGN channel.

The binary symbols (0)(0) and (1)(1) in xm\vec{x}_m and xm\vec{x}_{m'} are mapped to signal amplitudes:

0Ec,1+Ec,0 \to -\sqrt{\mathcal{E}_c}, \quad 1 \to +\sqrt{\mathcal{E}_c},

where Ec\mathcal{E}_c is the energy per component (per symbol).

The received signal for the ii-th symbol is

yi=xmi+ni,y_i = x_{m i} + n_i,

where niN(0,σ2)n_i \sim \mathcal{N}(0, \sigma^2) and σ2=N02\sigma^2 = \frac{N_0}{2} (noise variance, with N0/2N_0/2 as the power spectral density).

The conditional PDFs are

p(yixmi=+Ec)=12πσ2exp((yiEc)22σ2)=1πN0exp((yiEc)2N0),\begin{split} p\bigl(y_i \mid x_{m i} = +\sqrt{\mathcal{E}_c}\bigr) &= \frac{1}{\sqrt{2 \pi \sigma^2}} \exp\Bigl(- \frac{(y_i - \sqrt{\mathcal{E}_c})^2}{2 \sigma^2}\Bigr) \\ &= \frac{1}{\sqrt{\pi N_0}} \exp\Bigl(- \frac{(y_i - \sqrt{\mathcal{E}_c})^2}{ N_0}\Bigr), \end{split}
p(yixmi=Ec)=1πN0exp((yi+Ec)2N0).p\bigl(y_i \mid x_{m i} = -\sqrt{\mathcal{E}_c}\bigr) = \frac{1}{\sqrt{\pi N_0}} \exp\Bigl(- \frac{(y_i + \sqrt{\mathcal{E}_c})^2}{ N_0}\Bigr).

Again, the Bhattacharyya bound is

Pmmi=1nΔxmi,xmi.P_{m \rightarrow m'} \le \prod_{i=1}^n \Delta_{x_{m i}, x_{m' i}}.
Case 1: xmi=xmix_{m i} = x_{m' i}.

If xmi=xmi=+Ecx_{m i} = x_{m' i} = +\sqrt{\mathcal{E}_c}:

Δ+Ec,+Ec=p(y+Ec)p(y+Ec)dy=p(y+Ec)dy=1.\begin{split} \Delta_{+\sqrt{\mathcal{E}_c}, +\sqrt{\mathcal{E}_c}} &= \int_{-\infty}^\infty \sqrt{ p\bigl(y \mid +\sqrt{\mathcal{E}_c}\bigr) p\bigl(y \mid +\sqrt{\mathcal{E}_c}\bigr) } dy \\ &= \int_{-\infty}^\infty p\bigl(y \mid +\sqrt{\mathcal{E}_c}\bigr) dy = 1. \end{split}

Similarly, ΔEc,Ec=1\Delta_{-\sqrt{\mathcal{E}_c}, -\sqrt{\mathcal{E}_c}} = 1.

These contribute a factor of 1 for the ndn - d positions where the symbols match.

Case 2: xmixmix_{m i} \neq x_{m' i} (differ in dd positions).

If xmi=+Ecx_{m i} = +\sqrt{\mathcal{E}_c} and xmi=Ecx_{m' i} = -\sqrt{\mathcal{E}_c} (or vice versa, by symmetry):

Δ+Ec,Ec=p(y+Ec)p(yEc)dy\Delta_{+\sqrt{\mathcal{E}_c}, -\sqrt{\mathcal{E}_c}} = \int_{-\infty}^{\infty} \sqrt{ p\bigl(y \mid +\sqrt{\mathcal{E}_c}\bigr) p\bigl(y \mid -\sqrt{\mathcal{E}_c}\bigr) }dy
=(1πN0e(yEc)2N0)(1πN0e(y+Ec)2N0)dy= \int_{-\infty}^{\infty} \sqrt{\Bigl(\frac{1}{\sqrt{\pi N_0}} e^{-\frac{(y-\sqrt{\mathcal{E}_c})^2}{ N_0}}\Bigr) \Bigl(\frac{1}{\sqrt{\pi N_0}} e^{-\frac{(y+\sqrt{\mathcal{E}_c})^2}{ N_0}}\Bigr) }dy
=1πN0exp((yEc)2+(y+Ec)22N0)dy.= \int_{-\infty}^{\infty} \frac{1}{\sqrt{\pi N_0}} \exp\Bigl(- \frac{(y-\sqrt{\mathcal{E}_c})^2 + (y+\sqrt{\mathcal{E}_c})^2}{2 N_0}\Bigr) dy.

Expand the exponent:

(yEc)2+(y+Ec)2=(y22yEc+Ec)+(y2+2yEc+Ec)=2y2+2Ec.(y - \sqrt{\mathcal{E}_c})^2 + (y + \sqrt{\mathcal{E}_c})^2 = (y^2 - 2y\sqrt{\mathcal{E}_c} + \mathcal{E}_c) + (y^2 + 2y\sqrt{\mathcal{E}_c} + \mathcal{E}_c) = 2 y^2 + 2 \mathcal{E}_c.

Hence,

Δ+Ec,Ec=1πN0exp(2y2+2Ec2N0)dy=eEcN01πN0ey2N0dy.\Delta_{+\sqrt{\mathcal{E}_c}, -\sqrt{\mathcal{E}_c}} = \int_{-\infty}^\infty \frac{1}{\sqrt{\pi N_0}} \exp\Bigl(- \frac{2 y^2 + 2 \mathcal{E}_c}{2 N_0}\Bigr)dy = e^{-\frac{\mathcal{E}_c}{N_0}} \int_{-\infty}^\infty \frac{1}{\sqrt{\pi N_0}} e^{-\frac{y^2}{N_0}}dy.

Recognizing the integral as a Gaussian with variance N02\frac{N_0}{2}:

1πN0exp(y2N0)dy=12πN02exp(y22(N02))dy=1.\int_{-\infty}^\infty \frac{1}{\sqrt{\pi N_0}} \exp\Bigl(- \frac{y^2}{N_0}\Bigr)dy = \int_{-\infty}^\infty \frac{1}{\sqrt{2\pi \frac{N_0}{2}}} \exp\Bigl(- \frac{y^2}{2 (\frac{N_0}{2})}\Bigr)dy = 1.

Thus,

Δ+Ec,Ec=eEcN0.\Delta_{+\sqrt{\mathcal{E}_c}, -\sqrt{\mathcal{E}_c}} = e^{-\frac{\mathcal{E}_c}{N_0}}.

The overall product becomes Proakis (2007, Eq. (6.8-14))

Pmm(1)nd×(eEcN0)d=(eEcN0)d.\boxed{ P_{m \rightarrow m'} \le (1)^{ n-d} \times \bigl(e^{-\frac{\mathcal{E}_c}{N_0}}\bigr)^d = \Bigl(e^{-\frac{\mathcal{E}_c}{N_0}}\Bigr)^d. }

(Note: There is a known typo in some slides that might show a factor 12\frac{1}{2} in the exponent; the correct expression is eEcN0e^{-\frac{\mathcal{E}_c}{N_0}}.)

Comparison between The Two Considered Channels.

Both bounds have the form Δd\Delta^d:

In both cases, Δ<1\Delta < 1 implies Pmm0P_{m \to m'} \to 0 exponentially fast as dd grows.

This highlights the crucial role of Hamming distance in the error-correcting capability of the code.

Random Coding

In the analysis of coded modulation, rather than evaluating the pairwise error probability (PEP) PmmP_{m \rightarrow m'} for specific, deterministic codewords xm\vec{x}_m and xm\vec{x}_{m'}, we use a random coding approach.

In this method, all (M)(M) codewords are generated probabilistically to determine the average performance over an ensemble of codes.

This technique, pioneered by Shannon, is instrumental in showing that good codes (those that achieve rates close to channel capacity) exist, without requiring an explicit construction.

Codebook Generation

Assume the input alphabet X\mathcal{X} (for example, {0,1}\{0, 1\} for binary codes or R\mathbb{R} for continuous signals) has a probability density function (PDF) or probability mass function (PMF) p(x)p(x).

We generate each of the (M)(M) codewords xm=(xm1,xm2,,xmn)\vec{x}_m = (x_{m1}, x_{m2}, \ldots, x_{mn}) (where m=1,2,,Mm = 1, 2, \ldots, M) independently according to this distribution.

Specifically:

Average Pairwise Error Probability for Random Code

We denote by Pmm\overline{P_{m \rightarrow m'}} the average PEP, which is the expected value of PmmP_{m \rightarrow m'} over all possible pairs of randomly generated codewords xm\vec{x}_m and xm\vec{x}_{m'} (mmm \neq m').

This average reflects the typical performance of a randomly chosen code, a central idea in proving the existence of codes that can approach channel capacity.

Derivation Using the Chernov Parameter

To compute Pmm\overline{P_{m \rightarrow m'}}, we start from the Chernov bound on the PEP for a specific pair of codewords in a memoryless channel:

Pmmi=1nΔxmixmi(λ),λ>0,P_{m \rightarrow m'} \leq \prod_{i=1}^{n} \Delta^{(\lambda)}_{x_{mi} \rightarrow x_{m'i}}, \quad \lambda > 0,

where the Chernov parameter is defined as

Δxmixmi(λ)=yiYpλ(yixmi)p1λ(yixmi).\Delta^{(\lambda)}_{x_{mi} \rightarrow x_{m'i}} = \sum_{y_i \in \mathcal{Y}} p^{\lambda}\bigl(y_i \bigl| x_{m'i}\bigr) p^{ 1-\lambda}\bigl(y_i \bigl| x_{mi}\bigr).

This bound applies to a given pair (xm,xm)(\vec{x}_m, \vec{x}_{m'}).

To find the average PEP over all such pairs, we take the expectation:

Pmm=E[Pmm]=xmXnxmXnmmP(xm)P(xm)Pmm.\overline{P_{m \rightarrow m'}} = \mathbb{E}\bigl[P_{m \rightarrow m'}\bigr] = \sum_{\vec{x}_m \in \mathcal{X}^n} \sum_{\substack{\vec{x}_{m'} \in \mathcal{X}^n \\ m' \neq m}} P(\vec{x}_m)P(\vec{x}_{m'}) P_{m \rightarrow m'}.

Often, to simplify, we initially compute the average without explicitly excluding the case m=mm' = m, since that event can be accounted for later when applying the union bound.

Incorporating the Chernov bound, we get:

PmmxmXnxmXnP(xm)P(xm)i=1nΔxmixmi(λ).\overline{P_{m \rightarrow m'}} \leq \sum_{\vec{x}_m \in \mathcal{X}^n} \sum_{\vec{x}_{m'} \in \mathcal{X}^n} P(\vec{x}_m)P(\vec{x}_{m'}) \prod_{i=1}^{n} \Delta^{(\lambda)}_{x_{mi} \rightarrow x_{m'i}}.

Substituting Independent Generation Probabilities

Recall that

P(xm)=i=1np(xmi),P(xm)=i=1np(xmi),P(\vec{x}_m) = \prod_{i=1}^{n} p(x_{mi}), \quad P(\vec{x}_{m'}) = \prod_{i=1}^{n} p(x_{m'i}),

so

PmmxmXnxmXni=1n[p(xmi)p(xmi)Δxmixmi(λ)].\overline{P_{m \rightarrow m'}} \leq \sum_{\vec{x}_m \in \mathcal{X}^n} \sum_{\vec{x}_{m'} \in \mathcal{X}^n} \prod_{i=1}^{n} \Bigl[p(x_{mi}) p(x_{m'i}) \Delta^{(\lambda)}_{x_{mi} \rightarrow x_{m'i}}\Bigr].

Because each component index ii is independent and the double sum spans all possible nn-tuples of xm\vec{x}_m and xm\vec{x}_{m'}, we can factorize the expression as

Pmmi=1n(xmiXxmiXp(xmi)p(xmi)Δxmixmi(λ)).\overline{P_{m \rightarrow m'}} \leq \prod_{i=1}^{n} \Bigl( \sum_{x_{mi} \in \mathcal{X}} \sum_{x_{m'i} \in \mathcal{X}} p(x_{mi})p(x_{m'i})\Delta^{(\lambda)}_{x_{mi} \rightarrow x_{m'i}}\Bigr).

Since each term in the product is identical for all ii, let us define

x1Xx2Xp(x1)p(x2)Δx1x2(λ)xmiXxmiXp(xmi)p(xmi)Δxmixmi(λ).\sum_{x_1 \in \mathcal{X}} \sum_{x_2 \in \mathcal{X}} p(x_1)p(x_2)\Delta^{(\lambda)}_{x_1 \rightarrow x_2} \equiv \sum_{x_{mi} \in \mathcal{X}} \sum_{x_{m'i} \in \mathcal{X}} p(x_{mi}) p(x_{m'i}) \Delta^{(\lambda)}_{x_{mi} \rightarrow x_{m'i}}.

Hence, we attain the upper bound Proakis (2007, Eq. (6.8-15))

Pmm(x1Xx2Xp(x1)p(x2)Δx1x2(λ))n,λ0.\boxed{ \overline{P_{m \rightarrow m'}} \leq \Bigl( \sum_{x_1 \in \mathcal{X}} \sum_{x_2 \in \mathcal{X}} p(x_1)p(x_2)\Delta^{(\lambda)}_{x_1 \rightarrow x_2} \Bigr)^n, \quad \lambda \geq 0. }

Note: Allowing λ0\lambda \geq 0 admits the trivial case λ=0\lambda = 0, where Δx1x2(0)=yp(yx1)=1\Delta^{(0)}_{x_1 \rightarrow x_2} = \sum_{y} p(y \mid x_1) = 1. This yields the bound 1n=1\leq 1^n = 1.

In practice, to get useful bounds, we require λ>0\lambda > 0.

Continuous Case and Total Error Probability

Since Pmm\overline{P_{m \rightarrow m'}} is the average pairwise error probability for one pair of codewords, the total codeword error probability PeP_e is typically bounded by applying a union bound over (M1)(M-1) such pairs:

Pe(M1)Pmm.P_e \leq (M - 1)\overline{P_{m \rightarrow m'}}.

We can observe that:

R0(p,λ)R_0(p,\lambda) Function

Recall that, we consider a memoryless channel with input alphabet X\mathcal{X} and output alphabet Y\mathcal{Y}.

In a random coding framework, we generate MM codewords randomly according to a PDF (or PMF) p(x)p(x) defined over X\mathcal{X}.

The average pairwise error probability (PEP), Pmm\overline{P_{m \rightarrow m'}}, for this ensemble of codes can be bounded by the Chernov parameter.

Specifically,

Pmm(x1Xx2Xp(x1)p(x2)Δx1x2(λ))n,λ>0,\overline{P_{m \rightarrow m'}} \leq \biggl( \sum_{x_1 \in \mathcal{X}} \sum_{x_2 \in \mathcal{X}} p(x_1) p(x_2) \Delta^{(\lambda)}_{x_1 \rightarrow x_2} \biggr)^n, \quad \lambda > 0,

where

Δx1x2(λ)=yYpλ(yx2)p1λ(yx1).\Delta^{(\lambda)}_{x_1 \rightarrow x_2} = \sum_{y \in \mathcal{Y}} p^{\lambda}\bigl(y \mid x_2\bigr)p^{ 1-\lambda}\bigl(y \mid x_1\bigr).

Defining R0(p,λ)R_0(p,\lambda)

We introduce a function R0(p,λ)R_0(p,\lambda), which encapsulates the negative binary logarithm of the expected Chernov parameter Proakis (2007, Eq. (6.8-16):

R0(p,λ)log2[x1Xx2Xp(x1)p(x2)Δx1x2(λ)],λ>0.\boxed{ R_0\bigl(p,\lambda\bigr) \triangleq -\log_{2}\Bigl[ \sum_{x_1 \in \mathcal{X}} \sum_{x_2 \in \mathcal{X}} p(x_1) p(x_2) \Delta^{(\lambda)}_{x_1 \rightarrow x_2} \Bigr], \quad \lambda > 0. }

We can observe that the term inside the brackets, x1x2p(x1)p(x2)Δx1x2(λ),\sum_{x_1}\sum_{x_2} p(x_1) p(x_2) \Delta^{(\lambda)}_{x_1 \rightarrow x_2}, is E[ΔX1X2(λ)]\mathbb{E}\bigl[\Delta^{(\lambda)}_{X_1 \rightarrow X_2}\bigr], the expected Chernov parameter when X1X_1 and X2X_2 are independent and each distributed according to p(x)p(x).

Consequently,

R0(p,λ)=log2[E(ΔX1X2(λ))].\boxed{ R_0\bigl(p,\lambda\bigr) = -\log_{2}\bigl[\mathbb{E}\bigl(\Delta^{(\lambda)}_{X_1 \rightarrow X_2}\bigr)\bigr]. }

Because the logarithm is base 2, R0(p,λ)R_0(p,\lambda) has units of bits per symbol, aligning it with information-theoretic rates.

Re-Expressing the Average PEP Bound using R0(p,λ)R_0(p,\lambda)

From the definition above, we have

Pmm(E[ΔX1X2(λ)])n=2nlog2(E[Δ(λ)])\overline{P_{m \rightarrow m'}} \leq \Bigl( \mathbb{E}\bigl[\Delta^{(\lambda)}_{X_1 \rightarrow X_2}\bigr]\Bigr)^n = 2^{ n \log_{2}\bigl(\mathbb{E}[\Delta^{(\lambda)}]\bigr)}

Since log2(E[Δ(λ)])=R0(p,λ)\log_{2}\bigl(\mathbb{E}[\Delta^{(\lambda)}]\bigr) = - R_0(p,\lambda), we re-express the upper bound of Pmm\overline{P_{m \rightarrow m'}} as a function of R0(p,λ)R_0(p,\lambda) as Proakis (2007, Eq. (6.8-17))

Pmm2nR0(p,λ),λ>0\boxed{ \overline{P_{m \rightarrow m'}} \leq 2^{- n R_0(p,\lambda)}, \lambda > 0 }

Thus, Pmm\overline{P_{m \rightarrow m'}} decays exponentially in nn with exponent R0(p,λ)R_0(p,\lambda).

Overall Average Error Probability

Now consider the average error probability conditioned on sending codeword xm\vec{x}_m, denoted Pem\overline{P_{e\mid m}}.

Applying the union bound over the (M1)(M-1) incorrect codewords,

Pem=E[m=1mmMP[yDmxm]]m=1mmMPmm.\overline{P_{e\mid m}} = \mathbb{E}\Bigl[ \sum_{\substack{m' = 1 \\ m' \neq m}}^{M} P\bigl[\vec{y} \in D_{m'} \bigm| \vec{x}_m\bigr] \Bigr] \leq \sum_{\substack{m' = 1 \\ m' \neq m}}^{M} \overline{P_{m \rightarrow m'}}.

Because random coding is symmetric (all codewords are drawn from the same distribution p(x)p(x)), Pmm\overline{P_{m \rightarrow m'}} does not depend on mm:

Pem(M1)Pmm(M1)2nR0(p,λ).\overline{P_{e\mid m}} \leq (M-1) \overline{P_{m \rightarrow m'}} \leq (M-1) 2^{-n R_0(p,\lambda)}.

Recall that Rc=knR_c = \frac{k}{n} denotes the code rate.

If M=2k=2nRcM = 2^k = 2^{n R_c} (with RcR_c in bits per symbol), then for large MM, (M1)M(M-1)\approx M. Hence,

PemM2nR0(p,λ)=2nRc2nR0(p,λ)=2n(R0(p,λ)Rc).\overline{P_{e\mid m}} \leq M 2^{-n R_0(p,\lambda)} = 2^{n R_c} 2^{-n R_0(p,\lambda)} = 2^{- n \bigl(R_0(p,\lambda)-R_c\bigr)}.

Because this holds for any mm, the overall average error probability Pˉe\bar{P}_e (averaged over all codewords in the ensemble) satisfies Proakis (2007, Eq. (6.8-19))

Pˉe=1Mm=1MPem2n(R0(p,λ)Rc),λ>0.\boxed{ \bar{P}_e = \frac{1}{M} \sum_{m=1}^{M}\overline{P_{e\mid m}} \leq 2^{- n \bigl(R_0(p,\lambda)-R_c\bigr)}, \quad \lambda > 0. }
Implications for Reliability and Code Existence.

Thus, the function R0(p,λ)R_0(p,\lambda) plays a pivotal role in bounding the random-coding error probability and in demonstrating that, for rates RcR_c below capacity, there exist codes whose error probability goes to zero exponentially in nn.

The Channel Cutoff Rate R0R_0

The maximum value of R0(p,λ)R_0(p, \lambda), taken over all probability density functions p(x)p(x) and all λ>0\lambda > 0, yields the quantity R0R_0, which is referred to as the channel cutoff rate.

The channel cutoff rate R0R_0 represents the maximum rate at which the average error probability of a randomly coded system can decay exponentially with the code length nn.

Its derivation arises from the function R0(p,λ)R_0(p,\lambda), which provides an exponent bound on the average pairwise error probability (PEP). Specifically,

Pmm2nR0(p,λ).\overline{P_{m \rightarrow m'}} \le 2^{- n R_0(p,\lambda)}.

We define the channel cutoff rate R0R_0 as Proakis (2007, Eq. (6.8-20)):

R0maxp(x)supλ>0R0(p,λ),\boxed{ R_0 \triangleq \max_{p(x)} \sup_{\lambda > 0} R_0\bigl(p,\lambda\bigr), }

where

R0(p,λ)=log2[x1Xx2Xp(x1)p(x2)Δx1x2(λ)]R_0\bigl(p,\lambda\bigr) = -\log_{2}\Bigl[\sum_{x_1 \in \mathcal{X}} \sum_{x_2 \in \mathcal{X}} p(x_1) p(x_2) \Delta^{(\lambda)}_{x_1 \to x_2}\Bigr]

and

Δx1x2(λ)=yYpλ(yx2)p1λ(yx1).\Delta^{(\lambda)}_{x_1 \to x_2} = \sum_{y \in \mathcal{Y}} p^{\lambda}\bigl(y \mid x_2\bigr) p^{1-\lambda}\bigl(y \mid x_1\bigr).

In the case of continuous alphabets (e.g., X,YR\mathcal{X}, \mathcal{Y} \subseteq \mathbb{R}, as in AWGN channels), the sums become integrals:

R0(p,λ)=log2[XXp(x1)p(x2)(Ypλ(yx2)p1λ(yx1)dy)dx1dx2].R_0\bigl(p,\lambda\bigr) = -\log_{2} \Biggl[ \int_{\mathcal{X}}\int_{\mathcal{X}} p(x_1) p(x_2) \Bigl(\int_{\mathcal{Y}} p^{\lambda}\bigl(y \mid x_2\bigr) p^{1-\lambda}\bigl(y \mid x_1\bigr) dy \Bigr) dx_1 dx_2 \Biggr].

Since R0R_0 is the maximum of R0(p,λ)R_0(p,\lambda) over all p(x)p(x) and λ>0\lambda>0, it represents the largest exponent that can be achieved by any random coding ensemble.

This provides a practical guideline for reliable communication rates below the channel capacity CC.

Discussion on The Supremum in The Definition of R0R_0

Recall that, loosely speaking, the supremum is the least upper bound of a set of real numbers.

Specifically, when the channel cutoff rate is defined as:

R0=maxp(x)supλ>0R0(p,λ),R_0 = \max_{p(x)} \sup_{\lambda > 0} R_0(p, \lambda),

the sup\mathrm{sup} over λ>0\lambda > 0 means that for a fixed input probability density function p(x)p(x), we consider the set of all possible values of R0(p,λ)R_0(p, \lambda) as λ varies over all positive real numbers.

The supremum supλ>0R0(p,λ)\sup_{\lambda > 0} R_0(p, \lambda) is the smallest real number that is greater than or equal to every value in this set.

Unlike the maximum, the supremum may not be achieved by any specific λ if R0(p,λ)R_0(p, \lambda) does not attain its least upper bound within the set of allowable λ.

In this context, the use of the supremum indicates that we are seeking the greatest possible value of R0(p,λ)R_0(p, \lambda) for a given p(x)p(x), even if that value is only approached asymptotically as λ increases.

The overall expression for R0R_0 then involves maximizing this supremum over all possible p(x)p(x), ensuring the highest achievable rate for exponential error decay in the communication system.

R0R_0 for Symmetric Channels

A symmetric channel is one where each row and column of the channel transition probability matrix p(yx)p(y\mid x) can be viewed as a permutation of any other row or column.

Under such symmetry, the optimal λ that maximizes R0(p,λ)R_0(p,\lambda) is λ=12\lambda = \frac{1}{2}.

In that case, the Chernov parameter Δx1x2(1/2)\Delta^{(1/2)}_{x_1\to x_2} becomes the Bhattacharyya bound:

Δx1x2(1/2)=yYp(yx1)p(yx2).\Delta^{(1/2)}_{x_1 \to x_2} = \sum_{y \in \mathcal{Y}} \sqrt{ p\bigl(y \mid x_1\bigr) p\bigl(y \mid x_2\bigr)}.

Hence, we have Proakis (2007, Eq. (6.8-21))

R0=maxp(x){log2[E(ΔX1,X2(1/2))]}.\boxed{ R_0 = \max_{p(x)} \Bigl\{ - \log_{2} \Bigl[ \mathbb{E}\bigl(\Delta^{(1/2)}_{X_1,X_2}\bigr) \Bigr] \Bigr\}. }
Expansion of the Expectation.

Since X1X_1 and X2X_2 are i.i.d. with distribution p(x)p(x),

E[ΔX1,X2(1/2)]=x1Xx2Xp(x1)p(x2)yYp(yx1)p(yx2).\mathbb{E}\bigl[\Delta^{(1/2)}_{X_1,X_2}\bigr] = \sum_{x_1\in \mathcal{X}}\sum_{x_2\in \mathcal{X}} p(x_1) p(x_2) \sum_{y \in \mathcal{Y}} \sqrt{ p\bigl(y \mid x_1\bigr) p\bigl(y \mid x_2\bigr)}.

Changing the order of summation:

=yY(x1Xp(x1)p(yx1))(x2Xp(x2)p(yx2)).= \sum_{y \in \mathcal{Y}} \Bigl( \sum_{x_1\in \mathcal{X}} p(x_1) \sqrt{p\bigl(y \mid x_1\bigr)}\Bigr) \Bigl( \sum_{x_2\in \mathcal{X}} p(x_2) \sqrt{p\bigl(y \mid x_2\bigr)}\Bigr).

Define

s(y)=xXp(x)p(yx),s(y) = \sum_{x \in \mathcal{X}} p(x) \sqrt{p(y \mid x)},

so

E[ΔX1,X2(1/2)]=yY[s(y)]2.\mathbb{E}\bigl[\Delta^{(1/2)}_{X_1,X_2}\bigr] = \sum_{y \in \mathcal{Y}} \bigl[s(y)\bigr]^2.

Therefore,

R0=maxp(x){log2[yY(xXp(x)p(yx))2]}.\boxed{ R_0 = \max_{p(x)} \Bigl\{ - \log_{2} \Bigl[ \sum_{y \in \mathcal{Y}} \Bigl(\sum_{x \in \mathcal{X}} p(x) \sqrt{p(y \mid x)} \Bigr)^2 \Bigr] \Bigr\}. }

For a symmetric channel, the uniform distribution p(x)=1/Qp(x)=1/Q (where Q=XQ=\lvert \mathcal{X}\rvert) typically maximizes R0R_0.

Substituting p(x)=1/Qp(x)=1/Q:

s(y)=xX1Qp(yx),s(y) = \sum_{x \in \mathcal{X}} \frac{1}{Q} \sqrt{p(y \mid x)},

and

R0=log2[yY(xX1Qp(yx))2].R_0 = -\log_{2} \Bigl[ \sum_{y \in \mathcal{Y}} \Bigl( \sum_{x \in \mathcal{X}} \frac{1}{Q} \sqrt{p(y \mid x)} \Bigr)^2 \Bigr].

We can factor out 1/Q1/Q:

=log2[1Q2yY(xXp(yx))2]=2log2Qlog2[yY(xXp(yx))2].\begin{split} &= - \log_{2} \Bigl[ \frac{1}{Q^2} \sum_{y \in \mathcal{Y}} \Bigl( \sum_{x \in \mathcal{X}} \sqrt{p(y \mid x)}\Bigr)^2 \Bigr] \\ &= 2 \log_{2}Q - \log_{2} \Bigl[ \sum_{y \in \mathcal{Y}} \Bigl( \sum_{x \in \mathcal{X}} \sqrt{p(y \mid x)}\Bigr)^2 \Bigr]. \end{split}

Since

yY(xXp(yx))21\sum_{y\in \mathcal{Y}} \Bigl( \sum_{x\in \mathcal{X}} \sqrt{p(y\mid x)}\Bigr)^2 \ge 1

(by Jensen’s inequality and probability normalization), it follows that Proakis (2007, Eq. (6.8-25))

R0log2Q,\boxed{ R_0 \le \log_{2}Q, }

which is the maximal possible rate for an alphabet of size QQ.

R0R_0 for Symmetric Binary-Input Channels

For a binary symmetric-input channel (X={0,1}\mathcal{X}=\{0,1\}, Q=2Q=2), such as the Binary Symmetric Channel (BSC) or binary-input AWGN channel, the Bhattacharyya parameter Δx1,x2(1/2)\Delta^{(1/2)}_{x_1,x_2} often simplifies due to symmetry:

Δx1,x2={Δ,if x1x2,1,if x1=x2,\Delta_{x_1, x_2} = \begin{cases} \Delta, & \text{if } x_1 \neq x_2,\\ 1, & \text{if } x_1 = x_2, \end{cases}

with specific examples including:

When p(x)=12p(x)=\frac{1}{2}, the average of ΔX1,X2\Delta_{X_1,X_2} is:

E[ΔX1,X2(1/2)]=x1,x2{0,1}12×12Δx1,x2=14(1+Δ+Δ+1)=2+2Δ4=1+Δ2.\begin{split} \mathbb{E}\bigl[\Delta^{(1/2)}_{X_1,X_2}\bigr] &= \sum_{x_1,x_2 \in \{0,1\}} \frac{1}{2} \times \frac{1}{2} \Delta_{x_1,x_2} \\ &= \frac{1}{4}(1 + \Delta + \Delta + 1) = \frac{2 + 2 \Delta}{4} = \frac{1 + \Delta}{2}. \end{split}

Hence, same as Proakis (2007, Eq. (6.8-27))

R0=log2(1+Δ2)=1log2(1+Δ).\boxed{ R_0 = -\log_{2} \Bigl(\frac{1 + \Delta}{2}\Bigr) = 1 - \log_{2}(1 + \Delta). }

Because 0Δ10 \le \Delta \le 1 in these channels (e.g., Δ=1\Delta=1 for p=12p=\frac{1}{2} in a very noisy BSC, and Δ0\Delta \to 0 as Ec/N0\mathcal{E}_c/N_0 \to \infty in AWGN), R0R_0 ranges from 0 (when the channel is very noisy) to 1 (when the channel is error-free).

Significance of R0R_0 in The Considered Channel Model

Thus, the channel cutoff rate R0R_0 quantifies a fundamental lower bound on achievable rates with exponential error decay under random coding.

General Significance of R0R_0

Exponential Error Decay

Exponential error decay in a communication system refers to the behavior of the average pairwise error probability (PEP), denoted as Pmm\overline{P_{m \to m'}}, which decreases exponentially with the code length nn.

Specifically, as mentioned above, we have:

Pmm2nR0(p,λ),\overline{P_{m \to m'}} \leq 2^{-n R_0(p, \lambda)},

where R0(p,λ)R_0(p, \lambda) is a function related to the channel cutoff rate, and nn is the length of the code.

The term 2nR0(p,λ)2^{-n R_0(p, \lambda)} indicates that as nn increases, the error probability decreases exponentially at a rate determined by R0(p,λ)R_0(p, \lambda).

Recall that the channel cutoff rate R0R_0, defined as:

R0=maxp(x)supλ>0R0(p,λ),R_0 = \max_{p(x)} \sup_{\lambda > 0} R_0(p, \lambda),

represents the maximum rate at which this exponential decay can occur.

Here, R0(p,λ)R_0(p, \lambda) is given by:

R0(p,λ)=log2[x1Xx2Xp(x1)p(x2)Δx1x2(λ)],R_0(p, \lambda) = -\log_{2}\left[\sum_{x_1 \in \mathcal{X}} \sum_{x_2 \in \mathcal{X}} p(x_1) p(x_2) \Delta^{(\lambda)}_{x_1 \to x_2}\right],

with

Δx1x2(λ)=yYpλ(yx2)p1λ(yx1).\Delta^{(\lambda)}_{x_1 \to x_2} = \sum_{y \in \mathcal{Y}} p^{\lambda}(y \mid x_2) p^{1-\lambda}(y \mid x_1).

This exponential decay implies that the likelihood of mistaking one codeword for another diminishes rapidly as the code length increases, provided the communication rate is below or at R0R_0.

Impacts of Exponential Error Decay

Exponential error decay is crucial in communication systems for the following reasons:

Highest Achievable Rate and Error Decay

The channel cutoff rate R0R_0 represents the highest rate (in bits per symbol) at which the average pairwise error probability can decay exponentially with code length nn.

When the communication rate RR (where the number of codewords M=2nRM = 2^{nR}) is less than or equal to R0R_0, the error probability Pmm\overline{P_{m \to m'}} exhibits exponential decay of the form 2nR0(p,λ)2^{-n R_0(p, \lambda)}, where R0(p,λ)R0R_0(p, \lambda) \leq R_0.

At the rate R=R0R = R_0, the system achieves the maximum rate for which this exponential decay is guaranteed.

If the rate exceeds R0R_0, the error probability may still decrease with nn, but the decay will no longer be exponential, leading to a slower reduction in errors and potentially unacceptable error rates for reliable communication.

Thus, operating at RR0R \leq R_0 ensures that the system can achieve both a high data rate and a rapidly decreasing error probability, balancing efficiency and reliability in the communication system.

Example: R0R_0 of the BSC

This example is based on [Proakis, Example 6.8-2].

For a binary symmetric channel (BSC) with crossover probability pp, the input and output alphabets are X=Y={0,1}\mathcal{X} = \mathcal{Y} = \{0, 1\}.

The transition probabilities are:

P[y=1x=0]=p,P[y=0x=1]=p,andP[y=x]=1p.P[y = 1 \mid x = 0] = p, \quad P[y = 0 \mid x = 1] = p, \quad \text{and} \quad P[y = x] = 1 - p.

Due to symmetry, the optimal λ in the Chernov (Gallager) bound is λ=12\lambda = \frac{1}{2} (the Bhattacharyya bound), and the optimal input distribution is uniform, p(x)=12p(x)=\frac{1}{2}.

We use the general form for symmetric channels:

R0=2log2(Q)log2[yY(xXp(yx))2],Q=X=2.R_0 = 2 \log_{2}(Q) - \log_{2} \Bigl[ \sum_{y \in \mathcal{Y}} \Bigl( \sum_{x \in \mathcal{X}} \sqrt{p(y \mid x)} \Bigr)^2 \Bigr], \quad Q = \lvert \mathcal{X} \rvert = 2.

First, we have

Term for y=0y=0:

x=0,1p(y=0x)=p(00)+p(01)=1p+p.\sum_{x=0,1} \sqrt{p(y=0 \mid x)} = \sqrt{p(0 \mid 0)} + \sqrt{p(0 \mid 1)} = \sqrt{1 - p} + \sqrt{p}.

Therefore,

(1p+p)2\Bigl(\sqrt{1 - p} + \sqrt{p}\Bigr)^2

is the square of that sum.

Term for y=1y=1:

x=0,1p(y=1x)=p(10)+p(11)=p+1p.\sum_{x=0,1} \sqrt{p(y=1 \mid x)} = \sqrt{p(1 \mid 0)} + \sqrt{p(1 \mid 1)} = \sqrt{p} + \sqrt{1 - p}.

Its square is

(p+1p)2.\Bigl(\sqrt{p} + \sqrt{1 - p}\Bigr)^2.

Summation over y{0,1}y \in \{0,1\}:

y=0,1(x=0,1p(yx))2=(1p+p)2+(p+1p)2.\sum_{y=0,1} \Bigl(\sum_{x=0,1} \sqrt{p(y \mid x)}\Bigr)^2 = \Bigl(\sqrt{1 - p} + \sqrt{p}\Bigr)^2 + \Bigl(\sqrt{p} + \sqrt{1 - p}\Bigr)^2.

Notice that both expressions are identical, so

=2(1p+p)2=2(1p+2p(1p)+p)=2(1+2p(1p)).= 2 \Bigl(\sqrt{1 - p} + \sqrt{p}\Bigr)^2 = 2 \Bigl(1 - p + 2 \sqrt{p(1 - p)} + p\Bigr) = 2 \Bigl(1 + 2 \sqrt{p (1 - p)}\Bigr).

Substitute into the formula for R0R_0:

R0=2log2(2)log2[2(1+2p(1p))]=2[1+log2(1+2p(1p))].R_0 = 2 \log_{2}(2) - \log_{2} \Bigl[ 2 \bigl(1 + 2 \sqrt{p (1 - p)}\bigr) \Bigr] = 2 - \bigl[ 1 + \log_{2}\bigl(1 + 2 \sqrt{p (1 - p)}\bigr) \bigr].

Thus, after simplifying, we have Proakis (2007, Eq. (6.8-29))

R0=1log2(1+2p(1p)).\boxed{ R_0 = 1 - \log_{2}\bigl(1 + 2 \sqrt{p (1 - p)}\bigr). }

Recognizing the Bhattacharyya parameter Δ=2p(1p)=4p(1p)\Delta = 2 \sqrt{p (1 - p)} = \sqrt{4 p (1 - p)}, we can write:

R0=1log2(1+Δ).R_0 = 1 - \log_{2}\bigl(1 + \Delta\bigr).

An alternative form:

R0=log2[21+4p(1p)]=log2[11+Δ]+1,R_0 = \log_{2} \Bigl[ \frac{2}{1 + \sqrt{4 p (1 - p)}} \Bigr] = \log_{2} \Bigl[ \frac{1}{1 + \Delta} \Bigr] + 1,

and both expressions are completely equivalent.

Comparing R0R_0 to the Capacity CC

For the BSC, the capacity is

C=1Hb(p)whereHb(p)=plog2(p)(1p)log2(1p).C = 1 - H_{b}(p) \quad \text{where} \quad H_{b}(p) = - p \log_{2}(p) - (1 - p) \log_{2}(1 - p).

It can be shown that R0CR_0 \leq C for all 0p10 \le p \le 1, with equality holding only at p=0p=0 or p=1p=1 (the trivially error-free or fully corrupted channel).

At those extremes, both R0R_0 and CC reach 1 (or effectively 0 for an entirely useless channel).

R0R_0 and CC as Functions of γb\gamma_b

Consider a practical scenario where the BSC arises from a binary phase-shift keying (BPSK) transmission over an AWGN channel, followed by a hard-decision (binary) quantizer in the receiver. Let:

Define the bit-energy-to-noise-density ratio γb=EbN0\gamma_b = \frac{\mathcal{E}_b}{N_0}. Recall that Ec=RcEb\mathcal{E}_c = R_c \mathcal{E}_b when each symbol carries RcR_c information bits.

Then:

Example: R0R_0 for BPSK over AWGN (Without Quantization)

This example is based on [Proakis, Example 6.8-3].

In a continuous AWGN channel, we have:

Derivation of R0R_0.

We define

s(y)=x{Ec,Ec}12p(yx)=12[1πN0e(y+Ec)22N0+1πN0e(yEc)22N0].\begin{split} s(y) &= \sum_{x \in \{-\sqrt{\mathcal{E}_c}, \sqrt{\mathcal{E}_c}\}} \frac{1}{2} \sqrt{p(y \mid x)} \\ &= \frac{1}{2} \Bigl[ \sqrt{\frac{1}{\pi N_0}} e^{-\frac{( y + \sqrt{\mathcal{E}_c} )^2}{2 N_0}} + \sqrt{\frac{1}{\pi N_0}} e^{-\frac{( y - \sqrt{\mathcal{E}_c} )^2}{2 N_0}} \Bigr]. \end{split}

Simplifying the factor 1πN0\sqrt{\frac{1}{\pi N_0}},

s(y)=1πN0[e(y+Ec)24N0+e(yEc)24N0].s(y) = \frac{1}{\sqrt{\pi N_0}} \Bigl[ e^{-\frac{(y + \sqrt{\mathcal{E}_c})^2}{4 N_0}} + e^{-\frac{(y - \sqrt{\mathcal{E}_c})^2}{4 N_0}} \Bigr].

Square and integrate s(y)s(y) over all yRy\in\mathbb{R}:

s(y)2=1πN0[e(y+Ec)22N0+2ey2+Ec2N0+e(yEc)22N0].s(y)^2 = \frac{1}{\pi N_0} \Bigl[ e^{-\frac{(y + \sqrt{\mathcal{E}_c})^2}{2 N_0}} + 2 e^{-\frac{y^2 + \mathcal{E}_c}{2 N_0}} + e^{-\frac{(y - \sqrt{\mathcal{E}_c})^2}{2 N_0}} \Bigr].

Hence,

s(y)2dy=1πN0[e(y+Ec)22N0+2ey2+Ec2N0+e(yEc)22N0]dy.\int_{-\infty}^{\infty} s(y)^2 dy = \frac{1}{\pi N_0} \int_{-\infty}^{\infty} \Bigl[ e^{-\frac{(y + \sqrt{\mathcal{E}_c})^2}{2 N_0}} + 2 e^{-\frac{y^2 + \mathcal{E}_c}{2 N_0}} + e^{-\frac{(y - \sqrt{\mathcal{E}_c})^2}{2 N_0}} \Bigr] dy.

Each integral is a standard Gaussian form. Specifically:

Accounting for the factor 1πN0\frac{1}{\pi N_0} outside:

s(y)2dy=1πN0[22πN0+2eEc2N02πN0].\int_{-\infty}^{\infty} s(y)^2 dy = \frac{1}{\pi N_0} \Bigl[ 2 \sqrt{2 \pi N_0} + 2 e^{-\frac{\mathcal{E}_c}{2 N_0}} \sqrt{2 \pi N_0} \Bigr].

However, the simpler, most direct approach (often found in standard references) leads to:

2+2eEcN02 + 2 e^{-\frac{\mathcal{E}_c}{N_0}}

as the final integrated sum (all standard references show the result as: s(y)2dy=2+2eEcN0.\int_{-\infty}^{\infty} s(y)^2 dy = 2 + 2 e^{-\frac{\mathcal{E}_c}{N_0}}. )

Thus, we have

s(y)2dy=2[1+eEcN0].\int_{-\infty}^{\infty} s(y)^2 dy = 2 \bigl[1 + e^{-\frac{\mathcal{E}_c}{N_0}}\bigr].

(This is the precise, commonly cited result.)

Substitute into

R0=2log2(2)log2[2(1+eEcN0)].R_0 = 2 \log_{2}(2) - \log_{2}\Bigl[ 2 \bigl(1 + e^{-\frac{\mathcal{E}_c}{N_0}}\bigr) \Bigr].

We find:

R0=2[1+log2(1+eEcN0)]=log2[42(1+eEcN0)]=log2[21+eEcN0].R_0 = 2 - \bigl[ 1 + \log_{2}\bigl(1 + e^{-\frac{\mathcal{E}_c}{N_0}}\bigr) \bigr] = \log_{2} \Bigl[ \frac{4}{2 \bigl(1 + e^{-\frac{\mathcal{E}_c}{N_0}}\bigr)} \Bigr] = \log_{2} \Bigl[ \frac{2}{1 + e^{-\frac{\mathcal{E}_c}{N_0}}} \Bigr].

Equivalently, we have Proakis (2007, Eq. (6.8-35))

R0=1log2[1+eEcN0].\boxed{ R_0 = 1 - \log_{2}\bigl[1 + e^{-\frac{\mathcal{E}_c}{N_0}}\bigr]. }

Recognizing Δ=eEcN0\Delta = e^{-\frac{\mathcal{E}_c}{N_0}}, we again see the pattern:

R0=1log2(1+Δ).R_0 = 1 - \log_{2}\bigl(1 + \Delta\bigr).
Relating to Bit-Energy γb=EbN0\gamma_b = \frac{\mathcal{E}_b}{N_0}.

When each symbol has energy Ec\mathcal{E}_c and carries RcR_c bits, then Ec=RcEb\mathcal{E}_c = R_c \mathcal{E}_b.

In the limit that the code rate RcR_c approaches R0R_0,

R0=1log2[1+exp(R0EbN0)].R_0 = 1 - \log_{2} \Bigl[ 1 + \exp\bigl(-\frac{R_0 \mathcal{E}_b}{N_0}\bigr) \Bigr].

(Or, if one sets γb=Eb/N0\gamma_b = \mathcal{E}_b / N_0, a similar implicit relationship appears.)

Comparison with Capacity.

The capacity CC for unquantized AWGN BPSK involves more sophisticated integrals (e.g., involving
p(yx)log2p(yx)p(y)dy\int p(y\mid x)\log_2\frac{p(y\mid x)}{p(y)} dy).

Numerically, it is always found that

R0<C,R_0 < C,

confirming that while R0R_0 gives the largest rate for which an exponential decay of the random-coding error probability can be straightforwardly shown, the true channel capacity CC is strictly higher.

References
  1. Proakis, J. (2007). Digital Communications (5th ed.). McGraw-Hill Professional.