1 Introduction
Function estimation by using shrinkage estimators in the wavelet domain was initiated in the works of David Donoho and Iain Johnstone in the early 1990’s. Since then various shrinkage methods have been proposed. In this article, we propose and investigate a novel approach for estimation in the wavelet domain. This approach brings together the classic principle of Γminimaxity, Huber’s famous ϵcontamination class [23], mathematics of decision theory, aspects of multiple branches of pure mathematics, and then, also, detailed experimental and graphical analysis of the performance of our proposal. In these ways, this article is a new comprehensive analysis of the problem of estimation in the wavelet domain.
For the rest of this introductory section, we review fundamentals of Γminimax estimation, wavelet shrinkage, and Bayesian approaches to wavelet shrinkage. This section is mostly expository and gives background and the needed motivation.
1.1 ΓMinimax Theory
The Γminimax paradigm, originally proposed by Robbins [29], deals with the problem of selecting decision rules in tasks of statistical inference. The Γminimax approach falls between the Bayes paradigm, which selects procedures that work well “on average a posteriori” for a specific prior, and the minimax paradigm, which guards against least favorable outcomes, however unlikely. This approach has evolved from seminal papers in the fifties [29, 21] and early sixties, through an extensive research on foundations and parametric families in the seventies, to a branch of Bayesian robustness theory, in the eighties and nineties. A comprehensive discussion of the Γminimax approach can be found in [5, 6]. Also see [8], a fundamental contribution.
The Γminimax paradigm incorporates the prior information about the statistical model by a family of plausible priors, denoted by Γ, rather than by a single prior. Elicitation of “prior families” is often encountered in practice. Given the family of priors, the decision maker selects a rule that is optimal with respect to the least favorable prior in the family.
We note that least favorable priors typically do not exist in unbounded parameter spaces. For example, there is no least favorable prior for estimation of unconstrained normal or Poisson means. There is a least favorable prior for estimating a binomial parameter, and this is a compact parameter space. But no amount of intuition can predict the least favorable prior for the binomial. And, as soon as the loss is changed to a normalized squared error, the least favorable prior changes. There are no links or intuition regarding least favorable priors that are portable. It is too delicate and problem specific. For example, in a quite general setup, Clarke and Barron in [13] show that Jeffreys’ prior is asymptotically least favorable when the loss is KullbackLeibler distance, thus providing theoretical justification for reference priors.
Inference of this kind is often interpreted in terms of game theory. Formally, if $\mathcal{D}$ is a set of all decision rules and Γ is a family of prior distributions over the parameter space Θ, then a rule ${\delta ^{\ast }}\in \mathcal{D}$ is Γminimax if
where $r(\pi ,\delta )={E^{\theta }}\left[{E^{X\theta }}\mathcal{L}(\theta ,\delta )\right]={E^{\theta }}R(\theta ,\delta )$ is the Bayes risk under the loss $\mathcal{L}(\theta ,\delta )$. Here $R(\theta ,\delta )={E_{\theta }^{X\theta }}\mathcal{L}(\theta ,\delta )$ denotes the frequentist risk of δ, ${\pi ^{\ast }}$ is the least favorable prior, and $\mathcal{L}(\theta ,\delta )$ is the loss function, often the squared error loss, ${(\theta \delta )^{2}}$. Note that when Γ is the set of all priors, the Γminimax rule coincides with the minimax rule, if a minimax rule exists; when Γ contains a single prior, then the Γminimax rule coincides with Bayes’ rule with respect to that prior. When the decision problem, viewed as a statistical game, has a value, that is, when ${\inf _{\delta \in \mathcal{D}}}{\sup _{\pi \in \Gamma }}\equiv {\sup _{\pi \in \Gamma }}{\inf _{\delta \in \mathcal{D}}}$, then the Γminimax solution coincides with the Bayes rule with respect to the least favorable prior. For the interplay between the Γminimax and Bayesian paradigms, see [6]. A review on Γminimax estimation can be found in [32].
(1.1)
\[ \underset{\delta \in \mathcal{D}}{\inf }\underset{\pi \in \Gamma }{\sup }r(\pi ,\delta )=\underset{\pi \in \Gamma }{\sup }r(\pi ,{\delta ^{\ast }})=r({\pi ^{\ast }},{\delta ^{\ast }}),\]In a nutshell, the Γminimax is a philosophical compromise between subjective Bayes with a single prior and the conservative approach of not having a prior at all.
1.2 Wavelet Shrinkage
We apply the Γminimax approach to the classical nonparametric regression problem
where ${t_{i}}$, $i=1,\dots ,n$, are deterministic equispaced design points on $[0,1]$, the random errors ${\varepsilon _{i}}$ are i.i.d. standard normal random variables, and the noise level ${\sigma ^{2}}$ may, or may not, be known. The interest is to recover the function f from the observations ${y_{i}}$. Additionally, we assume that the unknown signal f has a bounded ${L^{2}}$energy, namely ${\textstyle\int _{[0,1]}}\hspace{0.1667em}{f^{2}}(t)dt$, and hence it assumes values from a bounded interval. After applying a linear and orthogonal wavelet transform, the model in (1.2) becomes (see, e.g., [16]),
where ${d_{j,k}}$ (${c_{jk}}$), ${\theta _{j,k}}$ and ${\epsilon _{j,k}}$ are the wavelet (scaling) coefficients (at resolution level j and location k) corresponding to y, f and ε, respectively; ${J_{0}}$ and $J1$ are the coarsest and finest levels of detail in the wavelet decomposition. If ϵ’s are i.i.d. standard normal, an arbitrary wavelet coefficient from (1.3) can be modeled as
where, due to (approximate) independence of the coefficients, we omitted the indices j, k. The prior information on the energy bound of the signal energy implies that a wavelet coefficient θ corresponding to the signal part in (1.2) assumes its values in a bounded interval, say $\Theta =[m(j),m(j)]$, which depends on the level j.
(1.3)
\[\begin{array}{r@{\hskip10.0pt}c@{\hskip10.0pt}l}\displaystyle {c_{{J_{0}},k}}& \displaystyle =& \displaystyle {\theta _{{J_{0}},k}}+\sigma {\epsilon _{{J_{0}},k}},\hspace{2.5pt}\hspace{2.5pt}k=0,\dots ,{2^{{J_{0}}}}1,\\ {} \displaystyle {d_{j,k}}& \displaystyle =& \displaystyle {\theta _{j,k}}+\sigma {\epsilon _{j,k}},\hspace{2.5pt}\hspace{2.5pt}j={J_{0}},\dots ,J1,\\ {} & & \displaystyle k=0,\dots ,{2^{j}}1,\end{array}\]Wavelet shrinkage rules have been extensively studied in the literature, but mostly when no additional information on the parameter space Θ is available. For the implementation of wavelet methods in nonparametric regression problems, we also refer to [4], where some methods are described and numerically compared.
1.3 Bayesian Model in the Wavelet Domain
Bayesian shrinkage methods in wavelet domains have received considerable attention in recent years. Depending on a prior, Bayes’ rules are shrinkage rules. The shrinkage process is defined as follows: A shrinkage rule is applied in the wavelet domain and the observed wavelet coefficients d are replaced by with their shrunken versions $\hat{\theta }=\delta (d)$. In the subsequent step, by the inverse wavelet transform, coefficients are transformed back to the domain of original data, resulting in a data smoothing. The shape of the particular rule $\delta (\cdot )$ influences the denoising performance.
Bayesian models on the wavelet coefficients have proved capable of incorporating some prior information about the unknown signal, such as smoothness, periodicity, sparseness, selfsimilarity and, for some particular bases (e.g., Haar), monotonicity.
The shrinkage is usually achieved by eliciting a single prior distribution π on the space of parameters Θ, and then choosing an estimator $\hat{\theta }=\delta (d)$ that minimizes the Bayes risk with respect to the adopted prior.
It is well known that most of the noiseless signals encountered in practical applications have (for each resolution level) empirical distributions of wavelet coefficients centered around zero and peaked at zero. A realistic Bayesian model that takes into account this prior knowledge should consider a prior distribution for which the prior predictive distribution produces a reasonable agreement with observations. A realistic prior distribution on the wavelet coefficient θ is given by
where ${\delta _{0}}$ is a point mass at zero, ξ is a symmetric distribution on the parameter space Θ and ϵ is a fixed number in $[0,1]$, usually level dependent, that regulates the amount of shrinkage for values of d close to 0. Priors for wavelet coefficients as in (1.5) have been considered by Vidakovic and Ruggeri in [33]. Prior literature on wavelet shrinkage by using Bayesian ideas includes important work by [12, 14, 30, 19], among others. A review by Reményi and Vidakovic in [28] overviews a range of Bayesian wavelet shrinkage methods and provides numerous references on Bayesian wavelet shrinkage strategies.
Our approach here is different from all the prior literature. Here is the motivation.
It is clear that specifying a single prior distribution π on the parameter space Θ can never be done exactly. Indeed the prior knowledge of real phenomena always contains uncertainty and multitude of prior distributions can match the prior belief, meaning that on the basis of the partial knowledge about the signal, it is possible to elicit only a family of plausible priors, Γ. In a robust Bayesian point of view the choice of a particular rule δ should not be influenced by the choice of a particular prior, as long as it is in agreement with our prior belief. Several approaches have been considered for ensuring the robustness of a specific rule, Γminimax being one compromise.
In this paper, we incorporate prior information on the boundedness of the energy of the signal (the ${L_{2}}$norm of the regression function). The prior information on the energy bound often exists in reallife problems, and it can be modelled by the assumption that the parameter space Θ is bounded. Estimation of a bounded normal mean has been considered in numerous papers, e.g., [20, 7, 11, 15, 9, 17], and in the classic text of Ibragimov and Hasminskii [24].
It is well known that estimating a bounded normal mean represents a challenging problem. In our context, if the structure of the prior (1.5) can be supported by the analysis of the empirical distribution of the wavelet coefficients, even then precise elicitation of the distribution ξ cannot be done without some kind of approximation. Any symmetric distribution supported on the bounded set, say $[m,m]$, can be a possible candidate for ξ.
Let Γ denote the family
where ${\Gamma _{S[m,m]}}$ is the class of all symmetric distributions supported on $[m,m]$, ${\delta _{0}}$ is point mass at zero, and ϵ is a fixed constant between 0 and 1. We also require that the distribution ξ does not have an atom at 0.
We consider two models; both assume that wavelet coefficients follow normal distribution (which is a statement about the distribution of the noise), $d\sim \mathcal{N}(\theta ,{\sigma ^{2}})$. In the Model I, the variance of the noise is assumed to be known, while in the Model II the variance is not known and is given a prior distribution. We have kept treatment of Model II limited. Detailed treatment of Model II would take too much additional space.
The rest of the paper is organized as follows. Section 2 contains mathematical aspects and results concerning the Γminimax rules. An exact risk analysis of the rule is discussed in Section 3. Section 4 proposes methods of elicitation of hyperparameters defining the model. For example, one can imagine “estimating” ϵ and even m itself as hyperparameters by using simulated data. But it is going to depend on the run and none of the theoretical results of our paper would then stand. It can be the topic of an entirely different paper. Performance of the shrinkage rule in the wavelet domain and application to a data set are given in Section 5. In Section 6 we provide the conclusions. Proofs are mostly deferred to Appendix A and B.
2 Characterizing ΓMinimax Rules
In this section, we discuss the existence and characterization of Γminimax shrinkage rules that are Bayes’ with respect to least favorable priors on the interval $[m,m]$. It will turn out that there exists a critical ${m^{\ast }}$ such that for $m\le {m^{\ast }}$, the least favorable prior is a three point prior, that is it assigns mass $1\epsilon $ at 0, and mass $\epsilon /2$ at the two boundary points $\pm m$. It is in this way that three point priors enter into our wavelet shrinkage analysis. We look at two scenarios, when the variance of the noise is known (Model I), and when it is not known (Model II). For reasons of space, we limit our treatment of Model II to just a short exposition.
We emphasize that we treat the case of Γminimax estimation in very general compact convex sets in general finite dimensional Euclidean spaces below. By giving a complete and selfcontained derivation of the existence of a least favorable prior and the Γminimax estimator in such great generality when the parameter space is sufficiently small, we have given a useful unification of the Γminimax problem.
Figure 1
ΓMinimax Rule for Model I. Left: Rules for $m=3$ and $\epsilon =0.5,0.7$, and 0.9. Right: Rules for $\epsilon =0.9$ and $m=3,4$, and 5.
2.1 Model I
Theorem 1.

(a) Let ${\mathcal{S}_{0}}$ be a compact convex set in ${R^{p}}$ containing the origin $\mathbf{0}$ in its interior. Let $\mathbf{a}\in {R^{p}}$ and $b\gt 0$, define and let $\partial \hspace{0.1667em}{\mathcal{S}_{\mathbf{a},b}}$ denote the boundary of ${\mathcal{S}_{\mathbf{a},b}}$.Let $\mathbf{X}\sim \mathbf{f}(\mathbf{x}\hspace{0.1667em}\boldsymbol{\theta })$, a density in the pparameter exponential family, and consider estimation of $\boldsymbol{\theta }$ using squared error loss. Then, there exists a unique least favorable prior on ${\mathcal{S}_{\mathbf{a},b}}$ as well as $\partial \hspace{0.1667em}{\mathcal{S}_{\mathbf{a},b}}$.

(b) There exists ${b^{\ast }}\gt 0$ such that for $b\le {b^{\ast }}$, the least favorable prior on $\partial \hspace{0.1667em}{\mathcal{S}_{\mathbf{a},b}}$ is also the least favorable prior on ${\mathcal{S}_{\mathbf{a},b}}$.

(c) If in particular, $\mathbf{X}\sim {\mathbf{N}_{\mathbf{p}}}(\boldsymbol{\theta }$, $\mathbf{I})$, and θ belongs to a pdimensional ball centered at $\mathbf{0}$ and of radius b, then for $b\le {b^{\ast }}$, the least favorable prior in the ball is simply the uniform distribution on the boundary of the ball.

(d) In the onedimensional case, for the entire one parameter exponential family, if θ belongs to a compact interval $[u,v]$, the least favorable prior is necessarily finitely supported.

(e) In the one dimensional case that $X\sim N(\theta ,1)$, let where ϵ is fixed in $[0,1]$, and ξ is any distribution on $[m,m]$ without atoms at 0, that is, with no pointmassatzero. Then there exists ${m^{\ast }}\gt 0$ such that for $0\lt m\le {m^{\ast }}$ the least favorable prior is
Remark.
Figure 1 shows this shinkage rule in (2.2) for selected values of the parameters ϵ and m. Note that the rules heavily shrink small coefficients, but unlike traditional shrinkage rules, remain bounded between $m$ and m. The values of ${m^{\ast }}$ are given in Table 1.
Table 1
Values of ${m^{\ast }}$ for both models for different values of ϵ.
ϵ  ${m^{\ast }}$ (Model I)  ${m^{\ast }}$ (Model II) 
0.0  1.05674  0.81758 
0.1  1.15020  0.91678 
0.2  1.27739  1.05298 
0.3  1.46988  1.25773 
0.4  1.84922  1.52579 
0.5  2.28384  1.74714 
0.6  2.41918  1.91515 
0.7  2.50918  2.05511 
0.8  2.58807  2.19721 
0.9  2.69942  2.40872 
0.95  2.81605  2.63323 
0.99  3.10039  3.24539 
2.2 Model II
In Model II, the variance ${\sigma ^{2}}$ is not known and is given an exponential prior. It is wellknown that the exponential distribution is an entropy maximize in the class of all distributions supported on ${R^{+}}$ with a fixed first moment. This choice is noninformative, in a form of a maxent prior.
The model is:
\[\begin{array}{r@{\hskip10.0pt}c@{\hskip10.0pt}l}& & \displaystyle [d\theta ,{\sigma ^{2}}]\sim \mathcal{N}(\theta ,{\sigma ^{2}}),\\ {} & & \displaystyle [{\sigma ^{2}}]\sim \mathcal{E}(\mu ),\hspace{2.5pt}\hspace{2.5pt}\hspace{2.5pt}\hspace{2.5pt}\hspace{2.5pt}\left[f({\sigma ^{2}})=\frac{1}{\mu }\exp \left\{\frac{{\sigma ^{2}}}{\mu }\right\}\right]\end{array}\]
The marginal likelihood is double exponential as an exponential scale mixture of normals,
\[\begin{array}{r@{\hskip10.0pt}c@{\hskip10.0pt}l}\displaystyle [d\theta ]& \displaystyle \sim & \displaystyle \mathcal{DE}\left(\theta ,\sqrt{\frac{\mu }{2}}\right)\\ {} & & \displaystyle \left[g(d\theta ,\mu )=\sqrt{\frac{1}{2\mu }}\exp \left\{\sqrt{\frac{2}{\mu }}d\theta \right\}\right].\end{array}\]
Theorem 2.
If in Model II the family of priors on the location parameter is (2.1), the resulting Γminimax rule is:
which is Bayes with respect to the least favorable prior
(2.3)
\[ {\delta _{B}}(d)=\frac{m\left({e^{\sqrt{2/\mu }\hspace{2.5pt}dm}}{e^{\sqrt{2/\mu }\hspace{2.5pt}d+m}}\right)}{{e^{\sqrt{2/\mu }\hspace{2.5pt}dm}}+\frac{2\epsilon }{1\epsilon }{e^{\sqrt{2/\mu }\hspace{2.5pt}d}}+{e^{\sqrt{2/\mu }\hspace{2.5pt}d+m}}},\]
\[ [\theta ]\sim \pi (\theta )=\epsilon {\delta _{0}}+\frac{1\epsilon }{2}({\delta _{m}}+{\delta _{m}}),\]
whenever $m\le {m^{\ast }}$. Figure 2 shows the rule in (2.3) for selected values of parameters ϵ and m. The values of ${m^{\ast }}$ depend on ϵ and are given in Table 1 for both models. We remark that the model in Theorem 2 is similar in spirit to those in BAMS wavelet shrinkage [33] and Bayesian Lasso [27]. Double exponential distributions in the marginal likelihood (for BAMS) and prior (for Bayesian Lasso) are obtained as exponential scale mixtures of normals.
3 Risk, Bias, and Variance of ΓMinimax Rules
Frequentist risk of a rule δ, as a function of θ, can be decomposed as a sum of two functions, variance and biassquared,
\[\begin{array}{r@{\hskip10.0pt}c@{\hskip10.0pt}l}\displaystyle R(\delta ,\theta )& \displaystyle =& \displaystyle {E^{d\theta }}{(\delta (d)\theta )^{2}}\\ {} & \displaystyle =& \displaystyle {E^{d\theta }}{(\delta (d){E^{d\theta }}(\delta (d)))^{2}}\\ {} & & \displaystyle +{(\theta {E^{d\theta }}(\delta (d)))^{2}}.\end{array}\]
To explore the behavior of the two risk components in the context of Models I and II, we selected the risk of Γminimax rule for $\epsilon =0.8$ and $m=2.197$. (Fig. 3). This particular value of m ensures that the rules are Γminimax and $m={m^{\ast }}$ for model II. Note that δ in Model II shows a smaller risk for values of θ in the neighborhood of m, while for θ close to 0, the risk of the rule from Model I is smaller.
Figure 4
Risk components of Γminimax rule for $\epsilon =0.8$ and $m=2.197$. Dashed plots for Model I and solid for Model II. Left: Biassquared; Right: Variance.
The most interesting finding is the behavior of the risk function of the rule. The risk function has a wavy shape with maxima at the locations depending on m. Typically the risk is maximized at $\theta =m,m$. This is the most important finding in that now general minimax theory takes over and establishes that the proposed rule indeed is fully minimax and hence, also gammaminimax. Similar, but less pronounced behavior is present in biassquared function (Fig. 4 Left Panel). Compared to Model I, the variance of δ in Model II is significantly smaller for values of θ in the neighborhood of $\pm m$, and larger for θ in the neighborhood of zero. Preference in using either Model I or II depends on what size of signal part we are more interested in. If there is more uncertainty about signal bound m, the rule from Model II is preferable. However, Model I has a lower risk and both components of the risk are in the neighborhood of 0. This translates to a possibly more precise shrinkage of small wavelet coefficients.
4 Elicitation of Parameters
The proposed Bayesian shrinkage procedures with threepoint priors depend on three parameters, m, ϵ and μ, that need to be specified. The criteria used for selecting these parameters are critical for effective signal denoising. We propose these hyperparameters to be elicited in an empirical Bayes fashion, that is, dependent on the observed wavelet coefficients.

(1) Elicitation of m: The bound m in the domain of signal acquisition translates to leveldependent bounds on the parameter θ in the wavelet domain. Given a data signal $y=({y_{1}},\dots ,{y_{n}})$, the hyperparameter m at the jth multiresolution level is estimated as where $J={\log _{2}}n$ is the resolution level of the transformation, and $\hat{\sigma }$ an estimator of the noise size. The noise size is often estimated by a robust estimator of standard deviation that uses the wavelet coefficients at the finest multiresolution level,
(4.2)
\[ \hat{\sigma }=\frac{\text{median}({d_{J1,\bullet }}\text{median}({d_{J1,\bullet }}))}{0.6745},\] 
(2) Elicitation of ϵ: This parameter controls the amount of shrinkage in the neighborhood of zero and overall shape of shrinkage rule. For the levels of fine details, this parameter should be close to 1. Based on the proposal for the ϵ given by Angelini and Vidakovic in [3] and Sousa et al. in [31], we suggest a leveldependent ${\epsilon _{0}}$ as follows: where ${J_{0}}\le j\le J1$ and k and l are positive constants.As we indicated, ϵ should be close to one at the multiresolution levels of fine details, and then be decreasing gradually for levels approaching the coarsest level [3]. When k and l are large, ϵ remains close to one over all levels. This results in an almost noisefree reconstruction, but could result in oversmoothing. On the other hand, $l\gt 1$ guarantees a certain level of shrinkage even at the coarsest level. Thus, the hyperparameters l and k should be selected with a care in order to achieve good performance for a wide range of signals. Numerical simulations guided us to suggest values $l\ge 6$ and $k=2$ as reasonable choices. However, it is important to note that these parameters should depend on the smoothness of data signals and their size. We further discuss the selection of l and k for specific signals in Section 5.

(3) Elicitation of μ: This parameter is needed only for Model II. Since the prior on the noise level ${\sigma ^{2}}$ is exponential and the prior mean is μ, by simple moment matching we select μ as ${\hat{\sigma }^{2}}$. A possible choice for ${\hat{\sigma }^{2}}$ is a robust estimator as in (4.2).
5 Simulation Study
In the simulation study, we assessed the performance of the proposed shrinkage procedures on the battery of standard test signals. We used nine different test signals (step, wave, blip, blocks, bumps, heavisine, doppler, angles, and parabolas), which are constructed to mimic a variety of signals encountered in applications (Fig. 5). As standardly done in literature, Haar and Daubechies sixtap (Daubechies 6) were used for Blocks and Bumps and Symmlet 8tap filter was used for the remaining test signals. The shrinkage procedures are compared using the average mean square error (AMSE), as in (5.1). All simulations were performed using MATLAB software and toolbox GaussianWaveDen [4] that can be found at http://www.mas.ucy.ac.cy/~fanis/links/software.html.
We generated noisy data samples of the nine test signals by adding normal noise with zero mean and variance ${\sigma ^{2}}=1$. The signals were rescaled so that ${\sigma ^{2}}=1$ leads to a prescribed SNR. Each sample consisted of $n=1024$ data points equally spaced on the interval $[0,1]$. Figure 6 shows a noisy version of the nine test signals with SNR = 1/4. Each noisy signal was transformed into the wavelet domain. After the shrinkage was applied to the transformed signal, the inverse wavelet transform was performed on the processed coefficients to produce a smoothed version of a signal in the original domain. The AMSE was computed as
where f denotes the original test signal and $\hat{{f_{j}}}$ its estimator in the jth iteration. To calculate the average mean square error this process was repeated $N=100$ times.
(5.1)
\[ AMSE(f(t))=\frac{1}{nN}{\sum \limits_{j=1}^{N}}{\sum \limits_{i=1}^{n}}{\left(f({t_{i}}){\hat{f}_{j}}({t_{i}})\right)^{2}},\]Figure 6
Noisy versions of the nine signals from Fig. 5.
Figure 7
Change in average MSE(AMSE) as a function of hyperparameters k (left) and l (right) exemplified on the heavisine test signal, $SNR=1/5$, and $n=1024$.
The shrinkage procedure was applied to each test signal and the AMSE was computed for a range of parameter values of l and k. For example, Fig. 8 shows the average MSE obtained on the heavisine test signal when SNR = 1/5, and l and k vary in the range $l\in [2,15]$ and $k\in [1.0,3.5]$. As evident from Fig. 8, the estimator achieves its best performance for values $k\approx 2.4$ and $l\approx 5.8$. With these selected values of l and k, Fig. 7 shows that the estimator is sufficiently close to the original test signal, even though the SNR is quite small.
Based on our simulations, the optimal hyperparameter values of l and k varied depending on the nature (e.g., smoothness) of the test signal. For larger values of k and l, the estimator performs better for smooth signals. This is because the corresponding wavelet coefficients rapidly decay with the increase in resolution. However, larger values of l and k may not detect localized features in signals (e.g., cusps, discontinuities, sharp peaks), resulting in oversmoothing. For low values of SNR, the threepoint priors estimator is more sensitive to hyperparameter values of l and k. When SNR increases, the estimation method shows better performance for most of the test signals with relatively small values of l and k. Moreover, higher values of parameters l and k are preferred when the sample size is large.
In general, we suggest $k=2.5$ and $l\ge 6$ as the most universal choice. The suggested values could, however, be adjusted depending on available information about the nature of signals.
Figure 8
Estimation of heavisine test signal: Estimations obtained by threepoint priors with $n=1024$, $k=2.4$, $l=5.8$ and $SNR=1/5$.
5.1 Performance Comparison with Some Existing Methods
We compared the performance of the proposed threepoint prior estimator with eight existing estimation techniques. The selected existing estimation techniques include: Bayesian adaptive multiresolution shrinker [33] (BAMS), Decompsh [22], blockmedian and blockmean [1], hybrid version of the blockmedian procedure [1], blockJS [10], visushrink [16], and generalized cross validation [2]. The first five techniques are relying on a Bayesian procedure and they are based on leveldependent shrinkage. The blockJS method uses a leveldependent thresholding, while the visushrink and generalized crossvalidation techniques use a global thresholding method. Readers can find more details about these techniques in [4].
In the simulation study, we computed the AMSE using the parameter values of $l=6$ and $k=2.5$ and compared with the AMSE computed for the selected estimation techniques. As can be seen in Fig. 9, the proposed estimator shows comparable and for some signals better performance when compared to the selected estimation methods. In particular, for smooth signals (e.g., wave, heavisine), the threepoint prior estimator shows better performance compared to nonsmooth signals, such as blip, for instance. Moreover, when comparing the performance of the leveldependent estimation methods, the BAMS estimation method shows competitive (or better) performance for most of the cases. We also investigated the influence of SNR level and the sample size on the performance of proposed estimators, compared to the methods considered. For example, for higher SNR (Fig. 10), the threepoint priorsbased shrinkage procedure does not provide better performance except for wave, angels, and time shifted sine. In general, the Γminimax shrinkage shows comparable or better performance compared to other methods considered, when the SNR is low. Also, for larger sample sizes, the threepoint prior shrinkage shows improved performance.
Figure 9
The box plots of MSE for the ten estimation methods: (1) RuleI, (2) RuleII, (3) Bayesian adaptive multiresolution shrinker (BAMS), (4) Decompsh, (5) Blockmedian, (6) Blockmean, (7) Hybrid version of the blockmedian procedure, (8) BlockJS, (9) VisuShrink, and (10) Generalized crossvalidation. The MSE was computed by using $SNR=1/5$, $k=2.5$, $l=6$, and $n=1024$ data points.
Figure 10
The same plot as in Fig. 8 with SNR = 3.
6 Conclusions
We proposed a method for wavelet denoising of signals when prior information about the size of the signal is available. Simple, leveldependent shrinkage rules that are Bayes with respect to a prior supported on three discrete points are at the same time Gammaminimax, for a set of all priors with a bounded support symmetric about zero with a fixed point mass at zero. This statement is true when the signal is bounded on $[m,m]$, for small values of m, which in denoising applications translates to estimation of a signal with a low signaltonoise ratio. For larger values of m the least favorable prior is still discrete with support on a finite number of points symmetrically distributed about 0, but finding exact locations and prior weights of these points presents a challenging optimization problem. Thus, using the three point priors for high signaltonoise ratio noisy signals is suboptimal from the standpoint of minimaxity and numerical experiments show that in this case, the three point prior shrinkage rules are underperforming compared to other standard shrinkage methods. We considered two estimation scenarios: when the variance of the noise is known and when it is not known. In the first case a plugin robust estimator of noise variance is used, while in the second the variance is given an exponential prior, and subsequently integrated out. The main theoretical result is given in Theorem 1, that describes the form of Gammaminimax rules when parameter belongs to a very general convex set in finite dimensional Euclidean space. The two estimation scenarios proposed in this paper are special cases of Theorem 1.
As demonstrated by simulations on a battery of standard test functions, the performance of the proposed shrinkage rules in terms of AMSE is comparable, and for some signals superior, to the stateofart methods when SNR is low and the signals are smooth.