- Research
- Open Access
- Published:

# Regularized linear discriminant analysis via a new difference-of-convex algorithm with extrapolation

*Journal of Inequalities and Applications*
**volume 2023**, Article number: 90 (2023)

## Abstract

In this paper, we transform the classical linear discriminant analysis (LDA) into a smooth difference-of-convex optimization problem. Then, a new difference-of-convex algorithm with extrapolation is introduced and the convergence of the algorithm is established. Finally, for a face recognition problem, the proposed algorithm achieves better classification performance compared with several current algorithms in the literature.

## 1 Introduction

Dimensionality reduction (DR) plays an important role in pattern recognition, and it has been studied extensively. Several kinds of DR methods are widely studied, such as principal component analysis (PCA) [6], linear discriminant analysis (LDA) [1], etc. Among them, LDA is a powerful tool for feature extraction and has been extensively studied, including multimodal DR [12], audiovisual speech recognition [8], and tensor extension on image representation [3, 11, 13]. In LDA the dimensionality is reduced from a *d*-dimensional space to an *h*-dimensional space (where \(h< d\)). LDA tries to find the optimal projection direction by maximizing the between-class variance while simultaneously minimizing the within-class variance in the projected space.

So far, there have been two major concerns for traditional LDA. First, the within-class scatter matrix is singular and cannot be inverted. Although one can use the generalized inverse instead, the estimate will be very unstable due to lack of observations. Second, high dimensionality makes direct matrix operation formidable, hence hindering the applicability of this method. To resolve the singularity problem, the authors [2] gave a regularized linear discriminant analysis (RLDA), i.e., added a multiple of identity matrix *γI* to the within-class scatter matrix \(\mathbf{S}_{w}\).

By introducing a slightly biased covariance estimate, not only the singularity problem, but also the stabilization of the sample covariance estimate is solved. However, the difficulties caused by high dimensional matrix direct operations have not been solved.

In this paper, motivated by [2] and [5], we transform the LDA into a smooth difference-of-convex optimization problem, a new difference-of-convex algorithm with extrapolation is introduced and the convergence of the algorithm is established. Furthermore, the proposed new RLDA can resolve the singularity problem. More importantly, the algorithm has great advantages in computation time and the number of iterations. Finally, we prove the convergence of this algorithm, and the new RLDA achieves better classification performance compared with some other algorithms for face recognition.

The article is organized as follows. In Sect. 2, we recall some useful notations and definitions. In Sect. 3, the new RLDA is given, and a new difference-of-convex algorithm with extrapolation is introduced to solve the new RLDA. Then the convergence of the generated subsequence is given. Numerical results are given in Sect. 4. Finally, Sect. 5 concludes this paper.

## 2 Notation and preliminaries

We now define the notation used in this paper. All vectors are column vectors. Given a training data set \(T= \{ (\mathbf{x}_{1}, l_{1} ), \ldots , ( \mathbf{x}_{m}, l_{m} ) \}\), where \(\mathbf{x}_{t} \in \mathbb{R}^{n}\) is the input and \(l_{t} \in \{1, \ldots , c\}\) is the corresponding label, \(t=1, \ldots , m\), we organize the *m* inputs by a matrix \(\mathbf{X}= (\mathbf{x}_{1}, \ldots , \mathbf{x}_{m} ) \in \mathbb{R}^{n \times m}\). Assume that the *i*th class contains \(m_{i}\) samples. Then \(\sum_{i=1}^{c} m_{i}=m\). Denote \(\overline{\mathbf{x}}_{i}\) as the mean of samples in the *i*th class and \(\overline{\mathbf{x}}\) as the center of the whole set of samples, that is, \(\overline{\mathbf{x}}_{i}= (1 / m_{i} ) \sum_{j=1}^{m_{i}} \mathbf{x}_{i j}\) and \(\overline{\mathbf{x}}=(1 / m) \sum_{l=1}^{m} \mathbf{x}_{l}\), where \(x_{ij}\) is the *j*th element in the *i*th class. Based on this, the following matrices are useful in the following analysis:

For \(\mathbf{w} \in \mathbb{R}^{n}\) and a sample \(\mathbf{x} \in \mathbb{R}^{n}\), \(\mathbf{w}^{\top} \mathbf{x}\) maps **x** into a 1-D vector. Generally, if \(\mathbf{W}= (\mathbf{w}_{1}, \ldots , \mathbf{w}_{d} ) \in \mathbb{R}^{n \times d}\) with \(d\leq n\), then \(\mathbf{W}^{\top} \mathbf{x}\) maps each \(\mathbf{x} \in \mathbb{R}^{n}\) into a *d*-dimensional space.

As a supervised dimensionality reduction method, LDA aims at finding the optimal transformation vector \(\mathbf{w}_{1}, \ldots , \mathbf{w}_{d}\), \(d \leq c-1\) that maximizes the Rayleigh coefficient

such that \(\mathbf{w}_{h}^{T} \mathbf{S}_{w} \mathbf{w}_{l}=0\), \(1 \leq l< h \leq d\). It is proved that (3) can be reduced to the following eigenvalue decomposition problem:

where \(\mathbf{S}_{w}\) is nonsingular, and \(\lambda \neq 0\). Since the rank of \(\mathbf{S}_{b}\) is at most \(c-1\), the number of extracted features is less than or equal to \(c-1\).

## 3 New DC algorithm for the new form of RLDA

As mentioned before, the classical LDA requires that \({\mathbf{S}}_{w}\) is nonsingular. In addition, high dimensionality makes direct matrix operation formidable, hence hindering the applicability of the method proposed in [2].

To solve such a problem, RLDA adds a multiple of identity matrix *γI* to the within-class scatter matrix \(\mathbf{S}_{w}\). It is noted that the regularization parameter *γ* is larger than zero. The corresponding objective function and eigenvalue decomposition problem become

and

For (5), we transform a fraction problem into a difference-of-convex problem, and a new difference-of-convex algorithm with extrapolation is proposed to solve the problem efficiently. We construct a new form of RLDA with a minus operator as follows. In this paper, we only consider the binary classification case, i.e., \(d=1\). The formulation is given as follows:

where *λ* is a positive tuning parameter and *γ* is a positive regularization parameter.

The geometric interpretation of problem (7) is clear. Optimizing the first term of (7) means maximizing the scatter between class scatter, which forces the data points from different classes to be as far as possible; whereas minimizing the second term of (7) makes the within-class scatter as small as possible. The third term is the regularization item and it can be avoided if \({\mathbf{S}}_{w}\) is nonsingular.

It is obvious that (7) is a smooth difference-of-convex optimization problem. For this problem, we proposed a new DC algorithm with extrapolation.

To move on, let

Then \(f({\mathbf{w}})=g({\mathbf{w}})-h({\mathbf{w}})\).

Obviously, \(g(\mathbf{w})\) and \(h({\mathbf{w}})\) are smooth convex functions. Motivated by [4, 7], we introduce a new DC algorithm with extrapolation to find stationary points of the smooth problem \(f({\mathbf{w}})\).

We note from

that \(\mathbf{w}^{t+1}\) is the global minimizer of a strongly convex function.

In this algorithm, motivated by [7], we set

In what follows, we prove a global subsequential convergence result of Algorithm 1, which is applied to solving (7).

### Theorem 1

(*Global subsequential convergence*) *Let* \(\{{\mathbf{w}}^{t}\}\) *be a sequence generated by the DC algorithm with extrapolation for solving* (7). *Then the following statements hold*.

(i) *The sequence* \(\{{\mathbf{w}}^{t}\}\) *is bounded*.

(ii) \(\lim_{t\rightarrow \infty}||{\mathbf{w}}^{t+1}-{\mathbf{w}}^{t}||=0\).

(iii) *The accumulation point of* \(\{{\mathbf{w}}^{t}\}\) *is a stationary point of* (7).

### Proof

First we prove (i). We note from (8) that \({\mathbf{w}}^{t+1}\) is the global minimizer of a strongly convex function. Using this and comparing the objective values of this strongly convex function at \({\mathbf{w}}^{t+1}\) and \({\mathbf{w}}^{t}\), we see immediately that

Then we have

where the first inequality follows from the convexity of \(h({\mathbf{w}})\), the second inequality follows from (9), the third inequality follows from the fact that ∇*h* is Lipschitz continuous with a modulus of \(L>0\). Now, invoking the definition of \(\overline{\mathbf{w}}\), we obtain further from (10) that

Consequently, we have upon rearranging terms that

Since \(\{\beta _{t} \} \subset [0,\frac{1}{2})\), we deduce from (12) that the sequence

is nonincreasing. This together with the fact that \({\mathbf{w}}^{0}={\mathbf{w}}^{-1}\) gives

for all \(t \geq 0\), which shows that \(\{{\mathbf{w}}^{t} \}\) is bounded. This proves (i).

Next we prove (ii). Summing both sides of (12) from \(t=0\) to ∞, we obtain that

Since \(\sup{\beta _{t}}<\frac{1}{2}\), we deduce immediately from the above relation that

This proves (ii).

Finally, let \({\mathbf{w}}^{*}\) be an accumulation point of \(\{{\mathbf{w}}^{t} \}\) and let \(\{{\mathbf{w}}^{t_{i}} \}\) be a subsequence such that \(\lim_{i\rightarrow \infty}{\mathbf{w}}^{t_{i}}={\mathbf{w}}^{*}\). Then, from the first-order optimality condition of subproblem (8), we have

Using this together with the fact that \(\overline{\mathbf{w}}^{t_{i}}={\mathbf{w}}^{t_{i}}+\beta _{t_{i}} ({ \mathbf{w}}^{t_{i}}-{\mathbf{w}}^{t_{i}-1} )\), we obtain further that

In addition, \(\Vert {\mathbf{w}}^{t_{i}+1}-{\mathbf{w}}^{t_{i}} \Vert \rightarrow 0\) from (ii) together with the continuity of ∇*g* and of ∇*h*, we have upon passing to the limit in (13) that

This completes the proof. □

## 4 Numerical examples

In this section, experimental results are given to evaluate the performance of the proposed method. Several related DR methods, including RLDA, RSLDA [5], and PDCA [7], are recalled for comparison. For RLDA and RSLDA [5], parameters *ρ* and *λ* are selected from \(\{0.1, 0.5, 1, 5, 10\}\) and \(\{0.1, 0.5, 1, 5, 10\}\), respectively. We choose *δ* for RSLDA from \(\{0.01, 0.05, 0.1, 0.5, 1, 5\}\). For our method, parameters *λ* and *γ* are selected from \(\{0.1, 0.5, 1, 5, 10, 50, 100\}\) and \(\{0.01, 0.05, 0.1, 0.5,0.6,0.7,0.8,0.9\}\), respectively. Note that the parameters for all the methods are optimally selected from their own sets. Numerical experiments are done in Matlab R2018b on a laptop computer with Intel(R), Core(TM), CPU i7-8550U @ 1.80GHz, and 16 GB memory running Microsoft Windows 10.

To show the effectiveness of the proposed method, we focus on testing the proposed algorithm based on human face datasets that are included in FERET and ORL. The FERET dataset includes 200 persons’ images, and each one has 7 different images. Each image is \(80\times 80\) with 256 grayscale levels per pixel. The ORL dataset includes 40 individuals’ face images, and each face has 10 different images. Each image is \(112\times 92\) with 256 grayscale levels per pixel. Figure 1 and Fig. 2 show partial sample faces of FERET and ORL databases. A random subset with \(p(2, 3,\ldots , 10)\) images per subject is taken to form the training set, while the rest of the data comprise the test set. For each given *p*, the average result over ten random splits is considered. Figure 1 and Fig. 2 show partial sample faces from FERET and ORL databases as follows.

The classification accuracy is used as an indicator to test the performance of the methods. The corresponding numerical results of the algorithms are listed in Tables 1 and 2 respectively. Here, “iter” denotes the number of iterations, time is measured in seconds, and “tnr” denotes the classification accuracy.

It can be easily seen from Table 1 and Table 2 that the proposed Algorithm 1 requires fewer iterations and less computing time to achieve higher accuracy than both RSLDA and RLDA. Furthermore, we show the relationship between reduced dimension and classification accuracies in Fig. 3. From the figure, we can see that for all of the methods, the accuracies all have ascending trends in terms of the increase of reduced dimensions in general. Algorithm 1 is more superior.

## 5 Conclusions

In this paper, a new RLDA is proposed. A new DC algorithm with extrapolation is introduced for a smooth DC problem, and the convergence of this algorithm is given. Numerical results show that the proposed algorithm achieves better classification performance compared with current algorithms for face recognition. In the future, we may consider several more practical applications of RLDA in optimal control and so on [9, 10].

## Availability of data and materials

The data used to support the findings of this study are available from the corresponding author upon request.

## References

Fukunaga, K.: Introduction to Statistical Pattern Recognition, 2nd edn. Academic Press, New York (1991)

Guo, Y., Hastie, T., Tibshirani, R.: Regularized discriminant analysis and its application in microarrays. Pattern Recognit.

**8**(1), 86–100 (2003)Huang, R., Liu, C., Zhou, J.: Discriminant analysis via jointly \(L_{2,1}\)-norm sparse tensor preserving embedding for image classification. J. Vis. Commun. Image Represent.

**47**, 10–22 (2017)Le Thi, H.A., Pham Dinh, T., Muu, L.D.: Numerical solution for optimization over the efficient set by D.C. optimization algorithms. Oper. Res. Lett.

**19**(3), 117–128 (1996)Li, C.N., Shao, Y.H., Yin, W., Liu, M.Z.: Robust and sparse linear discriminant analysis via an alternating direction method of multipliers. IEEE Trans. Neural Netw. Learn. Syst.

**31**(3), 915–926 (2019)Turk, M., Pentland, A.: Eigenfaces for recognition. J. Cogn. Neurosci.

**3**(1), 71–86 (1991)Wen, B., Chen, X., Pong, T.K.: A proximal difference-of-convex algorithm with extrapolation. Comput. Optim. Appl.

**69**(2), 297–324 (2018)Zeiler, S., Nicheli, R., Ma, N., Brown, G.J., Kolossa, D.: Robust audiovisual speech recognition using noise-adaptive linear discriminant analysis. In: Proc. IEEE Int. Conf. Acoust., Speech Signal Process, pp. 2797–2801 (2016)

Zhang, X., Wang, T.: Elastic and reliable bandwidth reservation based on distributed traffic monitoring and control. IEEE Trans. Parallel Distrib. Syst.

**33**(12), 4563–4580 (2022). https://doi.org/10.1109/TPDS.2022.3196840Zhang, X., Wang, Y., Geng, G., Yu, J.: Delay-optimized multicast tree packing in software-defined networks. IEEE Trans. Serv. Comput.

**16**(1), 261–275 (2021). https://doi.org/10.1109/TSC.2021.3106264Zhang, Z., Chow, W.S.: Tensor locally linear discriminative analysis. IEEE Signal Process. Lett.

**18**(11), 643–646 (2011)Zhang, Z., Zhao, M., Chow, T.W.S.: Constrained large margin local projection algorithms and extensions for multimodal dimensionality reduction. Pattern Recognit.

**45**(12), 4466–4493 (2012)Zhao, J., Shi, L., Zhu, J.: Two-stage regularized linear discriminant analysis for 2-D data. IEEE Trans. Neural Netw. Learn. Syst.

**26**(8), 1669–1681 (2015)

## Acknowledgements

The authors are very grateful to the reviewers for several valuable and helpful comments, suggestions, and questions, which helped to improve the paper into its present form.

## Funding

This work was supported by the Natural Science Foundation of China (12071249, 12071250) and Shandong Provincial Natural Science Foundation of Distinguished Young Scholars (ZR2021JQ01).

## Author information

### Authors and Affiliations

### Contributions

All authors contributed equally to this work. All authors read and approved the final manuscript.

### Corresponding author

## Ethics declarations

### Competing interests

The authors declare no competing interests.

## Additional information

### Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

## Rights and permissions

**Open Access** This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

## About this article

### Cite this article

Wang, C., Wang, W. & Li, M. Regularized linear discriminant analysis via a new difference-of-convex algorithm with extrapolation.
*J Inequal Appl* **2023**, 90 (2023). https://doi.org/10.1186/s13660-023-03001-4

Received:

Accepted:

Published:

DOI: https://doi.org/10.1186/s13660-023-03001-4

### Mathematics Subject Classification

- 15A18
- 15A69

### Keywords

- Regularized linear discriminant analysis
- Difference-of-convex
- Face recognition