# A smooth approximation approach for optimization with probabilistic constraints based on sigmoid function

## Abstract

Many practical problems, such as computer science, communications network, product design, system control, statistics and finance, etc.,can be formulated as a probabilistic constrained optimization problem (PCOP) which is challenging to solve since it is usually nonconvex and nonsmooth. Effective methods for the probabilistic constrained optimization problem mostly focus on approximation techniques, such as convex approximation, D.C. (difference of two convex functions) approximation, and so on. This paper aims at studying a smooth approximation approach. A smooth approximation to the probabilistic constraint function based on a sigmoid function is analyzed. Equivalence of PCOP and the corresponding approximation problem are shown under some appropriate assumptions. Sequential convex approximation (SCA) algorithm is implemented to solve the smooth approximation problem. Numerical results suggest that the smooth approximation approach proposed is effective for optimization problems with probabilistic constraints.

## Introduction

It is well known that the probabilistic constrained optimization has been applied extensively to practical problems, such as computer science, communications network, product design, system control, statistics and finance, etc. For instance, a manufacturer needs to maximize its profit while satisfying demands from customers to make decisions on production and inventory. However, there is often uncertainty in the demands of the customers. To handle the uncertainty in the constraints, a natural approach is to require that all constraints be satisfied with a given high probability e.g. 90%. The resulted optimization problem is called a probabilistic constrained optimization problem (PCOP).

Consider the probabilistic constrained optimization problem as follows:

\begin{aligned} &\min_{x\in {X}} \quad h(x) \\ & {\mathrm{s. t.}}\quad \Pr \bigl\{ c_{1}(x,\xi )\leq 0,c_{2}(x,\xi )\leq 0, \ldots ,c_{m}(x,\xi )\leq 0\bigr\} \geq 1-\alpha , \end{aligned}
(PCOP)

where x is a d-dimensional vector of decision variables, ξ is a k-dimensional vector of uncertain parameters, and support of ξ, denoted as Ξ, is a closed subset of $$\Re ^{k}$$, X is a subset of $$\Re ^{d}$$, $$h:\Re ^{d}\rightarrow \Re$$ and $$c_{i}:\Re ^{d} \times \Xi \rightarrow \Re$$, $$i=1,\ldots ,m$$, are real-valued functions, and functions h and $$c_{i}$$, $$i=1,\ldots ,m$$, are convex and continuously differentiable in x for every $$\xi \in \Xi$$.

Let $$c(x,\xi )=\max \{c_{1}(x,\xi ),\ldots ,c_{m}(x,\xi )\}$$. Note that $$c(x,\xi )$$ is a convex function of x since $$c_{i}(x,\xi )$$, $$i=1,\ldots ,m$$, are all convex in x. Let

$$p(x)=1-\operatorname{Pr} \bigl\{ c_{1}(x,\xi )\leq 0,c_{2}(x,\xi )\leq 0,\ldots ,c_{m}(x, \xi )\leq 0 \bigr\} =\operatorname{Pr} \bigl\{ c(x,\xi )>0 \bigr\} .$$

Then $$p(x)$$ is the probability that at least one constraint is violated. Furthermore, the probabilistic constrained optimization problem (PCOP) is reformulated as follows:

\begin{aligned} & {\min_{x\in {X}}}\quad h(x) \\ & {\mathrm{s. t.}} \quad p(x)\leq \alpha . \end{aligned}

The literature on PCOPs can be dated back to Charnes et al. , who first considered a single probabilistic constrained optimization problem (SPCOP), and Miller and Wagner , who first considered a joint probabilistic constrained optimization problem (JPCOP). Since then, probabilistic constrained optimization has been studied extensively. PCOPs are generally challenging to solve. There are three major difficulties in solving a PCOP. First, the probabilistic constraint does not necessarily preserve the convexity. The set defined by the probabilistic constraint may not be convex even if all $$c_{i}$$ are convex. Second, it is difficult to evaluate the probabilistic constraint function, that is to say, there is no closed-form expression of the probabilistic constraint generally. Third, the probabilistic constraint does not necessarily preserve the smoothness.

Numerical methods for PCOP have attracted great attention in the field of optimization (see ).

Convex conservative approximations have been proposed e.g. the quadratic approximation , the Bernstein approximation , and the conditional value-at-risk (CVaR) approximation . These methods seek to find a convex subset of the (possible nonconvex) feasible set and find the optimal solution in the subset. To solve JPCOPs, probabilistic inequalities (e.g. Bonferroni’s inequality) have to be used to break a joint probabilistic constraint into multiple single probabilistic constraints, which often makes approximations even more conservative.

Monte Carlo simulation is often used to evaluate $$p(x)$$ when its closed-form is not available. Luedtke and Ahmed  studied the sample average approximation of the PCOPs. Moreover, the scenario approach is introduced by Calafiore and Campi [12, 13] and De Farias and Van Roy ), which solves the following problem:

\begin{aligned} &{\min_{x\in {X}}} \quad h(x) \\ & {\mathrm{s. t.}}\quad c_{i}(x,\xi _{l})\leq 0,\quad i=1,\ldots ,m, l=1,\ldots ,n, \end{aligned}

where $$\xi _{1},\xi _{2},\ldots ,\xi _{n}$$ are independent observations of ξ which are often generated from a Monte Carlo simulation. The critical issue is how to determine the sample size n to ensure the probability requirement. The new problem under the scenario approach is easier to solve because $$c_{i}$$ is convex, smooth, and easy to evaluate. Yang et al.  presented a scenario-based method to solve the chance constrained optimization for the nonconvex program.

Hong et al.  proposed an ε-approximation approach, which reformulated a PCOP into a D.C. (difference of convex) program, which is solved by an ε-approximation together with a sequence convex approximation (SCA) algorithm. Under some technical conditions, the solutions of the sequences of approximations converge to the set of Karush–Kuhn–Tucker (KKT) points of the PCOP. Based on the ε-approximation approach, Hu et al.  proposed a smooth Monte Carlo (SMC) approach. Shan et al.  proposed a class of smoothing functions to approximate the joint probabilistic constraint function, based on which smooth optimization problems are constructed to approximate PCOP. They show that the solutions of a sequence of smoothing approximations converge to a KKT point of PCOP under a certain asymptotic regime.

Owing to the nonsmoothness of the probabilistic constraint, we establish a smooth approximation technique based on sigmoid function, which has many desired properties. Moreover, the sequential convex approximation (SCA) algorithm is implemented to solve the smooth approximate problem. In each iteration of the SCA algorithm, a gradient-based Monte Carlo method is applied to solve a convex stochastic program. We prove that, under some assumptions, a KKT point of PCOP can be obtained by solving the sequence of convex programs.

The rest of this paper is organized as follows. In Sect. 2, we provide the smooth technique based on sigmoid function to approximate problem (PCOP) and show the desired convergence properties. Sequential convex approximation algorithm is discussed to solve the smoothing approximation problem. The numerical results are reported in Sect. 3, and some conclusions are presented in Sect. 4.

## Materials and methods

### Sigmoid approximation to PCOP

In this section, we use a sigmoid function to approximate a probabilistic constraint function, and obtain an approximation problem based on the sigmoid function.

#### Sigmoid function

Probabilistic constrained optimization problem (PCOP) can be reformulated as follows:

\begin{aligned} &{\lim_{x\in {X}}} \quad h(x) \\ & {\mathrm{s. t.}} \quad p(x)\leq \alpha , \end{aligned}
(P)

where $$p(x)=\operatorname{Pr}\{c(x,\xi )>0\}={\mathrm{E}}[\textbf{1}_{(0,+\infty )}(c(x, \xi ))]$$ and

$$\textbf{1}_{(0,+\infty )}(z):= \textstyle\begin{cases} 1& \text{if } z\in (0,+\infty ), \\ 0& \text{if } z\notin (0,+\infty ). \end{cases}$$

Consider the sigmoid function

$$\psi (z,\mu )=\frac{1}{1+e^{-\mu ^{-1}z}},$$

where $$\mu >0$$ is a parameter and $$\psi :\Re ^{2}\rightarrow \Re$$ is a real-valued function. It is obvious that $$\psi (z,\mu )\in (0,1)$$ for any $$z\in \Re$$.

### Proposition 2.1

For $$\mu >0$$, the sigmoid function has the following properties.

1. (i)

For any $$z\in \Re \backslash \{0\}$$, $$\lim_{\mu \searrow 0} \psi (z,\mu )=\textbf{1}_{(0,+\infty )}(z)$$.

2. (ii)

$$\psi (z,\mu )$$ is nondecreasing in z.

3. (iii)

$$\psi (z,\mu )$$ is monotone increasing in μ when $$z<0$$, $$\psi (z,\mu )$$ is monotone decreasing in μ when $$z>0$$.

4. (iv)

$$\psi (z,\mu )$$ is smooth with respect to z.

### Proof

(i) When $$z>0$$, we have

$$\lim_{\mu \searrow 0} \psi (z,\mu )= \lim_{\mu \searrow 0} \frac{1}{1+e^{-\mu ^{-1} z}}=1.$$

When $$z<0$$, we have

$$\lim_{\mu \searrow 0} \psi (z,\mu )= \lim_{\mu \searrow 0} \frac{1}{1+e^{-\mu ^{-1} z}}=0.$$

(ii) Differentiating the function $$\psi (z,\mu )$$ with respect to z yields

$$\frac{\partial }{\partial z}\psi (z,\mu )= \frac{e^{-\mu ^{-1}z}}{\mu (1+e^{-\mu ^{-1}z})^{2}}>0.$$

(iii) Differentiating the function $$\psi (z,\mu )$$ with respect to μ, we obtain

$$\frac{\partial }{\partial \mu }\psi (z,\mu )= \frac{-ze^{-\mu ^{-1}z}}{(\mu +\mu e^{-\mu ^{-1}z})^{2}}.$$

Obviously, $$\frac{\partial }{\partial \mu }\psi (z,\mu )<0$$, when $$z>0$$; $$\frac{\partial }{\partial \mu }\psi (z,\mu )>0$$, when $$z<0$$.

(iv) The result is obvious. □

We can observe that the sigmoid function $$\psi (z, \mu )$$ approximates the characteristic function $$\textbf{1}_{(0, +\infty )}(z)$$ when parameter $$\mu >0$$ is small enough(see Fig. 1).

#### Sigmoid approximation

Let

$$\widetilde{p}(x,\mu ):={\mathrm{E}} \bigl[\psi (z,\mu ) \bigr]={\mathrm{E}} \bigl[ \psi \bigl(c(x,\xi ), \mu \bigr) \bigr],$$

then we have $$\widetilde{p}(x)=\lim_{\mu \searrow 0} \widetilde{p}(x,\mu )$$. The approximation problem is built as follows:

Let Ω̃ and Ω denote the feasible sets of problem $$(\widetilde{\textrm{P}})$$ and problem (P), respectively.

Now we make the following assumptions.

### Assumption 1

The set X is a compact and convex subset of $$\Re ^{d}$$, and the support of ξ, denoted as Ξ, is a closed subset of $$\Re ^{k}$$. For any $$\xi \in \Xi$$, $$h(x)$$ and $$c_{i}(x,\xi )$$, $$i=1,\ldots ,m$$, are continuously differentiable and convex in x for any $$x\in \mathcal{O}$$, where $$\mathcal{O}$$ is a bounded open set such that $$X\subset \mathcal{O}$$.

### Assumption 2

Function $$c(x,\cdot )$$ is measurable for every $$x\in \Re ^{d}$$ and $$c(\cdot ,\xi )$$ is continuous for a.e. $$\xi \in \Xi$$.

### Assumption 3

For every $$\bar{x}\in X$$, the set $$\{\xi \in \Xi :c(\bar{x},\xi )=0\}$$ has P-measure zero i.e. $$c(\bar{x},\xi )\neq 0, w.p.1$$.

### Assumption 4

For any $$x\in \mathcal{O}$$, $$\operatorname{Pr} \{c_{i}(x,\xi )=c_{j}(x,\xi )\}=0$$, $$i \neq j$$ i.e. $$c(x,\xi )$$ is differentiable with respect to x w.p.1.

The following theorem suggests the equivalence between problem $$(\widetilde{\textrm{P}})$$ and problem (P).

### Theorem 2.2

Assume that Assumptions 14hold. Then, for any $$x\in X$$ and $$\mu >0$$,

$$\lim_{\mu \searrow 0}\widetilde{p}(x,\mu )= p(x).$$

### Proof

Let $$z=c(x,\xi )$$, it follows from Proposition 2.1 and the Lebesgue control convergence theorem that

\begin{aligned} \widetilde{p}(x)&=\lim_{\mu \searrow 0} \widetilde{p}(x,\mu )= \lim_{\mu \searrow 0} {\mathrm{E}}\bigl[\psi \bigl(c(x,\xi ),\mu \bigr)\bigr] \\ &=\lim_{\mu \searrow 0} \int _{\Xi }\psi \bigl(c(x,\xi ),\mu \bigr)\,dp( \xi ) \\ &= \int _{\Xi }\lim_{\mu \searrow 0} \psi \bigl(c(x,\xi ),\mu \bigr)\,dp( \xi ) \\ &= \int _{\Xi }\textbf{1}_{(0,+\infty )}\bigl(c(x,\xi )\bigr)\,dp( \xi ) \\ &={\mathrm{E}}\bigl[\textbf{1}_{(0,+\infty )}\bigl(c(x,\xi )\bigr)\bigr] \\ &=p(x). \end{aligned}

□

An important question for sigmoid approximation is how to choose parameter μ. Consider the parameterized approximation of problem $$(\widetilde{\textrm{P}})$$ as follows:

Denote $$\Omega _{\mu }:=\{x\in X:\widetilde{p}(x,\mu )\leq \alpha \}$$ as the feasible set of problem $$({\mathrm{\widetilde{P}_{\mu}}})$$. Let $$S_{\mu }$$ and $$\nu _{\mu }$$ be the set of optimal solutions and the optimal value of problem $$({\mathrm{\widetilde{P}_{\mu}}})$$, respectively. The following theorem describes the properties of problem $$({\mathrm{\widetilde{P}_{\mu}}})$$.

### Assumption 5

Let $$\Omega =\{x\in X: p(x)\leq \alpha \}$$ and $$\Omega ^{I}=\{x\in X:p(x)<\alpha \}$$. Then $$\Omega =\mathrm{cl} \Omega ^{I}$$.

### Theorem 2.3

Suppose that Assumptions 1to 5are satisfied. Then $$\lim_{\mu \searrow 0} \Omega _{\mu }=\Omega$$.

### Proof

It suffices to prove that the inclusions

$$\limsup_{\mu \searrow 0}\Omega _{\mu }\subset \Omega \subset \liminf_{\mu \searrow 0} \Omega _{\mu }$$

are valid.

Given any $$\bar{x}\in \limsup_{\mu \searrow 0}\Omega _{\mu }$$, there exist $$\mu _{k}\rightarrow 0$$ and $$x^{k}\in \Omega _{\mu _{k}}$$ such that $$x^{k}\rightarrow \bar{x}$$, which means that the inequality

$${\widetilde{p}} \bigl(x^{k}, \mu _{k} \bigr)\leq \alpha$$

holds. Taking $$k\rightarrow +\infty$$, we have

$$\lim_{\mu _{k}\rightarrow 0, x^{k}\rightarrow \bar{x}} \widetilde{p} \bigl(x^{k}, \mu _{k} \bigr)\leq \alpha .$$

Owing to the continuity of function $$\widetilde{p}(x, \mu )$$ in x, it follows that $$p(\bar{x})\leq \alpha$$, which means $$\bar{x}\in \Omega$$.

Consequently, inclusion $$\limsup_{\mu \searrow 0}\Omega _{\mu }\subset \Omega$$ holds.

For any $$\bar{x} \in \Omega ^{I}$$, we have $$p(\bar{x})<\alpha$$. It follows from Theorem 2.2 that

$$\lim_{\mu \searrow 0}\widetilde{p}(\bar{x}, \mu )< \alpha .$$

Consider $$\epsilon =\frac{1}{2}(\alpha -p(\bar{x}))>0$$, given any $$\mu _{k}\searrow 0$$, there exists $$N\in \mathcal{N}_{\infty }$$ for $$k\in N$$, one has

$$\widetilde{p}(\bar{x}, \mu _{k})\leq p(\bar{x})+\epsilon = \frac{1}{2} \bigl( \alpha +p(\bar{x}) \bigr)< \alpha$$

i.e. $$\bar{x}\in \operatorname{int} (\Omega _{\mu _{k}})$$. So there exist $$N\supset N_{1}\in {\mathcal{N}}_{\infty }$$ and $$x^{k}\in \Omega _{\mu _{k}}$$, $$k\in N_{1}$$ such that $$x^{k}\underset{N_{1}}{\longrightarrow }{\bar{x}}$$, which implies that the inclusion $$\Omega ^{I}\subset \liminf_{\mu \searrow 0}\Omega _{\mu }$$ holds. Since the set $$\liminf_{\mu \searrow 0}\Omega _{\mu }$$ is closed, it follows by Assumption 5 that

$$\Omega ={\mathrm{cl}}\Omega ^{I}\subset \liminf_{\mu \searrow 0} \Omega _{\mu }.$$

As a result, $$\lim_{\mu \searrow 0} \Omega _{\mu }=\Omega$$ holds. □

For sets $$A,B\subset \Re ^{d}$$, let $$d(x,A)= \inf_{x'\in A}\|x-x'\|$$ denote the distance from $$x\in \Re ^{d}$$ to A, and $${\mathrm{D}}(A,B)=\sup_{x\in A} d(x,B)$$ denote the deviation of the set A from the set B, see . Note that it is a measure of the distance between two sets. Let S and ν denote the set of optimal solutions and optimal value of problem (P), respectively. It is easy to prove the following theorem.

### Theorem 2.4

Suppose that Assumptions 1to 5are satisfied. Then, for any $$x\in X$$, $$\lim_{\mu \searrow 0} \nu _{\mu }=\nu$$ and $$\lim_{\mu \searrow 0} {\mathrm{D}}(S_{\mu },S)=0$$.

### Proof

Define

$$\bar{h}(x)=h(x)+I_{\Omega }(x),\qquad \bar{h}_{\mu }(x)=h(x)+I_{\Omega _{\mu }}(x),$$

where

$$I_{A}(x)=\textstyle\begin{cases} 0 & {\mathrm{if }}~x\in A, \\ +\infty & {\mathrm{if}}~ x\notin A. \end{cases}$$

In view of Theorem 2.3 and [20, Proposition 7.4(f)], we have that $$I_{\Omega _{\mu }}(\cdot )$$ epi-converges to $$I_{\Omega }(\cdot )$$ as $$\mu \searrow 0$$. Since $$h(\cdot )$$ is continuous, it follows that $$\bar{h}_{\mu }(\cdot )$$ epi-converges to $$\bar{h}(\cdot )$$ as $$\mu \searrow 0$$. Note that $$\Omega _{\mu }$$ and Ω are compact, we have that $$\bar{h}_{\mu }(\cdot )$$ and $$\bar{h}(\cdot )$$ are lower semi-continuous and proper. Then, by [20, Theorem 7.33], we have $$\nu _{\mu }\rightarrow \nu$$ and

$$\limsup_{\mu \searrow 0} S_{\mu }\subset S \bigl({ \mathrm{as}}\ c(x,\xi )< 0 \bigr) \quad \text{or}\quad \liminf _{\mu \searrow 0} S_{\mu }\supset S \quad \bigl(\text{as } c(x,\xi )>0 \bigr).$$

Since $$S_{\mu }$$ and S are subsets of the compact set X, they are uniformly compact. According to [20, Example 4.13], we have that $$\lim_{\mu \searrow 0} {\mathrm{D}}(S_{\mu },S)=0$$. This concludes the proof of the theorem. □

Theorem 2.4 shows that the optimal solutions of problem $$({\mathrm{\widetilde{P}_{\mu}}})$$ provide good approximation to those of problem (P) when $$\mu \searrow 0$$. However, problem $$({\mathrm{\widetilde{P}_{\mu}}})$$ is generally a nonconvex problem. Therefore, finding an optimal solution to the problem may be difficult. We often only find KKT points of problem $$({\mathrm{\widetilde{P}_{\mu}}})$$. In the rest of this subsection, we analyze the convergence of the KKT points of problem $$({\mathrm{\widetilde{P}_{\mu}}})$$ to those of problem (P) as $$\mu \searrow 0$$.

Let Λ and $$\Lambda _{\mu }$$ denote the sets of KKT pairs of problem (P) and $$({\mathrm{\widetilde{P}_{\mu}}})$$, respectively, as follows:

\begin{aligned}& \Lambda = \begin{Bmatrix} (x,\lambda )\in \Omega \times \Re _{+}: 0 \in \nabla h(x)+\lambda \partial p(x)+N_{X}(x) \\ \lambda \bigl[p(x)-\alpha \bigr]=0 \end{Bmatrix},\\& \Lambda _{\mu }= \begin{Bmatrix} (x,\lambda )\in \Omega _{\mu }\times \Re _{+}:~ 0\in \nabla h(x)+ \lambda \nabla _{x}\widetilde{p}(x,\mu )+N_{X}(x) \\ \lambda \bigl[\widetilde{p}(x,\mu )-\alpha \bigr]=0 \end{Bmatrix}, \end{aligned}

where $$\partial p(x)$$ represents the sub-differential of function p at x and $$N_{X}(x)$$ denotes the normal cone to X at x.

### Theorem 2.5

Suppose that Assumptions 1to 5are satisfied. Then $$\limsup_{\mu \searrow 0} \Lambda _{\mu }\subset \Lambda$$.

### Proof

For any $$(x,\lambda )\in \limsup_{\mu \searrow 0} \Lambda _{\mu }$$, there exist $$\mu _{k}\searrow 0$$ and $$(x^{k},\lambda ^{k})\in \Lambda _{\mu _{k}}$$ such that $$(x^{k},\lambda ^{k})\rightarrow (x,\lambda )$$. The inclusion $$(x^{k},\lambda ^{k})\in \Lambda _{\mu _{k}}$$ means

$$\textstyle\begin{cases} 0\in \nabla h(x^{k})+\lambda ^{k}\nabla _{x}\widetilde{p}(x^{k},\mu _{k})+N_{X}(x^{k}), \\ \lambda ^{k} [\widetilde{p}(x^{k},\mu _{k})-\alpha ]=0, \end{cases}$$

which are proved from the following four parts.

Part I, by Assumption 1, we have $$h(x)$$ is continuously differentiable. Then $$\nabla h(x)$$ is continuously differentiable. So $$\nabla h(x^{k})\rightarrow \nabla h(x)$$ as $$x^{k}\rightarrow x$$ holds.

Part II, by Theorem 2.2, we know that $$\lim_{\mu \searrow 0} \widetilde{p}(x,\mu )=p(x)$$. Then $$\widetilde{p}(x^{k},\mu _{k})\rightarrow p(x)$$ as $$x^{k}\rightarrow x$$, $$\mu _{k}\searrow 0$$.

Part III, by Assumption 4, we obtain by calculation that, for $$\mu >0$$,

$$\nabla _{x}\widetilde{p}(x,\mu )=\nabla _{x}{ \mathrm{E}} \bigl[\psi \bigl(c(x,\xi ), \mu \bigr) \bigr] ={\mathrm{E}} \bigl[ \nabla _{x}\psi \bigl(c(x,\xi ),\mu \bigr) \bigr]$$

and

$$\lim_{\mu \searrow 0} \nabla _{x}\psi \bigl(c(x,\xi ), \mu \bigr) = \lim_{\mu \searrow 0} \frac{\mu ^{-1}e^{-\mu ^{-1}c(x,\xi )}}{(1+e^{-\mu ^{-1}c(x,\xi )})^{2}} \nabla _{x}c(x,\xi ) =\textstyle\begin{cases} \textbf{0} & c(x, \xi )\neq 0, \\ +\infty & c(x, \xi )= 0. \end{cases}$$

Thus, it follows from Assumption 3 that

$$\lim_{\mu \searrow 0}\nabla _{x}\widetilde{p}(x,\mu )= \textbf{0},$$

Consequently, $$\lim_{\mu \searrow 0}\nabla _{x}\widetilde{p}(x,\mu )\in \partial p(x)$$. Since $$\nabla _{x}\widetilde{p}(x,\mu )$$ is continuous,

$$\lim_{x^{k}\rightarrow x, ~\mu _{k}\searrow 0 }\nabla _{x} \widetilde{p} \bigl(x^{k},\mu _{k} \bigr)\in \partial p(x).$$

Part IV, by Theorem 2.3, we can obtain $$\lim_{\mu \searrow 0} \Omega _{\mu }=\Omega$$, since $$x^{k}\in \Omega _{\mu _{k}}$$ and $$x^{k}\rightarrow x$$. Therefore $$x\in \Omega$$. By [20, Proposition 6.6], we have

$$\limsup_{x^{k}\rightarrow x} N_{X} \bigl(x^{k} \bigr)=N_{X}(x),$$

where $$x,x^{k}\in X$$. In conclusion, $$(x,\lambda )\in \Lambda$$, so $$\limsup_{\mu \searrow 0} \Lambda _{\mu }\subset \Lambda$$ holds. This concludes the proof. □

Theorem 2.4 and 2.5 ensure that problem $$({\mathrm{\widetilde{P}_{\mu}}})$$ can approximate problem (P) very well. Therefore, we can solve problem $$({\mathrm{\widetilde{P}_{\mu}}})$$ instead of problem (P) provided that the smoothing parameter μ is sufficiently close to 0.

We see that problem $$({\mathrm{\widetilde{P}_{\mu}}})$$ is also a smooth approximation of the original PCOP. Similarly, as problem $$({\mathrm{\widetilde{P}_{\mu}}})$$ may be nonconvex, its optimal solutions may not be guaranteed by the optimization procedures such as the SCA algorithm that will be introduced in Sect. 2.2. We observe that problem $$({\mathrm{\widetilde{P}_{\mu}}})$$ directly approaches the original PCOP as $$\mu \searrow 0$$. However, the probability function $$\operatorname{Pr}\{c(x,\xi )>0\}$$ in the PCOP is in general nonsmooth, nonconvex, and also may not be locally Lipschitz continuous. Consequently, none of its gradient, subdifferential in convex context, and Clarke’s generalized gradient  are available, which makes the conventional KKT conditions for smooth optimization, the subdifferential conditions for nonsmooth convex optimization, and the generalized gradient conditions for locally Lipschitz continuous optimization not applicable to the PCOP. In this paper, we try to give a depiction of the possible optimality conditions for the PCOP and show the convergence of the stationary points for problem $$({\mathrm{\widetilde{P}_{\mu}}})$$ in the proposed new context. In the rest of the paper we shall focus on the computational and implementation issues of the proposed smoothing approach.

### Sequential convex approximation method

In this section, we applied sequential convex approximation (SCA) constructed in  to solve the approximation problem $$({\mathrm{\widetilde{P}_{\mu}}})$$.

Let

$$H_{1}(x,\xi ,\mu ):=\frac{1}{1+e^{-\mu ^{-1}c(x,\xi )}}+\frac{1}{8} \mu ^{-2}c_{+}(x,\xi )^{2}$$

and

$$H_{2}(x,\xi ,\mu ):=\frac{1}{8}\mu ^{-2}c_{+}(x, \xi )^{2},$$

where $$c_{+}(x,\xi )=\max \{0, c(x, \xi )\}$$. Then, for any $$\mu >0$$, both $$H_{1}(x,\xi ,\mu )$$ and $$H_{2}(x,\xi ,\mu )$$ are convex in x.

In fact, let $$z=c(x,\xi )$$, we can obtain that

$$\frac{\partial }{\partial z}H_{2}(z,\mu )=\frac{1}{4}\mu ^{-2}z_{+} \geq 0.$$

Thus $$H_{2}(z,\mu )$$ is monotone increasing. It follows from [22, Proposition 2.1.7] that $$H_{2}(x, \xi ,\mu )$$ is convex in x. Similarly, $$\psi (x,\xi ,\mu )$$ is also convex in x by [22, Proposition 2.1.1], $$H_{1}(x,\xi ,\mu )$$ is also convex in x. And for any $$\mu >0$$, both $$H_{1}(x,\xi ,\mu )$$ and $$H_{2}(x,\xi ,\mu )$$ are continuously differentiable in x for every $$\xi \in \Xi$$. Denote

$$g_{1}(x,\mu )=\Psi _{1}(x,\mu )-\alpha ,\qquad g_{2}(x,\mu )=\Psi _{2}(x, \mu ),$$

where $$\Psi _{1}(x,\mu )={\mathrm{E}}[H_{1}(x,\xi ,\mu )]$$, $$\Psi _{2}(x,\mu )={ \mathrm{E}}[H_{2}(x,\xi ,\mu )]$$.

Problem $$({\mathrm{\widetilde{P}_{\mu}}})$$ can be reformulated as the following DC program:

Let $$Z(\mu )$$ be the feasible set of problem ($${\mathrm {{P}_{\mu }}}$$) and

$$Z(\mu )_{y}= \bigl\{ x\in X:g_{1}(x,\mu )- \bigl[g_{2}(y,\mu )+\nabla _{x}g_{2}(y, \mu )^{T}(x-y) \bigr]\leq 0 \bigr\}$$

for any $$y\in Z(\mu )$$. Note that

$$g_{2}(y,\mu )+\nabla _{x}g_{2}(y,\mu )^{T}(x-y)$$

is the first order Taylor expansion at any point $$y\in Z(\mu )$$ to approximate $$g_{2}(x,\mu )$$. Since $$g_{2}(x,\mu )$$ is convex in x, we have

$$g_{2}(x,\mu )\geq g_{2}(y,\mu )+\nabla _{x}g_{2}(y,\mu )^{T}(x-y),\quad x \in X,$$

which implies that

$$g_{1}(x,\mu )-g_{2}(x,\mu )\leq g_{1}(x, \mu )- \bigl[g_{2}(y,\mu )+\nabla _{x}g_{2}(y, \mu )^{T}(x-y) \bigr].$$

Then $$Z(\mu )\supset Z(\mu )_{y}$$ for any $$y\in Z(\mu )$$. Moreover, since

$$g_{1}(x,\mu )- \bigl[g_{2}(y,\mu )+\nabla _{x}g_{2}(y,\mu )^{T}(x-y) \bigr]$$

is a smooth convex function of x, we have $$Z(\mu )_{y}$$ is a convex subset of $$Z(\mu )$$.

For any $$y\in Z(\mu )$$, define problem as follows:

Then ($${\normalfont {\mathrm {CP}}}(\mu ,y)$$) is a convex smooth approximation of problem ($${\mathrm {{P}_{\mu }}}$$). It leads us to using the following algorithm to solve problem ($${\mathrm {{P}_{\mu }}}$$).

### Algorithm SCA

()

step 0:

Given an initial point $$x_{0}\in Z(\mu )$$ and set $$k=0$$;

step 1:

Stop if $$x^{k}$$ is satisfies the KKT condition of problem ($${\mathrm {{P}_{\mu }}}$$);

step 2:

Solve $${\mathrm{CP}}(\mu ,x^{k})$$ to obtain its optimal solution $$x^{k+1}$$;

step 3:

Set $$k=k+1$$ and return to step 1.

Algorithm SCA is easy to implement, since we only need to solve the convex optimization problem $${\mathrm{CP}}(\mu ,x^{k})$$ in each iteration. It also has some desired properties, which we summarize in the following theorems without proofs. Note that we say Slater’s condition holds at $$y\in Z(\mu )$$ if $${\mathrm{int}} Z(\mu )_{y}\neq \emptyset$$, where intA denotes the interior of a set A.

### Theorem 2.6

For problem ($${\mathrm {{P}_{\mu }}}$$), let $$\{y_{k}\}\subset X$$ be a sequence converging to $$\bar{y}\in Z(\mu )$$ at which Slater’s condition holds. Then $$\lim_{k\rightarrow +\infty } Z(\mu )_{y_{k}}=Z(\mu )_{\bar{y}}$$.

### Theorem 2.7

Suppose that the Slater’s condition holds and $$\{x^{k}\}$$ is a sequence of solutions generated by Algorithm SCAfor problem ($${\mathrm {{P}_{\mu }}}$$) starting from $$x_{0}\in Z(\mu )$$. Then,

1. (i)

$$\{x^{k}\}\subset Z(\mu )$$ and $$\{h(x^{k})\}$$ is a convergent nonincreasing sequence;

2. (ii)

if $$x^{k+1}=x^{k}$$, then $$x^{k}$$ is a KKT point of problem ($${\mathrm {{P}_{\mu }}}$$);

3. (iii)

suppose that is a cluster point of $$\{x^{k}\}$$, then is a KKT point of problem ($${\mathrm {{P}_{\mu }}}$$).

Theorem 2.7 shows that we make improvement at each iteration and the sequence of objective values converges to a certain value, and if the algorithm terminates after a finite number of iterations, we actually reach a KKT point. It also ensures that all limit points of the sequence of solutions generated are KKT points. Therefore, it demonstrates that the algorithm has the desired convergence property. If problem ($${\mathrm {{P}_{\mu }}}$$) has only a KKT point that is better than the initial solution or only has a single KKT point, Algorithm SCA guarantees to converge to a global optimal solution.

## Results and discussion

In this section we implement Algorithm SCA in MATLAB and use the inherent function fmincon to solve the convex sub-problem in each iteration. All the programs are run on a laptop with Intel(R) Core(TM) i5-8265U CPU @ 1.60 GHz 1.80 GHz and 8.00 GB of RAM.

### Example 3.1

Consider the following problem:

\begin{aligned} &\min_{x\geq 0}\quad - \sum^{10}_{j=1}x_{j} \\ & {\mathrm{s. t.}}\quad \operatorname{Pr}\Biggl\{ \sum_{j=1}^{10} \xi _{ij}^{2}x_{j}^{2} \leq 100, i=1,\ldots ,10\Biggr\} \geq 1-\alpha , \end{aligned}
(1)

where $$x=(x_{1},\ldots ,x_{10})^{T}\in \Re ^{10}$$, $$\xi _{i}=(\xi _{i1},\ldots ,\xi _{i10})^{T}\in \Re ^{10}$$ for $$i=1,\ldots ,10$$ are random variables. Assume that the random variables $$\xi _{ij}$$, $$i=1,\ldots ,10$$, $$j=1,\ldots ,10$$ are independent and identically distributed (i.i.d.) standard normal random variables.

It is easy to know that the optimal solution $$x^{*}$$ of problem (1) is

$$x_{1}^{*}=x_{2}^{*}= \cdots =x_{10}^{*}= \biggl[ \frac{100}{F^{-1}_{\chi ^{2}_{10}}((1-\alpha )^{\frac{1}{10}})} \biggr]^{\frac{1}{2}},$$
(2)

where $$F^{-1}_{\chi ^{2}_{10}}$$ is the inverse chi-squared distribution function with 10 degrees of freedom.

By calculating according to Eq. (2), we can obtain that the optimal solution and optimal value of problem (1) are $$(2.082,2.082,\ldots , 2.082)^{T}$$ and −20.82 respectively as $$\alpha =0.1$$, and $$(1.995,1.995,\ldots , 1.995)^{T}$$ and −19.95 respectively as $$\alpha =0.05$$.

Denote $$c_{i}(x,\xi )=\sum_{j=1}^{10}\xi _{ij}^{2}x_{j}^{2}-100$$, $$i=1, \ldots ,10$$, then problem (1) is a JPCOP as defined in problem (PCOP). For different α, we run the Algorithm SCA a great number of replications for sample size $$N=1000$$ and $$N=10\text{,}000$$ respectively. The algorithm is stopped if $$\| x^{k+1}-x^{k}\| ^{2}\leq 10^{-6}$$. Numerical results are rounded to two decimal places.

Table 1 reports typical performances for different scalar $$\mu =0.1, 0.05, 0.025$$ as $$\alpha =0.1, 0.05$$ respectively. It is obvious that the performances of the algorithm are very stable for parameter μ small enough. It always converges to similar objective values. And the smaller the confidence α the faster the convergence under the same conditions.

Figure 2 and Fig. 3 illustrate the objective values of all iterations for different parameters μ as $$\alpha =0.1$$ and $$\alpha =0.05$$ respectively. The algorithm starts with the objective function dropping rapidly and then gradually slows down.

By comparison, it is found that when the sample size is sufficiently large, the approximation problem has desirable convergence. Furthermore, numerical results suggest that the smooth approximation approach based on a sigmoid function for solving optimization with probabilistic constraints is implementable.

## Conclusions

We used a sigmoid function to approximate the characteristic function and built the corresponding smooth approximation problem. Sequential convex approximation (SCA) algorithm was introduced to solve the smooth approximation problem. Numerical results showed that the smooth approximation approach for optimization with probabilistic constraints based on the sigmoid function was implementable and guaranteed desired convergence properties under certain conditions. In the future, we will further explore a class of smooth approximation methods for solving probabilistic constrained optimization problems and compare different approximation algorithms numerically.

Not applicable.

## References

1. Charnes, A., Cooper, W.W., Symonds, H.: Cost horizons and certainty equivalents: an approach to stochastic programming of heating oil. Manag. Sci. 4, 235–263 (1958)

2. Miller, L.B., Wagner, H.M.: Chance constrained programming with joint constraints. Oper. Res. 13, 930–945 (1965)

3. Som, A., Some, K., Compaore, A., Some, B.: Exponential penalty function with MOMA-plus for the multiobjective optimization problems. Appl. Anal. Optim. 5, 323–334 (2021)

4. Zhao, J., Bin, M., Liu, Z.: A class of nonlinear differential optimization problems in finite dimensional spaces. Appl. Anal. Optim. 5, 145–156 (2021)

5. Zhao, X., Kobis, M.A., Yao, Y., Yao, J.C.: A projected subgradient method for nondifferentiable quasiconvex multiobjective optimization problems. J. Optim. Theory Appl. 190, 82–107 (2021)

6. Zhang, J., Xu, H.F., Zhang, L.W.: Quantitative stability analysis of stochastic quasi-variational inequality problems and applications. Math. Program. 165, 433–470 (2017)

7. Zhang, J., Xu, H.F., Zhang, L.W.: Quantitative stability analysis for distributionally robust optimization with moment constraints. SIAM J. Optim. 26, 1855–1882 (2016)

8. Ben-Tal, A., Nemirovski, A.: Robust solutions of linear programming problems contaminated with uncertain data. Math. Program. 88, 411–424 (2000)

9. Nemirovski, A., Shapiro, A.: Convex approximations of chance constrained programs. SIAM J. Optim. 17, 347–375 (2006)

10. Rockafellar, R.T., Uryasev, S.: Optimization of conditional value-at-risk. J. Risk 2, 21–41 (2000)

11. Luedtke, J., Ahmed, S.: A sample approximation approach for optimization with probabilistic constraints. SIAM J. Optim. 19, 674–699 (2008)

12. Calafiore, G., Campi, M.C.: Uncertain convex programs: randomized solutions and confidence levels. Math. Program. 102, 25–46 (2005)

13. Calafiore, G., Campi, M.C.: The scenario approach to robust control design. IEEE Trans. Autom. Control 51, 742–753 (2006)

14. Farias, D.P.D., Roy, B.V.: On constraint sampling in the linear programming approach to approximate dynamic programming. Math. Oper. Res. 29, 462–478 (2004)

15. Yang, Y., Sutanto, C.: Chance-constrained optimization for nonconvex programs using scenario-based methods. ISA Trans. 90, 157–168 (2019)

16. Hong, L.J., Yang, Y., Zhang, L.W.: Sequential convex approximations to joint chance constrained programs: a Monte Carlo approach. Oper. Res. 59, 617–630 (2011)

17. Hu, Z., Hong, L.J., Zhang, L.W.: A smooth Monte Carlo approach to joint chance-constrained programs. IIE Trans. 45, 716–735 (2013)

18. Shan, F., Zhang, L.W., Xiao, X.T.: A smoothing function approach to joint chance-constrained programs. J. Optim. Theory Appl. 163, 181–199 (2014)

19. Shapiro, A., Dentcheva, D., Ruszczyński, A.: Lectures on Stochastic Programming: Modeling and Theory. SIAM, Philadelphia (2009)

20. Rockafellar, R.T., Wets, R.J.B.: Variational Analysis. Springer, Berlin (1998)

21. Clarke, F.H.: Optimization and Nonsmooth Analysis. Wiley, New York (1983)

22. Hiriart-Urruty, J.B., Lemaréchal, C.: Fundamentals of Convex Analysis. Springer, Berlin (2001)

## Acknowledgements

We would like to thank for the constructive feedback provided by the reviewers.

## Funding

Our work is supported by National Natural Science Foundation of China under Grant(12171219, 11801054); Liaoning Province Department of Education Scientific Research General Project (LJ2019005).

## Author information

Authors

### Contributions

YR performed the analysis with constructive discussions and wrote the manuscript. YX and YY performed the experiments. JG contributed significantly to data analysis. All authors have read and approved the manuscript.

### Corresponding authors

Correspondence to Yong H. Ren or Jian Gu.

## Ethics declarations

### Competing interests

The authors declare that they have no competing interests.

## Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and Permissions 