Skip to main content

ORIGINAL RESEARCH article

Front. Appl. Math. Stat., 20 March 2023
Sec. Statistics and Probability

On composite length-biased exponential-Pareto distribution: Properties, simulation, and application in actuarial science

\nMoulouk Halima BenchettahMoulouk Halima Benchettah1Halim Zeghdoudi
Halim Zeghdoudi1*Vinoth RamanVinoth Raman2
  • 1LaPS Laboratory, Badji-Mokhtar University, Annaba, Algeria
  • 2Department of Quality Measurement and Evaluation, Deanship of Quality and Academic Accreditation, Imam Abdulrahman Bin Faisal University, Dammam, Saudi Arabia

The composite length-biased exponential-Pareto (CLBEP) distribution is a new composite distribution that is introduced in this article. This model's probability density function, moments, and quantiles, among other statistical characteristics, are determined mathematically. The parameters' maximum-likelihood estimation and stochastic ordering are discussed. A comparison study with other new composite and conventional distributions is also included. Specifically, using two actual fire insurance data sets, the goodness of fit of this new model is contrasted with the composite exponential-Pareto, composite lognormal-Pareto, and composite Rayleigh-Pareto distributions (Algerian and Danish fire insurance losses).

2010 AMS subject classifications: 62E10; 60E05.

1. Introduction

Currently, digital methods are being used in the fields of biology, economics, physical sciences, statistical sciences, and other fields. In the applications of other fields as well as in daily life, the statistical sciences are essential. Probability distributions are frequently the foundation of statistical science because many problems in these fields frequently do not follow one of the fundamental probability distributions. Actuarial science and finance generally use common distributions to express their data on payments, quantity and number of claims, and premium computation. Examples of these distributions are exponential, Poisson, length-biased exponential, and Pareto.

The length-biased exponential distribution, on the other hand, offers a wide range of practical applications in several industries (reliability, actuarial science, survival analysis, and mathematical financiers). The lifetime of a phenomenon with no memory, no aging, no wear and tear, or the profits of an insurance company, or various models of surpluses and financial assets, are frequently modeled using the length-biased exponential distribution.

The modeling of unimodal insurance loss data with a long tail appeals to actuaries. Distributions that may replicate the heavy tail of insurance loss data are necessary to provide a sufficiently precise estimate of the degree of connected business risk, including gamma, Pareto, length-biased exponential, Rayleigh lognormal, and Weibull.

For example, if there are both modest and significant losses, insurance companies may experience losses. When modeling very large losses, practitioners seem to favor the Pareto distribution for size distribution. Length-biased exponential, lognormal, Rayleigh, or Weibull models are preferred when the losses are composed of smaller values with high frequencies and larger losses with low frequencies [1]. Nevertheless, no conventional size model can simultaneously account for losses that are both minor and significant. Unlike length-biased, lognormal, Rayleigh, or Weibull exponential models, which have a positive general fit but fit the tail poorly, Pareto models actually fit the tail well.

When modeling data that have heavy tails, the composite distributions appear appropriate. For instance, the one-parameter exponential-Pareto (exp-Pareto) model and the one-parameter inverse gamma-Pareto (IG-Pareto) model have both been proposed as potential models for the modeling of insurance data. When they are fitted to well-known insurance data sets, such as the Danish fire insurance data set, they still are unable to perform satisfactorily. So, the model needs to be improved. By exponentiation of the random variable linked to the probability density function (pdf) of an inverse gamma-Pareto distribution, Liu and Ananda [2] suggested an improved version of the one-parameter IG-Pareto model. Their suggested model outperformed the original model significantly across several data sets. Furthermore, there are other composite models such as the composite lognormal-Pareto (cLP) model (see Scollnik [3] and composite Rayleigh-Pareto (cRP) model (see Benatmane et al. [4]). For more details see [512].

As a result, we suggest, in this study, a novel composite distribution that blends length-biased and Pareto exponential distributions. This effort aims to introduce a new composite distribution. As a result, the CLBEP distribution has a single parameter. It is simple to determine mathematical qualities in an explicit form. Due to its composition (two types of distributions that can be simulated for survival analysis and actuarial purposes), this new distribution offers advantages. Many real-life data sets can be analyzed using the CLBEP model, which provides suitable fits to these data sets.

The current article is structured as follows: The composite length-biased exponential-Pareto distribution and some of its statistical characteristics are discussed in Section 2. The estimation of parameters is addressed in Section 3. A numerical example with a comparison of various classical and composite models using two real data sets is provided in Section 4.

2. Formulation of the CLBEP distribution

For many theoretical issues, the length-biased exponential and Pareto distributions might not be adequate. We created the composite length-biased exponential-Pareto (CLBEP) distribution, based on the composite transformation, to have a flexible model. Let T be an arbitrary random variable with density function

f(t;θ)={cf1(t)0<tθcf2(t)θt<

where f1 is a length-biased exponential density, f2 is a two-parameter Pareto density, and c is the normalizing constant. Hence,

f1(t)=tλ2exp(tλ),0<t,f2(t)=αθαtα+1,t>θ,

where λ, α, and θ are unknown non-negative parameters. To obtain a composite smooth density function, we use the continuity and differentiability conditions at the threshold point θ, i.e.,

{f1(θ)=f2(θ)ddtf1(θ)=ddtf2(θ).

These two restrictions give

{θλ2exp(θλ)=αθ(λθ)λ3exp(θλ)=α(α+1)θ2.

After some calculation, we get

{2β+(β2exp(β))=0α=β2exp(β) with β=θλ.

Using the numerical methods, we find

{β=2.5118α=0.51181.

To find the normalizing constant, we use the density condition (0f(t,θ)dt=1), which has

c=1eθλ(θλ+1)+2   =1eβ(β+1)+2=0.43766.

Since f(t; θ) can be expressed as

f(t;θ)={2.7613tθ2exp(2.5118tθ)0<tθ0.224t1.51181θ0.51181θt<.    (1)

2.1. Statistical properties of the CLBEP distribution

In this subsection, many statistical properties are presented, such as the behavior of PDF and quantile function, as well as the moments and stochastic ordering.

limt0f(t;θ)=0 and limtf(t;θ)=0.

The following proposition states that there is one shape for the PDF of the CLBEP distribution. Furthermore, the plots of PDF for some parameter value of the proposed model are presented in Figure 1.

FIGURE 1
www.frontiersin.org

Figure 1. The plots of PDF for some parameter value of θ.

Proposition 1. The PDF f(t; θ) in Equation (1) of the CLBEP distribution is unimodal for θ > 0.

Proof. The first derivative of f(t; θ) is

df(t;θ)dt={0.0002θ3exp(2.5118tθ)(34679t13807θ)0<tθ0.338 65t2.511 8θ0.51181θt<.

The CLBEP distribution is unimodal with maximum value at the point t^=0.39814θ, where the unique mode is tmod = 0.39814θ.

2.2. Cumulative distribution function and moments of the CLBEP distribution

The cumulative distribution function (c.d.f.) of this composite model is

F(t;θ)={0.43768(1(1+2.5118tθ)exp(2.5118tθ))0<tθ10.43768(θt)0.51181θt<    (2)

The kth moment about the origin of the CLBEP distribution can be obtained as:

E(Tk)=0.43768θk(2.5118)k( Γ( k+2,2.5118) Γ( k+2))                    +0.224θ0.51181(xk1.5118k1.5118)|θ,

which E(Tk) = ∞ (infinite), for k ≥ 2.

The mean of the CLBEP distribution is given by

E(T)=0.160 03θ+0.437 67.

2.3. The quantile function of the CLBEP distribution

The quantile function of the CLBEP distribution is given in the following theorem.

Theorem 1. The quantile function of the CLBEP distribution is

FT1(u)={θ2.5118θ2.5118W1(u0.43768ee1)   if 0<u<u0θ(0.43768(1u))1.953 9   if u0<u<1

where u0 = 0.43768.

Proof. For u ∈ ]0;u0[, we have to solve the equation F(t) = u with respect to t, t > 0

0.43768(1(1+2.5118tθ)exp(2.5118tθ))=u                 (1+2.5118tθ)exp(2.5118tθ)=u0.437681

Multiplying by e−1 both sides, we find

(1+2.5118tθ)exp(2.5118tθ)e1=(u0.437681)e1W(z)exp(W(z))=z

We see that -(1+2.5118tθ) is the Lambert W function of the real argument (u0.43768-1)e-1. Then, we have

W(u0.43768ee1)=12.5118tθ.    (3)

Moreover, for any θ, t > 0, it is immediate that  -(1+2.5118tθ)<0, and it can also be checked that

(u0.43768ee1)]1e;0[ since u ∈ ]0;u0[

Therefore, by taking into account the properties of the negative branch of the Lambert W function, Equation (3) becomes

W1(u0.43768ee1)=(1+2.5118tθ).

Finally, ∀θ > 0, t=FT-1(u),

FT1(u)=θ2.5118θ2.5118W1(u0.43768ee1),where 0<u<u0.

Now, for u ∈ ]u0; 1[, we have to solve the equation F(t) = u with respect to t, t > 0

10.43768(θt)0.51181=u

it is easy to find

FT1(u)=θ(0.43768(1u))1.953 9   where u0<u<1.

2.4. Stochastic ordering

Consider two random variables Z1 and Z2. Then, Z1 is said to be smaller than Z2 in the following cases:

1. Stochastic order (Z1 <S Z2 ), if FZ1(t) < FZ2(t), ∀t.

2. Convex order (Z1cx X2), if for all convex functions Ψ and provided expectation exists, E[Ψ(Z1)] ≤ E[Ψ(Z2)].

3. Hazard rate order (Z1hr Z2), if hZ1(t) ≥ hZ2(t), ∀t.

4. Likelihood ratio order (Z1 <lr Z2), if fZ1(t)fZ2(t) is decreasing in t.

Remark 1. Likelihood ratio order ⇒ hazard rate order ⇒ stochastic order, If E[Z1] = E[Z2], then convex order ⇔ stochastic order.

Theorem 2. Let ZiCLBEP distribution (θi); i = 1, 2 be two random variables. If θ1 ≤ θ2, then Z1 <lr Z2, Z1 <hr Z2; Z1 <S Z2 and Z1cx Z2.

Proof.

Case I: 0 < t ≤ θ

We have

fZ1(t;θ1)fZ2(t;θ2)=θ22θ12exp(2.5118tθ1+2.5118tθ2).

Using the ln (fZ1(t;θ1)fZ2(t;θ2)) for simplification, we can find

ddtln(fZ1(t;θ1)fZ2(t;θ2))=2.5118θ22θ12(θ1θ2θ1θ2).

To this end, if θ1 ≤ θ2, we have ddtln (fZ1(t;θ1)fZ2(t;θ2))0. This means that Z1 <lr Z2.

Case II: θ ≤ t < ∞

We have

fZ1(t;θ1)fZ2(t;θ2)=0.26148t1.511 81(θ10.51181θ20.51181).

We can see, if θ1 ≤ θ2, then fZ2(t; θ2) ≤ fZ2(t; θ2). Furthermore, according to Remark 1, the theorem is proved.

3. Generating random values from the CLBEP distribution

3.1. Parameter estimation

In this section, we will introduce two methods of estimating the unknown parameter θ.

3.1.1. An ad hoc procedure based on percentiles

The following ad hoc procedure provides a closed form for the parameter θ, estimated using percentiles. Let t1, t2, …, tn be a random sample from the CLBEP model. Assume that t1t2 ≤… ≤ tn and tm ≤ θ ≤ tm+1. Based on percentiles, the parameter θ can be estimated, as the pth percentile, where p = F(θ)

p=0.43768(1(1+2.5118θθ)exp(2.5118θθ))=0.312 99.

According to Klugman et al. [1], we have a smooth empirical estimate of the pth percentile given by

θ^=(1h)tm+htm+1

with

{m=[(n+1)p]h=(n+1)pm.    (4)

The Pareto distribution or the length-biased exponential distribution will be a more superior model than the composite length-biased exponential-Pareto distribution according as θ^ is closer to t1 or tn.

3.1.2. Maximum-likelihood estimation

Assume again that t1t2 ≤… ≤ tn and tm ≤ θ ≤ tm+1. Then, the likelihood function is

L(t1,,tn;θ)=i=1nf(ti)=i=1mf1(ti)i=m+1nf2(ti)=i=1m2.7613tiθ2exp(2.51181tiθ)i=m+1n(0.224)ti.1.51181θ0.51181=(2.7613)m(0.224)nmθ0.51181(nm)2me2.51181θi=1mtii=1mtii=m+1nti1.51181=kθ0.51181n2.51181me(2.51181θi=1mti).

with

k=(2.7613)m(0.224)nmi=1mtii=m+1nti1.51181.

Define ln L=ln k+(0.51181n-2.51181m)ln θ+(-2.51181θi=1mti).

Differentiating ln L with respect to θ gives

dlnLdθ=0.51181n2.51181mθ+2.51181i=1mtiθ2.

Hence, the solution of the likelihood equation dln Ldθ=0 is

θ^=2.51181mt¯m2.251181m0.51181n, if nm4.9 and t¯m=i=1mtim.    (5)

Since this estimator requires the value of m, we recommend the following algorithm (see Teodorescu and Vernic [13]):

4. Numerical and application examples

In this section, the estimation procedure described in Section 3 has been explained using two data samples generated from the CLBEP model. The generating algorithm used is based on the inversion of the c.d.f. (Equation 2).

4.1. Example

The data set given in this subsection, consisting of 108 values, was sampled from a length-biased exponential-Pareto population with parameter θ = 5 (see Table 1 in the Appendix).

The estimated values of the parameter are:

- By Algorithm 1, m = 39: θ^1=5.0536.

- By Algorithm 2, MLE Step 1: θ^2=4.9812.

- By Algorithm 2, MLE Step 2 : θ^3=4.9810.

ALGORITHM 1
www.frontiersin.org

Algorithm 1. Estimate θ using MLE.

ALGORITHM 2
www.frontiersin.org

Algorithm 2. Estimate θ using percentiles.

We notice that Algorithm 2 in Step 1 gives a more accurate value. We also applied the χ2 test to check the distribution fitting, and the results for θ^3 are given in Tables 14.

TABLE 1
www.frontiersin.org

Table 1. Test for θ = 5.

TABLE 2
www.frontiersin.org

Table 2. Test for θ^1=5.0536.

TABLE 3
www.frontiersin.org

Table 3. Test for θ^2=4.9812.

TABLE 4
www.frontiersin.org

Table 4. Test for θ^3=4.9810.

The χ2 distances calculated for all the estimated values of the parameters are

χ2distance(θ^1)=2.4591χ2distance(θ^2)=2.4819χ2distance(θ^3)=2.4818.

The χ2 test accepts the length-biased exponential Pareto model for all values of the parameter as expected, which d2(θ^1) is a minimum.

4.2. Goodness of fit

In this subsection, we apply the composite length-biased exponential-Pareto model to two real insurance data sets.

Data set I: is 100 Algerian (SAA company) fire insurance losses (see Appendix).

We provide in Table 5 the estimated value of fitted models and the values of the −LL, AIC, AICc, and BIC evaluated at the maximum-likelihood estimators.

TABLE 5
www.frontiersin.org

Table 5. Estimated values of fitted models and −LL, AIC, AICc, and BIC data set I.

Data set II: is 2, 156 Danish fire insurance losses.

We use the same analysis, we find

Tables 5, 6 indicate that the CLBEP model outperforms classical distributions, composite Rayleigh-Pareto, composite exponential-Pareto, and composite lognormal-Pareto models in terms of −LL, AIC, AICc, and BIC for data sets I and II. In addition, in data set II, the Pareto model outperforms the conventional model since it covers a larger loss (n = 2, 156).

TABLE 6
www.frontiersin.org

Table 6. Estimated values of fitted models and −LL, AIC, AICc, and BIC data set II.

5. Conclusion

A unique distribution known as the composite length-biased exponential Pareto generated is suggested for application. Some of the mathematical features of this distribution include the quantile function, stochastic ordering, moments of the CLBEP, and maximum-likelihood estimation. In contrast to other conventional and new composite distributions, the distribution proposed in this work gives very satisfactory results. The goodness of fit of this novel model is compared to different conventional and new composite models, such as composite exponential-Pareto, composite lognormal-Pareto, and composite Rayleigh-Pareto distributions, using two real fire insurance data sets (Algerian and Danish fire insurance losses). Compared to the standard models, the composite models provided a far better fit to the data. The composite exponential-Pareto, composite lognormal-Pareto, and composite Rayleigh-Pareto distributions do not fit as well as the CLBEP model provides. We predict that researchers interested in statistical sciences and their applications, such as dependability and actuarial sciences, will be drawn to the CLBEP model. A future research may examine the Bayesian estimation of the CLBEP parameter, introducing the truncated version of the CLBEP distribution. In addition, it is interesting to use similar composite distributions to model the epidemic problem.

Data availability statement

The original contributions presented in the study are included in the article/Supplementary material, further inquiries can be directed to the corresponding author.

Author contributions

All authors listed have made a substantial, direct, and intellectual contribution to the work and approved it for publication.

Acknowledgments

The authors acknowledge the Editor, FM and the referees of this journal for their constant encouragement to finalize the paper.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher's note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fams.2023.1137036/full#supplementary-material

References

1. Klugman SA, Panjer HH, Willmot G. Loss Models: From Data to Decisions. New York, NY: Wiley (2008).

Google Scholar

2. Liu B, Ananda MMA. Analyzing insurance data with an exponentiated composite Inverse-Gamma Pareto Model. Commun Stat Theory Methods. (2022) 2022:399. doi: 10.1080/03610926.2022.2050399

CrossRef Full Text | Google Scholar

3. Scollnik DPM. On composite Lognormal-Pareto models. Scand Actuarial J. (2007) 2007:20–33. doi: 10.1080/03461230601110447

CrossRef Full Text | Google Scholar

4. Benatmane, C, Zeghdoudi H, Shanker R, Lazri, N. Composite Rayleigh-Pareto distribution: application to real fire insurance losses data set. J Stat Manag Syst. (2021) 24:545–57. doi: 10.1080/09720510.2020.1759253

CrossRef Full Text | Google Scholar

5. Elbatal I, Aryal G. A new generalization of the exponential Pareto distribution. J Inf Optim Sci. (2017) 38:675–97. doi: 10.1080/02522667.2016.1220079

CrossRef Full Text | Google Scholar

6. EL-Sagheer RM, Mahmoud MAW, Abdallah SHM. Statistical inferences for new Weibull-Pareto distribution under an adaptive type-ii progressive censored data. J Stat Manag Syst. (2018) 21:1021–57. doi: 10.1080/09720510.2018.1467628

CrossRef Full Text | Google Scholar

7. Aminzadeh MS, Deng, M. Bayesian predictive modeling for Inverse Gamma-Pareto composite distribution. Commun Stat Theory Methods. (2019) 48:1938–54. doi: 10.1080/03610926.2018.1440595

CrossRef Full Text | Google Scholar

8. Cebrian A, Denuit M, Lambert PH. Generalized Pareto fit to the society of Actuaries' large claims database. North Am Actuarial J. (2003) 7:18–36. doi: 10.1080/10920277.2003.10596098

CrossRef Full Text | Google Scholar

9. Preda V, Ciumara R. On composite models: Weibull-Pareto lognormal-Pareto. A comparative study. Rom J Econ Forecast. (2006) 3:32–46.

Google Scholar

10. Cooray K, Ananda MA. Modeling actuarial data with a composite Lognormal-Pareto model. Scand Actuarial J. (2005) 5:321–34. doi: 10.1080/03461230510009763

CrossRef Full Text | Google Scholar

11. Teodorescu S, Vernic R. Some composite exponential-Pareto models for actuarial prediction. Rom J Econ Forecast. (2009) 12:82–100.

Google Scholar

12. Teodorescu S, Vernic R. On composite pareto models. Math Rep. (2013)15:11–29.

Google Scholar

13. Teodorescu S, Vernic R. A composite exponential–pareto distribution. The Annals of the “Ovidius” University of Constanta. Math Series. (2006) XIV:99–108.

Google Scholar

Keywords: composite distribution, Pareto distribution, length-biased exponential, maximum-likelihood estimation, quantile function

Citation: Benchettah MH, Zeghdoudi H and Raman V (2023) On composite length-biased exponential-Pareto distribution: Properties, simulation, and application in actuarial science. Front. Appl. Math. Stat. 9:1137036. doi: 10.3389/fams.2023.1137036

Received: 03 January 2023; Accepted: 23 February 2023;
Published: 20 March 2023.

Edited by:

Fabrizio Maturo, Universitas Mercatorum, Italy

Reviewed by:

Fuxia Cheng, Illinois State University, United States
Abdelfateh Beghriche, Université Frères Mentouri Constantine 1, Algeria
Seghier Fatma Zohra, University of Skikda, Algeria

Copyright © 2023 Benchettah, Zeghdoudi and Raman. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Halim Zeghdoudi, aGFsaW16ZWdoZG91ZGk3NyYjeDAwMDQwO2dtYWlsLmNvbQ==

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.