- 1Department of Mathematics, Al-Aqsa University, Gaza, Palestine
- 2Department of Applied Statistics and Econometrics, Faculty of Graduate Studies for Statistical Research, Cairo University, Giza, Egypt
- 3Department of Quantitative Analysis, College of Business Administration, King Saud University, Riyadh, Saudi Arabia
- 4Electrical Engineering Department, Faculty of Engineering & Technology, Future University in Egypt, New Cairo, Egypt
In the censored regression model, the Tobit maximum likelihood estimator is unstable and inefficient in the occurrence of the multicollinearity problem. To reduce this problem's effects, the Tobit ridge and the Tobit Liu estimators are proposed. Therefore, this study proposes a new kind of the Tobit estimation called the Tobit new ridge-type (TNRT) estimator. Also, the TNRT estimator was theoretically compared with the Tobit maximum likelihood, the Tobit ridge, and the Tobit Liu estimators via the mean squared error criterion. Moreover, we performed a Monte Carlo simulation to study the performance of the TNRT estimator compared with the previously defined estimators. Also, we used the Mroz dataset to confirm the theoretical and the simulation study results.
Introduction
The limited dependent variables (LDVs) in the regression models are defined as the censored, the discrete, and the truncated outcomes. Tobin [1] introduced the Tobit model of the censored dependent variable, which is related to the LDVs, and Goldberger [2] gave its current name. The censored data appear when the dependent variable has a loss of information, while the truncated data appear when the dependent and the independent variables have a loss of information. In this study, we used the standard Tobit regression model, which is the Type 1 model of the Tobit models (Type 1–5) categorized by Amemiya [3] to deal with the censored dataset and their estimation. The censored normal regression model, which is called the Tobit model, is used to relieve the deficiency of biasedness and inconsistency of the results of using the least squares estimator (LSE). Therefore, to determine the estimates of the parameter and to find the estimates of statistical inference, the Tobit maximum likelihood estimator (TMLE) is used. When the explanatory (independent) variables are not independent, it becomes a problem called multicollinearity, which this problem often ignored in the censored regression models. Also, the multicollinearity makes the Tobit maximum likelihood estimates of the regression coefficients incorrect, unreliable, and unstable; because the mean squared error (MSE) values of these estimates are inflated. For this case, Khalaf et al. [4] examined the multicollinearity effects on the TMLE, and they introduced the Tobit ridge estimator (TRE). Then, Alhusseini and Odah [5] introduced a Tobit principal component estimator. Also, Toker et al. [6] introduced a Tobit Liu estimator (TLE).
In the linear regression model (LRM), several alternative estimators of the regression coefficients have been produced for the LSE when the multicollinearity problem happens because, in this case, the LSE gives large variances, wrong signs, and becomes unstable. The most popular estimators are the ridge estimator of Hoerl and Kennard [7] and the Liu estimator of Liu [8]. Recently, Kibria and Lukman [9] proposed a new ridge-type estimator (NRTE). The NRTE has been extended in different regression models in different studies, such as Lukman et al. [10], Lukman et al. [11], Akram et al. [12], Dawoud and Abonazel [13], Awwad et al. [14], and Abonazel et al. [15]. The multicollinearity is known to be a terrible problem in the Tobit model like in the LRM. For handling multicollinearity, some studies gave and investigated some biased estimators in the LRM for a long time, but there is little investigation of these estimators in the Tobit model. However, studies of the biased estimators instead of TMLE in deleting multicollinearity effects on regression coefficients in the Tobit model are needed. In this context, the TRE was introduced by Khalaf et al. [4] and the TLE by Toker et al. [6] were the biased estimation beginning points in the Tobit model. Then, we defined the Tobit NRTE (TNRTE) in this study. Also, we focus on the theoretical properties of the TNRTE by the MSE criterion and to compare them to the TMLE, the TRE, and the TLE.
The next content of this study is given as follows: Methodology Section defines the Tobit regression model and provides the TNRTE and the theoretical properties. A Monte Carlo Simulation Section deals with the Monte Carlo simulation study. A Real Life Data Section deals with the Mroz dataset. Conclusion Section includes the concluding remarks.
Methodology
Tobit Regression Model
The model of the Tobit regression is
where is called the dependent latent variable, xi is an i-th row of the known matrix X with the dimension n × (p + 1); where p is the number of the explanatory variables. β is the unknown (p + 1) × 1 coefficient vector (when the model contains the intercept β0), and ui is called an error term that is independent, follows a normal distribution by mean, and equals 0 and variance equals σ2. We considered the left censoring, where yi is defined as follows:
On the basis of n observations on yi and xi, the β and σ2 estimation issues are noted. For the defined model in Equation (1), assuming that na is the observation number for yi = 0 and is the observation number for yi > 0, that is, non-zero for yi occur first, then the log-likelihood function of the censored data is given as
where .
The TMLE of β is identified after solving the derivate of Equation (3), but it is not a linear function of β, so it can be solved iteratively by Fisher's scoring method that comprises using the second derivative. The Fisher's scoring method is given as
where is the matrix of the Fisher information which is given at where is β estimate at iteration(r), is β estimate at iteration (r − 1), , D is called as the diagonal matrix and So, the TMLE is written as:
Then, is given as:
Since the TMLE becomes inefficient and unstable when the multicollinearity problem occurs, Khalaf et al. [4] proposed the TRE and Toker et al. [6] proposed the TLE to eliminate the effects of this problem.
The TRE is given iteratively as
and the first step of the TRE is
such that is the first estimate of β, , is given at β(0), the TMLE first step values are as same as that of the TRE, and is the first step of the TMLE. When k = 0, .
The TLE is given iteratively as
and the first step of the TLE is
where the TMLE first step values are as same as that of the TLE if d = 1, [see Amemiya [16], Fair [17], and Toker et al. [6] for more details].
New Ridge-Type Estimator
The usefulness of the NRTE among the one-parameter estimators (RE and LE) in many different regression models and the extension of the one-parameter estimators to the area of the Tobit regression model encouraged us to derive the NRTE in this model as follows:
By extending Equation (3), which is the censored data log-likelihood function with the term of penalization, as
where is called a Lagrangian multiplier and c is a constant, and by differentiating J due to β, we got
where .
By finding the J second derivative due to β and then taking the expectation, we got the following form for the matrix:
Then, we employed the scoring of Fisher's method in order to introduce the TNRTE as:
By using Equation (4), we have the TNRTE in its final form as:
The TNRTE of Equation (15) was obtained iteratively. The first step of the TNRTE is given as follows:
and the first step of the TNRTE is
where the first step values of the TNRTE are same as that of the Tobit LE and is evaluated at β(0) if k = 0, .
Asymptotic MSE Comparisons
To observe the estimators' characteristics, the MSE criterion was preferred. When is an estimator of B, then the matrix form of the MSE criterion is given as
where is the matrix form of the variance-covariance and is the bias vector of estimator. Then, the scalar MSE is given by
Since the TMLE for the first step is known as an asymptotically unbiased estimator, it means that the asymptotic matrix form of the MSE equals the asymptotic matrix form of the variance-covariance as follows:
The asymptotic MSE matrix form of is given as
The asymptotic MSE matrix form of is given as
The first step TNRTE asymptotic bias and its asymptotic variance-covariance forms are given as follows:
and
Then, the asymptotic MSE matrix form of is given as
Model (1) is written in the canonical form using the orthogonal transformation and the spectral decomposition such that the Fisher matrix form of the first step is given as , where C = [C0, C1, ..., Cp] is called a (p + 1) × (p + 1) orthogonal matrix form and refers to the eigenvectors columns, is called a (p + 1) × (p + 1) diagonal matrix form with the eigenvalues on the diagonal, such that M = XC. The canonical form formula of the asymptotic matrix form and the scalar MSE for , , and are written as follows:
where α = C′β, , , and .
The lemmas below are useful to be used in the theoretical comparisons among the above estimators.
Lemma 1: Suppose for the matrices n × n, if F > 0 and I > 0 (or I ≥ 0), then F > I iff such that is the matrix IF−1 maximum eigenvalue [18].
Lemma 2: If the matrix F is defined as an n × n positive definite, i.e., F > 0, as well as α is a vector, then, F − αα′ > 0 iff α′F−1 α < 1 [19].
Lemma 3: Suppose αi = Kim, i = 1, 2 are two α linear estimators and suppose , where refers to covariance matrix and , i = 1, 2 [20], then consequently,
iff , where .
Comparisons Among the Estimators
Theorem 1: is superior to iff
Proof : The dispersion difference is:
We observed that is positive definite since for k > 0. By Lemma 3, the proof is completed.
Theorem 2: When , is superior to iff
where
Proof:
where and
It is clear that, for k > 0 and 0 < d < 1, F > 0 and I > 0. It is obvious that F − I > 0 if and only if , where is the maximum eigenvalue of the matrix IF−1. By Lemma 1, the proof is completed.
Theorem 3: is superior to if and only if
where
Proof: The dispersion difference is
We observed that is applicable if and only if . For k > 0, it was observed that . By Lemma 3, the proof is completed.
The Selection of k Parameter of the TNRTE
Using the Kibria and Lukman [9] method, the optimal biasing parameter k of the TNRTE is given as:
and using the unbiased estimates of σ2 and α2, the optimal estimated k of the TNRTE is given as:
A Monte Carlo Simulation
To explain the performance of the proposed TNRTE compared with other mentioned estimators, we conducted the simulation experiments using some different factor levels. The design is constructed by following the techniques of Kibria [21], Yenilmez et al. [22], Khalaf et al. [4], Yenilmez and Kantar [23], Toker et al. [6], and Yenilmez et al. [24]. The correlation degree (τ) among the explanatory variables is one of the essential factors in the simulation. For providing the correlation changing range, the data were also generated using the next model:
where zij is given and follows a standard normal. The dependent variable is given using the next equation:
where ui's are considered as pseudo-random numbers, which are independent and identical and have N(0, σ2), and the parameter vector is considered as β′β = 1 as in the studies of Dawoud and Abonazel [25], Awwad et al. [26], Awwad et al. [14], Abonazel and Dawoud [27], Algamal and Abonazel [28], Abonazel et al. [15], and Abonazel et al. [29]. So, the dependent variable has been censored using Equation (2). Also, all factors used in this simulation are stated in Table 1.
The TRE, the TLE, and the proposed TNRTE estimated biasing parameters used in this simulation study are given as follows:
1. The estimated parameter of k for the TRE is considered according to Hoerl and Kennard [7], as
2. The estimated parameter d for the TLE is considered, according to Liu [8] as follows
when has negative value, Ozkale and Kaciranlar [30] considered the alternative parameter of d as:
3. Following the study of Kibria and Lukman [9], the estimated biasing parameter minimum value and the harmonic-mean of k for the proposed TNRTE are considered as follows:
To examine the performances of the TMLE, TRE, TLE, and the proposed TNRTE, we computed the estimated MSE (EMSE) as:
where is called an estimator as well as α is called a true parameter. The simulation results (EMSE values) are stated in Tables 2–7, the smallest value of the EMSE is highlighted in bold.
Based on the simulation results, we conclude the following:
1. The EMSE increases as n decreases.
2. The EMSE increases as p increases.
3. The EMSE increases as τ increases.
4. The EMSE increases as σ increases.
5. The EMSE increases as the CL increases.
6. The TMLE exhibited the least performance at all levels of multicollinearity and censoring.
7. The TNRTE and the TLE outperform the TRE for all cases.
8. The proposed TNRTE has few EMSE values near to that of TLE in case of large σ and p values.
9. The proposed TNRTE with the biasing parameters performs the best of all other mentioned estimators in terms of the EMSE, followed by the proposed TNRTE with the biasing parameters in most cases.
10. The proposed TNRTE performance and others almost depend on the determination of their biasing parameter estimators.
11. Finally, the proposed TNRTE performs the best of all other mentioned estimators in terms of the EMSE in most cases.
A Real-Life Data
In this section, we have the Mroz dataset that was originally adopted by Mroz [31] to clarify the performance of the proposed TNRTE and other mentioned estimators. The Mroz data contains 753 cases of married women with 21 variables, and the ages of these women range from 30 to 60 years. Three hundred twenty-five of the 753 cases from these women have an average wage of zero in an hour. Then, Barros et al. [32] considered the average hourly wage of the women as a dependent variable (y), while the independent variables are as follows: age of the women (x1), education of the women (x2), number of children <6 years (x3), number of children between the ages 6 and 18 (x4), and previous labor market experience of the women (x5). With the method of Toker et al. [6], to examine the existence of multicollinearity, or not, the matrix eigenvalues are given as 69,601.81, 1,723.52, 334.22, 54.43, 6.22, and 0.36, and the condition number is calculated as 441.09, and these results connote that there is high multicollinearity. The parameters and MSE are estimated and presented in Table 8.
Table 8 shows that the TMLE performs worse as expected. Also, the TRE has a near MSE value with the biasing parameter estimator to that of the proposed TNRTE with biasing parameter estimator . Moreover, the proposed TNRTE has the lowest MSE value among the mentioned estimators (TRE and TLE), followed by TLE and then the TRE, when k = d = 0.3; this means that the proposed TNRTE is the best in this case.
Figure 1 shows that the proposed TNRTE with biasing parameter k from 0.18 to 0.58 performing better than other mentioned estimators, and when k equals 0.36, the proposed TNRTE has the least MSE; which means it is the best of all given estimators, while the TMLE performs the worst as expected.
Conclusions
In this study, we proposed the Tobit new ridge-type estimator (TNRTE) for overcoming the multicollinearity problem of the censored model. Theoretically, we compared the proposed TNRTE with some given estimators: the Tobit maximum likelihood estimator (TMLE), the Tobit ridge estimator (TRE), and the Tobit Liu estimator (TLE), and gave biasing parameter estimators of the proposed TNRTE. Then, a simulation study was performed to know the performance of the TMLE, the TRE, and the TLE with the proposed TNRTE. The results of the simulation indicate that the proposed TNRTE is better than other existing estimators in most cases. Moreover, real-life Mroz data were used to clarify the study results.
Data Availability Statement
The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding author.
Author Contributions
ID, MA, and FA contributed to conception and structural design of the manuscript. MA performed the simulation and application. All authors contributed to manuscript revision, read, and approved the submitted version.
Conflict of Interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Publisher's Note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
Acknowledgments
The authors would like to thank the Deanship of Scientific Research at King Saud University represented by the Research Center at CBA for supporting this research financially.
References
1. Tobin J. Estimation of relationships for limited dependent variables. Econometrica. (1958) 26:24–36. doi: 10.2307/1907382
4. Khalaf G, Mansson K, Sjolander P. A Tobit ridge regression estimator. Commun. Stat. Theory Methods. (2014) 43:131–40. doi: 10.1080/03610926.2012.655881
5. Alhusseini FHH, Odah MH. Principal component regression for Tobit model and purchases of gold. In: Proceedings of the 10th International Management Conference, Bucharest, Romania. (2016) 10:491–500.
6. Toker S, Özbay N, Siray GÜ, Yenilmez I. Tobit Liu estimation of censored regression model: an application to Mroz data and a Monte Carlo simulation study. J Stat Comput Simul. (2021) 91:1061–91. doi: 10.1080/00949655.2020.1828416
7. Hoerl AE, Kennard RW. Ridge regression: biased estimation for nonorthogonal problems. Technometrics. (1970) 12:55–67. doi: 10.1080/00401706.1970.10488634
8. Liu K. A new class of biased estimate in linear regression. Commun Stat Theory Methods. (1993) 22:393–402. doi: 10.1080/03610929308831027
9. Kibria BMG, Lukman AF. A new ridge-type estimator for the linear regression model: simulations and applications. Hindawi. (2020) 2020:1–16. doi: 10.1155/2020/9758378
10. Lukman AF, Dawoud I, Kibria BM, Algamal ZY. A new ridge-type estimator for the gamma regression model. Scientifica. (2021) 2021:5545356. doi: 10.1155/2021/5545356
11. Lukman AF, Algamal ZY, Kibria BG. The KL estimator for the inverse Gaussian regression model. Concurr Comput Pract Exp. (2021) 33:e6222. doi: 10.1002/cpe.6222
12. Akram MN, Kibria BG, Abonazel MR. On the performance of some biased estimators in the gamma regression model: simulation and applications. J Stat Comput Simul. (2022). doi: 10.1080/00949655.2022.2032059. [Epub ahead of print].
13. Dawoud I, Abonazel MR. Generalized Kibria-Lukman estimator: method, simulation, and application. Front Appl Math Stat. (2022) 8:880086. doi: 10.3389/fams.2022.880086
14. Awwad FA, Odeniyi KA, Dawoud I, Algamal ZY, Abonazel MRBM, Tag Eldin E. New two-parameter estimators for the logistic regression model with multicollinearity. WSEAS Trans Math. (2022) 21:403–14. doi: 10.37394/23206.2022.21.48
15. Abonazel MR, Dawoud I, Awwad FA. Dawoud–Kibria estimator for beta regression model: simulation and application. Front Appl Math Stat. (2022) 8:775068. doi: 10.3389/fams.2022.775068
16. Amemiya T. Regression analysis when the dependent variable is truncated normal. Econometrics. (1973) 41:997–1016. doi: 10.2307/1914031
17. Fair RC. A note on computation of the Tobit estimator. Econometrics. (1977) 45:1723–7. doi: 10.2307/1913962
18. Wang SG, Wu MX, Jia ZZ. Matrix Inequalities. 2nd ed. Beijing: Chinese Science Press (2006), p. 1–116.
19. Farebrother RW. Further results on the mean square error of ridge regression. J R Stat Soc B. (1976) 38:248–50. doi: 10.1111/j.2517-6161.1976.tb01588.x
20. Trenkler G, Toutenburg H. Mean squared error matrix comparisons between biased estimators-an overview of recent results. Stat Pap. (1990) 31:165–79. doi: 10.1007/BF02924687
21. Kibria BMG. Performance of some new ridge regression estimators. Commun Stat Simul Comput. (2003) 32:419–35. doi: 10.1081/SAC-120017499
22. Yenilmez I, Mert Kantar Y, Acitaş S. Estimation of censored regression model in the case of non-normal error. Sigma J Eng Nat Sci. (2018) 36:513–521.
23. Yenilmez I, Mert Kantar Y. An alternative estimation method based on alpha skew logistic distribution for parameters of censored regression model. Data Sci Appl. (2019) 2:16–20.
24. Yenilmez I, Ilhan U, Mert Kantar Y. Quasi-maximum likelihood estimator based on moyal distribution for censored data. In: 5th International Researchers, Statisticians and Young Statisticians Congress Aydin, Turkey. (2019), p. 419–27.
25. Dawoud I, Abonazel MR. Robust Dawoud–Kibria estimator for handling multicollinearity and outliers in the linear regression model. J Stat Comput Simul. (2021) 91:3678–92. doi: 10.1080/00949655.2021.1945063
26. Awwad FA Dawoud I, Abonazel MR. Development of robust Özkale–Kaçiranlar and Yang–Chang estimators for regression models in the presence of multicollinearity and outliers. Concurr Comput Pract Exp. (2022) 34:e6779. doi: 10.1002/cpe.6779
27. Abonazel MR, Dawoud I. Developing robust ridge estimators for Poisson regression model. Concurr Comput Pract Exp. (2022) 34:e6979. doi: 10.1002/cpe.6979
28. Algamal ZY, Abonazel MR. Developing a Liu-type estimator in beta regression model. Concurr Comput Pract Exp. (2022) 34:e6685. doi: 10.1002/cpe.6685
29. Abonazel MR, Algamal ZY, Awwad FA, Taha IM. A New Two-parameter estimator for beta regression model: method, simulation, and application. Front Appl Math Stat. (2022) 7:780322. doi: 10.3389/fams.2021.780322
30. Ozkale MR, Kaçiranlar S. The restricted and unrestricted two-parameter estimators. Commun Stat Theory Methods. (2007) 36:2707–25. doi: 10.1080/03610920701386877
31. Mroz TA. The sensitivity of an empirical model of married women's hours of work to economic and statistical assumptions. Econometrica. (1987) 55:765–99. doi: 10.2307/1911029
Keywords: censored regression model, multicollinearity, Tobit Liu estimator, Tobit ridge estimator, Tobit new ridge-type estimator
Citation: Dawoud I, Abonazel MR, Awwad FA and Tag Eldin E (2022) A New Tobit Ridge-Type Estimator of the Censored Regression Model With Multicollinearity Problem. Front. Appl. Math. Stat. 8:952142. doi: 10.3389/fams.2022.952142
Received: 24 May 2022; Accepted: 21 June 2022;
Published: 15 July 2022.
Edited by:
Han-Ying Liang, Tongji University, ChinaReviewed by:
Guoliang Fan, Shanghai Maritime University, ChinaFuxia Cheng, Illinois State University, United States
Copyright © 2022 Dawoud, Abonazel, Awwad and Tag Eldin. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Mohamed R. Abonazel, bWFib25hemVsJiN4MDAwNDA7Y3UuZWR1LmVn