- 1Academic Area of Basic Sciences and Modeling, Faculty of Natural Sciences and Engineering, University of Bogotá Jorge Tadeo Lozano, Bogotá, Colombia
- 2Territorial Cundinamarca, Superior School of Public Administration, Fusagasugá, Colombia
Introduction: Student dropout, as a dynamic and complex system, requires a broad conceptualization. The aim of this article is to analyze the concept of student dropout in higher education, with the aim of effectively addressing it at various levels, including both institutional and societal.
Methods: Using a mixed-methods approach, dropout patterns were traced, and a model was designed and validated using anonymized data from 17,328 students at a Colombian higher education institution offering face-to-face programs.
Results: Results from decision trees and survival analysis highlight the significance of economic and academic factors in increasing the risk of dropout and contributing to low graduation rates. It has been confirmed that the first two years of enrollment in the educational institution are crucial for the likelihood of dropout, and that extended time spent at the institution also increases the risk of dropout.
Discussion: The study highlights the dynamic complexity of student dropout and emphasizes the importance of continuously updating models by integrating diverse analysis techniques. Socioeconomic status and academic performance emerged as key factors, with a focus on students at intermediate levels.
1 Introduction
Student dropout is a complex phenomenon that not only affects students whose aspirations are cut short, it also impacts families who invest their resources in their children´ education and whose investment without a guarantee of return (Guzmán et al., 2021). Student dropout also affects higher education institutions (HEIs) because it leads to decreased enrollment income (Roslan et al., 2024; Guzmán et al., 2021), impacts society, because its social capital does not increase as required, affects productivity (Bronfenbrenner, 1996; Schmitt and Santos, 2013), social relationships and the performance of citizenship skills (Swail et al., 2003).
Donoso and Schiefelbein analyzed different models that explain student dropout or, if desired, student permanence in education (Donoso and Schiefelbein, 2007). This is illustrated by the models of Fishbein and Ajzen from 1975, Spady from 1970, Tinto from 1987, Ethington from 1990, Bean from 1985, Pascarella and Terenzini from 1985, Waidman from 1989. The classical approaches outlined in Table 1 provided foundational perspectives that are essential for understanding the multifactorial nature of the student dropout. By examining Table 1, one can build a comprehensive framework that integrates historical insights with contemporary data.
HEIs, aiming to support their students and enhance student retention on governmental, academic or theoretical models, base their permanence and timely graduation plans, and their early warning systems, on their academic information systems, official platforms and on their conception of dropout and student retention. This approach underscores how a culture of institutional information -serving as a tool to promote permanence- is rooted in the availability of reliable quality information for planning, formulating, evaluating, and monitoring policies to mitigate student dropout (Palomino and Ortega, 2023; Tete et al., 2022). Student desertion affects the sustainability and stability of HEIs as it drains the necessary resources for their academic and education production in terms of their activities of teaching, research and social projection (Cosenz, 2014, 2022).
It is from this context, from the perceived evolution of the concept of student dropout, its complexity, its monitoring and treatment in higher education, that the research question that guides this document arose.
Research question: How can the concept of student dropout in higher education be analyzed to effectively mitigate its effects at an institutional level?
Consequently, the objective of this article is to analyze the concept of student dropout in higher education, with the aim of effectively addressing it at various levels, including both institutional and societal. Although the primary focus is within the confines of HEIs, it is acknowledged that dropout not only impacts these institutions financially and reputationally but also affects society at large by influencing social capital and economic productivity. Therefore, the study proposed herein aims to foster a comprehensive approach that extends beyond the borders of individual HEIs. This is crucial given the dynamic complexity of student dropout, which necessitates ongoing analysis to adapt to evolving explanatory variables. Continuous updates and adjustments to these studies are imperative for keeping abreast of changes and ensuring the efficacy of interventions (Barragán and Lozano, 2021). Aligned with the culture of institutional information, this article identifies the most relevant factors contributing to student dropout. It also offers strategic insights for decision-makers, facilitating the management of this challenge in a way that optimally uses human, economic, and technological resources within student support programs across HEIs, thus benefiting broader educational and social systems (Ministerio de Educación Nacional, 2015).
To achieve the objective, after the introduction, the theoretical references on which this work was based are presented. Subsequently, the research methodology and the results of its implementation are defined, this implementation was operationalized using a database that belong to a private HEI. Finally, the conclusions of the research are detailed.
2 Theoretical references
Vincent Tinto’s university dropout model, published in 1975, marked a milestone in the study and modeling of the subject. Figure 1 illustrates the connections between articles and Tinto’s publication. Node size represents the number of citations each article has received. In the case of Tinto (1975) it is 7,603 citations (until April 2024), which demonstrates the great academic incidence his work has had. Color intensity indicates the recency of each connected article. The lighter the color, the closer the publication date of the article is to 1970. The darker, the closer the publication is to the year 2020.
Figure 1. Graph for articles related to dropout from higher education: a theoretical synthesis of recent research by Tinto (1975). Prepared by the authors based on the free tool www.connectedpapers.com. This interactive graph is located at https://bit.ly/3Q2VrVA.
As already mentioned, Vincent Tinto was one of the forerunners of conceptual studies and modeling of student dropout (Tinto, 1975, 1993). From a sociological perspective, Tinto’s interaction model proposes that student desertion results from the level of academic and social integration students achieve during their tenure through HEIs. His model is explanatory, positing that misalignment between student goals and institutional objectives greatly increases the likelihood of dropout. Although it is a classic model, it is not infallible; McCubbin points out at least two aspects that require reconsideration. First, its limitation: it does not consider economic explanatory variables. Second, its focus on “typical” students, whereas today’s understanding of this category encompasses a broad diversity of characteristics (McCubbin, 2003). On the other hand, Donoso and Schiefelbein pointed out that the model does not weigh variables, such as the types of institutions to which the students are affiliated, especially in cases in which institutions are not exactly framed as traditional (Donoso and Schiefelbein, 2007). Recently, Hadjar, Haas, and Gewinner redefined the models of Spady and Tinto through a conceptual approach that emphasized the individual backgrounds of students and how satisfied they are with the support structures of HEIs (Hadjar et al., 2022).
In 2012, the ALFA GUIA DCI-ALA/2010/94 Project –funded by the European Union– conducted works aimed at contributing to define a prediction model of student dropout that may add to the factors considered in previous models, e.g., factors skipped or recently associated with the event (Grupo Análisis. Proyecto ALFA GUIA DCI-ALA/2010/94, 2012, p. 3). These conceptual works have had diverse impacts on Latin American research. For example, they have provided theoretical foundations, including the classical authors described in Table 1, they have developed their own approximations, such as that of the Project, in which they proposed a shift, even in the terminology used. Thanks to its theoretical strengthening, the ALFA GUIA DCI-ALA/2010/94 Project synthesized a conceptual framework, compiled a matrix of models and theories, diagrammed, conceptualized, and identified the typology of student dropout (Red GUÍA. Gestión Universitaria Integral del Abandono, 2020). Regarding the terminology, the Project positioned the term abandono [abandonment] instead of the term deserción [desertion], given the negative connotation that the latter has. The term “desertion” excludes the alternative decisions that a student might make to follow other paths, provided by different life options more in line with their individual interests. This might be considered as professional reorientation. The Project reaffirms the use of the term abandonment and its understanding as a relational, interactive and dynamic event. The word presents an individual, institutional and social act –caused by an assessment made of education based on intrinsic and extrinsic expectations, offers and demands– that modifies the interactions between the different educational agents. A contextualized and complex event that must be approached interdisciplinary, based on multiple and complementary strategies (Proyecto ALFA GUIA DCI-ALA/2010/94, 2013, p.4).
Hadjar et al. (2022) refined the traditional Spady-Tinto approach to understanding higher education dropout intentions. The traditional models, developed by Spady and Tinto, emphasize the role of academic and social integration in affecting student retention and dropout rates, the refined model that still acknowledges the core components of the traditional approach but extends it by incorporating individual background characteristics such as gender, social origin, and immigration background, as well as satisfaction with institutional support. This update model aims to provide a more nuanced understanding of dropout intentions by considering a broader range of factors that influence a student’s educational journey.
This is how the scope of the definition of student dropout (or abandonment) is closely related to the purpose, depth and level of resolution of the models or studies to be conducted (i.e., Martins et al., 2023; Mostert et al., 2023; Gallegos et al., 2018; Lema et al., 2023; Montoya-Restrepo et al., 2020). That is why operational approximations for statistical tracking purposes have emerged. Such is the case of the definition of deserter provided by the Ministry of National Education of Colombia (MEN in Spanish) in which a student becomes a deserter when he/she abandons his/her education process in a HEI, voluntarily or forcibly, for two or more consecutive academics periods (Ministerio de Educación Nacional, 2015). Additionally, Ministerio de Educación Nacional (2009) and Seminara and Aparicio (2018) defined student dropout differently by its occurrence over time, or, for example, if the student abandons one major for another in the same HEI or if the students leave the university completely and depending on the cause (see Figure 2).
Figure 2. Approaches to student dropout by occurrence in time and space. Elaborated by the authors based on Ministerio de Educación Nacional (2009) and Seminara and Aparicio (2018) using PresentationGO.com.
Defining dropout within an academic context –which favors preventive approaches (retention) over reactive ones—allows accommodating modeling at all levels of analysis, so that it is possible to identify success and risk factors, accompaniment opportunities for students with particular needs or groups with special characteristics (Lema et al., 2023; Gómez-León, 2022; Guzmán et al., 2021; Martinez-Daza et al., 2021; Pineda-Báez, 2021; Casanova et al., 2021). Student retention represents the ongoing initiative of HEI to develop strategies aimed at enhancing institutional capacity which contribute to reducing dropout rates. Likewise, it is established as a significant element in the development of the institutional educational plan (Ministerio de Educación Nacional, 2015).
For this, it is essential to characterize the student population periodically and in detail to identify the determining variables (Figure 3) that explain student dropout. After conducting the characterization, student dropout and permanence must be diagnosed, followed up and treated from multiple institutional levels (Ministerio de Educación Nacional, 2009) following institutional objectives, plans and programs. This process must account for indicators of access, permanence, dropout and graduation framed in the Educational Quality Assurance System (Escobar, 2013; Ramírez et al., 2013; Ministerio de Educación Nacional, 2018; Ministerio de Educación Nacional, 2019; Consejo Nacional de Educación Superior, 2014).
Figure 3. Determining factors of student dropout. Elaborated by the authors based on Ministerio de Educación Nacional (2015) using PresentationGO.com.
3 Methodology
As a preliminary step in establishing the research methodology, this study examined traditional models, public policies, and academic works by researchers who have addressed student dropout using various methods. These elements are reflected in the reference framework of the current study. To meet the objective of this work we developed three consecutive phases. The last one, the validation and operationalization of the model, had three stages:
Phase 1—Qualitative: A non-exhaustive theoretical exploration of the concept of student dropout was conducted because the meaning used is associated with the type of study that the researchers design at the time (intervention, survey of factors, intervention evaluation, characterization, exploration or modeling) (Barragán and Lozano, 2021) and the purpose of their research, so the approach is general in nature (Barragán et al., 2022; Guzmán et al., 2021; Barragán et al., 2015). The results of this phase were described in part of the introduction and in the theoretical references.
Phase 2—Qualitative: An integrating visual model was designed to define student dropout in higher education in the analysis for any HEI. This phase includes the construction of a definition.
Phase 3—Quantitative: The phase 2 model was validated and operationalized using a combination of the definition of the ALFA GUIA DCI-ALA/2010/94 Project and modeling through data mining and survival analysis. In this stage, the information available in the institutional databases of Universidad de Bogotá Jorge Tadeo Lozano (Utadeo) was used. Utadeo is a private HEI in Colombia, renowned for its focus on the arts, design, and creative sciences. It offers a broad range of academic programs from visual arts to engineering and marine sciences.
To do this, this phase had the following stages:
Stage 1: Configuration of the databases. In this phase, the information collected by the academic and administrative units in charge of capturing information at Utadeo was organized and consolidated in digital form. The consolidated information was also filtered, processed and structured.
Stage 2: Characterization of the population of the institution for the academic periods between the first semester of 2017 and the first of 2021 (in 2022, Utadeo initiated a self-assessment process aimed at securing high-quality accreditation, which restricted the analysis to the period from 2017 to 2021). The population was profiled considering the demographic and context variables presented in the University databases (Figure 3). In this phase, the appropriate models for undergraduate student dropout in Utadeo were also selected according to the database obtained in stage 1. It is noteworthy that this stage was executed following the information revealed by the institutional databases. Although the techniques chosen here are widely used, it was only possible to make the technical decision of the processing in the presence of each database. For this validation and considering the academic definition of the ALFA-GUIA Project, two statistical modeling techniques –which address dropout and student retention with multiple and complementary strategies– were coupled. To combine these two types of modeling techniques, a section called “sample” was incorporated. In this section, the participants, the instruments and the procedure used to process the databases are described. The modeling was used with the Decision Trees technique –aimed at organizing and categorizing the variables that affect the permanence of the students by hierarchies– (Roslan et al., 2024) and with the Survival Analysis technique. This technique was chosen to estimate the survival and dropout risk function, as well as to identify some of the predictor variables available in the University databases (Barragán et al., 2022; Rodríguez and Zamora, 2014; Castaño et al., 2004).
3.1 Description of the method for Phase 3
3.1.1 Instruments
No special instrument was designed; instead, information was obtained through the institutional departments from which it was requested. This may be an advantage because, if this combination of models needs to be replicated, the records on which the information rests are in the academic information system of the HEIs.
3.1.2 Sample
A single database was created, it included 17,328 students with unique identification code (UIC), that is, the number of students taking undergraduate courses from the first period of 2017 to the first period of 2021 at the Bogotá campus of Utadeo. In each period, a count of the subjects taken by each of the UINs was made. Subsequently, two more variables were defined:
Closure: variable understood as the semester in which the student had his/her last courses. In other words, it is the last semester in which the student was identified as active. This variable is a date.
Permanence: variable defined as the minimum between the closure date and the date of graduation minus the date of admission. The variable is the student’s number of periods of activity at the University. In total, the database had records for 86 variables and the UIC identifier. Some of these variables included in the database were: UIC, date of birth, gender, age, department of residence, municipality of residence, campus, access method, dual program, financial status, academic status, administrative status, disciplinary status, graduation date, dropout, graduated, closure, admission period, faculty, academic program, academic period, academic level, civil status, governmental standardized exam score, biology score, math score, philosophy score, physics score, history score, English score, chemistry score, language score, geography score, social studies core, language test score, verbal score, social and civic science score, quantitative and abstract reasoning score, civic competence and intrinsic score, natural science score, Spanish and literature score, basic credits obtained, mandatory credits obtained, optional credits obtained, elective credits obtained, project credits obtained, comprehensive credits obtained, blocked, typology, health promoting entity, Sisbén (social identification system of potential beneficiaries) and ethnicity.
3.1.3 Processing
When pre-processing the database, the combination of decision trees and duration models was chosen.
3.1.3.1 Decision trees
Statistical modeling made through decision trees offers a predictive system that classifies observations framed in decision rules (Roslan et al., 2024). This classification system changes as the dynamics of the phenomenon of student dropout change (Roslan et al., 2024; Tan et al., 2006). This technique favors the hierarchization of the variables that affect the permanence or dropout of students, disregarding explanatory variables that do not contribute or contribute very little to the response variable (Hernández et al., 2004).
3.1.3.2 Duration or survival models
A survival model involves three elements: the occurrence of the student dropout event, the variables with the greatest influence on survival, and the time elapsed until it occurred (Singer and Willett, 1993). The survival function is essential since it measures how probable it is that a student will persist in Utadeo beyond a given period, as well as the risk function, which measures the probability that this student will drop out over time (Rebasa, 2005). These functions make it possible to identify the evolution of dropout –when it occurs with the greatest probability or when there is a greater risk– and which are the most influential variables. Estimates of survival and hazard functions are approximated using the Kaplan–Meier estimator (Lee and Wang, 2013) based on the proportion of individuals surviving at time t. However, such an approximation is not feasible when there are censored times (Breslow, 1970), that is, survival times of students who have not experienced dropout (dropout does not occur either because of graduation or because it is not known whether the dropout event will occur or not) (Rebasa, 2005). Comparing survival distributions at different levels of a factor is useful in determining the significant difference between variables. This comparison can be made using the Log Rank statistical test, with which all observations are equally weighted (Cox, 1972).
Integrating modern data techniques, such as decision trees and survival analysis, with traditional models of student dropout offers a nuanced approach to understanding the dynamics of student retention and dropout in higher education. This integration not only underscores the continued relevance of longstanding theoretical frameworks but also uncovers subtle patterns and interactions that previous studies may have overlooked. By blending these methodologies, the research presents a comprehensive tool for analyzing complex educational data, thereby facilitating a deeper understanding of the factors influencing student retention and dropout.
The described methodology and the processing of the database yielded the results that are presented below.
4 Results
The model from phase 2 is presented in Figure 4. It reconciles governmental, institutional, and public policies and HEI conceptions to address and mitigate student abandonment in higher education.
Figure 4. Integrating model of both governmental and institutional concepts and policies. Source: Elaborated by the authors using PresentationGO.com.
The results are presented according to Figure 4 and after carrying out the proposed methodology. The findings for 2017–2021 are intended to contribute to the modeling of undergraduate student dropout at Universidad de Bogotá Jorge Tadeo Lozano (Utadeo) and, thus, improve the understanding of dropout and student retention. It is important to note that Utadeo adopted the government’s definition of abandonment outlined by the MEN for comparative purposes and to provide information to official platforms. However, it also works around the academic definition of the ALFA-GUIA Project in conjunction with its own definition of deserter, which is based on the characterization of the population and statistical modeling (in this case) to achieve historical references, grounded explanations and approaches to prediction.
4.1 Characterization of the population in relation to each base variable, grouped into the determining factors of student desertion
The presentation of results commences with an analysis of the population based on the determinants of student dropout as in Figure 3. This approach aligns with the traditional model proposed by Tinto (1975). Therefore, it was deemed appropriate to discuss these elements within the results section rather than in the methodology.
4.1.1 Individual determining factor
Of the 17,328 students, 53.3% were women and 46.7% were men. The average age of the students on the date of admission to the university in the four faculties in which Utadeo is organized –Faculty of Economic and Administrative Sciences (FCEA), Faculty of Natural Sciences and Engineering (FCNI), Faculty of Arts and Design (FAD) and Faculty of Social Sciences (FCS)– was 21.4 years, with a standard deviation of 5.25 years. The median age is 21, which implies that 50% of students enter the university being older than 21.
78.84% of the students say they are single and 0.76% married.
Regarding disability, 38.91% of the students indicate that the condition does not apply in their case. Sensory disability was recorded in 44 students (low vision) and physical disability in 16.
Regarding the Ethnicity variable, 99.1% of the students indicated that they do not recognize themselves as part of an ethnic group. Although only 0.9% –a small percentage of students– do so, the connotations that this variable has for permanence make its monitoring essential.
The variables also include location, depending on the municipality in which the 17,276 students live. 72.13% of the students indicated that they live in Bogotá, followed by Soacha (1.53%) and Chía (1%) (Chía and Soacha are Colombian municipalities conurbated with Bogotá). There are 52 students for whom this information is not available.
4.1.2 Institutional determining factor
The type of access called regular is the most frequent (54.29%), then new (24.47%), followed by far by External Transfers at 4.5%. Also, the majors that the students were mostly a part of were identified: Chemical Engineering 2012-1S (5.07%) (1S means first period of the year), Graphic Design 2012-IS (5.02%), Advertising-IS (4.44%), Industrial Design 2011-IS (4.42%) and Proyecto Enlace (3.52%), this project creates a bridge between high school and university.
In the database, students were found whose admission period was from 1988-2S to 2021-1S. 8.86% entered in 2016-1S, 8.72% in 2017-1S, 8.31% in 2015-1S and 8.24% in 2017-2S.
4.1.3 Socioeconomic determining factor
In Colombia, the System for the Identification of Potential Beneficiaries of Social Programs (Sisbén in Spanish) classifies the population into levels (1–7) according to their income and living conditions, Lower levels represent less favorable conditions (Sistema de Identificación de Potenciales Beneficiarios de Programas Sociales, 2020). Of the total student base, 10,670 (61.57%) were identified at some level of Sisbén. 46.88% of the students indicated being in level 1 of Sisbén (lowest), 6.63% in level 2 and 5.55% in level 3. At Sisbén levels above 3, policies do not provide special support, as these levels are associated with favorable socioeconomic conditions. Survival analysis and decision trees will reveal that this variable has a significant impact on graduation and dropout rates.
4.1.4 Academic determining factor
Of the 17,328 students, 5,799 have already graduated and 11,529 have not. 3,728 dropped out of their studies at Utadeo during the study period.
It is noteworthy that the database has: 6,837 students (39.45% of the total) from the Faculty of Arts and Design (FAD), 3,638 (20.99%) from the Faculty of Natural Sciences and Engineering (FCNI), 3,275 (18.90%) students from the Faculty of Social Sciences (FCS) and 2,928 (16.89%) from the Faculty of Administrative Economic Sciences (FCEA). The faculties are organized in descending order by size.
In the database, the dropout and graduation percentages were also identified with respect to the total number of students (those who dropped out and those who graduated). Specifically, the FAD contributed a 36.6% dropout rate over the total and 44.1% of the total number of graduates. The FCNI, 22.5% of the dropouts and 18.3% of the graduates. The FCS provided 19.5% dropout and 18.1% graduation. For its part, the FCEA participated with 19.5% of the total dropout and 18.1% of the total graduation.
Subsequently, the modeling process began. Its first stage consisted of data mining, particularly with the decision tree technique.
4.1.5 Decision tree for permanence in Utadeo
Just as in the characterization section, the database established for the first phase of the methodology was used here, since it contains the academic information and some demographic variables of the 17,327 different students who enrolled in at least one undergraduate course in said period in Bogotá (17,327 of 17,328 had complete academic information).
Considering the information provided in the database of graduate students of Bogotá from 2011 to 2020, the graduation event and its date were marked. The occurrence of the university dropout event was also marked, considering those undergraduate students who stopped enrolling in courses in at least two academic periods. Finally, the permanence was calculated in the following way:
1. For non-graduate students: The number of semesters between the date of admission and the date of termination of the last semester with at least one subject registered at the university.
2. For graduate students: The number of semesters between the date of entry and the closing date of the semester in which the degree was obtained.
Figure 5 shows the permanence distribution of graduate students who registered at least one subject between 2017 1S and 2021 1S. There is a minimal portion of graduate students with a permanence longer than 21 semesters and they are not included in this figure. These extended academic periods are due to academic leave, health issues, or re-enrollment (extended academic periods are those that exceed 10 academic periods). The 270 students who graduated in 2 years or less are highlighted. In general, 50% of graduating students took 9 semesters or less to complete their undergraduate degree and the average time to graduate was almost 10 semesters (9.66). The number of semesters should be eight, in the case of a fully successful path.
Regarding the academic information, the procedure involved calculating the average grades of each student for subjects enrolled in from 2017 to 2021. This variable effectively reflects the academic performance of those who pursued their university education during the 2017–2021 period, although it is less indicative for those who either started or finished their studies at the beginning of this period. Figure 6 shows the distribution of the academic average based on the academic information of 17,120 students with an average course grade of 3.83 with a standard deviation of 0.53 (on a scale of 0.0–5.0).
Additionally, a low academic performance variable was arbitrarily defined for those students with an average equal to or less than the first quartile of the grade distribution. That is, students with averages lower than or equal to 3.46049 (value that maximizes the discrimination of those who drop out with the criterion of the highest Gini inequality index) on a scale of 0 to 5. Students with the characteristic are considered as having a low academic level while the others are considered as having a normal academic level.
Given that the standard duration for completing an undergraduate degree at Utadeo is 4 years, the database was filtered to analyze the behavior of students who had completed at least eight semesters by the first semester of 2021, whether they had graduated or not.
Of the 17,328 students who enrolled in at least one subject between the first period of 2017 and the first of 2021, 8,350 of them have a permanence of eight semesters or more. As already mentioned, 4 years is the time it takes to graduate for students who have an adequate academic load and who have a passing academic performance in most subjects (a grade equal to or greater than 3 on a scale of 0 to 5). The percentage of graduates of these students (with 4 years or more of permanence) is 55% (node 0 in Figure 7). Of them, 3,104 report being classified in Sisbén with categories 1 to 7 and their graduation rate is barely 22.1% (node 3) without including 186 of them who dropped out of the university. On the other hand, those who do not report being in any of these Sisbén categories have a graduation rate of 85.1% (node 5) without including the 604 who dropped out of the university. This reinforces the importance of the economic factor in achieving academic goals and reveals the imperative need to monitor the academic performance and social conditions of those students who initially report being classified in levels 1–7 of the Sisbén, as a strong indicator of graduation.
The Utadeo study revealed economically disadvantaged students, as identified by their Sisbén levels, are more likely to graduate owing to targeted support systems. These supports include scholarships, financial aid, and academic counseling, which help them overcome barriers to education, making their graduation rates higher despite socioeconomic challenges. This could demonstrate the effectiveness in aiding student retention and success.
Table 2 presents the correct classification rates of the decision tree for the complete sample of student. There, it can be observed that 65.5% of the non-graduates and 86.0% of the graduates are correctly classified using this model. To increase these correct classification rates, the time window can be improved, the definition of academic performance modified, the failure variable included, or a more precise record of student attendance in class can be developed.
Figure 8 shows that, although the dropout rate calculated in this segment of the population (who abandoned) in the period 2017–2021 is 9.3%, in the group of students with a low academic level (average of up to 3.46049) this rate is more than triple (25.1%) compared to the dropout of people with a normal academic level (average above 3.46049).
Of the 17,328 students mentioned above, 5,799 have already graduated and 11,529 have not yet done so. Of the latter, 7,801 are still active and 3,728 dropped out of the university, that is, they spent at least two academic periods in which they did not register for courses (Table 3).
Figure 9 shows the general map of the tree, which has 14 nodes with dropout as the response variable and graduation, gender, permanence, and academic level (low or normal) as independent variables.
Figure 10 shows the 17,328 students considered, of which 5,799 managed to graduate. Of the 11,529 non-graduates, 3,728 (node 1) students who abandoned their studies should be highlighted (under the definition of dropout: non-graduates who stopped enrolling in courses in at least two academic periods).
Figure 11 shows the follow-up line of non-graduate students with normal academic performance, that is, those who obtained an average equal to or greater than 3.46049 in the enrolled subjects. Here the students who have been at the university for a considerable time –eight semesters or more (node 5)– stand out, 537 of them dropped out of the university (17.6%, 537 of 3,052) and this rate increased a little to 20.5% if the student was a man.
Figure 12 continues the line of monitoring of non-graduate students with a low academic level (node 4). It reveals that 1,134 abandoned their studies (node 4). In this case, what is important is to observe those who have been in the university for 8 semesters or more, noting that, of 705 students, 253 dropped out of the university (node 7).
As a technical specification, the correct classification rates of the decision tree on the complete sample of students are attached in Table 4. In it, 96.20% of the students who did not drop out of the university were correctly classified with this model. While 23.60% of those who did drop out are classified correctly. These correct classification rates are compromised by the uncertainties observed in nodes 9–14, as none of these nodes are absorbing. For instance, in node 9, women with eight or more semesters of permanence exhibit a high probability of persistence (85.1%—the highest among nodes 9–14); yet within this segment, 235 students still dropped out. It is important to emphasize that this is initial model serves to explore how dropout probabilities vary across different population segments based on gender, duration of study, or academic level. This preliminary analysis provides a foundation for ongoing enhancements, aiming to develop a more sophisticate model that can be integrate into the institutional early warning system in the future.
4.2 Survival analysis for the Utadeo population based on information from 2017 to 2021
4.2.1 Kaplan–Meier estimator for permanence at Utadeo
By knowing the permanence length of the students in Utadeo and the occurrence of the dropout event, it is possible to implement the survival analysis described in the theoretical review to estimate the survival, density and risk functions of said event. To develop this technique, the following was taken into account: (1) The Kaplan–Meier limit product estimation method; (2) that the censored data comes from graduate students (the dropout event does not occur); and (3) that to compare the survival distributions at the different levels of a factor, the Log Rank test was used, with which all the observations are equally weighted and no assumption of normality of the distribution of permanence times is made. To highlight the differences in the survival function over time, the graphs are shown in logarithms. Figure 13 shows the logarithm of the Kaplan–Meier estimator for the survival function from the first period of 2017 to the first of 2021. From this fact, it was possible to identify that the first 2 years of permanence at the university are critical for the occurrence of dropout and that from the fifth to the tenth semester the permanence and the dropout rates stabilize. Once again, those students who remain for more than 10 semesters and fail to obtain the degree show a gradually increasing latency to drop out. These last two facts should be the focus of management attention for permanence and timely graduation and to allocate resources in accordance with the milestones indicated.
The academic component is one of the factors that influence dropout that is worth highlighting. The probabilities of dropping out are much higher in students who have lower academic averages and are much more accentuated in the first 2 years at university, as well as in students who have stayed in it for 10 semesters or more, as shown in Figure 14.
The survival analysis demonstrated significant differences in dropout rates, with lower academic performers showing much higher dropout rates. Utilizing the Kaplan–Meier estimator to compare survival distributions, the results showed that students with lower academic averages were significantly more likely to drop out between entry and the fourth semester. These findings suggest that academic performance is a crucial predictor of student retention, highlighting the need for early academic support and intervention strategies.
Further analysis employing the Log Rank test confirmed the profound impact of socioeconomic status on student dropout rates. Student categorized in lower Sisbén levels, (who do not receive support) demonstrated a significantly increased risk of dropout, underscoring to economic factors as substantial determinants of student retention.
This difference in the distribution of survival reinforces the idea that students who have a Sisbén classification are more vulnerable, as shown in Figure 15. This fact agrees with what had been mentioned about their low graduation rates and with the greater probabilities of dropping out during all the academic periods of their stay at the university.
5 Discussion and conclusion
Student dropout is a dynamically complex phenomenon that requires continuous updates in its governmental and institutional conceptualization, among other aspects. This update is achieved by examining theories from different perspectives, including studies and analyses conducted using robust computational and data processing tools. It was concluded that this continuous updating will enhance the ongoing studies that universities regularly conduct to monitor explanatory variables and their changes. Consequently, HEIs will be able to identify at-risk students and provide them with alternatives that enhance their ability to obtain their degrees in a timely manner.
It is crucial for HEIs to maintain an updated model of student dropout that links various approaches, for example, the approach that arises from public policies, the one based on the characterization of students, and the one from statistical analyses (using historical data and recent cohort data), mathematical, and computer tools that are used to analyze student dropout and to analytically deconstruct it.
The response to the research question on how to analyze and effectively mitigate student dropout at an institutional level is that combining various techniques and models facilitates a deeper understanding of retention, dropout, and timely graduation. This approach enables the optimization of resources, provides support to students who need it most, and ensures the sustainability and stability of HEIs for fulfilling their mission- driven activities.
In the analyzed sample, it was possible to verify the importance of economic and academic factors in influencing graduation rates. The first 2 years at the educational institution were critical in increasing the risk of dropout and contributing to low graduation rates. Interestingly, extended periods at the educational institution (more than 10 semesters) also increased the possibility of dropout. The study’s findings underscore the significance of socioeconomic status and academic performance as determinants of student dropout aligning with established theories by Tinto (1975) and Bean and Metzner (1985), that emphasize the role of integration and educational experiences. Despite the use of advanced methodologies such as decision tress and survival analysis, the results do not diverge significantly from the classical models. This alignment suggests that these models remain robust in explaining the dropout phenomena, even when applied to modern, large-scale data sets.
By applying the research methodology proposed by us, the objective of analyzing student dropout in higher education by appealing to multiple instruments and resources was achieved, thereby mitigating it at the institutional level in an effective way.
In addition to the primary socioeconomic indicators, the decision tree model identified several other relevant variables that significantly influenced student retention and dropout rates at Utadeo. Among these were academic performance, age at entry, disability status, and ethnicity. Academic performance emerged as a critical factor; the model demonstrated that students with lower academic scores were more likely to drop out, aligning with Tinto’s model which posits that academic integration and performance are crucial for student retention.
The analysis reveals a nuanced pattern in student permanence relative to Sisbén levels. Students at the lowest Sisbén levels demonstrate better retention rates, largely attributed to the financial assistance they receive, which alleviates the economic pressures of continuing their education. Conversely, students at the highest Sisbén levels also show strong retention, likely due to their inherent financial stability which buffers against the economic challenges that often precipitate dropout. Interestingly, the risk of dropping out is predominantly concentrated among students with intermediate Sisbén levels, who neither qualify for sufficient financial assistance nor possess adequate financial resources independently. This pattern suggests a potential gap in the current public policy framework regarding financial support. To address this disparity and reduce dropout rates effectively, it is imperative for policy makers to reconsider and possibly expand the eligibility criteria for financial support. Such adjustments would ensure that students across a broader spectrum of the socioeconomic scale receive the necessary support to continue their education, thereby enhancing overall student retention and success.
The low prevalence of students with disabilities or from ethnic minority groups implies that modelling techniques, such as decision tree, struggle to adequately explain their dropout patterns due to the small size of this population (the gain in the Gini index remains unaffected by the inclusion of related variables). Nevertheless, the critical importance of monitoring the dropout patterns of these vulnerable populations is acknowledged. This remark resonates with the inclusive education framework, which advocates for tailored educational strategies to accommodate diverse learning needs and backgrounds, ensuring equitable opportunities for success. These variables impact underscores the multifaceted nature of student retention, echoing the theoretical perspective that successful educational outcomes are often the result of interplay between individual characteristics, institutional conditions, and broader socioeconomic contexts. This comprehensive approach highlights the complexity of predicting educational outcomes and the necessity of incorporating a wide range of factors into retention models to effectively support all students.
This study faces five primary limitations. First, its reliance on quantitative data may overlook nuanced personal experiences and institutional conditions that also impact dropout rates, potentially skewing the analysis. Second, the decision tree model, though effective for handling large datasets, might oversimplify the complex interactions between variables by imposing a hierarchical structure, which could align too closely with traditional models and obscure emerging trends or deeper insights into student behavior. Third, the scope of the database, spanning from 2017 to 2021 and defined by institutional activities, represents a temporal limitation. Fourth, critical variables such as employment status, parenthood, and other special characteristics of students were absent from the higher education institution (HEI) databases, further constraining the comprehensiveness of our analysis. Fifth, the generalization of results may be complicated due to the particular characteristics of the HEI.
Continuing with the combination of diverse approaches to the phenomenon, future research should explore mixed methods approaches that incorporate qualitative data to capture a fuller spectrum of dropout influences. For example, interviews or focus group discussions could reveal more about the subjective experiences of students at risk of dropping out. Applying these advanced methodologies across different educational contexts (such as virtual learning environments or non-traditional student population) may uncover new variables or interaction effects not evident in traditional settings. Such studies could help refine existing models or develop new theoretical frameworks for understanding student dropout.
Data availability statement
The data analyzed in this study is governed by specific licenses and restrictions. The datasets are owned by the university. All requests for access to these datasets should be submitted to c2VjcmV0YXJpYS5nZW5lcmFsQHV0YWRlby5lZHUuY28=.
Author contributions
SB: Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Project administration, Software, Resources, Supervision, Validation, Visualization, Writing – original draft, Writing – review & editing. LG: Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Project administration, Software, Resources, Supervision, Validation, Visualization, Writing – original draft, Writing – review & editing.
Funding
The author(s) declare that no financial support was received for the research, authorship, and/or publication of this article.
Acknowledgments
Artificial Intelligence tools were used exclusively for proofreading, grammar, and style improvements, as well as for checking the accuracy and consistency of the bibliographic references.
Conflict of interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Publisher’s note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
References
Barragán, S., Calderón, G., González, L., Rodríguez, R., and Ruiz, J. (2015). “Referentes Conceptuales para la Retención Estudiantil en la Universidad de Bogotá Jorge Tadeo Lozano” in La Universidad de Bogotá Jorge Tadeo Lozano en el Camino de la Retención Estudiantil. ed. S. Barragán (Bogotá: Universidad de Bogotá Jorge Tadeo Lozano), 21–39.
Barragán, S., González, L., and Calderón, G. (2022). Modelling student dropout risk using survival analysis and analytic hierarchy process for an undergraduate accounting program. Interchange 53, 407–427. doi: 10.1007/s10780-022-09463-7
Barragán, S., and Lozano, Ó. (2021). Explanatory variables of dropout in Colombian public education: evolution limited to coronavirus disease. Eur. J. Educ. Res. 11, 287–304. doi: 10.12973/eu-jer.11.1.287
Bean, J., and Eaton, S. (2001). The psychology underlying successful retention practices. J. Coll. Stud. Retent. 3, 73–89. doi: 10.2190/6R55-4B30-28XG-L8U0
Bean, J. P., and Metzner, B. S. (1985). A conceptual model of nontraditional undergraduate student attrition. Rev. Educ. Res. 55, 485–540. doi: 10.2307/1170245
Breslow, N. (1970). A generalized Kruskal-Wallis test for comparing K samples subject to unequal patterns of censorship. Biometrika 57, 579–594. doi: 10.1093/biomet/57.3.579
Bronfenbrenner, U. A abordagem sistêmica de bronfenbrenner: modelo bioecológico. In Ecologia do Desenvolvimento Humano. (1996). Available at: https://www.passeidireto.com/arquivo/53404439/ecologia-do-desenvolvimento-humano. (Accessed October 15, 2024).
Casanova, J., Gomes, C., Bernardo, A., Núñez, J., and Almeida, L. (2021). Dimensionality and reliability of a screening instrument for students at-risk of dropping out from higher education. Stud. Educ. Eval. 68:100957. doi: 10.1016/j.stueduc.2020.100957
Castaño, E., Gallón, S., Gómez, K., and Vásquez, J. (2004). Deserción Estudiantil Universitaria: Una Aplicación de Modelos de Duración. Lect. Econ. 60, 39–65. doi: 10.17533/udea.le.n60a2707
Consejo Nacional de Educación Superior (2014). Acuerdo por lo Superior 2034. Propuesta de Política Pública para la Excelencia de la Educación Superior en Colombia en el Escenario de la Paz : Ministerio de Educación Nacional. Available at: https://www.mineducacion.gov.co/1621/w3-article-344500.html. (Accessed October 15, 2024).
Cosenz, F. (2014). A dynamic viewpoint to design performance management systems in academic institutions: theory and practice. Int. J. Public Adm. 37, 955–969. doi: 10.1080/01900692.2014.952824
Cosenz, F. (2022). Managing sustainable performance and governance in higher education institutions: a dynamic performance management approach. Switzerland: Springer.
Cox, D. R. (1972). Regression models and life tables. J. R. Stat. Soc. 34, 187–202. doi: 10.1111/j.2517-6161.1972.tb00899.x
Donoso, S., and Schiefelbein, E. (2007). Análisis de los Modelos Explicativos de Retención de Estudiantes en la Universidad: Una Visión desde la Desigualdad Social. Estud. Pedag. XXXIII 33, 7–27. doi: 10.4067/S0718-07052007000100001
Escobar, M. (2013). Lineamientos para Solicitud, Otorgamiento y Renovación de Registro Calificado. Programas de Pregrado y Posgrado. Colombia: SECAB- Publicaciones.
Gallegos, J., Campos, N., Canales, K., and González, E. (2018). Factores Determinantes en la Deserción Universitaria. Caso Facultad de Ciencias Económicas y Administrativas de la Universidad Católica de la Santísima Concepción (Chile). Form. Univ. 11, 11–18. doi: 10.4067/S0718-50062018000300011
Gómez-León, M. (2022). Giftedness from the perspective of neuroimaging and differential pedagogy. Are we talking about the same thing? Rev. Españ. Pedag. 80, 451–474.
Grupo Análisis. Proyecto ALFA GUIA DCI-ALA/2010/94 (2012). Hacia la Construcción Colectiva de un Marco Conceptual para Analizar, Predecir, Evaluar y Atender el Abandono Estudiantil en la Educación Superior Síntesis. Medellín: Alfa Guía. Available at: https://redguia.net/images/documentacion/marco-conceptual/S%C3%ADntesis_del_Marco_Conceptual.pdf (Accessed October 15, 2024).
Guzmán, A., Barragán, S., and Cala, F. (2021). Dropout in rural higher education: a systematic review. Front. Educ. 6:727833. doi: 10.3389/feduc.2021.727833
Hadjar, A., Haas, C., and Gewinner, I. (2022). Refining the Spady–Tinto approach: the roles of individual characteristics and institutional support in students’ higher education dropout intentions in Luxembourg. Eur. J. Higher Educ. 13, 409–428. doi: 10.1080/21568235.2022.2056494
Hernández, H., Osorio, J., and Gálvez, E. (2020). “La Deserción Escolar, un Abordaje desde el Enfoque de la Ecología del Desarrollo Humano de Bronfenbrenner” in Tendencias en la Investigación Universitaria. Una Visión desde Latinoamérica. eds. Y. Chirinos, A. Ramírez, R. Godinez, N. Barbera, and D. Rojas (Venezuela: Fondo Editorial Universitario Servando Garcés de la Universidad Politécnica), 629–645.
Hernández, J., Ramírez, J. M., and Ferri, C. (2004). Introducción a la Minería de Datos. Spain: Pearson Educación S.A.
Lema, M., Vooren, M., Cannistrà, M., Klaveren, C., Agasisti, T., and Cornelisz, I. (2023). Predicting dropout in higher education across Borders. Stud. High. Educ. 49, 141–156. doi: 10.1080/03075079.2023.2224818
Martinez-Daza, M. A., Guzmán Rincón, A., Castaño Rico, J. A., Segovia-García, N., and Montilla Buitrago, H. Y. (2021). Multivariate analysis of attitudes, knowledge and use of ICT in students involved in virtual research seedbeds. Eur. J. Invest. Health Psychol. Educ. 11, 33–49. doi: 10.3390/ejihpe11010004
Martins, M. V., Baptista, L., Machado, J., and Realinho, V. (2023). Multi-class phased prediction of academic performance and dropout in higher education. Appl. Sci. 13:4702. doi: 10.3390/app13084702
McCubbin, I. An examination of criticisms made of Tinto’s 1975 student integration model of attrition. (2003). Available at: https://www.psy.gla.ac.uk/~Esteve/localed/icubb.pdf (Accessed October 15, 2024).
Ministerio de Educación Nacional (2009). Deserción Estudiantil en la Educación Superior Colombiana. Metodología de Seguimiento, Diagnóstico y Elementos para su Prevención. Colombia: Imprenta Nacional de Colombia.
Ministerio de Educación Nacional. Guía para la Implementación del Modelo de Gestión de Permanencia y Graduación Estudiantil en Instituciones de Educación Superior. (2015). Available at: http://www.mineducacion.gov.co/1759/articles-356272_recurso.pdf (Accessed October 15, 2024).
Ministerio de Educación Nacional (2018). Referentes de Calidad: Una Propuesta para la Evolución del Sistema de Aseguramiento de la Calidad. Colombia: Ministerio de Educación Nacional.
Ministerio de Educación Nacional. Decreto 1330: ‘Por el Cual se Sustituye el Capítulo 2 y se Suprime el Capítulo 7 del Título 3 de la Parte’. (2019). Available at: https://www.mineducacion.gov.co/1759/articles-387348_archivo_pdf.pdf (Accessed October 15, 2024).
Montoya-Restrepo, I. A., Sánchez-Torres, J. A., Rojas-Berrio, S. P., and Montoya-Restrepo, A. (2020). Lovemark effect: analysis of the differences between students and graduates in a love brand study at a public university. Innovar 30, 43–56. doi: 10.15446/innovar.v30n75.83256
Mostert, K., van Rensburg, C., and Machaba, R. (2023). Intention to dropout and study satisfaction: testing item Bias and structural invariance of measures for South African first-year university students. J. Appl. Res. High. Educ. 16, 677–692. doi: 10.1108/JARHE-04-2022-0126
Palomino, J., and Ortega, A. (2023). Dropout intentions in higher education: systematic literature review. J. Effic. Respons. Educ. Sci. 16, 149–158. doi: 10.7160/eriesj.2023.160206
Pineda-Báez, C. (2021). Conceptualizations of teacher-leadership in Colombia: evidence from policies. Res. Educ. Admin. Leadersh. 6, 92–125. doi: 10.30828/real/2021.1.4
Proyecto ALFA GUIA DCI-ALA/2010/94. Marco Conceptual sobre el Abandono. Síntesis del Marco Conceptual. (2013). Available at: https://redguia.net/images/documentacion/marco-conceptual/S%C3%ADntesis_del_Marco_Conceptual.pdf (Accessed October 15, 2024).
Ramírez, D. M., Gartner, M. L., Bernal, J. E., Zapata, Á., Vallejo, F. A., Prieto, P. A., et al. (2013). Lineamientos para la Acreditación de Programas de Pregrado : Consejo Nacional de Acreditación. Available at: https://www.mineducacion.gov.co/1621/articles-342684_recurso_1.pdf (Accessed October 15, 2024).
Rebasa, P. (2005). Conceptos Básicos del Análisis de Supervivencia. Cir. Esp. 78, 222–230. doi: 10.1016/S0009-739X(05)70923-4
Red GUÍA. Gestión Universitaria Integral del Abandono. Documentación ALFA-GUIA. (2020). Available at: https://redguia.net/index.php/es/archivo/documentacion (Accessed October 15, 2024).
Rodríguez, M., and Zamora, J. (2014). Análisis de la Deserción en la Universidad Nacional desde una Perspectiva Longitudinal. Costa Rica: Universidad Nacional de Costa Rica.
Roslan, N. N., Jamil, N. J. M., Shaharanee, N. I. N. M., and Alawi, N. S. J. S. (2024). Prediction of student dropout in Malaysian’s private higher education institute using data mining application. J. Adv. Res. Appl. Sci. Eng. Technol. 45, 168–176. doi: 10.37934/araset.45.2.168176
Schmitt, R., and Santos, B. (2013). Modelo Ecológico del Abandono Estudiantil en la Educación Superior: Una Propuesta Metodológica Orientada a la Construcción de una Tesis : Congresos CLABES. Available at: https://revistas.utp.ac.pa/index.php/clabes/article/view/890 (Accessed October 15, 2024).
Seminara, M., and Aparicio, M. (2018). Deserción Universitaria ¿Un Concepto Equívoco? Revisión de Estudios Latinoamericanos sobre Conceptos Alternativos. Rev. Orient. Educ. 32, 44–72.
Singer, J., and Willett, J. (1993). Using a discrete time survival analysis to study duration and the timing of events. J. Educ. Stat. 18, 155–195. doi: 10.3102/10769986018002155
Sistema de Identificación de Potenciales Beneficiarios de Programas Sociales. (2020). Available at: https://www.sisben.gov.co/paginas/que-es-sisben.html (Accessed October 15, 2024).
Swail, W., Reed, K., and Perna, L. (2003). Retaining minority students in higher education. USA: Ashe Eric.
Tan, P.-N., Steinbach, M., and Kumar, V. (2006). Introduction to data mining. USA: Pearson Education, Inc.
Tete, M., Sousa, M., Santana, T., and Fellipe, S. (2022). Predictive models for higher education dropout: a systematic literature review. Educ. Policy Anal. Arch. 30, 1–23. doi: 10.14507/epaa.30.6845
Tinto, V. (1975). Dropouts from higher education: a theoretical synthesis of recent research. Rev. Educ. Res. 45, 89–125. doi: 10.3102/00346543045001089
Keywords: student dropout, higher education, statistical models, educational policy, conceptualization
Citation: Barragán Moreno SP and González Támara L (2024) Complexities of student dropout in higher education: a multidimensional analysis. Front. Educ. 9:1461650. doi: 10.3389/feduc.2024.1461650
Edited by:
George Giannopoulos, University College London, United KingdomReviewed by:
Filipa Seabra, Universidade Aberta, PortugalLeandro S. Almeida, University of Minho, Portugal
Copyright © 2024 Barragán Moreno and González Támara. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Sandra Patricia Barragán Moreno, U2FuZHJhLmJhcnJhZ2FuQHV0YWRlby5lZHUuY28=
†ORCID: Sandra Patricia Barragán Moreno, orcid.org/0000-0001-6503-4445
Leandro González Támara, orcid.org/0000-0002-9870-2312
†These authors have contributed equally to this work and share first authorship