A methodological framework proposal for managing risk in small-scale farming through the integration of knowledge and data analytics

Casanova Olaya, Juan Fernando; Corrales, Juan Carlos

doi:10.3389/fsufs.2024.1363744

ORIGINAL RESEARCH article

Front. Sustain. Food Syst., 22 July 2024

Sec. Agricultural and Food Economics

Volume 8 - 2024 | https://doi.org/10.3389/fsufs.2024.1363744

This article is part of the Research TopicTransforming Food Systems in Latin America and the Caribbean: Increasing Sustainability, Resilience and Adaptation to Climate ChangeView all 12 articles

A methodological framework proposal for managing risk in small-scale farming through the integration of knowledge and data analytics

Juan Fernando Casanova Olaya^1,2,3^*

Juan Carlos Corrales²

¹Ecotecma S.A.S, Popayán, Colombia
²Universidad del Cauca, Facultad de Ingeniería Electrónica y Telecomunicaciones, Popayán, Colombia
³Centro de Desarrollo Tecnológico Cluster Creatic, Popayán, Colombia

Introduction: Climate change and weather variability pose significant challenges to small-scale crop production systems, increasing the frequency and intensity of extreme weather events. In this context, data modeling becomes a crucial tool for risk management and promotes producer resilience during losses caused by adverse weather events, particularly within agricultural insurance. However, data modeling requires access to available data representing production system conditions and external risk factors. One of the main problems in the agricultural sector, especially in small-scale farming, is data scarcity, which acts as a barrier to effectively addressing these issues. Data scarcity limits understanding the local-level impacts of climate change and the design of adaptation or mitigation strategies to manage adverse events, directly impacting production system productivity. Integrating knowledge into data modeling is a proposed strategy to address the issue of data scarcity. However, despite different mechanisms for knowledge representation, a methodological framework to integrate knowledge into data modeling is lacking.

Methods: This paper proposes developing a methodological framework (MF) to guide the characterization, extraction, representation, and integration of knowledge into data modeling, supporting the application of data solutions for small farmers. The development of the MF encompasses three phases. The first phase involves identifying the information underlying the MF. To achieve this, elements such as the type of knowledge managed in agriculture, data structure types, knowledge extraction methods, and knowledge representation methods were identified using the systematic review framework proposed by Kitchemhan, considering their limitations and the tools employed. In the second phase of MF construction, the gathered information was utilized to design the process modeling of the MF using the Business Process Model and Notation (BPMN).Finally, in the third phase of MF development, an evaluation was conducted using the expert weighting method.

Results: As a result, it was possible to theoretically verify that the proposed MF facilitates the integration of knowledge into data models. The MF serves as a foundation for establishing adaptation and mitigation strategies against adverse events stemming from climate variability and change in small-scale production systems, especially under conditions of data scarcity.

Discussion: The developed MF provides a structured approach to managing data scarcity in small-scale farming by effectively integrating knowledge into data modeling processes. This integration enhances the capacity to design and implement robust adaptation and mitigation strategies, thereby improving the resilience and productivity of small-scale crop production systems in the face of climate variability and change. Future research could focus on the practical application of this MF and its impact on small-scale farming practices, further validating its effectiveness and scalability.

1 Introduction

The development of agricultural insurance requires access to comprehensive data that accurately represents the conditions within productive systems and accounts for external risk factors. Currently, insurers employ techniques based on statistical and actuarial concepts to assess the conditions of the granted insurance and fulfill their acquired commitments. In this process, deficiencies in the mechanisms for determining insurance determinants are evident, stemming from a lack of understanding of the risk factors associated with agricultural activity and the vulnerability conditions of producers (Carter et al., 2017). Additionally, the non-stationary spatiotemporal structure of the data used for risk assessment introduces high complexity when a non-linear relationship between events and crop yield is present. Therefore, traditional statistical methods or other models may not be appropriate (Ghahari et al., 2019). By this, it is of great importance to propose alternatives that support the design of agricultural insurance, considering factors of data accessibility and availability in the agriculture domain.

At the farm level, crop yield data are either scarce or unavailable, impeding the estimation of individual losses due to a lack of representation and selection bias due to the high scarcity and low credibility of data at the local scale. Data scarcity can arise from the phenology of the assessed crops, as some have an extended development period, mainly perennial crops, making it challenging to obtain a historical data series. Additionally, in some productive systems, crop intercropping or rotation occurs, resulting in inconsistencies in data recording (Porth et al., 2019). Meanwhile, low credibility can be attributed to the fact that past data may not be representative of the current state of the productive system, owing to changes in management practices such as the use of technologies, application of agricultural inputs, and production arrangement, among others (Porth et al., 2014, 2019). These issues lead to the design of insurance being formulated based on regional or municipal data rather than local or farm scales, resulting in an aggregation bias. This bias may increase idiosyncratic risk by underestimating or overestimating the anticipated risk compared to the actual individual risk (Finger, 2012; Lyubchich et al., 2019), a situation known as base risk, one of the primary challenges associated with the design of agricultural insurance.

Base risk discourages producers from showing a low willingness to pay for agricultural insurance, owing to a lack of confidence in determining policy payments. In this regard, studies (Berg et al., 2009; Ramasubramanian, 2012; Thompson, 2017) evaluated the payment capability of producers, finding that they encounter issues with the insurance design, considering that payment is made based on an index constructed with data at the municipal or regional scale. Additionally, there are difficulties in comprehending the mechanisms for determining insurance policy payments. Therefore, it is pertinent to evaluate analytical methods that enhance the relationship between the indices determining policy payments and individual losses and increase transparency and trust in the methods employed to determine the proposed indices to improve their acquisition by producers.

Techniques based on machine learning, statistics, mechanistic or empirical models, or the integration of expert knowledge have been proposed to address the issue of base risk. Independently, each of these techniques presents drawbacks in its application. Mechanistic or empirical models have a high capacity to represent the complex processes of the agricultural system; however, their conception requires a high degree of knowledge of the system’s processes, and their application necessitates specific input data for validation within new scenarios (Tartarini et al., 2021). Due to their high heterogeneity, statistical techniques have limitations when analyzing data with different structures, frequencies, and scales (Ghahari et al., 2019). On the other hand, machine learning techniques are constrained or yield inadequate results when insufficient data is available for training and validating the developed models or when their development or outcome lacks a rational explanation within the framework of natural laws or human regulation (Von Rueden et al., 2019; Roscher et al., 2020). Based on the preceding, there is a need to propose mechanisms that allow for mitigating the disadvantages presented by the individual application of techniques and to leverage the advantages each offers. Accordingly, this paper proposes a methodological framework (MF) to facilitate knowledge’s characterization, extraction, representation, and integration into data modeling. This framework serves as a tool to support agricultural insurance design, particularly under data scarcity scenarios. The paper is structured as follows, the initial phase entails identifying the foundational information of the MF, employing the systematic review framework proposed by Kitchemhan (Kitchenham et al., 2009). The second phase of MF construction involves utilizing the gathered information to design the process model of the MF using the Business Process Model and Notation (BPMN; Chinosi and Trombetta, 2012). The third phase, involving an evaluation, was conducted employing the expert weighting method.

2 Materials and methods

A Methodological Framework (MF) provides the structure, elements, rules, and methods required to implement a particular process or a series of processes (McMeekin et al., 2020). Constructing an MF necessitates identifying data and information that underpin its development. In this regard, McMeekin et al. (2020) consolidates three phases from a literature review on MF development. The first phase corresponds to identifying evidence to inform the MF, initially considering the identification of utilized MFs, which will serve as the foundation for constructing the new framework. Secondly, unused data and information that aid in contextualizing the MF are identified. The second phase involves the development of the MF; in this phase, elements, processes, and techniques found in the recognized frameworks are adapted, combined, or complemented to structure the new framework.

Additionally, critical data identified in the second instance of phase one are extracted. The extracted information must be analyzed, synthesized, grouped, or merged into categories that will support the new MF, following an iterative approach until consensus is reached with experts, which will serve as a basis for refining the proposed framework. Finally, the third phase corresponds to the process of evaluating the MF.

In this regard, a macro-process is proposed for constructing an MF to support the implementation of agricultural insurance under a data scarcity scenario within the informed data analytics framework (Figure 1). The MF will consider the integration of different methodologies, which will be adapted within the guidelines proposed by McMeekin et al. (2020). The schematization of diagrams follows the procedures offered by the American National Standards Institute—ANSI (Zabinski, 2021).

Figure 1

Figure 1. Phases and macro processes for the development of the methodological framework.

In McMeekin et al. (2020), three (Porth et al., 2019) phases are established. The first corresponds to the identification of evidence to inform the MF (MFI), the second corresponds to the development of the MF (MFD), and the third corresponds to the evaluation and refinement of the MF (MFV). In MFI, one (Carter et al., 2017) macro-process is considered. It involves identifying new information supporting the new MF’s development (P1). On the other hand, in MFD, one (Carter et al., 2017) macro-process is established, focused on the iterative development process of the MF (P2). Finally, in MFV, one (Carter et al., 2017) macro-process is found, oriented toward evaluating and refining the MF (P3).

2.1 Phase 1. Identification of new data to support the MF

To develop the macro-process (Figure 2), we consider the six steps for conducting a systematic review as established in the methodology proposed by Kitchenham (2004). The steps are the planning phase (SR-0), research identification (SR-1), primary study selection (SR-2), study quality assessment (SR-3), the relevant information is extracted from the preliminary studies (SR-4), and synthesis of the results found in the primary studies (SR-5). In the SR-0 phase, research questions and protocol design are established. In SR-1, the search strategy for the systematic review is generated, publication bias is identified, the bibliography management process is determined, and the search documentation mechanism is established. Additionally, in SR-2, inclusion and exclusion criteria are set for study selection. In SR-3, quality thresholds are defined, and instruments for their assessment are designed. In SR-4, relevant information is extracted from the primary studies; the formats established in the review planning are utilized to achieve this. Finally, in SR-5, a synthesis of the results found in the prior studies is carried out for a case study. The extracted information is tabulated in a way that consistently answers the research questions posed in the previous stages.

Figure 2

Figure 2. Illustrates the defined processes for the macro-process.

According to the review objectives, we present the plan to build that below.

2.1.1 PICOC

This study employs the PICOC framework (García-Peñalvo, 2022), with the population defined as the agriculture and knowledge domain. The review is specifically directed toward identifying the elements utilized in knowledge management within the agricultural sector. Furthermore, the primary emphasis lies in identifying techniques, methods, and tools employed for extracting and representing knowledge. The “Comparison” component has not been considered, as there is no requirement for a specific comparison of the results obtained by applying identified methods or techniques.

• Population: Knowledge, agriculture

• Intervention: Methods or techniques for knowledge management

• Outcome: Describe methods or procedures for knowledge management in the field of agriculture

• Context: Systematic Review of methods or techniques for knowledge management in the field of agriculture

2.1.2 Research questions

Four research questions have been formulated, which are related to identifying the type of knowledge and data structure managed in knowledge management processes and identifying methods or techniques for knowledge extraction and representation in agriculture. Additionally, the identification of the most used tools for knowledge representation and the limitations of each recognized knowledge representation method have been addressed.

R1. What methods or techniques have been used to extract the different types of knowledge in agriculture?

R2. What methods or techniques have been used to represent knowledge in agriculture?

R3. What are the most commonly used techniques for knowledge extraction and representation?

R4. What are the main limitations posed by knowledge representation methods?

2.1.3 Keywords and synonyms

Following the procedural steps, keywords were chosen for the proposed research questions. These keywords will be instrumental in formulating search equations within bibliographic sources. The selected keywords encompass all types of activities undertaken in a knowledge management process.

2.1.4 Search string

An exploratory search equation was formulated, incorporating the critical term “agriculture” alongside all words associated with knowledge management processes. The equation was devised to address the review’s posed questions. The search scope did not concentrate on the agricultural insurance domain, as a preliminary review indicated insufficient data retrieval to inform the Methodological Framework (MF; agricultur*) AND (“knowledge elicitation” OR “knowledge harvesting” OR “expertise extraction” OR “expertise elicitation” OR “knowledge discovery” OR “knowledge extraction” OR “knowledge acquisition” OR “knowledge gathering” OR “knowledge revelation” OR “knowledge representation” OR “knowledge integration”) ≥ 2013.

2.1.5 Sources

The bibliographic sources IEEE, Scopus, and Web of Science are selected for their outstanding reputation and extensive coverage of scientific articles. The IEEE source is pivotal as it focuses explicitly on papers related to the engineering and data analytics component, providing a solid foundation for research in this field. On the other hand, Scopus and Web of Science span all knowledge areas, ensuring a comprehensive and multidisciplinary view of research. It is crucial for contextualizing and enriching the work, enabling the identification of interdisciplinary connections and emerging trends that may significantly contribute to the study at hand.

• IEEE¹

• Scopus²

• WoS³

2.1.6 Selection criteria

About the selection criteria, consideration is given to studies that introduce new methods or replicate existing methods for knowledge extraction and representation. Additionally, studies corresponding to systematic reviews of the proposed topics are included, as they can provide comparative analyses or facilitate the identification of studies not captured by the formulated search equation. As for exclusion criteria, articles inaccessible through available databases are excluded, as some databases may have partial accessibility. Studies lacking descriptions of knowledge extraction or representation methods, those outside the domain of agriculture, and those lacking a clearly defined methodological and formal process are also excluded, as they lack a scientific foundation conducive to replication.

Inclusion Criteria:

• We select articles presenting novel methods or techniques for knowledge extraction or replicating existing ones.

• We choose research with new methods or techniques for knowledge representation or replicating existing ones.

• We pick papers incorporating a review as part of the research or where the review is the main objective.

• Finally, we sort out the most current version of an article in case of duplication across multiple sources.

We exclude papers:

• That is not accessible in the available databases.

• Outside the field of agriculture.

• That does not describe the required methods or techniques.

• The informal literature does not have a clearly defined research process.

2.1.7 Quality assessment checklist

In the quality evaluation process, criteria are considered to ensure that articles contain the necessary elements for the data extraction process. In this regard, articles that describe the methods or techniques for knowledge management (characterization, extraction, representation, and integration) are selected. These methods should not be solely based on expert opinions but should also offer sufficient information about the methodological process for obtaining the proposed results. The selected articles should also demonstrate that the methods or techniques used have been replicated in other studies or subjected to a rigorous evaluation. Furthermore, studies should acknowledge the limitations of the evaluated methods or approaches.

The established criteria are evaluated on a categorical scale, determining whether they fully, partially, or do not meet the specified criteria. Articles scoring equal to or above 4.0 are then chosen and proceed to the data extraction stage.

Questions:

• Is there a description of the methods or techniques for knowledge management?

• Are the results based on research rather than expert opinions?

• Do the articles provide sufficient information about the methodology and data used to develop or adapt the methods?

• Are the knowledge management methods presented in a practical case?

• Do the articles clearly state the limitations of the evaluated methods?

Answers:

• Yes

• Partially

• No

2.1.8 Data extraction form

Finally, to address the guiding questions of the review, the extraction of general information from the articles is considered to characterize the studies, such as the publication year and the specific application area within agriculture. Regarding the detailed required data, the type of data used in the analysis is considered to identify the handling of structured, unstructured, or semi-structured data. The kind of knowledge managed (explicit or implicit), the methods or techniques for knowledge extraction and representation, the tools (languages, software) used to apply methods, and the limitations identified in their application are also considered.

• Year

• Specific area of application

• Type of data used.

• Type of knowledge

• Extraction method or technique

• Representation method or technique

• Tools, languages, software

• Limitations

2.2 Phase 2. Development of the MF

Considering the information extracted, the development of the MF is constructed following the Business Process Notation and Modeling - BPMN. The Bizagi software (Bizagi, 2020) is employed to achieve this.

2.3 Phase 3. Assessment and refinement of the MF

For the evaluation of the MF, the expert weighting method was employed, which considers the following steps under Ishizaka and Nemery (2013):

• Expert Identification: assembling a group of experts in knowledge application and its integration into data analytics processes, especially in agriculture.

• Definition of Evaluation Criteria: in this case, the following evaluation criteria were proposed, taking into consideration aspects of clarity and comprehensibility, relevance and pertinence, adaptability and flexibility, and feasibility of implementation:

o C1: Is the Methodological Framework (MF) formulated and easily understandable for users and experts in data modeling?

o C2: Does the MF adequately address challenges related to integrating knowledge in data modeling?

o C3: Can the framework be adapted and applied in various data modeling contexts and situations?

o C4: Is implementing and effectively implementing the MF in real-world settings feasible?

o C5: Does the MF demonstrate activities related to characterization, extraction, and representation of knowledge?

o C6: Are the potential advantages and benefits of applying the MF in the data modeling context identified?

o C7: Does the MF address potential challenges that may arise during the knowledge management process in data modeling?

o C8: Is it possible to consider adaptations or updates to the MF without compromising the overall proposed structure?

• Definition of the Evaluation Scale: a scale from 1 to 5 was used, where 1 indicates low acceptance, and 5 indicates high acceptance.

• Calculation of the Average: based on the evaluations provided by the experts, a total weighted score was determined for each criterion.

• Verification of Consensus: a review of significant discrepancies between the weights assigned by the experts was conducted. If substantial differences are found, reaching a consensus with the experts is necessary. For evaluating the consistency between experts, the Intraclass Correlation Coefficient (ICCa) and Spearman’s coefficient were used. For the ICCa, the ranges established by Hills and Fleiss (1987) were considered (low if ICC < 0.40; good if 0.41 < ICC < 0.75; very good if ICC > 0.75). For Spearman’s coefficient, the correlation between experts ranges from 0 to 1, with values close to 1 indicating higher correlation.

• Utilization of the Evaluation for Decision-Making: based on the conducted evaluation, a decision was made on whether the MF requires changes or if, on the contrary, it remains as initially established. It ensures an iterative process in the development of the MF.

3 Results

3.1 Identification of new data to support the MM

Applying the protocol outlined in Figure 2 and considering the elements established in the systematic review planning, articles about knowledge management in agriculture were assessed between 2013 and 2023. A total of 481 articles were initially identified, resulting in a final count of 37 articles after removing duplicates, applying the defined exclusion and inclusion criteria, and conducting a quality assessment of the studies (Figure 3). This structure conforms to the requirements for an indexed journal submission.

Figure 3

Figure 3. Systematic review process for identification of new data to support the MM.

Following the data extraction process, various types of knowledge, extraction methods, representation methods, their limitations, central areas of application, and the tools employed were identified. Regarding knowledge representation methods, it was observed that 40.4% of the studies utilized knowledge graphs, followed by ontologies at 34.6% and production rules at 25% (Figure 4).

Figure 4

Figure 4. Knowledge representation distribution in the agriculture domain.

On the other hand, Table 1 identifies the techniques employed in the data extraction process, noting the utilization of manual procedures such as interviews or the application of surveys with experts, alongside Natural Language Processing (NLP) techniques oriented toward entity recognition and relation extraction in unstructured data. Some of the tools employed for knowledge extraction and representation were also identified. There was a notable prevalence of the “Web Ontology Language - OWL,” used for knowledge representation in the Semantic Web, and the rule-oriented programming language CLIPS or one of its adaptations, such as Jess Rule, for knowledge representation through rules. Furthermore, in knowledge graphs, the Resource Description Framework (RDF) was identified as the primary means of representation. Additionally, the query language SPARQL was highlighted as essential for accessing and extracting information from RDF datasets.

Table 1

Table 1. Extraction techniques and tools used.

Additionally, the main areas of intervention within the field of agriculture were identified, with pest and disease management accounting for 53.8%, comprehensive crop management at 19.2%, and nutritional management at 11.5% (Figure 5).

Figure 5

Figure 5. Study area application.

Regarding the data structure, 75% of the articles contemplate using unstructured data, encompassing text, images, audio, and video. 39% consider semi-structured data, and 12% pertain to structured data. Furthermore, the two types of knowledge considered in the knowledge management process were identified, with explicit knowledge comprising 92% of the studies and tacit knowledge accounting for 36% (Figure 6).

Figure 6

Figure 6. Data structured and knowledge classification.

Finally, Table 2 presents some of the limitations of knowledge representation methods. At a general level, limitations were identified, such as the size of the knowledge base, the impact of the quality of input data on the reliability of the represented knowledge, the specificity of knowledge, which constrains its scalability, and the high requirement of experts for the creation and updating of the knowledge base. In ontologies, resistance may arise from formalizing specific agricultural domain knowledge, highlighting the challenge of representing knowledge with spatiotemporal characteristics.

Table 2

Table 2. Limitations of knowledge representation methods.

3.2 Development of the MF (P2)

Based on the information gathered during the systematic review process, the Knowledge Management Framework (MF) was proposed for subsequent integration into data analytics. Initially, the MF was proposed using the flow diagram standard, and subsequently, the refined process involved applying the Business Process Model and Notation (BPMN).

Knowledge Characterization and Knowledge Extraction (KC and KE): in the initial phase of the proposed MF, the characterization process of the data scarcity issue was considered, along with an assessment of the required knowledge type and the identification of available knowledge sources. These sources may contain either implicit or explicit knowledge. Therefore, a selection process was defined through a gate establishing an inclusive flow, meaning that both types may be found within the same knowledge source.

In cases where the source contains tacit knowledge, an elicitation process was outlined to extract unstructured data, which is subsequently stored in a data repository. Next, an activity was defined to extract implicit knowledge from the unstructured data, utilizing the identified extraction methods. These methods align with the Natural Language Processing techniques described in Table 1 and any others that may be placed. This process yields semi-structured or structured data. In the event of semi-structured data, a normalization and transformation process were established to convert it into structured data. Conversely, if the extraction yields structured data, it is directly stored in a data repository.

Finally, in cases where the assessed knowledge is explicit, the type of data structure to be processed was determined through an exclusive gate. Depending on the structure, the same processes defined earlier will be followed.

Knowledge Representation (KR): in the second phase, corresponding to the knowledge representation process, the selection of the knowledge representation method was defined. This decision was informed by the data repository containing the various representation methods and their respective limitations (Table 2). These limitations were identified for each method (Table 2). Subsequently, the representation method was implemented, considering the data object containing the available tools (Table 1). Following this, an exclusive gate was set up to evaluate the response of the knowledge representation method to the defined data scarcity issue. Should the implementation of the representation method appropriately address the problem, the represented knowledge is then stored in a data repository.

Knowledge Integration (KI): in the third phase of the MF, corresponding to the process of knowledge integration in data modeling, the task of integrating the knowledge represented in one or more phases of data modeling was established. This selection will depend on the model optimization objectives. This task makes use of the data warehouse containing the represented knowledge. Similarly, the knowledge integration process is defined by an inclusive gate, allowing the integration of knowledge in one or more phases simultaneously. In this sense, the represented knowledge can facilitate business or data understanding, support the data preparation process, optimize the modeling process, or support the evaluation process of the generated models. Following the model evaluation, compliance with the established requirements for the model was defined through an exclusive gate to proceed with its deployment or iterate the evaluation process of the integration phase(s).

Finally, in Figure 7, the Methodological Framework is presented, articulating the three phases that support the knowledge management process and its integration into data analytics models.

Figure 7

Figure 7. Methodological framework to support the integration of knowledge into data modeling under data scarcity scenarios.

3.3 Assessment and refinement of the MF

In this phase of the MF development process, the framework underwent evaluation and refinement conducted by four experts in the field of data analytics. Eight evaluation criteria were employed, encompassing aspects of clarity and comprehensibility, relevance and pertinence, adaptability and flexibility, and feasibility of implementation.

• C1: Is the Methodological Framework (MF) formulated and easily understandable for users and experts in data modeling?

• C2: Does the MF adequately address challenges related to integrating knowledge in data modeling?

• C3: Can the framework be adapted and applied in various data modeling contexts and situations?

• C4: Is implementing and effectively putting the MF into practice in real-world settings feasible?

• C5: Does the MF demonstrate activities related to characterization, extraction, and representation of knowledge?

• C6: Are the potential advantages and benefits of applying the MF in the data modeling context identified?

• C7: Does the MF address potential challenges that may arise during the knowledge management process in data modeling?

• C8: Is it possible to consider adaptations or updates to the MF without compromising the overall proposed structure?

Following the evaluation conducted by experts (Figure 8), the highest weighted scores were assigned to criteria 3 (Lyubchich et al., 2019), 4 (4.4), 5 (4.4), and 8 (4.4), reflecting aspects of adaptability, flexibility, relevance, and reliability in implementation. On the other hand, criteria 1 (3.4), 2 (3.4), 6 (3.2), and 7 (3.8) yielded lower averages, although not falling below the mean evaluation level. These criteria are associated with the clarity, comprehensibility, and relevance of the Methodological Framework (MF). The lowest weighted score was attributed to criterion 6, which pertains to identifying the advantages and benefits of applying MF in data analytics. Furthermore, considering the indicators used to evaluate the consistency among experts, the ICCa was satisfactory, with a value of 0.41. Additionally, the average Spearman coefficient among all experts was 0.85, indicating a high level of concordance.

Figure 8

Figure 8. Methodological framework evaluation through expert weigh method.

Furthermore, in addition to the assigned rating for each established criterion, the experts provided recommendations to be considered in addressing the weaknesses identified in the MF (Table 3).

Table 3

Table 3. Expert recommendations to improve the MF.

Based on the consolidated information, modifications were made to the MF (Figure 9):

• Specific conditions were established at each output of the inclusive gateway for knowledge integration in data modeling, defining the objectives sought through the implementation of knowledge in data modeling.

• A “data object” was added to describe various available methods for knowledge integration in data modeling.

• Stages of the knowledge management process were delimited and named using lanes.

• An activity was included to support the verification process of extracted tacit knowledge.

• The order for activities of knowledge characterization was reorganized.

Figure 9

Figure 9. Methodological framework for knowledge characterization, extraction, representation and integration into data modeling.

4 Discussion

4.1 Knowledge characterization and extraction

The integration of knowledge into data modeling allows for a reduction in data dependence, an improvement in the precision and robustness of models, and, in some cases, confers physical meaning to the obtained results (Willard et al., 2020). Also, knowledge management strategies are critical for making decisions in climate change mitigations and adaptations to ensure better practices in small farming (Chisita and Fombad, 2020). In this context, some authors propose general frameworks for knowledge integration in data modeling, such as in Von Rueden et al. (2019), where the information flow in a process called informed machine learning is defined. This process generally involves problem identification and the search for a joint solution where data and prior knowledge are integrated, presenting some mechanisms for representing knowledge and its integration into data modeling. Similarly, in Roscher et al. (2020), an approach is proposed where the integration of domain knowledge is considered to improve the explainability of data models. Additionally, in Karpatne et al. (2017), despite not presenting a guide for knowledge management or its integration into data modeling, the paradigm of theory-guided data science is referred to, where the use of explicit and tacit knowledge is considered for refining the results of data models to be consistent with the understanding of physical phenomena.

Similarly, the proposed Methodological Framework (MF) is based on the general approach of integrating knowledge into data modeling. However, it delves into the processes by presenting specific activities to support the characterization, extraction, and representation of knowledge and its subsequent integration into data modeling. It considers the type of knowledge required, the type of data structure, and methods of knowledge extraction and representation, allowing for the support of the optimization of data models in their different development phases.

Regarding the characterization and extraction of knowledge, according to its origin and considering the types usually defined in the knowledge management area, it is classified as explicit and tacit knowledge (Hajric, 2018). Explicit knowledge, formalized and encoded, is called “Know-What.” This type of knowledge is found in the content of indexed journals, databases, public documents, reports, videos, and images, among others. Explicit knowledge is contained in files with different formats of structure, known as structured, semi-structured, or unstructured data, and treated by various methods to carry out the extraction process. For the extraction process of explicit knowledge, the MF considers the identification of the type of data structure where it is contained. Data extraction and direct storage are proposed when dealing with structured formats, thinking they possess a formal structure (Hajric, 2018). The performance of a normalization and transformation process from semi-structured to structured data is presented for semi-structured data. It is suggested by some authors who carry out the knowledge extraction process from HTML formats refined for the identification of concepts with the help of experts (Ahsan et al., 2014; Bonacin et al., 2016), the use of web crawlers to extract information directly from pages in HTML or XML formats (Baumgartner et al., 2005; Lin et al., 2018; Zhai et al., 2021) or the process of manual error correction, normalization, and standardization of semi-structured data to structured data suggested by Chenglin et al. (2018). This procedure is necessary to obtain data in a formal structure to be worked with using knowledge representation methods.

Similarly, tacit knowledge, known as “Know-How,” corresponds to that found in the minds of individuals and has not been quantified or represented in any accessible format. It is manifested through practices and experiences in the application domain (Rhem, 2005; Becerra-Fernandez and Sabherwal, 2014) and possesses defining characteristics such as difficulty in communication, practicality, experiential nature, unconsciousness, and personalization (Pérez-Fuillerat et al., 2019). In the MF, when the process of extracting tacit knowledge is carried out, knowledge elicitation methods are used, which involve storing extracted knowledge from experts in non-structured formats (Jakus et al., 2013). The use of elicitation methods depends on the characteristics of the users with whom the process will be developed. In the case of agriculture, some studies suggest the application of techniques such as knowledge harvesting (Frappaolo, 2008), storytelling (Whyte and Classen, 2012; Prasarnphanich et al., 2016; Zammit et al., 2018), interviews (Ferrari et al., 2016), and video sharing (Zammit et al., 2018).

Subsequently, when knowledge is contained in a non-structured data format, either through elicitation or explicit knowledge in this structure, the MF proposes a process of extracting knowledge considered implicit knowledge. It refers to patterns or relationships between data that are not evident to humans (Frappaolo, 2008). For this purpose, a tacit knowledge extraction task is established and supported by a data object containing extraction methods identified in the agriculture domain. Among the recognized methods, some studies report the use of manual tasks to carry out the extraction and categorization of data (Devraj and Deep, 2015; Bonacin et al., 2016; Goldstein et al., 2019; Admass, 2022). Similarly, some authors mention Natural Language Processing (NLP) techniques, such as the “Stanford Dependency Trees” structure used for extracting entities from the agricultural knowledge domain (Devi and Dua, 2017), Named Entity Recognition (NER) used for identifying and classifying entities from text into predefined categories (Chenglin et al., 2018), Predicate-Argument Structure (PAS) used to represent relationships between the predicate and its arguments (nouns, prepositional phrases, etc.) in a sentence (Chatterjee et al., 2019), Conditional Random Field (CRF), which corresponds to a probabilistic graphical model used for sequence labeling, and Syntactic Tree-based Relation Extraction, which uses syntactic trees to extract relationships between named entities in text (Xiaoxue et al., 2019). There are also tasks proposed by Gharibi et al. (2020), such as POS tagging, chunking, and Stanford Parser, which allow the identification of relevant words, their grouping into meaningful phrases, and the provision of a syntactic structure for understanding relationships between words. Finally, other authors mention the use of neural networks such as Lattice Long Short-Term Memory (LSTM), Structured Perceptron, or Bidirectional Gated Recurrent Init (bi-GRU; Kung et al., 2021; Zhu et al., 2021), used to process sequences of data in texts. These methods are necessary to identify patterns that are not explicit to humans and are present in the unstructured data used in the knowledge characterization process.

4.2 Knowledge representation

On the other hand, following Bergman (2018), knowledge representation is the description of an object through different elements. Knowledge representation comprises three main aspects: concepts as basic units of knowledge, associations or relationships between concepts, and a dynamic structure built by the concepts and their associations (Gutiérrez, 2012). Knowledge representation methods are applied to logical language resources, that is, formal and explicit language. Therefore, the Methodological Framework (MF) establishes a series of activities to extract and transform knowledge from unstructured and semi-structured data into a set of structured data that possess the required characteristics to implement representation methods (Staab and Studer, 2009). In this context, the MF delineates activities for selecting the knowledge representation method and its subsequent implementation. It is supported by data objects containing representation methods, their limitations, and the tools available to carry out the process.

The methods identified are production rules, generally used for procedural knowledge representation (Yingying et al., 2017), i.e., methods or processes for performing a task (Gutiérrez, 2012). In agriculture, it has been widely used, especially for supporting pest and disease management (Balleda et al., 2014; Devraj and Deep, 2015; Kalita et al., 2017; Yingying et al., 2017; Admass, 2022). In some cases, production rules are used with other knowledge representation methods, as in Yingying et al. (2017), where rules are combined with knowledge graphs to design an expression and reasoning model for diagnosing diseases in tomato cultivation. In Afzal and Kasi (2019), a knowledge model based on ontology was developed to support rice production, using rules to keep the reasoning process of the knowledge base created through ontology. Additionally, in Sottocornola et al. (2023), rules are employed to support the explanation process of diagnosis in treating diseases in apple cultivation.

However, the use of production rules in agriculture has limitations, such as the relatively small size of the constructed knowledge bases (Balleda et al., 2014); moreover, when working with semi-structured or unstructured data, metadata may be derived from unreliable sources due to incomplete or incorrect information (Gomez-Perez et al., 2017). Rule-based systems are limited to the dataset used for constructing the knowledge base, which may not represent all the dynamics of the addressed problem (Godara and Toshniwal, 2020). Similarly, rule-based models are limited by the quality of the rules and require extensive expert intervention in the domain for rule maintenance and updating. They also face challenges when attempting to scale to other problems, either within the same domain or outside of it (Nismi Mol and Santosh Kumar, 2023).

On the other hand, ontologies can be defined as a formal and explicit specification of a set of related concepts (Jakus et al., 2013). Some studies in agriculture have proposed the use of ontologies to improve semantic interoperability between developed systems and data sources (Bonacin et al., 2016; Stucky et al., 2018), design and build a knowledge base to support query systems (Devi and Dua, 2017; Aminu et al., 2019; Jearanaiwongkul et al., 2019), support the development of knowledge graphs serving as a design layer (Chenglin et al., 2018; Xiaoxue et al., 2019), provide lexical modeling and conceptualization to extracted knowledge (Yanchinda, 2019) or propose a semantic representation of IoT device data to reduce the need for human intervention (Afzal and Kasi, 2019). However, ontologies have limitations in their application, such as linguistic disambiguation. Expert keyword selection and query formulation may affect the quality of results, requiring a high availability of experts for any system scaling process. Many resources are needed for knowledge base maintenance.

Additionally, there is a low standardization of concepts in the agriculture domain, affecting ontology understanding and consistency, along with language barriers in which concepts used for ontology construction are found (Ahsan et al., 2014; Bonacin et al., 2016; Goldstein et al., 2019; Fahad et al., 2021; Kung et al., 2021; Malik et al., 2021). These limitations have led to the development of graphs as a novel mechanism for knowledge representation. It involves the extraction of entities, attributes, and their relationships, integrating knowledge through entity alignment and association with ontologies. Moreover, it facilitates the completion of the knowledge update and retrieval processes (Xiaoxue et al., 2019).

Like ontologies, knowledge graphs serve as a structured semantic knowledge base that describes concepts and their relationships in symbols (Xiaoxue et al., 2019). In this sense, graphs can be represented with varying levels of formalization, depending on whether one desires a lighter and more flexible representation or aims for knowledge representation with semantic consistency, integrating with an ontology that serves as a design layer for the knowledge graph (Chenglin et al., 2018). Under this, some authors have proposed the use of knowledge graphs with semantic support through ontologies to assess the impacts of agriculture and climate change on water resources (Bonacin et al., 2016), represent knowledge at a general level in the agricultural field (Ahsan et al., 2014; Devi and Dua, 2017; Chatterjee et al., 2019; Malik et al., 2021), automatically generate agrometeorological reports (Chenglin et al., 2018), address fertilization and soil management in corn cultivation (Aminu et al., 2019), support decision-making in pest and disease management (Goldstein et al., 2019; Jearanaiwongkul et al., 2021), and precision agriculture (Fahad et al., 2021).

Similarly, studies have been proposed to consider using knowledge graphs to support the wine sector, employing a lighter and more flexible representation, i.e., without being supported by an ontology (Abbal et al., 2016; Groumpos and Groumpos, 2016). Finally, like ontologies and production rules, knowledge graphs present similar limitations, such as low scalability to other knowledge domains and even to different areas within the same knowledge domain. The quality of the represented knowledge depends on the input data to the system (Chenglin et al., 2018), the need for labeled data for the application of machine learning models for entity and relationship extraction, and the necessity of domain knowledge experts for identifying or verifying meaningful semantic relationships among extracted concepts, which can consume significant resources (Chatterjee et al., 2019); moreover, graphs must undergo constant maintenance and updates, requiring a substantial allocation of resources due to the need for a high level of expertise in the knowledge domain (Xiaoxue et al., 2019). When selecting a knowledge representation method, the limitations of its application must be considered to address the problem appropriately.

4.3 Knowledge integration

Integrating knowledge into data modeling is of great interest, particularly in scenarios where data might be inaccessible, unavailable, or of low quality (Porth et al., 2019). Knowledge integration can occur at any phase of modeling (Von Rueden et al., 2023). Therefore, within the MF (Methodological Framework), a flow is established through an inclusive gate that allows the inclusion of knowledge represented in any data modeling phase. The conditions set in the inclusive gate include data generation (data understanding), model evaluation (model assessment), parameter adjustment (model development), scientific consistency (business understanding and model evaluation), and attribute selection (data preparation). In this regard, studies have been proposed related to knowledge integration in the data acquisition phase (Hain et al., 2011; Wang et al., 2017; Read et al., 2019; Yu et al., 2019; Clemens and Viechtbauer-Gruber, 2020; Downton et al., 2020; Zhao et al., 2020; Sepe et al., 2021; Yu et al., 2021; Raymond et al., 2022; Schröder et al., 2022), data preparation phase (Froehlich, 2020; Mudunuru and Karra, 2021; Bajracharya and Jain, 2022; Fuhg and Bouklas, 2022; Kohtz et al., 2022), optimization process of machine learning algorithms (Anoop Krishnan et al., 2018; Azari et al., 2020; Chadalawada et al., 2020; Huang et al., 2020; Qian et al., 2020; Sun et al., 2020; Tartakovsky et al., 2020; Jurj et al., 2021; Lu et al., 2021; Soriano et al., 2021; Kim et al., 2022), and as support for explaining data model results (MacInnes et al., 2010; Read et al., 2019).

Ontologies and knowledge graphs can support interoperability among knowledge domain datasets, verify the quality of extracted data, classify data, extract attributes or relationships, or facilitate working with heterogeneous data (Robinson and Haendel, 2020; Sahoo et al., 2022; Mummigatti et al., 2023). Furthermore, axioms established in an ontology can support constructing new ontologies by inducing the reuse of existing knowledge or verifying the consistency of the new ontology (Smith et al., 2007; Mungall et al., 2011). They can also expand or enrich the characteristics used in a machine learning model without finding relationships from the data, ensuring consistency or coherence through context rules (Kulmanov et al., 2021; Shrivastava and Deepak, 2023). Similarly, ontologies and graphs can be used for task prediction (Mazandu et al., 2017; Chen et al., 2021), text clustering (Wei et al., 2015; Ruas and Grosky, 2018; Mehta et al., 2021), or to support attribute reduction or selection (Garla and Brandt, 2012). These integrations are typically achieved through entity similarity or embedded entity methods (Deepa and Vigneshwari, 2019; Sun et al., 2020; Mežnar et al., 2022). Therefore, ontologies and knowledge graphs are highly useful in supporting the development of data models, especially in contexts such as small-scale agriculture, where historical data series are mostly unavailable, or the available data is of low quality.

On the other hand, in some cases, knowledge can be explicitly represented, allowing its integration into data modeling phases without any characterization or extraction process. In this regard, hybrid models that integrate results from mechanistic or empirical models have been developed, either for generating training data or for model evaluation data (Ji and Lu, 2018; Feng et al., 2019; Maya Gopal and Bhargavi, 2019; Saha et al., 2020; Sansana et al., 2021). Additionally, the integration of algebraic or differential equations has been proposed, which can be used to condition policy in learning, modify the error function, function parameterization, or as restrictive functions (Mangasarian and Wild, 2008; Karpatne et al., 2017; Lu et al., 2017; Muralidhar et al., 2019; Ramamurthy et al., 2019; Asvatourian et al., 2020; Gupta and Das, 2020; Meng et al., 2022). Similarly, invariance properties have been proposed to enhance the performance of machine learning models (Ling et al., 2016; Wu et al., 2018). Lastly, expert knowledge has been incorporated to ensure that results generated by machine learning models have scientific consistency (Brown et al., 2012; Choo et al., 2013; Spinner et al., 2020). Thus, knowledge integration depends on the improvement objectives sought concerning data models.

4.4 The methodological framework as a tool for risk management in small-scale farming

The increase in variability and climate change, diseases, and pests, among other problems, negatively impacts agriculture, particularly affecting small-scale producers who are highly vulnerable and have low resilience. Additionally, food security relies on the adaptive capacity of small-scale producers to address such events (Hatfield et al., 2020). A significant amount of research has proposed data methods to contribute to solving these problems (Xie, 2011; Ghahari et al., 2019) (Dalhaus et al., 2018; Wang et al., 2018; Mangani and Kousalya, 2019; Roznik et al., 2019; Shirsath et al., 2019; Boyd et al., 2020; Zhang et al., 2020). However, the information used has different temporal and spatial resolution, affecting its correct application at the local level. At the local level, farmers possess knowledge about practices and techniques; however, this local knowledge can vary from one agricultural region to another. In this context, knowledge extraction and representation can be useful for storing knowledge from heterogeneous sources and sharing it with farmers (Jearanaiwongkul et al., 2019; Haider et al., 2021). Furthermore, there is an exponential amount of data about farm management and system conditions, necessitating proper methods to represent and share this data to support farmers’ activities (Aminu et al., 2019; Goldstein et al., 2019; Bhuyan et al., 2022). For this reason, in the context of small-scale farming, it is necessary to complement data analysis with knowledge that can support model development, considering data scarcity and heterogeneity.

Another problem where the Framework can be useful is addressing the lack of financial data to support risk management in the context of financial inclusion. In this sense, knowledge about system conditions or agronomic management may be necessary for develop new instruments for improvement. The Methodological Framework (MF) can facilitate the extraction and representation of knowledge from various sources to build new tools, such as credit scoring, while considering the heterogeneity of diverse agricultural systems (Simumba et al., 2018; Bunnell et al., 2021).

In the context of agricultural insurance, the management and integration of knowledge in data modeling will enable the proposition of agricultural insurance design solutions, facilitating the reduction of aggregation bias by considering specific characteristics of the production system. These include crop phenology, access conditions or availability of primary resources and implementing techniques or practices that enhance or diminish producers’ adaptive capacity. Additionally, it may facilitate the integration of area-related knowledge, such as agroecological classifications or soil types. This adjustment would fine-tune the utilization of the proposed parametric index, consequently mitigating idiosyncratic risks. It also aims to minimize gaps in insurance acquisition stemming from poor design comprehension or a weak correlation between premium payments and individual-level losses (Berg et al., 2009; Ramasubramanian, 2012; Thompson, 2017; Fonta et al., 2018; Madaki et al., 2023).

The optimization of data models through knowledge can provide producers with more adaptive tools to enhance their resilience against variability and climate change events, diseases, pest control, and all agronomic management factors contributing to food security and economic growth in small-scale agriculture.

5 Conclusion and recommendations

The Methodological Framework (MF) is a tool designed to guide researchers in knowledge management. It defines techniques and methods for knowledge characterization, extraction, representation, and integration into data modeling to support data model development, particularly in risk management in small-scale agriculture. One of the main challenges in knowledge representation is that knowledge can be specific to one domain and might not apply to others. Therefore, it is essential to increase research on methods for data interoperability and knowledge sharing and evaluate reasoning characteristics.

Additionally, it is crucial to continue research on techniques for knowledge extraction, considering the significant amount of heterogeneous data and information sources (such as images, text, audio, and video) that can support development in the agricultural sector. Particular attention should be given to methods or techniques used for knowledge extraction from unstructured data.

It should be noted that the Methodological Framework (MF) was evaluated through an expert consensus. For this reason, it is considered a proposal, and it is crucial to apply the framework to address problems in small-scale farming, especially when there is a significant lack of consistent and high-quality data available. An example of such application is the design of agricultural insurance in small-scale farming, with an emphasis on the processes of index selection, data preparation, and determination of optimal triggers, exit thresholds, and premium calculation.

Data availability statement

The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

Author contributions

JCA: Conceptualization, Methodology, Writing – original draft, Writing – review & editing. JCO: Conceptualization, Writing – review & editing.

Funding

The author(s) declare that financial support was received for the research, authorship, and/or publication of this article. This research was funded by the Department of Cauca, the Sistema General de Regalías de Colombia (SGR), and the Cluster CreaTIC project. Additionally, this work is part of the research project ‘Incremento de la Oferta de Prototipos Tecnológicos en Estado Pre-Comercial Derivados de Resultados de I + D Para el Fortalecimiento del Sector Agropecuario en el Departamento del Cauca’ (grant no: BPIN 2020000100098), funded by the Sistema General de Regalías (SGR) of Colombia.

Acknowledgments

The authors are grateful to the Telematics Engineering Group (GIT) of the University of Cauca, Cluster Creatic, and the Sistema General de Regalías de Colombia (SGR).

Conflict of interest

JCA was employed by the company Ecotecma S.A.S.

The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Footnotes

1. ^https://ieeexplore-ieee-org.ezproxy.unal.edu.co/

2. ^https://www-scopus-com.ezproxy.unal.edu.co/

3. ^https://www-webofscience-com.ezproxy.unal.edu.co/

References

Abbal, P., Sablayrolles, J. M., Matzner-Lober, É., Boursiquot, J. M., Baudrit, C., and Carbonneau, A. (2016). A decision support system for vine growers based on a Bayesian network. J. Agric. Biol. Environ. Stat. 21, 131–151. doi: 10.1007/s13253-015-0233-2