Skip to main content

TECHNOLOGY AND CODE article

Front. Plant Sci. , 13 March 2025

Sec. Plant Bioinformatics

Volume 16 - 2025 | https://doi.org/10.3389/fpls.2025.1475057

This article is part of the Research Topic Recent Advances in Big Data, Machine, and Deep Learning for Precision Agriculture, Volume II View all 11 articles

StatFaRmer: cultivating insights with an advanced R shiny dashboard for digital phenotyping data analysis

  • 1All-Russia Research Institute of Agricultural Biotechnology, Moscow, Russia
  • 2School of Biological and Medical Physics, Moscow Institute of Physics and Technology, Dolgoprudny, Moscow Region, Russia

Digital phenotyping is a fast-growing area of hardware and software research and development. Phenotypic studies usually require determining whether there is a difference in some trait between plants with different genotypes or under different conditions. We developed StatFaRmer, a user-friendly tool tailored for analyzing time series of plant phenotypic parameters, ensuring seamless integration with common tasks in phenotypic studies. For maximum versatility across phenotypic methods and platforms, it uses data in the form of a set of spreadsheets (XLSX and CSV files). StatFaRmer is designed to handle measurements that have variation in timestamps between plants and the presence of outliers, which is common in digital phenotyping. Data preparation is automated and well-documented, leading to customizable ANOVA tests that include diagnostics and significance estimation for effects between user-defined groups. Users can download the results from each stage and reproduce their analysis. It was tested and shown to work reliably for large datasets across various experimental designs with a wide range of plants, including bread wheat (Triticum aestivum), durum wheat (Triticum durum), and triticale (× Triticosecale); sugar beet (Beta vulgaris), cocklebur (Xanthium strumarium) and lettuce (Lactuca sativa), corn (Zea mays) and sunflower (Helianthus annuus), and soybean (Glycine max). StatFaRmer is created as an open-source Shiny dashboard, and simple instructions on installation and operation on Windows and Linux are provided.

1 Introduction

Digital phenotyping is crucial for tackling challenges like climate change, population growth, and environmental stress (Pieruschka and Schurr, 2019). Until recently, traditional methods of phenotyping did not align with the capabilities of high-throughput genome sequencing and genotyping techniques. These limitations have prompted scientists from diverse fields, including agriculture and engineering, to explore new technologies for phenotyping (Al-Tamimi et al., 2022).

The rapid advancement of high-throughput plant phenotyping (HTPP) tools has resulted in platforms generating enormous amounts of data (Li et al., 2021). High-throughput experiments are conducted both in controlled and field settings through the extensive use of frequent, non-destructive automatic sampling and/or monitoring of several hundreds to thousands of plants within a short period (Al-Tamimi et al., 2022). Comprehensive phenome-wide data facilitate comparisons across populations, enabling phenomics to characterize diverse traits, including structural, physiological, and performance metrics under different environmental conditions (Rahaman et al., 2015; Demidchik et al., 2020). When dealing with large volumes of data, the statistical power of analysis increases. This is particularly true in cases where time series data are involved. Also, in HTPP input data can be highly heterogeneous, such as during studies of different plant varieties and genotypes and different treatments and sites of the studies. Analysis and interpretation of data by appropriate techniques and tools are required. To maximize the potential of HTPP, it is essential for researchers to be able to manage large datasets. This necessitates the efficient collection and management of data, which is most effectively achieved through automated processes (Araus et al., 2022).

A number of companies and research institutes have developed high-capacity phenotyping platforms, both indoor and outdoor, such as Traitmill (CropDesign, Belgium) (Lobet, 2017), HyperAIxpert (LemnaTec, Germany) (HyperAIxpert Family - LemnaTec), and The Plant Accelerator (Australian Plant Phenomics Network, Australia) (Australian Plant Phenomics Network, [[NoYear]]). Such platforms are used to assess crop features related to productivity and tolerance to stressors like salinity (Lazarević et al., 2021; Li et al., 2022), drought (Hein et al., 2021; Joshi et al., 2021; Kim et al., 2021; Javornik et al., 2023), low temperature (Islam et al., 2021). They employ advanced technologies, including imaging stations, automated systems, and proprietary software, to conduct efficient analysis of plant characteristics. In addition, various software solutions have been developed to automatically extract standard features from images, such as plant height and width, utilizing open-source platforms such as HTPheno (Hartmann et al., 2011) and the Integrated Analysis Platform (IAP) (Yang et al., 2020).

Regardless of the method used to collect data on plant phenotype, the next essential step is statistical analysis of the results. Critical for the average user is the self-sufficiency in employing analytical tools, eliminating the need for recurrent solution development. Regrettably, the embedded analytical tools in mainstream digital phenotyping platforms frequently fall short in managing extensive datasets with diverse attributes.

Contemporary methodologies of time series analyses of phenotypic data prevalent in recent publications routinely entail procedures such as outlier identification, percentage transformations, ANOVA, and post-hoc Tukey’s test (Kjaer and Ottosen, 2015; Minervini et al., 2017; Parmley et al., 2019; Kim et al., 2020; Leiva et al., 2021; Nyonje et al., 2021; Schmidt et al., 2023; Tripodi et al., 2024). With the rapid increase in data volume, it has become clear how crucial it is to be able to seamlessly visualize and validate digital phenotyping data in an automated manner. Excellent example of a solution to this problem was given in the work of Schmidt et al (Schmidt et al., 2023), where detecting and rectifying any potential phenotyping artifacts at an early stage was essential for conducting Genome-Wide Association Studies (GWAS) on a scale of nearly a thousand lines.

In the realm of plant phenotypic data preprocessing, a noteworthy tool is AllInOne Pre-processing (Yoosefzadeh Najafabadi et al., 2023), an open-source R-Shiny package that offers efficient solutions for data management. This package includes advanced features such as handling missing data, visualizing datasets, detecting outliers, estimating correlations, normalizing data, and conducting spatial analyses, all optimized for speed and user experience.

Building on this philosophy, our approach prioritizes extended longitudinal studies and integrates specialized methods for controlled environment time series analysis. StatFaRmer (Statistical Analysis for Farmers using R) is an open-source web tool that can be installed locally, requiring no prior knowledge of R.

Standard software bundled with phenotyping tools often falls short in providing user-centric interfaces for specific tasks, highlighting the need for a more intuitive design that meets diverse hardware and software requirements. The development goal for StatFaRmer was an enhanced data processing and insights generation in time-series data. The key requirements for the tool included:

1. Data Processing and Outlier Filtering: Implementing robust data processing techniques to filter out outliers and ensure the integrity and accuracy of the time-series data. The tool defaults to easily interpretable Z-score outlier detection, which can be switched to IQR outlier detection for less normal data by adjusting the options at the beginning of the main script. Outlier removal can be entirely bypassed by commenting out the “remove outlier groups” section of the main script. Additionally, the skewness and kurtosis of each selected group within the data can be tracked using the “Descriptive” tab in the StatFaRmer application.

2. ANOVA: Incorporating ANOVA analysis with post-hoc Tukey’s test to enable users to perform statistical comparisons and identify significant differences among various groups within the time-series data. The Shapiro-Wilk test and diagnostic plots are offered alongside ANOVA to evaluate normality and, thus, reliability of ANOVA results. The specific ANOVA model incorporates user-defined terms and is prominently displayed above the plot. While you can adjust the number of terms, only two-way interactions between them are considered to improve the clarity of the results.

3. Data Subsetting: Developing capabilities for data subsetting, allowing users to focus on specific subsets of the time-series data for more targeted and detailed analysis.

4. Factor Selection and Faceting: Enabling selection of phenotyping traits to facilitate in-depth analysis by grouping and examining data based on specific variables or factors.

5. Download Table with Statistics: Enabling users to download tables with comprehensive statistics, including selected grouping parameters such as gene, treatments, cultivar, and time clusters. This feature empowers users to access and utilize data insights offline for further analysis and reporting purposes.

Functionality and performance of the StatFaRmer tool was tested on plant data obtained using TraitFinder (Phenospex, Netherlands) (TraitFinder digital phenotyping workstation on wheels for lab- and greenhouse phenotyping automation - PHENOSPEX) high-performance phenotyping platform, complemented by our custom annotations of accessions.

2 Methods

2.1 Development of StatFaRmer

StatFaRmer was developed fully with R (R Core Team, 2024) and consists of an initial processing script main.R, which can be modified to better work with a new experiment, and a shiny app.R, which allows users to visualize the data and quickly test a number of hypotheses, as shown in Figure 1.

Figure 1
www.frontiersin.org

Figure 1. The overall block-scheme of the StatFaRmer platform. The block types are categorized: bold blocks represent required hardware and optional data sources, regular blocks indicate input and output files, and dashed blocks depict processing units and concepts.

The script main.R is equipped with functions that can handle various tasks using four groups of R packages:

● The first group consists of libraries that are primarily used for project stability and data validation: checkmate (Lang, 2017) for ensuring the validity of input data, logger (Daróczi, 2024) for testing tool’s interactivity, and renv (Ushey, 2023) for managing project dependencies.

● The second group includes tidyverse (Wickham et al., 2019) packages that focus on data manipulation and transformation: magrittr (Bache and Wickham, 2022) and rlang (Henry and Wickham, 2024) for simplifying code structure, purrr (Wickham and Henry, 2023) for functional programming, dplyr (Wickham et al., 2023a) for data manipulation, forcats (Wickham, 2023a) for handling categorical data, stringi (Gagolewski, 2022), stringr (Wickham, 2023b) and glue (Hester and Bryan, 2024) for string manipulation; readr (Wickham et al., 2024a), readxl (Wickham and Bryan, 2023) for reading tabular data, tibble (Müller and Wickham, 2023) for data structure, tidyr (Wickham et al., 2024b) for data tidying; with addition of janitor (Firke, 2023) for data cleaning, and lubridate (Grolemund and Wickham, 2011) for working with dates and times.

● The third group introduces statistical and clustering libraries: stats (R Core Team, 2024) for basic statistical functions and operations, and dbscan (Hahsler et al., 2019; Hahsler and Piekenbrock, 2024) for density-based clustering.

● astly, shiny (Chang et al., 2024) is a library for starting interactive web applications directly from R, providing a user-friendly interface for data visualization and analysis.

The script app.R utilizes various libraries to enhance its functionality: for dashboard creation, it employs shiny (Chang et al., 2024) with shinyWidgets (Perrier et al., 2024) for data wrangling, it incorporates tidyverse (Wickham et al., 2019) and vecsets (Witthoft, 2023); and for plotting, it features ggplot2 (Wickham, 2016). Additionally, libraries such as broom, flextable, moments, bslib, viridis, DT, multcompView (Komsta and Novomestky, 2022; Garnier et al., 2024; Gohel and Skintzos, 2024; Graves and Piepho, 2024; Robinson et al., 2024; Sievert et al., 2024; Xie et al., 2024) are used for general beautification and presentation of tabular results, while writexl (Ooms, 2024), svglite (Wickham et al., 2023b) are employed for data export. The script also includes publishing options through rsconnect (Atkins et al., 2024), facilitating tasks like data visualization, statistical testing, and data preparation.

2.2 Automated data handling

StatFaRmer imports user data from a specified project directory in the form of a TraitFinder-compatible experiment (.zip containing.csv files). The data includes plant coordinates (in the “unit” column) at each time point (in the “timestamp” column, ISO 8601 standard (ISO, 2017)), along with their respective numerical phenotypic parameters such as height and Normalized Difference Vegetation Index (NDVI).

Users are also required to provide two.csv tables: one named *_handmade.csv, which includes the required plant IDs (column “V.T.R,” indicating variety, treatment, and repetition number), as well as treatment and cultivar names (columns “Treatment” and “Cultivar”). This information will be displayed in the reports, overwriting any corresponding columns from the.zip file if provided. Additionally, a *_translation.csv table is necessary for establishing a one-to-one correspondence between plant IDs and unit coordinates, containing “V.T.R” and “T:X:Y” (should match TraitFinder ‘unit’ convention, indicating table number and spatial coordinates on it) columns.

An optional groups.xlsx table can be included, which must have a cultivar column and any additional columns that the user wishes to include as factors. Factor levels should consist of Roman letters, digits, and underscores for compatibility with the multcompView package.

On first access of StatFaRmer to the data, it determines time clusters. These clusters are introduced since the measurements of each plant take some time and timestamps for different plants are different for each experimental time point. For example, if the measurement takes 1 second, the timestamps for measurements at 2 PM for the first plant will be 2:00:00, for the second plant — 2:00:01, etc. Also, at this step the script identifies repeated measurements of each plant at one time point that have different timestamps as replicates, and separates them from measurements made at other time points. For this clustering, the script main.R of StatFaRmer uses DBSCAN clusterization algorithm with epsilon parameter. This parameter, defined in the main.R script in hours and currently configured to 1, efficiently processes experimental data irrespective of the plant measurement frequency. It is optimized to handle datasets where consecutive measurements are separated by an hour or longer. measurements with narrower time windows are considered technical repeats. From this point on, the original timestamps are replaced by “dbscan_cluster” times, which are close to the time points defined in experiment design, but do not coincide exactly. For example, the 2 PM time point in experimental design will result in timestamps of original data in the 2:00:00 — 2:05:30 interval, which become replaced by the median for all measurements in the cluster, say, 2:02:43 dbscan_cluster value.

Thus, “dbscan_cluster” times serve as a substitute for timestamp data, enabling faceting and factor selection in the analysis. It allows grouping timestamps with a given precision, which is required in an experiment with multiple consecutive scans. This step is necessary for further data processing and statistical analysis.

Then, all replicated measurements for plants (the measurements with the same dbscan_cluster time value) are filtered from outliers for each measured parameter (trait) within clusters based on a 3-sigma threshold. If the parameter is expressed as a percentage (for example, bins of specific ranges of NDVI values in TraitFinder datasets), then it is converted using the logit function, and placed in data tables as logit value for further analysis. Percentage values exactly equal to zero or one are replaced in advance with the nearest extreme finite values to avoid introducing infinities into analysis. This transformation was chosen as a more robust alternative to the arcsin, while still striving for interpretability (Lin and Xu, 2020). Additionally, the data table undergoes further modifications, such as column reordering, type conversions, the elimination of columns with only one factor level, and data reorganization for better readability. These manipulations are optional and can be controlled by the user as needed. However, this functionality is currently implemented by commenting out the relevant sections of the main script. At this stage, additional criteria for grouping are incorporated based on the groups.xlsx file located in the current project directory. Users also have the flexibility to group plants based on additional criteria in accordance with the experimental design (e.g., control, factor treatment), as demonstrated in section 2.4. All of these steps are implemented to prevent collisions during the subsequent ANOVA. Then StatFaRmer computes medians for technical repetitions of the same plants within time clusters, and saves the processed table as an RDS file for the Shiny app.

StatFaRmer performs ANOVA on user-selected grouping factors and their interactions for a specified trait, followed by Tukey’s test. It includes Shapiro-Wilk tests to assess normality, displaying results and diagnostics in the ANOVA tab to aid informed decisions on further analysis, such as adjusting outlier removal strategies and model parameters.

2.3 User interface

StatFaRmer has a friendly interface made with the Shiny app framework, which allows users to analyze data interactively and share the dashboards. Users can choose grouping factors, factor levels, treatments, and more to customize their analysis. The main panel displays faceted violin plots with automatically assigned characters from multiple comparisons, based on ANOVA/Tukey’s tests of user defined groups. There are also tabs for looking at raw data, stats, ANOVA and Tukey’s test results, and group comparisons, giving users more ways to analyze their data.

The tool’s server logic handles data smoothly using reactive expressions and debouncing techniques. It allows users to select and change variables by specifying them in a drop-down list and removing them using the Backspace or Delete keys for dynamic data visualization. By presenting mean, median, deviation from normal distribution, ANOVA and Tukey’s test results, StatFaRmer provides a comprehensive view on the data. Additionally, StatFaRmer makes it easy for users to see and download plots as SVG files, formatted tables with formulas used, raw data and their stats, ANOVA results, Tukey’s test outcomes, and group comparison characters. This feature makes it simple for users to explore, interpret and record the results of their statistical analysis.

2.4 Utilization of the StatFaRmer tool for statistical analysis of phenotypic data

The tool is run in a browser (tested on major web browsers) at address (https://stathmin.shinyapps.io/StatFaRmer). A sample dataset of different plant species (bread wheat (Triticum aestivum), durum wheat (Triticum durum), and triticale (× Triticosecale)), cultivars (35 variants) and plant genotypes (allelic state of 3 genes), with different treatments (3 variants), and the time series of morphological and spectral parameters of these plants is loaded in this tool as an example and available on GitHub.

3 Results

3.1 Initial data processing

3.1.1 The application of the DBSCAN method for the soybean dataset

Clustering similar timestamps with DBSCAN — with epsilon guided by the experimental design (e.g., event duration or data collection frequency, typically 1 hour) allows for identification of patterns in time-series data. This approach minimizes artifacts from cruder timestamp aggregation, clarifying meaningful relationships. The effectiveness of DBSCAN is contingent on selected epsilon, requiring meticulous tuning for optimal clustering.

This feature is exemplified by an experiment on soybean (Glycine max) phenotyping, where 50 varieties were grown under two photoperiods with two repetitions, totaling 200 plants. Seed preparation involved treating dry seeds with a fungicide, then treated soybean seeds were planted 2 cm deep in 500 ml pots filled with 230 g of moistened peat, with four seeds per pot. After germination, three plants were retained per pot. Plants were grown in a climate chamber under a photoperiod of 22 hours light and 2 hours dark. Initial lighting was continuous (24/0) for the first three days to prevent seedling stretch. Temperatures were maintained at +26°C daytime and +25°C nighttime, with an intensity of 400 μmol/m²/s. Pots were arranged randomly and repositioned weekly. During the first 7-10 days, plants were watered with room-temperature water, then with 50 ml as needed. Once true leaves appeared, a mineral fertilizer was applied daily at 30 ml per pot. For scanning, plants were moved to a phenotyping table, organized by variety and replication. Soybeans were grouped into sets of 12 pots, recorded in separate blocks, totaling nine blocks. Figure 2A displays the raw data in HortControl (PHENOSPEX, [[NoYear]]), the default application of Phenospex, while Figure 2B presents the same data after time clustering using StatFaRmer.

Figure 2
www.frontiersin.org

Figure 2. (A) The first two measurements of soybean phenotyping. The X-axis represents the date and time of each scan, while the Y-axis shows the corresponding digital biomass. Each line on the graph corresponds to an individual plant, resulting in a total of 200 graphs. The data is unprocessed, with each timestamp represented as a separate plotted point, totaling 34 points per measurement. This creates visual clutter, evident as “ladders” at the edges, and complicates further analysis. (B) The same data, presented in StatFaRmer. The repetitions are averaged and displayed as one graph per variety, reducing the total to 50 graphs in one image. Similar timestamps are clustered and represented as single time points, enhancing the graph’s visual accessibility and making the data easier to use for statistical analysis. The points in this graph represent unprocessed measurements, while the lines represent data processed in StatFaRmer.

Biologically, this function is crucial for making the data more accessible to humans. In the original unprocessed figure, the volume of data is too large to discern any trends (200 graphs compared to 50). Additionally, the spikes and “ladders” caused by the absence of timestamp clustering make it difficult to follow the individual graphs and the figure as a whole.

3.1.2 Example of the outlier removal for bread wheat, durum wheat, and triticale datasets

Filtering outliers in timestamp clusters using a 3-sigma threshold preserves data integrity in time-series analyses. This method discards measurements beyond three standard deviations from the mean, mitigating measurement errors and highlighting true trends. It assumes normality; thus, alternative methods like interquartile range (IQR) may be needed. In this case, the variable use_IQR changes the filtration method.

This method is illustrated by our sample dataset of bread wheat, durum wheat, and triticale plants featuring 35 varieties and 3 treatments, measured across 2 repetitions, totaling 210 plants. A more detailed description of this experiment can be found in section 3.5. The original dataset includes outliers, as shown in Figure 3A, where specific outlier measurements are indicated by arrows. StatFaRmer automatically filters these outliers, resulting in the adjusted dataset shown in Figure 3B. After applying either 3-sigma threshold or IQR outlier removal, 99% and 97% of measurements are retained, respectively. The outliers are primarily caused by uncontrollable external factors, such as improper positioning of glossy or reflective plant parts or interference from nearby foliage in the scanning area. The tool allows analyzing the affected plants in subsequent timestamps to assess any persistent issues or trends.

Figure 3
www.frontiersin.org

Figure 3. Plots of the same data collected during bread wheat, durum wheat, and triticale phenotyping from HortControl and StatFaRmer. (A) Unprocessed data collected during phenotyping by HortControl, the primary software for the TraitFinder platform. The X axis is the date and time for each scan, the Y axis is the corresponding plant leaf area. Each line on the graph represents an individual plant. (B) Comparative visualization from StatFaRmer with individual lines displaying the changes over time of DBSCAN cluster medians. The outliers are removed, the spike at Apr-13 is made more evident, presuming technical issues to be studied in more detail. The thin lines represent the “height map” or 2D density map and are used when the number of plotted observations exceeds 2000 individual points.

The biological significance of this lies in the automatic removal of obviously unrealistic measurements that result from errors, such as interference from a person or an inanimate object during measurements. Since we conduct numerous large-scale experiments in limited spaces, we have encountered instances of human error, such as rearranging pots while TraitFinder is still running. While we document these incidents thoroughly, having a function that removes the most extreme outliers is essential.

3.1.3 Example of logit transformation for sugar beet dataset

Logit transformation corrects deviation from normality in percentage data, particularly near 0% and 100%, by converting bounded proportions to an unbounded scale, stabilizing variance for statistical analyses like linear regression and ANOVA while still being easily interpretable (Lin and Xu, 2020). Before the transformation, we replace 0% or 100% values with the closest observed values to maintain the integrity of the analysis.

As an example of such a percentage data, which normally occurs in phenotyping analysis, we have studied the share of specific Plant Senescence Reflectance Index (PSRI) bins. PSRI is calculated as follows:

REDGREENNIR

where the red wavelength is 620-645 nm, green is 530-540 nm, and near-infrared (NIR) is 720-750 nm. Spectral indices can also be represented using bins. A bin counts how many points of a given 3D scan fall within its defined boundaries, each having a lower and upper limit. The number of points within the bin is expressed as a percentage of the total area. In this case, the PSRI index consists of six bins, with bin 0 [-4:-0.8] being the lowest. A lower PSRI value indicates healthier plants (Merzlyak et al., 1999).

To illustrate this feature we picked an experiment with sugar beet (Beta vulgaris) plants of 2 varieties which were exposed to a prolonged period of cold (vernalization) at 5 and 10°C (24 plants in total) and were grown at lighting conditions differing by spectrum. After pre-sowing preparation, the seeds for germination were placed on moist filter paper in a plastic container, which was covered with film and kept in the dark at room temperature for three days. The growing containers used were 500 ml plastic seedling pots filled with sterilized peat that had been treated for 15 minutes at 121°C, mixed with perlite in a 5:1 ratio. Germinated seeds were planted at a rate of three seeds per pot. Vernalization was carried out at temperatures of +5°C and +10°C. Temperature was monitored through temperature and humidity sensors the entire duration of the experiment. Plants were grown under LED lamps with photoperiods of 22/2 and 10/14 hours, using white light at an intensity of 60, blue + red light at an intensity of 500, and blue + red light at an intensity of 466.

In Figure 4 we compare the PSRI [-4:-0.8] bin of plants grown at various lighting conditions at temperature 10°C. The Plant Senescence Reflectance Index (PSRI) is a key parameter in plant science for assessing leaf senescence. Proposed in 2002 (Merzlyak et al., 1999), it helps estimate the onset, stage, relative rates, and kinetics of senescence and ripening processes.

Figure 4
www.frontiersin.org

Figure 4. Results of the Plant Senescence Reflectance Index (PSRI) for sugar beet plants after a vernalization period, under different lighting conditions in the [-4:-0.8] bin. This bin is the lowest, with lower PSRI values corresponding to healthier plants. The graph comprises six panels for easy comparison among the different conditions. The left three panels display measurements from the initial time point, while the right three were measured two months later to evaluate how the spectrum affects the plants. On the right, there is a bar indicating the spectrum options. The colors represent different clusters. Clusters that do not share common characters (a and b) are significantly different. The equation above the panels represents the analyzed factors and interaction in the ANOVA.

As shown in Figure 4, the lowest and healthiest bin of PSRI was significantly reduced after two months of growing the plants under white light, while there was little change under the other conditions. Based on the conducted research, it can be stated that a short day, particularly in combination with low temperature, significantly slows down plant development. However, the blue-red spectrum at an intensity of 400-500 μmol/m²/s mitigates this effect.

3.1.4 Example of support for supplementary data for bread wheat, durum wheat, and triticale dataset

Integrating Phenospex (TraitFinder) data with user-generated tables enriches datasets by adding variables like environmental and genetic factors, enhancing analysis robustness and validity, and improving insights into phenotypic traits and research reproducibility. Grouping and subsetting data by genes, treatments, cultivars, and time clusters enables targeted hypothesis testing for selected traits.

This step is illustrated with a subset of our experiment with bread wheat. All the cultivars of bread wheat were screened for their allelic states of the Ppd-1 gene, which is known to regulate inflorescence architecture and paired spikelet development in bread wheat (Boden et al., 2015). A more detailed description of this experiment can be found in section 3.5. After the experiment was completed, the data were uploaded in csv format and supplemented with information about Ppd-1 alleles in each cultivar. Using StatFaRmer, we assessed the effect of the Ppd-1 alleles Ppd-D1a and Ppd-D1b on the digital biomass of bread wheat (Figure 5). Photoperiod-insensitive alleles of Ppd-1 are commonly utilized in breeding to reduce the requirement for long day lengths and promote earlier flowering in the season. The graph indicates that the digital biomass was nearly equal for plants with Ppd-D1a and Ppd-D1b alleles at the beginning of the experiment (first column). However, by the end of the experiment (third column), plants with the Ppd-D1b allele exhibited significantly higher digital biomass, suggesting that some Ppd-1 alleles may enhance biomass gain.

Figure 5
www.frontiersin.org

Figure 5. This graph consists of six panels: the top three illustrate how the digital biomass of plants with the Ppd-D1a allele changed over time, while the bottom three show the same process for plants with the Ppd-D1b allele. The three time points are indicated at the top, and colors represent the respective clusters. Clusters sharing a common character (c and cd; cd and de; de and d) may overlap. They are arranged in order of digital biomass value, with plants in cluster “a” having the highest biomass and being the most distinct from those in cluster “e.” The equation above the panels represents the correlation of the analyzed factors in the ANOVA test.

3.2 Examples of phenotypic data visualization with dynamic faceting for sugar beet, corn and sunflower datasets

The tool’s visualization allows for interpretation of temporal variations among groups, marked by characters, obtained based on Tukey’s test p-values, where common characters identify levels or groups that are not significantly different. This aids in hypothesis formulation and decision-making. Alternatively, it allows users to observe and compare time trends between groups of interest. A color-coded, color-blind-friendly palette enhances accessibility and clarity for all users.

This function is illustrated (Figure 6) with a dataset consisting of plants of 3 varieties of corn (Zea mays) and 3 varieties of sunflower (Helianthus annuus) in 6 repetitions grown under two different conditions (72 plants in total). The sunflower varieties used were Zhemchuzhina, Korona, and CC-4, while the corn varieties included Marmeladka, 147MB, and 975-5. The plants were grown under photoperiods of 10/14 and 22/2.

Figure 6
www.frontiersin.org

Figure 6. The comparison of biomass growth of sunflower (A) and corn (B) under two distinct sets of conditions. The graph illustrates that while corn exhibited variability in response to the different growing conditions, sunflowers showed consistent growth across both environments.

As shown in Figure 6, digital biomass is similar across various conditions for both sunflower and corn at the initial measurement. At the second time point, while sunflowers continue to exhibit similar digital biomass across all treatments, corn demonstrates a preference for the 10/14 photoperiod. This suggests that photoperiod has a more significant impact on corn compared to sunflower. Alternatively, the effects of photoperiod on sunflowers may manifest later due to differences in growth rates.

The facet syntax aligns with R formula principles, allowing grouping variables to be placed around the tilde (~) for distinct vertical or horizontal subplots. This flexibility enhances data clarity and interpretability, while using “~.” simplifies visualizations by removing faceting when necessary.

To illustrate this feature we used the previously mentioned experiment with sugar beet plants and compared the growth of leaf area with different temperatures of vernalization. As shown in Figure 7, the leaf area of plants exposed to 5°C vernalization remained consistently lower throughout the experiment. This likely indicates that the plants experienced more stress, resulting in suppressed leaf growth. Conversely, it may suggest that the plants were conserving energy to invest in flowering.

Figure 7
www.frontiersin.org

Figure 7. (A) This study compares the leaf area growth of sugar beets vernalized at 5°C and 10°C. The results indicate that plants vernalized at 10°C exhibited a larger leaf area compared to those vernalized at 5°C. However, the observed distributions are not normal, raising questions about the underlying factors affecting growth. Additionally, the same data was visualized as a timeline (B) by eliminating the faceting. The deviations from normality observed in (B) can thus be attributed to some plants not initiating growth throughout the experiment. The data is plotted in a single subplot by setting the faceting formula to “~ 1” and using genotype as the grouping factor.

3.3 Statistical evaluation of the differences between groups for significance

3.3.1 Example of descriptive statistics for cocklebur and lettuce datasets

StatFaRmer reports raw tables and tables with descriptive statistics. They provide essential insights on sample size (n), central tendency (median, mean), variability (cv_perc), range (min, max), and distribution shape (skewness, kurtosis), guiding further analysis.

This feature is illustrated with an experiment in which 22 plants of cocklebur (Xanthium strumarium) and 22 plants of lettuce (Lactuca sativa) were treated with 4 different herbicides. Plants were grown in pots measuring 16.5 cm x 9.5 cm x 8.5 cm, with two plants per pot, using a universal soil that contains all the necessary macro- and microelements.Soil moisture was maintained at 50% through watering three times a week, and plants were kept indoors at 22°C and 60% humidity with 16 hours of light. Temperature and humidity were regulated using air conditioning units and water containers, and they were continuously monitored with temperature and humidity sensors throughout the entire duration of the experiment.

On day 0, plants were sprayed with water or clopyralid formulations (20.6 mL/m²) in five replicated pots. Clopyralid formulations, Hacker WG and Hacker 300 SL, were obtained from JSC August Inc. A gemini surfactant, 16-6-16, was synthesized from hexadecyl bromide and N,N′-tetramethylhexamethylenediamine. Readers can find a more detailed description of the results in our article dedicated to this experiment (Mirgorodskaya et al., 2023).

Figure 8 demonstrates that lettuce plants were significantly more susceptible to herbicide treatment compared to cocklebur, as indicated by the higher PSRI in lettuce. The descriptive statistics in Table 1 confirm this observation.

Figure 8
www.frontiersin.org

Figure 8. Timeline representation of the herbicide experiment with lettuce and cocklebur. Due to over 2,000 observations, the data is presented as density maps. Notably, May 17 exhibits a decline in average PSRI for one cocklebur sample, lasting for several days.

Table 1
www.frontiersin.org

Table 1. Descriptive statistics from the herbicide experiment at three time points.

The increased sensitivity of lettuce indicates a deficiency in protective mechanisms against herbicides, making it more susceptible to chemical stress. In this experiment, lettuce was used as a control plant due to its low resistance to chemical stress, and its higher PSRI indicates that the herbicide is effective.

3.3.2 Example of ANOVA and Tukey’s test for cocklebur

ANOVA’s user-selected factors are automatically supplemented with their two-way interactions, which allows for thorough assessment of variable influences on responses. This aids in identifying complex relationships, but researchers must be cautious of potential overfitting due to increased model complexity. StatFaRmer reports post-ANOVA tables (Table 2) to succinctly present key results.

Table 2
www.frontiersin.org

Table 2. ANOVA table illustrating the significance of observed effects in cocklebur plants treated with four herbicide compositions across days 1, 2, and 10 of the experiment.

This feature is illustrated by a subset of the same experiment. The Normalized Difference Vegetation Index (NDVI) is a widely used vegetation index for assessing plant health. Figure 9 displays sets of three time points for plants treated with four different herbicide compositions. Cluster “a” represents higher NDVI values, indicating healthier plants before herbicide treatment. Conversely, cluster “e,” which has the lowest NDVI, appears only at the end of the experiment with Treatment 1. This suggests that this treatment is particularly effective at destroying this specific weed.

Figure 9
www.frontiersin.org

Figure 9. Comparison of NDVI in cocklebur plants treated with various herbicides. Treatment 1 notably decreased NDVI, showing significant effects the day after application and lasting for a week. The colors represent clusters, where lack of common characters in cluster names indicate statistically significant differences between the clusters. These lowercase letters are automatically assigned characters from multiple comparisons, based on ANOVA/Tukey’s tests of user defined groups. They are used consistently across multiple figures.

The “Tukey” feature identifies contrasts among parameter combinations across factors and time points. StatFaRmer reports these tables after the Tukey’s test to present key results, including contrasts, estimates, confidence intervals (conf.low, conf.high), adjusted p-values, and significance (sig).

3.4 Results export

In StatFaRmer Shiny App, all tables and produced plots can be downloaded after applying filters and subsets (via the “Download Full Results” and “Save Plot as SVG” buttons).

3.5 The dataset of high-throughput plant phenotyping of Triticeae

To evaluate the features and performance of StatFaRmer, a study on the growth of various cereal plants under conditions of nitrogen starvation and low and high nitrate concentrations has been performed (Figure 10). The grains of different cultivars of bread wheat (Triticum aestivum), durum wheat (Triticum durum), and triticale (× Triticosecale) were placed on Petri dishes containing moist filter paper and incubating them at 25 degrees Celsius. After seed germination, the seedlings were transferred to pots filled with sand and watered regularly with modified Hoagland solutions with various concentrations of nitrates. These solutions contained 0, 1mM and 10mM potassium nitrate, and the first two solutions were supplemented by potassium chloride to maintain the same molar potassium content that the third solution. The pots were arranged randomly. The modified Hoagland solutions were added 3 times per week in quantities to replace the lost weight of the pots. Temperature and humidity were monitored throughout the entire duration of the experiment.

Figure 10
www.frontiersin.org

Figure 10. Schematic representation of the study examining nitrate’s effect on bread wheat, durum wheat, and triticale growth, which produced the sample dataset for StatFaRmer. * here represents "wildcard character".

Phenotypic observations of all plants were conducted three times daily in two to three replicated measurements using the TraitFinder phenotyping platform (Phenospex, Netherlands). This platform is based on a PlantEye — a laser scanner coupled with a multispectral imager. PlantEye generates a 3D point cloud with reflectance values for each point at four wavelengths. The TraitFinder we used was equipped with two PlantEye scanners that were installed at some distance and angles to minimize plant blocked areas. Two 3D point clouds from each PlantEye were combined into one point cloud with better coverage of plants. Using these 3D point clouds, a HortControl software calculated various morphological and spectral parameters of the plant.

Among the morphological parameters are plant height, leaf area and digital biomass, which is determined by the product of the two previous parameters and for plants with the same architecture correlates well with the real plant biomass (Vadez et al., 2015; Maphosa et al., 2017; Quijia Pillajo et al., 2024). Among the spectral parameters are NDVI and its “bins” — the proportion of plant leaf area that has NDVI values in a certain range. A set of parameters obtained during the experiment for all plants and time points exported in archived CSV format, was combined with annotation tables and imported into StatFaRmer as a standard sample dataset.

3.6 StatFaRmer performance evaluation

Data processing speed of StatFaRmer has been tested on a laptop equipped with an AMD Ryzen 5 5500U with Radeon Graphics 2.10 GHz with 8 GB of RAM. The test showed that the initial launch of the program using the bread wheat, durum wheat, and triticale study dataset (18 Mb.csv file that contains 58,380 rows and 49 columns) takes 30 seconds. Subsequent launches take only 4 seconds, since the.rds objects used by the shiny app were already created when the application was first launched.

ANOVA analysis and plotting violin charts for a phenotypic parameter for selected time points occurs almost instantly for numbers of treatment and timestamp levels below ten. Plotting time series trends for big datasets such as the bread wheat, durum wheat, and triticale study dataset (almost 20,000 measurements to represent after averaging within DBSCAN clusters) was sped up by drawing density maps instead of point geometric objects when trying to plot more than 2,000 measurements.

In this series of studies, StatFaRmer has become essential to evaluate the outcomes obtained from digital time-series phenotyping due to its flexibility and a wide range of customizable parameters for analysis. During our work with this tool, we were able to explore a diverse range of plant cultivars and identify the factors that influence the condition of specific plants. For instance, in our latest experiment, we grew plants of various varieties, and because of this tool, it was possible to not only compare the growth patterns between different varieties but also assess the impact of various alleles by grouping the varieties based on these allelic variants.

The resulting tool can be accessed at 9https://github.com/Stathmin/StatFaRmer), with the instructions on installation and the sample dataset provided.

4 Discussion

The evaluation of StatFaRmer underscores its efficiency in processing data for digital time-series phenotyping, with an initial launch time of 30 seconds, followed by just 4 seconds for subsequent access. The tool demonstrates strong capabilities in conducting ANOVA and generating violin plots for both simple and complex datasets, making it valuable for a variety of applications. It also works reliably with datasets from vastly different cultures, including bread wheat, durum wheat, and triticale, sugar beet, cocklebur and lettuce, corn and sunflower, and soybean. However, to maximize its potential, future research should prioritize enhancing its usability and exploring integration possibilities with other tools, thereby strengthening its role in plant phenotyping workflows.

Three-dimensional visualization techniques can be classified into active and passive categories. Active techniques utilize a controlled source of structured energy emission, such as a scanning laser or projected light pattern, in conjunction with a detector, such as a camera, to generate an image. Passive techniques rely on ambient lighting to form an image (Harandi et al., 2023). Point sets can contain noise originating from various sources, whether the point cloud was actively or passively generated. Generated point clouds often suffer from limited sensor accuracy and measurement errors caused by environmental factors. Therefore, it is crucial to promptly identify and eliminate these outliers to prevent their impact on the accuracy of the results. In addition to environmental factors, technical errors caused by human interference can also lead to inaccuracies in generated point clouds. Common errors may also include improperly calibrated equipment, misaligned sensors, or incorrect parameter settings during the data acquisition process. Furthermore, other plausible reasons for errors in point clouds could be occlusions, reflections, and varying surface properties of objects being scanned.

The results demonstrate that our framework is robust and suitable for diagnostics throughout the experiment, requiring no formal statistical knowledge or advanced tool expertise.

We are also exploring ways to make our tool more general and less reliant on TraitFinder. Aside from the naming conventions in the experiment.csv file, we have largely achieved this goal. An alternative to our current approach with data acquisition would be to utilize measurements from deep learning models. Recently, there has been a significant increase in the use of deep learning models for analyzing phenotypic data. This trend has become increasingly important for the advancement of plant phenotyping research, as the available phenotyping platforms can be broadly categorized into two types: previously discussed commercially available solutions that offer, among other features, data processing tools, which are often proprietary; and more affordable options based on RGB or multispectral cameras and LiDARs, typically implemented on self-built platforms for indoor or outdoor use, or on unmanned aerial vehicles (UAVs) (Gano et al., 2021; Gano et al., 2024). In this latter case, the processing tools must be developed independently, and deep learning models represent the most flexible option for this purpose. We plan to use StatFaRmer in tandem with open-source deep learning point cloud processing tools to reduce reliance on proprietary instruments and extract new biological features from existing point clouds. This process could benefit from a “sanity check” through parallel processing with previously explored features.

As previously mentioned, deep learning models represent a versatile option for the open-source analysis of phenotypic data. For example in a recent study, the researchers developed a high-throughput phenotyping method utilizing RGB and infrared time-series data obtained from unmanned aerial vehicles (UAVs) and a multi-modal image segmentation model in order to monitor and quantitatively assess the growth of soybean canopy (Yu et al., 2024). The study found that the RIFSeg-Net, a novel multimodal image segmentation model, outperformed traditional deep learning-based image segmentation networks in accurately extracting canopy cover from unmanned aerial vehicle (UAV) images. The study demonstrates the potential of high-throughput phenotyping to rapidly identify crop germplasm with favorable traits such as high yield, disease resistance, and improved quality. This method can assist breeders in developing novel varieties with increased productivity and resilience, thereby enhancing crop quality and yield simultaneously.

Another study proposes a method for automatically acquiring detailed traits of rice panicles based on time-series images, using the YOLO v5 and ResNet50 models, as well as the DeepSORT algorithm, to analyze the effect of nitrogen on panicle development during the heading and flowering stages (Zhou et al., 2023). The proposed approach achieved high accuracy in counting panicles (R2 = 0.96, root mean square error (RMSE) = 1.73), as well as in estimating the heading date (absolute error of 0.25 days). The study revealed that higher nitrogen application leads to an earlier initiation and longer duration of flowering, and a longer total duration from the beginning of vigorous flowering to the end of the process. This proposed technique provides a novel approach to analysis for agricultural experts, and the impact of nitrogen on rice heading and blooming may assist us in avoiding extreme weather conditions and achieving sustainable and stable food production.

Another research paper explores the use of Terrestrial Laser Scanning (TLS) to study seasonal and circadian rhythms in plants and leaves under standard and cold stress conditions (Jin et al., 2021). The methods used in the research paper involved the collection of LiDAR data along with environmental data such as photosynthetically active radiation (PAR), temperature, and relative humidity throughout the growing season. Seasonal rhythms in structural traits like azimuth and Plant Leaf Area Index (PLA) were consistent between plant and leaf levels, while leaf-level rhythms were more diverse, such as changes in leaf inclination angle. Circadian rhythms of certain traits were found to be opposite under cold stress and standard conditions, with environmental factors showing stronger correlations with leaf trait rhythms under cold stress, especially air temperature. The study highlights the potential of using time-series TLS to study crop chronobiology in outdoor environments, aiding in understanding plant rhythms and survival strategies in response to environmental changes.

Many advanced deep learning data analysis methods under development could greatly benefit from a reliable and transparent validation tool, enabling comprehensive evaluation of outcomes through straightforward, interpretable metrics. Conversely, StatFaRmer would enhance its effectiveness by collaborating with emerging platforms designed for measuring traits critical to breeding. This integration would help alleviate phenotypic measurements as a bottleneck in Genome-Wide Association Studies (GWAS), streamlining the research process and improving the accuracy of trait assessments.

In the realm of time series analysis, a work by (Han et al., 2019) delves into the nuances of exploring data dynamic patterns, notably through the application of fuzzy clustering analysis. The approach of the study, if implemented in the newer versions of StatFaRmer, would allow detecting new phenotypic traits hidden in the temporal profiles.

Moreover, the limitations of traditional ANOVA in time series analysis have motivated researchers to explore more sophisticated approaches (Spyroglou et al., 2021). introduced a novel methodology that integrates generalized linear mixed models with classical time series models, modernizing the analysis of longitudinal datasets in plant sciences.

Furthermore, the landscape of crop-specific point cloud segmentation tools has seen significant advancements, exemplified by the pioneering work (Li et al., 2023). By harnessing these state-of-the-art tools, researchers can now extract valuable traits with unprecedented accuracy, underscoring the urgency of modularizing StatFaRmer for broader accessibility beyond one specific platform.

These articles inspire us by clearly indicating pathways for the future improvement of StatFaRmer.

5 Conclusion

StatFaRmer is an open-source tool created as a Shiny dashboard that is useful for the analysis of time series datasets in CSV format with capabilities of outlier filtration, grouping based on multiple parameters simultaneously, and more advanced statistical methods for assessing the significance of effects. It can be easily copied and used through a web interface by any number of users.

In this series of studies, StatFaRmer has become essential to evaluate the outcomes obtained from digital time-series phenotyping due to its flexibility and a wide range of customizable parameters for analysis. During our work with this tool, we were able to explore a diverse range of plant cultivars and identify the factors that influence the condition of specific plants. For instance, in our latest experiment, we grew plants of various varieties, and because of this tool, it was possible to not only compare the growth patterns between different varieties but also assess the impact of various alleles by grouping the varieties based on these allelic variants.

Data availability statement

The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article/supplementary material.

Author contributions

DU: Data curation, Methodology, Software, Visualization, Writing – original draft, Writing – review & editing. AU: Data curation, Formal analysis, Investigation, Visualization, Writing – original draft, Writing – review & editing. DL: Investigation, Resources, Supervision, Validation, Writing – review & editing. AAK: Data curation, Investigation, Resources, Writing – review & editing. NS: Data curation, Formal analysis, Investigation, Writing – review & editing. MD: Conceptualization, Funding acquisition, Project administration, Supervision, Writing – review & editing. GK: Project administration, Supervision, Writing – review & editing. AYK: Data curation, Methodology, Resources, Writing – review & editing. VV: Data curation, Software, Writing – review & editing. AV: Data curation, Software, Writing – review & editing.

Funding

The author(s) declare that financial support was received for the research, authorship, and/or publication of this article. This research received funding from two key sources. The State Assignment FGUM-2025-0010 played a vital role in the development of the StatFaRmer tool and automated digital phenotyping methods tailored for a range of plant species, including soybean, lettuce, and maize. This work specifically highlighted genetic and treatment differences. Simultaneously, the Russian Science Foundation Grant No. 24-16-00274 facilitated extensive cereal experiments, detailed in the article. This funding supported rigorous statistical analysis, improved data processing, and effective visualization techniques.

Acknowledgments

We acknowledge the use of ChatGPT-3.5 to enhance the grammar and style of this article. The authors initially produced a draft, which underwent a four-step prompting session involving specific queries to refine the text and followed by fact-checking. The queries were the following: 1. Rewrite the given text in the context of a scientific article, ensuring clarity and conciseness. *Initial draft* 2. Suggest improvements to enhance precision and clarity in the text for scientific readership. 3. Critique the revised statement as a reviewer, addressing any weaknesses or areas for further improvement. 4. Revise the text again, incorporating the reviewer’s feedback and making it even more concise.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

References

Al-Tamimi, N., Langan, P., Bernád, V., Walsh, J., Mangina, E., Negrão, S. (2022). Capturing crop adaptation to abiotic stress using image-based technologies. Open Biol. 12. doi: 10.1098/rsob.210353

PubMed Abstract | Crossref Full Text | Google Scholar

Araus, J. L., Kefauver, S. C., Vergara-Díaz, O., Gracia-Romero, A., Rezzouk, F. Z., Segarra, J., et al. (2022). Crop phenotyping in a context of global change: What to measure and how to do it. J. Integr. Plant Biol. 64, 592–618. doi: 10.1111/jipb.13191

PubMed Abstract | Crossref Full Text | Google Scholar

Atkins, A., Allen, T., Wickham, H., McPherson, J., Allaire, J. J. (2024). rsconnect: Deploy Docs, Apps, and APIs to “Posit Connect”, “shinyapps.io”, and “RPubs”. Available online at: https://CRAN.R-project.org/package=rsconnect (Accessed October 16, 2025).

Google Scholar

Australian Plant Phenomics Network Our Infrastructure. (2024). Available online at: https://www.plantphenomics.org.au/plant-phenomics/our-infrastructure (Accessed October 16, 2024).

Google Scholar

HyperAIxpert Family - LemnaTec. (2024). Available online at: https://www.lemnatec.com/hyperaixpert/ (Accessed October 16, 2024).

Google Scholar

TraitFinder digital phenotyping workstation on wheels for lab- and greenhouse phenotyping automation - PHENOSPEX. (2024). Available online at: https://www.phenospex.com/products/plant-phenotyping/traitfinder-for-lab-and-greenhouse-phenotyping-automation/ (Accessed October 16, 2024).

Google Scholar

Bache, S. M., Wickham, H. (2022). magrittr: A Forward-Pipe Operator for R. Available online at: https://CRAN.R-project.org/package=magrittr (Accessed October 16, 2025).

Google Scholar

Boden, S. A., Cavanagh, C., Cullis, B. R., Ramm, K., Greenwood, J., Jean Finnegan, E., et al. (2015). Ppd-1 is a key regulator of inflorescence architecture and paired spikelet development in wheat. Nat. Plants. 1, 14016. doi: 10.1038/nplants.2014.16

PubMed Abstract | Crossref Full Text | Google Scholar

Chang, W., Cheng, J., Allaire, J. J., Sievert, C., Schloerke, B., Xie, Y., et al. (2024). shiny: Web Application Framework for R. Available online at: https://CRAN.Rproject.org/package=shiny (Accessed October 16, 2025).

Google Scholar

Daróczi, G. (2024). logger: A Lightweight, Modern and Flexible Logging Utility. Available online at: https://CRAN.R-project.org/package=logger (Accessed October 16, 2025).

Google Scholar

Demidchik, V. V., Shashko, A. Y., Bandarenka, U. Y., Smolikova, G. N., Przhevalskaya, D. A., Charnysh, M. A., et al. (2020). Plant phenomics: fundamental bases, software and hardware platforms, and machine learning. Russian J. Plant Physiol. 67, 397–412. doi: 10.1134/S1021443720030061

Crossref Full Text | Google Scholar

Firke, S. (2023). janitor: Simple Tools for Examining and Cleaning Dirty Data. Available online at: https://CRAN.R-project.org/package=janitor (Accessed October 16, 2025).

Google Scholar

Gagolewski, M. (2022). stringi: Fast and portable character string processing in R. J. Stat. Software 103, 1–59. doi: 10.18637/jss.v103.i02

Crossref Full Text | Google Scholar

Gano, B., Bhadra, S., Vilbig, J. M., Ahmed, N., Sagan, V., Shakoor, N. (2024). Drone-based imaging sensors, techniques, and applications in plant phenotyping for crop breeding: A comprehensive review. Plant Phenome J. 7, e20100. doi: 10.1002/ppj2.20100

Crossref Full Text | Google Scholar

Gano, B., Dembele, J. S. B., Ndour, A., Luquet, D., Beurier, G., Diouf, D., et al. (2021). Using UAV borne, multi-spectral imaging for the field phenotyping of shoot biomass, leaf area index and height of west african sorghum varieties under two contrasted water conditions. Agronomy 11. doi: 10.3390/agronomy11050850

Crossref Full Text | Google Scholar

Garnier, S., Ross, N., Rudis, R., et al. (2024). viridis(Lite) - Colorblind-Friendly Color Maps for R. Available online at: https://sjmgarnier.github.io/viridis/ (Accessed October 16, 2025).

Google Scholar

Gohel, D., Skintzos, P. (2024). flextable: Functions for Tabular Reporting. Available online at: https://CRAN.R-project.org/package=flextable (Accessed October 16, 2025).

Google Scholar

Graves, S., Piepho, H. P. (2024). Dorai-Raj LS with help from S. multcompView: Visualizations of Paired Comparisons. Available online at: https://CRAN.R-project.org/package=multcompView (Accessed October 16, 2025).

Google Scholar

Grolemund, G., Wickham, H. (2011). Dates and times made easy with lubridate. J. Stat. Software 40, 1–25. doi: 10.18637/jss.v040.i03

Crossref Full Text | Google Scholar

Hahsler, M., Piekenbrock, M. (2024). dbscan: Density-Based Spatial Clustering of Applications with Noise (DBSCAN) and Related Algorithms. Available online at: https://CRAN.R-project.org/package=dbscan (Accessed October 16, 2025).

Google Scholar

Hahsler, M., Piekenbrock, M., Doran, D. (2019). dbscan: fast density-based clustering with R. J. Stat. Software 91, 1–30. doi: 10.18637/jss.v091.i01

Crossref Full Text | Google Scholar

Han, L., Yang, G., Dai, H., Yang, H., Xu, B., Feng, H., et al. (2019). Fuzzy clustering of maize plant-height patterns using time series of UAV remote-sensing images and variety traits. Front. Plant Sci. 10. doi: 10.3389/fpls.2019.00926

PubMed Abstract | Crossref Full Text | Google Scholar

Harandi, N., Vandenberghe, B., Vankerschaver, J., Depuydt, S., Van Messem, A. (2023). How to make sense of 3D representations for plant phenotyping: a compendium of processing and analysis techniques. Plant Methods 19, 60. doi: 10.1186/s13007-023-01031-z

PubMed Abstract | Crossref Full Text | Google Scholar

Hartmann, A., Czauderna, T., Hoffmann, R., Stein, N., Schreiber, F. (2011). HTPheno: An image analysis pipeline for high-throughput plant phenotyping. BMC Bioinf. 12, 148. doi: 10.1186/1471-2105-12-148

PubMed Abstract | Crossref Full Text | Google Scholar

Hein, N. T., Ciampitti, I. A., Jagadish, S. V. K. (2021). Bottlenecks and opportunities in field-based high-throughput phenotyping for heat and drought stress. J. Exp. Botany. 72, 5102–5116. doi: 10.1093/jxb/erab021

PubMed Abstract | Crossref Full Text | Google Scholar

Henry, L., Wickham, H. (2024). rlang: Functions for Base Types and Core R and “Tidyverse” Features. Available online at: https://CRAN.R-project.org/package=rlang (Accessed October 16, 2025).

Google Scholar

Hester, J., Bryan, J. (2024). glue: Interpreted String Literals. Available online at: https://CRAN.R-project.org/package=glue (Accessed October 16, 2025).

Google Scholar

Islam, N. U., Wani, S. H., Ali, G., Dar, Z. A., Wani, A., Lone, A. (2021). High-throughput phenotyping for abiotic stress resilience in cereals. J. Cereal Res. 13. doi: 10.25174/2582-2675/2021/111256

Crossref Full Text | Google Scholar

ISO (2017). ISO - ISO 8601 — Date and time format. Available online at: https://www.iso.org/iso-8601-date-and-time-format.html (Accessed October 16, 2025).

Google Scholar

Javornik, T., Carović-Stanko, K., Gunjača, J., Vidak, M., Lazarević, B. (2023). Monitoring drought stress in common bean using chlorophyll fluorescence and multispectral imaging. Plants (Basel). 12, 1386. doi: 10.3390/plants12061386

PubMed Abstract | Crossref Full Text | Google Scholar

Jin, S., Su, Y., Zhang, Y., Song, S., Li, Q., Liu, Z., et al. (2021). Exploring seasonal and circadian rhythms in structural traits of field maize from liDAR time series. Plant Phenomics. 2021. doi: 10.34133/2021/9895241

PubMed Abstract | Crossref Full Text | Google Scholar

Joshi, S., Thoday-Kennedy, E., Daetwyler, H. D., Hayden, M., Spangenberg, G., Kant, S. (2021). High-throughput phenotyping to dissect genotypic differences in safflower for drought tolerance. PloS One 16, e0254908. doi: 10.1371/journal.pone.0254908

PubMed Abstract | Crossref Full Text | Google Scholar

Kim, S. L., Kim, N., Lee, H., Lee, E., Cheon, K. S., Kim, M., et al. (2020). High-throughput phenotyping platform for analyzing drought tolerance in rice. Planta 252, 38. doi: 10.1007/s00425-020-03436-9

PubMed Abstract | Crossref Full Text | Google Scholar

Kim, M., Lee, C., Hong, S., Kim, S. L., Baek, J. H., Kim, K. H. (2021). High-throughput phenotyping methods for breeding drought-tolerant crops. Int. J. Mol. Sci. 22, 8266. doi: 10.3390/ijms22158266

PubMed Abstract | Crossref Full Text | Google Scholar

Kjaer, K. H., Ottosen, C. O. (2015). 3D laser triangulation for plant phenotyping in challenging environments. Sensors 15, 13533–13547. doi: 10.3390/s150613533

PubMed Abstract | Crossref Full Text | Google Scholar

Komsta, L., Novomestky, F. (2022). moments: Moments, Cumulants, Skewness, Kurtosis and Related Tests. Available online at: https://CRAN.R-project.org/package=moments (Accessed October 16, 2025).

Google Scholar

Lang, M. (2017). checkmate: Fast argument checks for defensive R programming. R J. 9, 437–445. doi: 10.32614/RJ-2017-028

Crossref Full Text | Google Scholar

Lazarević, B., Šatović, Z., Nimac, A., Vidak, M., Gunjača, J., Politeo, O., et al. (2021). Application of phenotyping methods in detection of drought and salinity stress in basil (Ocimum basilicum L.). Front. Plant Sci. 12. doi: 10.3389/fpls.2021.629441

PubMed Abstract | Crossref Full Text | Google Scholar

Leiva, F., Vallenback, P., Ekblad, T., Johansson, E., Chawade, A. (2021). Phenocave: an automated, standalone, and affordable phenotyping system for controlled growth conditions. Plants 10. doi: 10.3390/plants10091817

PubMed Abstract | Crossref Full Text | Google Scholar

Li, C., Li, Y., Chu, P., Hao-hao, Z., Wei, Z., Cheng, Y., et al. (2022). Effects of salt stress on sucrose metabolism and growth in Chinese rose (Rosa chinensis). Biotechnol. Biotechnol. Equipment. 36, 706–716. doi: 10.1080/13102818.2022.2116356

Crossref Full Text | Google Scholar

Li, D., Quan, C., Song, Z., Li, X., Yu, G., Li, C., et al. (2021). High-throughput plant phenotyping platform (HT3P) as a novel tool for estimating agronomic traits from the lab to the field. Front. Bioengineering Biotechnol. 8. doi: 10.3389/fbioe.2020.623705

PubMed Abstract | Crossref Full Text | Google Scholar

Li, Y., Wen, W., Fan, J., Gou, W., Gu, S., Lu, X., et al. (2023). Multi-source data fusion improves time-series phenotype accuracy in maize under a field high-throughput phenotyping platform. Plant Phenomics. 5, 0043. doi: 10.34133/plantphenomics.0043

PubMed Abstract | Crossref Full Text | Google Scholar

Lin, L., Xu, C. (2020). Arcsine-based transformations for meta-analysis of proportions: Pros, cons, and alternatives. Health Sci. Rep. 3, e178. doi: 10.1002/hsr2.v3.3

PubMed Abstract | Crossref Full Text | Google Scholar

Lobet, G. (2017). Image analysis in plant sciences: publish then perish. Trends Plant Science. 22, 559–566. doi: 10.1016/j.tplants.2017.05.002

PubMed Abstract | Crossref Full Text | Google Scholar

Maphosa, L., Thoday-Kennedy, E., Vakani, J., Phelan, A., Badenhorst, P., Slater, A., et al. (2017). Phenotyping wheat under salt stress conditions using a 3D laser scanner. Israel J. Plant Sci. 64, 55–62. doi: 10.1080/07929978.2016.1243405

Crossref Full Text | Google Scholar

Merzlyak, M. N., Gitelson, A. A., Chivkunova, O. B., Rakitin, V. Y. U. (1999). Non-destructive optical detection of pigment changes during leaf senescence and fruit ripening. Physiologia Plantarum. 106, 135–141. doi: 10.1034/j.1399-3054.1999.106119.x

Crossref Full Text | Google Scholar

Minervini, M., Giuffrida, M. V., Perata, P., Tsaftaris, S. A. (2017). Phenotiki: an open software and hardware platform for affordable and easy image-based phenotyping of rosette-shaped plants. Plant J. 90, 204–216. doi: 10.1111/tpj.2017.90.issue-1

PubMed Abstract | Crossref Full Text | Google Scholar

Mirgorodskaya, A. B., Kushnazarova, R. A., Zakharova, L. Y., Ulyanova, A. A., Litvinov, D. Y., Blinkov, A. O., et al. (2023). Enhanced herbicidal action of clopyralid in the form of a supramolecular complex with a Gemini surfactant. Agron. (Basel). 13, 973. doi: 10.3390/agronomy13040973

Crossref Full Text | Google Scholar

Müller, K., Wickham, H. (2023). tibble: Simple Data Frames. Available online at: https://CRAN.R-project.org/package=tibble (Accessed October 16, 2025).

Google Scholar

Nyonje, W. A., Schafleitner, R., Abukutsa-Onyango, M., Yang, R. Y., Makokha, A., Owino, W. (2021). Precision phenotyping and association between morphological traits and nutritional content in Vegetable Amaranth (Amaranthus spp.). J. Agric. Food Res. 5, 100165. doi: 10.1016/j.jafr.2021.100165

Crossref Full Text | Google Scholar

Ooms, J. (2024). writexl: Export Data Frames to Excel “xlsx” Format. Available online at: https://CRAN.R-project.org/package=writexl (Accessed October 16, 2025).

Google Scholar

Parmley, K., Nagasubramanian, K., Sarkar, S., Ganapathysubramanian, B., Singh, A. K. (2019). Development of optimized phenomic predictors for efficient plant breeding decisions using phenomic-assisted selection in soybean. Plant Phenomics. 2019. doi: 10.34133/2019/5809404

PubMed Abstract | Crossref Full Text | Google Scholar

Perrier, V., Meyer, F., Granjon, D. (2024). shinyWidgets: Custom Inputs Widgets for Shiny. Available online at: https://CRAN.R-project.org/package=shinyWidgets (Accessed October 16, 2025).

Google Scholar

PHENOSPEX HortControl – Plant Data Management Software. (2024). Available online at: https://phenospex.com/products/plant-phenotyping/science-hortcontrol-data-management-software/ (Accessed October 16, 2024).

Google Scholar

Pieruschka, R., Schurr, U. (2019). Plant phenotyping: past, present, and future. Plant Phenomics. 7507131. doi: 10.34133/2019/7507131

PubMed Abstract | Crossref Full Text | Google Scholar

Quijia Pillajo, J., Chapin, L. J., Martins, E. M., Jones, M. L. (2024). A biostimulant containing humic and fulvic acids promotes growth and health of tomato ‘Bush beefsteak’ Plants. Horticulturae 10. doi: 10.3390/horticulturae10070671

Crossref Full Text | Google Scholar

Rahaman, M., Chen, D., Gillani, Z., Klukas, C., Chen, M. (2015). Advanced phenotyping and phenotype data analysis for the study of plant growth and development. Front. Plant Sci. 6. doi: 10.3389/fpls.2015.00619

PubMed Abstract | Crossref Full Text | Google Scholar

R Core Team (2024). R: A Language and Environment for Statistical Computing (Vienna, Austria: R Foundation for Statistical Computing). Available at: https://www.R-project.org/.

Google Scholar

Robinson, D., Hayes, A., Couch, S. (2024). broom: Convert Statistical Objects into Tidy Tibbles. Available online at: https://CRAN.R-project.org/package=broom (Accessed October 16, 2025).

Google Scholar

Schmidt, L., Jacobs, J., Schmutzer, T., Alqudah, A. M., Sannemann, W., Pillen, K., et al. (2023). Identifying genomic regions determining shoot and root traits related to nitrogen uptake efficiency in a multiparent advanced generation intercross (MAGIC) winter wheat population in a high-throughput phenotyping facility. Plant Science. 330, 111656. doi: 10.1016/j.plantsci.2023.111656

PubMed Abstract | Crossref Full Text | Google Scholar

Sievert, C., Cheng, J., Aden-Buie, G. (2024). bslib: Custom “Bootstrap” “Sass” Themes for “shiny” and “rmarkdown”. Available online at: https://CRAN.R-project.org/package=bslib (Accessed October 16, 2025).

Google Scholar

Spyroglou, I., Skalák, J., Balakhonova, V., Benedikty, Z., Rigas, A. G., Hejátko, J. (2021). Mixed models as a tool for comparing groups of time series in plant sciences. Plants 10. doi: 10.3390/plants10020362

PubMed Abstract | Crossref Full Text | Google Scholar

Tripodi, P., Vincenzo, C., Venezia, A., Cocozza, A., Pane, C. (2024). Precision Phenotyping of Wild Rocket (Diplotaxis tenuifolia) to Determine Morpho-Physiological Responses under Increasing Drought Stress Levels Using the PlantEye Multispectral 3D System. Horticulturae 10. doi: 10.3390/horticulturae10050496

Crossref Full Text | Google Scholar

Ushey, K. (2023). renv: Project Environments. Available online at: https://CRAN.R-project.org/package=renv (Accessed October 16, 2025).

Google Scholar

Vadez, V., Kholová, J., Hummel, G., Zhokhavets, U., Gupta, S. K., Hash, C. T. (2015). LeasyScan: a novel concept combining 3D imaging and lysimetry for high-throughput phenotyping of traits controlling plant water budget. J. Exp. Botany. 66, 5581–5593. doi: 10.1093/jxb/erv251

PubMed Abstract | Crossref Full Text | Google Scholar

Wickham, H. (2016). ggplot2: Elegant Graphics for Data Analysis (New York: Springer-Verlag). Available at: https://ggplot2.tidyverse.org.

Google Scholar

Wickham, H. (2023a). forcats: Tools for Working with Categorical Variables (Factors). Available online at: https://CRAN.R-project.org/package=forcats (Accessed October 16, 2025).

Google Scholar

Wickham, H. (2023b). stringr: Simple, Consistent Wrappers for Common String Operations. Available online at: https://CRAN.R-project.org/package=stringr (Accessed October 16, 2025).

Google Scholar

Wickham, H., Averick, M., Bryan, J., Chang, W., McGowan, L. D., François, R., et al. (2019). Welcome to the tidyverse. J. Open Source Software 4, 1686. doi: 10.21105/joss.01686

Crossref Full Text | Google Scholar

Wickham, H., Bryan, J. (2023). readxl: Read Excel Files. Available online at: https://CRAN.R-project.org/package=readxl (Accessed October 16, 2025).

Google Scholar

Wickham, H., François, R., Henry, L., Müller, K., Vaughan, D. (2023a). dplyr: A Grammar of Data Manipulation. Available online at: https://CRAN.R-project.org/package=dplyr (Accessed October 16, 2025).

Google Scholar

Wickham, H., Henry, L. (2023). purrr: Functional Programming Tools. Available online at: https://CRAN.R-project.org/package=purrr (Accessed October 16, 2025).

Google Scholar

Wickham, H., Henry, L., Pedersen, T. L., Luciani, T. J., Decorde, M., Lise, V. (2023b). svglite: An “SVG” Graphics Device. Available online at: https://CRAN.R-project.org/package=svglite (Accessed October 16, 2025).

Google Scholar

Wickham, H., Hester, J., Bryan, J. (2024a). readr: Read Rectangular Text Data. Available online at: https://CRAN.R-project.org/package=readr (Accessed October 16, 2025).

Google Scholar

Wickham, H., Vaughan, D., Girlich, M. (2024b). tidyr: Tidy Messy Data. Available online at: https://CRAN.R-project.org/package=tidyr (Accessed October 16, 2025).

Google Scholar

Witthoft, C. (2023). vecsets: Like Set Tools in “Base” Package but Keeps Duplicate Elements. Available online at: https://CRAN.R-project.org/package=vecsets (Accessed October 16, 2025).

Google Scholar

Xie, Y., Cheng, J., Tan, X. (2024). DT: A Wrapper of the JavaScript Library “DataTables”. Available online at: https://CRAN.R-project.org/package=DT (Accessed October 16, 2025).

Google Scholar

Yang, W., Feng, H., Zhang, X., Zhang, J., Doonan, J., Batchelor, W., et al. (2020). Crop phenomics and high-throughput phenotyping: Past decades, current challenges and future perspectives. Mol. Plant 13 (2), 187–214. doi: 10.1016/j.molp.2020.01.008

PubMed Abstract | Crossref Full Text | Google Scholar

Yoosefzadeh Najafabadi, M., Heidari, A., Rajcan, I. (2023). AllInOne Pre-processing: A comprehensive preprocessing framework in plant field phenotyping. SoftwareX 23, 101464. doi: 10.1016/j.softx.2023.101464

Crossref Full Text | Google Scholar

Yu, H., Weng, L., Wu, S., He, J., Yuan, Y., Wang, J., et al. (2024). Time-series field phenotyping of soybean growth analysis by combining multimodal deep learning and dynamic modeling. Plant Phenomics. 6, 0158. doi: 10.34133/plantphenomics.0158

PubMed Abstract | Crossref Full Text | Google Scholar

Zhou, Q., Guo, W., Chen, N., Wang, Z., Li, G., Ding, Y., et al. (2023). Analyzing nitrogen effects on rice panicle development by panicle detection and time-series tracking. Plant Phenomics. 5, 0048. doi: 10.34133/plantphenomics.0048

PubMed Abstract | Crossref Full Text | Google Scholar

Keywords: high-throughput plant phenotyping, phenotypic data visualization, time series analysis, digital phenotyping platforms, genotype-phenotype analysis, statistical analysis of phenotypic data, open-source software, automated data analysis

Citation: Ulyanov DS, Ulyanova AA, Litvinov DY, Kocheshkova AA, Kroupina AY, Syedina NM, Voronezhskaya VS, Vasilyev AV, Karlov GI and Divashuk MG (2025) StatFaRmer: cultivating insights with an advanced R shiny dashboard for digital phenotyping data analysis. Front. Plant Sci. 16:1475057. doi: 10.3389/fpls.2025.1475057

Received: 02 August 2024; Accepted: 30 January 2025;
Published: 13 March 2025.

Edited by:

Muhammad Fazal Ijaz, Melbourne Institute of Technology, Australia

Reviewed by:

Dmitry Afonnikov, Russian Academy of Sciences (RAS), Russia
Boubacar Gano, Donald Danforth Plant Science Center, United States

Copyright © 2025 Ulyanov, Ulyanova, Litvinov, Kocheshkova, Kroupina, Syedina, Voronezhskaya, Vasilyev, Karlov and Divashuk. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Daniil S. Ulyanov, dWxkYXMxNTA4QGdtYWlsLmNvbQ==

These authors have contributed equally to this work and share first authorship

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Research integrity at Frontiers

Man ultramarathon runner in the mountains he trains at sunset

95% of researchers rate our articles as excellent or good

Learn more about the work of our research integrity team to safeguard the quality of each article we publish.


Find out more