High Throughput Screening Technologies in Biomass Characterization

Decker, Stephen R.; Harman-Ware, Anne E.; Happs, Renee M.; Wolfrum, Edward J.; Tuskan, Gerald A.; Kainer, David; Oguntimein, Gbekeloluwa B.; Rodriguez, Miguel; Weighill, Deborah; Jones, Piet; Jacobson, Daniel

doi:10.3389/fenrg.2018.00120

REVIEW article

Front. Energy Res., 27 November 2018

Sec. Bioenergy and Biofuels

Volume 6 - 2018 | https://doi.org/10.3389/fenrg.2018.00120

This article is part of the Research Topic Advancements in Biomass Feedstock Preprocessing: Conversion Ready Feedstocks View all 24 articles

High Throughput Screening Technologies in Biomass Characterization

$\r\nStephen R. Decker*$ Stephen R. Decker¹^*

Gerald A. Tuskan²

Gbekeloluwa B. Oguntimein³

Miguel Rodriguez²

Deborah Weighill^2,4

Piet Jones^2,4

Daniel Jacobson^2,4

¹National Renewable Energy Laboratory, Biosciences Center, Golden, CO, United States
²Oak Ridge National Laboratory, Biosciences Division, Oak Ridge, TN, United States
³Department of Civil Engineering, Morgan State University, Baltimore, MD, United States
⁴The Bredesen Center for Interdisciplinary Research and Graduate Education, University of Tennessee, Knoxville, TN, United States

Biomass analysis is a slow and tedious process and not solely due to the long generation time for most plant species. Screening large numbers of plant variants for various geno-, pheno-, and chemo-types, whether naturally occurring or engineered in the lab, has multiple challenges. Plant cell walls are complex, heterogeneous networks that are difficult to deconstruct and analyze. Macroheterogeneity from tissue types, age, and environmental factors makes representative sampling a challenge and natural variability generates a significant range in data. Using high throughput (HTP) methodologies allows for large sample sets and replicates to be examined, narrowing in on more precise data for various analyses. This review provides a comprehensive survey of high throughput screening as applied to biomass characterization, from compositional analysis of cell walls by NIR, NMR, mass spectrometry, and wet chemistry to functional screening of changes in recalcitrance via HTP thermochemical pretreatment coupled to enzyme hydrolysis and microscale fermentation. The advancements and development of most high-throughput methods have been achieved through utilization of state-of-the art equipment and robotics, rapid detection methods, as well as reduction in sample size and preparation procedures. The computational analysis of the large amount of data generated using high throughput analytical techniques has recently become more sophisticated, faster and economically viable, enabling a more comprehensive understanding of biomass genomics, structure, composition, and properties. Therefore, methodology for analyzing large datasets generated by the various analytical techniques is also covered.

Introduction

Crop breeding for improved traits has been in constant development since mankind first intentionally put seeds in the ground. Selection of better crops has been, until recently, a long-term strategy, with simple selection criteria of higher yield or increased tolerance to various stresses. With the advent of molecular genetics, plant breeding has accelerated significantly, with specific genes and entire metabolic pathways being added or deleted in a few days, however the results are not always selectable with simple phenotypic criteria. While plants continue to require weeks, months, or years to fully mature and exhibit stable traits, phenotypes such as reduced recalcitrance, altered polysaccharide and lignin content, and other changes in cell wall composition are not readily discernable. Technologies to rapidly evaluate engineered plants for non-phenotypically evident resultant traits has not kept pace with the generation of these mutations, nor has the analytical methodology needed to catalog, interpret, compare, and model the huge genetic information being uncovered gene by gene.

As plant engineering methodologies have evolved to generate more variants in shorter time, screening these expanded sample sets has ranged from cumbersome to nearly impossible, depending on the analyses and turnaround required. Simple changes to phenotype such as biomass yield can be readily evaluated by increasing the area planted to accommodate more variants and replicates and measuring height, width, mass, etc. by hand. Given enough researchers (most probably students), several thousand plants can be evaluated in this manner, however, understanding the more intricate specifics of compositional analysis, conversion potential, and chemical changes to components such as lignin require much more detailed analyses.

Over the past decade or so, multiple research groups have developed methods to address some of these complex measurements. In this review, we cover advances in measuring changes in plant cell wall chemistry and recalcitrance, developments in rapid sugar and fatty acid analysis, advances in spectroscopy applications and instrumentation to cell wall composition determination, and use of data analysis and systems biology modeling to develop understandings from the acquired data.

Table 1 is a summary of HTP methods used in HTP analysis of biomass. Details for each method can be found in sections High Throughput Plant Cell Wall Compositional Analysis, HTP Recalcitrance Screening, NMR for Biomass Sugar and Fatty Acid Composition and in the associated references. Data analysis and systems biology modeling is handled separately in section Data Analysis and Systems Biology Modeling for High Throughput Biomass Analysis.

TABLE 1

Table 1. Summary of HTP methods in biomass conversion.

High Throughput Plant Cell Wall Compositional Analysis

Compositional analysis of biomass is a complex, but necessary and important analytical technique. The ratio, composition, and content of the three major cell wall components (cellulose, hemicellulose, lignin), have a direct impact on the technoeconomics of biomass conversion to products and knowing the potential yields is the best way to standardize the cost analysis across different feedstocks. Several methods have been developed to deconvolute the complexity of the plant cell wall down to the subunit level. Such methods are laborious, slow, and employ a variety of harsh reagents requiring some degree of remediation (Elliston et al., 2015). Destructive methods, such as 2-stage acid hydrolysis and pyrolysis-Molecular Beam Mass Spectrometry (py-MBMS), rely on breaking apart the various polymers and measuring the content of the resultant subunits, which can then be mathematically reassembled to estimate the various polymer content. The use of simpler and more rapid spectroscopic methods have proved invaluable in analyzing biomass (Elliston et al., 2015). Non-destructive spectroscopic methods such as Near InfraRed (NIR) spectroscopy rely on specific bond detection and quantitation, which is then fed to a multivariate model in order to predict the content of the various polymers, using samples previously characterized by destructive methods to build the model. While this latter method is fast and non-destructive, it is not a primary measurement and requires significant upfront efforts to build good models. Any samples that fall outside the predictive range of the model cannot be characterized with any precision. While the destructive methods provide direct data, they can be very slow (2-stage acid hydrolysis), require detailed analytical methods to measure products, and can use a lot of sample material. The NIR and py-MBMS methods have been reviewed by Xiao et al. for readers interested in more details (Xiao et al., 2014).

Micro-Scale 2-Stage Acid Hydrolysis

Decades ago, the National Renewable Energy Laboratory established several standard Laboratory Analytical Procedures (LAPs) designed to standardize analytical methods in an emerging and very chemically complicated landscape of biomass conversion. Basing many of these LAPs on tried-and-true methods from pulp and paper and forage analysis, these protocols became a standard yardstick by which disparate labs could meaningfully measure complex parameters of biomass conversion. The procedures established were highly regimented and geared toward the bench-scale, i.e., using 1–10 g of material and requiring several days for completion. A reasonably well-trained technician could be expected to handle 40–50 samples per week for compositional analysis and enzyme hydrolysis and/or Simultaneous Saccharification and Fermentation (SSF). A large fraction of that time taken up by the manual manipulations of weighing the samples and taking representative samples, as well as the relatively long analysis times (30–60 min/sample) via HPLC.

The bench-scale compositional analysis LAP requires 0.3–3 g of biomass to be hydrolyzed first by 72% (w/w) H₂SO₄ at 30°C and then by 4% (w/w) H₂SO₄ in an autoclave, sequentially hydrolyzing cellulose and hemicellulose to oligomers and then to monomers, but requiring the use of pressure-rated glass vessels (Sluiter et al., 2008). The severe conditions result in degradation of sugars to hydroxymethyl furfural, furfural, levulinic acid, and other products as they are released from the polymer matrix, necessitating the use of sugar recovery standards to account for those losses. The sugars are typically measured via HPLC with Pb-, Ca-, or H-based ion exchange resins used to resolve the various sugars. The best separation and quantitation is typically found using the Pb-based chemistry, however these columns require neutralization of samples (CaCO₃), long run times (~45–60 min), and de-ashing guard columns to protect the analytical column Pb-groups from complexing with the neutralization CO $_{3}^{- 2}$ ions. For a sample set of 50, running them in triplicate and with various sugar, sugar recovery, and instrument validation standards, could easily tie up an HPLC for a week or longer. The soluble lignin was estimated by absorbance and the insoluble lignin, whether natural or precipitated by the severe conditions, was measured gravimetrically, adjusting for inorganic content as determined gravimetrically after ashing the sample.

In 2011, Selig et al. published the first HTP method for measuring glucan and xylan content in plant cell walls (Selig et al., 2011). The protocol was primarily a scaled-down version of the NREL LAP, with enzyme-linked spectrophotometric measurements replacing HPLC as the primary sugar analysis. This limited sugar detection to glucose and xylose only, but allowed for parallel processing of multiple samples and automated quantitation by microtiter plate (MTP) spectrophotometers while only using 50 mg of sample for each analysis. While liquid handling and absorbance reading steps were all automated using various robots, transfer between the robots was carried out by hand. Solids dispensing was automated into special dispensing plates that were used to manually add samples to the custom Hastelloy deep 96-well reactor plates. As manual mixing during the acid hydrolysis was not practical, sonication was used to maintain sample dispersity and special clamping mechanisms were used to seal the reactor during the digestion. Other adaptations included the inclusion of strong buffering of diluted aliquots instead of whole sample neutralization to minimize differences in pH which affected the enzyme-linked assays, and the use of centrifugation to minimize solids interference with pipetting by the robots. While the scaled-down version of the compositional analysis method was fast and reproducible, it did not track exactly 1:1 with standard bench-scale analysis (due mainly to the change in sugar detection and the simple error propagation of using small masses and volumes), and should therefore only be used in a comparative manner within a given sample set and not as a precise analytical tool. It also suffered from the requirement of highly specialized, expensive, and very heavy custom reactor plates. Subsequent refinements to the method substituted 96-well format glass tubes for the Hastelloy reactor plate, eliminating the expensive custom reactor, the need for problematic sealing films, and allowed for the use of an autoclave instead of a Parr reactor for the 2nd stage of hydrolysis. Extraction and enzymatic de-starching of the starting material, especially for herbaceous feedstocks, was also worked into the method after the initial publication (Decker et al., 2012).

DeMartini et al. (2010) developed a similar method around the same time, utilizing HPLC vials instead of Hastelloy reactors or 96-well format glass tubes (DeMartini et al., 2010). The use of readily available HPLC vials allowed for automated solid and liquid dispensing, as well as heating on the Symyx Core Module robot, however subsequent manipulations, such as centrifugation, transfer of supernatant to polypropylene tubes, neutralization with CaCO₃, and analysis by HPLC required individual manual operations. The other advantage of the HPLC vials, however, was the ability to estimate insoluble lignin content gravimetrically after washing and drying the residual solids and subtracting previously determined ash content, even if the washing steps were carried out by hand.

Foster et al. (2010) developed a small-scale protocol for plant cell wall compositional analysis as well, however the protocol utilized numerous solvent addition and removal steps as well as sample extraction and de-starching. The starting material was mechanically sized-reduced biomass from essentially an automated small-scale ball mill. These complex steps precluded ready automation, but the protocol provided much of the same information as larger-scale standard analyses while using considerably less material (Foster et al., 2010).

Though several labs have developed HTP/micro-scale compositional analysis methods, it remains a major tenet of this work that these methods are not as precise and accurate as larger-scale, lower throughput methodologies. Much of this is tied up in the heterogeneity of the samples, especially when a single particle of bark or rind can comprise 10% or more of a single sample or when automated dispensing results in size-fractionation of the bulk material during repeated sample aliquoting. Primarily, these methods are useful in ranking large samples sets in terms of cellulose content or percent theoretical conversion in subsequent enzymatic or microbiological conversion. They also result in significant savings in reagent costs, sample prep, and technician time, however the costs of the robots, especially those designed for accurate and reproducible solids dispensing in the 1–10 mg range, is prohibitive for many research groups.

Pyrolysis-MBMS For Analysis of Cell Wall Composition and Other Components Present in Lignocellulosic Biomass (LCB)

The analysis of biomass by analytical pyrolysis techniques has been practiced for decades as it can provide a significant amount of information regarding the structure and composition of biomass as well as inform pyrolysis processes, conditions and upgrading strategies that are used to generate bio-oil. Fast pyrolysis, the rapid thermal decomposition of material in the absence of oxygen, produces analytes that originate from different components in the feedstock that can be analyzed using detectors such as mass spectrometers. Pyrolysis-molecular beam mass spectrometry (py-MBMS) is an analytical technique that uses a pyrolyzer coupled to a molecular beam mass spectrometer to analyze all ions generated without chromatographic separation. Fast pyrolysis of samples is typically performed at temperatures between 300 and 700°C in the timescale of < 1 min/sample. Electron ionization is used to generate ions with voltages ranging from < 20 eV up to 70 eV and ions are detected after passing through a quadrupole which typically scans for the analysis of m/z 30–450. Py-MBMS has been used as a high throughput technique to analyze LCB for estimation of lignin content, syringyl/guaiacyl (S/G) ratios, sugar composition as well as diterpenoid resin acid content in biomass extract. The benefits of using py-MBMS for biomass analysis include rapid throughput (250 samples/day), minimal sample preparation and low sample amount requirements (10 mg). The high-throughput nature of py-MBMS has enabled studies across large populations and sample set sizes allowing for the incorporation of appropriate statistical and biological variations in data. Py-MBMS data can be analyzed using various statistical tools such as principle component analysis (PCA), partial least squares regression (PLS), clustering methods and other predictive analytics that provide both quality control measures and a means to understand the underlying spectral patterns as well as their sources and associations with other biomass properties and genomics.

Lignin content in biomass samples can be estimated relative to a standard of known Klason lignin content run in the same set of samples by py-MBMS through summation of ion intensities that originate from phenolic species generated during pyrolysis which may include m/z 120, 124, 137, 138, 150, 152, 154, 164, 167, 168, 178, 180, 181, 182, 194, 208, and 210 (Sykes et al., 2008). Relative syringyl (S) monomers in lignin are typically calculated by summation of ion intensities of m/z 154, 167, 168, 182, 194, 208, 210; whereas relative guaiacyl (G) monomeric values can be determined by summation of ion intensities of m/z 124, 137, 138, 150, 164, 178. S/G ratios are then determined by dividing the sum of S-based ions by the sum of G-based ions. Table 2 summarizes the sources of various ions seen in py-MBMS spectra of biomass and example spectra of maize stems are shown in Figure 1 (Penning et al., 2014a). Otherwise, PLS models can be used to determine lignin content in a set of samples relative to wet chemistry methods using a variety of standards. The analysis of lignin content and S/G ratios in lignocellulosic biomass by py-MBMS has been reported extensively in the literature and has been used in studies focused on the analysis of biomass recalcitrance, (Studer et al., 2011; Biswal et al., 2015, 2018; Decker et al., 2015; Sykes et al., 2015b) for genetic studies including as QTL and GWAS, (Wegrzyn et al., 2010; Penning et al., 2014a; Muchero et al., 2015) to determine within-plant variability and the effects of environmental conditions on cell wall structure (Sykes et al., 2008; Mann et al., 2009), and for the analysis of biomass either engineered or selected for different lignin content and monomer compositions (Penning et al., 2014b; Sykes et al., 2015b; Edmunds et al., 2017).

TABLE 2

Table 2. Summary of ion assignments in py-MBMS spectra of lignocellulosic biomass or biomass extracts.

FIGURE 1

Figure 1. Py-MBMS spectra of maize stems can be used to compare relative abundances of lignin and carbohydrate components. (A) upper and lower spectra of stems from different lines can be subtracted to show differences in certain ions as shown in subtraction spectrum, (B) Reprinted by permission from Penning et al. (2014b), Bioenergy Research: Springer Nature, copyright 2014.

Structural carbohydrate or sugar composition has also been estimated in a variety of LCB types by py-MBMS. Ions that can originate from C5 sugars (xylose, etc.) include m/z 57, 73, 85, 96, 114 and ions attributed to C6 sugars (glucose, etc.) include m/z 57, 60, 73, 98, 126, 144. The pyrolysates from which the carbohydrate-derived ions originate include anhydrosugars, furans, low molecular weight aldehydes and ketones and other compounds as shown in Table 2. In a study by Sykes et al. (2015a) structural carbohydrates were determined using a standard method involving low-throughput two-stage acid hydrolysis of biomass followed by HPLC analysis of the hydrolysates. The HPLC method was used to build a PLS model with py-MBMS spectra for the prediction of sugar content in various types of biomass. The best PLS models incorporated different biomass types (hardwood, softwood, etc.) which extended the range in composition of each of the major sugars predicted. While models used to predict sugar composition within a biomass type were not ideal, py-MBMS analysis for sugar composition of biomass has a significant increase in throughput over traditional methods using two-stage acid hydrolysis followed by HPLC analysis (Sykes et al., 2015a).

Diterpenoid resin acids from pine and pine extracts have also been analyzed using py-MBMS. Typically, diterpenoid resin acids are extracted from coniferous biomass and derivatized prior to analysis by GC. The derivatizing step of resin acids is cumbersome, time consuming and consumes an additional step involving a derivatizing reagent. Harman-Ware et al. (2017) developed a high-throughput method for the analysis of diterpenoid resin acids from the organic extract obtained from pine sapling cross sections using py-MBMS. While the py-MBMS analysis of pine biomass correlated spectral patterns indicative of variable levels of diterpenoid resin acids with GC data, it was not possible to use or spike standards for the quantification of these compounds in the biomass samples. Additionally, the low content (< 5 wt%) of the resin acids made reliable quantification by analysis of the whole biomass difficult and the variability within trees complicated sampling methodology. Instead, total diterpenoid content was determined by py-MBMS analysis of the organic extract left after evaporation by means of an external calibration standard consisting of a mixture of components closely resembling the composition as determined by GC (Harman-Ware et al., 2017).

The analysis of biomass by py-MBMS is limited to non-volatile samples and analytes (for quantification) and small sample size. Therefore, harvesting, sample preparation, and variability of biomass are all experimental considerations that must be made in advance and understood when interpreting spectral data. Other considerations that must be made when interpreting spectra include the presence of inorganic salts and other compounds (proteins, lipids, etc.), the pyrolysis temperature, the fragmentation energy of the ionization source and other parameters that can influence the pyrolysis and subsequent spectra of a given sample. Also, it is not recommended to compare samples analyzed when instrument maintenance and tuning adjustments have been made as the spectra could change slightly. Typically, standards are run within a set and samples that need to be compared are run within a set without maintenance down time of the instrument between sample analyses. As no chromatographic separation has occurred, the sources of the ions present in the spectra must also be interpreted with caution as many analytes may produce the same ions.

Py-MBMS data has been used to analyze large data sets with a focus on a small number of ions present in the spectra. The mining of MBMS spectral data for clustering, mapping and making spectral associations with genotypes and phenotypes will extend the usefulness and capabilities of py-MBMS analyses as more genetic and phenotypic information about biomass is elucidated. Improvements in mass resolution and sensitivity would also extend capabilities for py-MBMS to analyze components that are difficult to identify and quantify and/or make up a smaller fraction of biomass such as metabolites, lipids, etc. While py-MBMS has proven to be instrumental in the elucidation of biomass characteristics relating particularly to lignin, there may still be untapped information present in the spectra that could potentially inform other useful properties of biomass.

NIR Spectroscopy for Structural Components

As an alternative to the wet chemistry method, Near Infrared (NIR) spectroscopy has been used for decades for the rapid analysis of biomass, starting with the prediction of forage quality (Norris et al., 1976; Shenk et al., 1979; Abrams et al., 1987). Its use for the prediction of biomass composition in a biorefinery context originated later (Sanderson et al., 1996; Hames et al., 2003). An overview of the technique and a comprehensive review of its use in biomass conversion processes was recently published (Skvaril et al., 2017).

Rapid analysis using NIR is considered a secondary analytical method, because NIR spectra are correlated with primary biomass compositional analysis data using multivariate calibration algorithms to produce a calibration equation. This equation can then be used to predict the composition of samples in lieu of primary biomass compositional analysis. Robust calibration methods, including methods for estimating calibration and predicting have been developed (Martens and Næs, 1992; Martens and Martens, 2001; Olivieri Alejandro et al., 2006; Zhang and Garcia-Munoz, 2009).

Because the method relies only on the collection of a NIR spectrum for a given sample, the technique can provide compositional analysis data much more quickly than traditional chemical analysis methods. Many calibration models for biomass feedstocks and process intermediates have been published, including contributions from these authors (Wolfrum and Sluiter, 2009; Godin et al., 2011; Liu et al., 2013; Sluiter and Wolfrum, 2013; Payne and Wolfrum, 2015). These models have been used for the prediction of large numbers of samples, saving substantial amounts of time (and cost) compared to conventional laboratory analysis (Pordesimo et al., 2005; Templeton et al., 2009).

Care must be taken when using NIR spectroscopy for rapid biomass analysis. Because it is a secondary method, a prediction of biomass composition will be accurate only if the underlying sample set used to develop the calibration equation contains sufficient compositional variability compared to that expected in the sample population. Unknown samples that are not part of the calibration population are inevitably poorly predicted. If careful attention is not paid to this issue, unreliable and even misleading results will be obtained. Poorly predicted samples (flagged either as outliers or with large prediction uncertainties) can be excellent candidates for improving a calibration model. If these samples undergo compositional analysis using traditional analytical methods, they can be added to the existing calibration model to extend its predictive range; by definition, these new samples are within the calibration population.

As discussed above, high-throughput analysis using NIR spectroscopy involves (1) collecting a NIR spectra of a biomass sample, and then (2) predicting one or more chemical properties of the biomass sample by comparing the collected spectra to a collection or library of samples for which spectral and chemical properties are known. There are opportunities to make improvements in both areas.

Spectrometer manufacturers are constantly improving the performance of NIR spectrometers, such as increasing the signal-to-noise ratio, the spectral resolution, or the spectral range of instruments, and it is likely that these improvements in performance will continue in the future. While these improvements require additional research and development on conventional spectrometers and are important advances, they represent incremental improvements to existing technology. There are two different and complementary approaches that offer opportunities for substantial improvement in high-throughput analysis using NIR spectroscopy: NIR hyperspectral imaging and low-cost, ultra-portable NIR spectrometers.

NIR hyperspectral image cameras are conceptually similar to conventional cameras except the pixels of the two-dimensional images produced consist of NIR spectra (Boldrini et al., 2012). Recent contributions have presented comprehensive reviews and discussions of applications of hyperspectral imaging for biomass analysis (Fahlgren et al., 2015; Eylenbosch et al., 2017). This is an active area of research, and the Department of Energy ARPA-E Transportation Energy from Renewable Agriculture (TERRA, https://arpa-e.energy.gov/?q=arpa-e-programs/terra) is currently supporting work in high-throughput biomass phenotyping in both laboratory and field environments using a variety of spectroscopy approaches, including hyperspectral imaging. Work sponsored by the ARPA-E TERRA will dramatically accelerate the application of hyperspectral imaging for biomass analysis in the coming years.

Several ultra-portable NIR spectrometers have been developed in recent years that have the potential to provide performance comparable to conventional NIR spectrometers in a much smaller form factor and at a much lower cost than traditional NIR spectrometers. Very-low cost, ultra-portable, and ubiquitous NIR spectrometers could represent a compelling alternative to traditional NIR rapid analysis approaches for biomass analysis. The composition of biomass materials could be tracked essentially continuously across the value chain from harvest and collection, transport, storage, through conversion to fuels and chemicals. Table 3 highlights four different instruments (listed alphabetically) that represent unique approaches to ultra-portable NIR spectroscopy.

TABLE 3

Table 3. Ultra-portable NIR spectrometers.

While this is not meant to be an exhaustive list of all ultra-portable NIR spectrometers, the list does demonstrate the breadth of technical innovation in this area. For example, each of these instruments employs a different active optical element or light processing modality. The microNIR instrument uses a fixed Linear Variable Filter as the dispersive element in the optical path. The NeoSpectra is a Fourier Transform spectrometer, using a miniature Michelson Interferometer based on micro-electro-mechanical system (MEMS) fabrication technology. The NIRONE uses a MEMS-based Fabry-Perot Interferometer (FPI) as a tunable optical filter. The NIRVASCAN instrument uses a fixed grating in combination with a digital micromirror device (DMD) consisting of several hundred thousand miniature mirrors acting as a wavelength filter. Each of these instruments has a different spectral range, signal-to-noise ratio, sample presentation geometry, and data collection and processing environment. It is beyond the scope of this work to provide a comprehensive comparison of the performance of each of these instruments; the suitability of a given instrument depends in large part on the potential application.

Some challenges remain for further development of these novel spectrometers, including the demonstration of adequate data collection and processing environments, long-term performance stability in real-world applications, the ability to develop useful and robust calibration models for use on these new platforms, and to perform accurate calibration transfer among spectrometers (Workman and Mark, 2013) so that calibration equations developed on a primary instrument can be used on multiple secondary instruments. Nonetheless, it is clear that there has been substantial developing in ultra-portable NIR spectroscopy in recent years, and this development will likely continue.

As mentioned above, the way NIR spectroscopy is used for rapid analysis (calibration model development and subsequent sample prediction) has evolved substantially over the last several decades, for example, with more robust PLS modeling algorithms and improved outlier detection and measures of prediction uncertainty. However, as NIR spectroscopy for rapid biomass analysis becomes more widely used (in part due to developments in hyperspectral imaging instruments and lower-cost spectrometer technology), it will be possible to take fundamentally new approaches to deriving useful information from larger collections of spectral data. While the size of these data sets may never approach those of laboratory analytical techniques such as hyphenated chromatography (e.g., GC-GC-MS, LC-MS), they will likely be large enough and have enough variability to permit machine learning or neural net modeling approaches for classification applications, and real-time updating of classification and prediction modeling using cloud computing resources. Both the development of novel approaches for processing the data and the curation and management of the data itself will represent key technical challenges (and opportunities) in the future. In summary, rapid analysis using near Infrared (NIR) spectroscopy has proven to be a robust, reliable technique for high-throughput biomass characterization when used properly and with care. In the future new opportunities for the technique will develop because of improvements in the two complementary technologies that have made the technique useful in the past: NIR spectroscopy instrumentation and spectral data processing techniques, particularly machine learning approaches.

HTP Recalcitrance Screening

Measuring biomass recalcitrance across large numbers of natural and transgenic plant variants has long held the promise of identifying promising lignocellulosic biofuel feedstocks. The best screen should include both compoitonal analysis of the starting feedstock as well as sugar release after processing, in order to measure conversion efficiency as well as titer and yield (Sykes et al., 2015a). For conversion, either pretreatment, enzyme hydrolysis, or a combined approach may be used. More recently, microbial assays of a Consolidated Bioconversion Process (CBP) nature, in which cellulolytic microbes are used to hydrolyze lignocellulosic feedstocks and measured by product formation, have been employed. Both approaches can be used to screen a range of substrates for differences in recalcitrance by holding the catalyst(s) constant or to evaluate different catalysts (pretreatment conditions, enzyme systems, microbes) on a single defined substrate. In both systems, product detection is critical to evaluating the differences induced by the variables introduced. Many of the methods used to set up and analyze the experiments are the same or similar.

HTP screening methods are being increasingly applied to process development in biotechnology (Long et al., 2014; Back et al., 2016; Yang et al., 2017; Zutz et al., 2017). As new methodology is developed, HTP screening is increasingly employed to collect biological data that historically required extensive time and effort (Scheel and Lutke-Eversloh, 2013; Suzuki et al., 2015) and is widely used today in the development of fermentation process assays (Decker et al., 2003, 2009; Selig et al., 2010, 2011; Studer et al., 2010; Suzuki et al., 2015; Yang et al., 2017). There have been rapid developments of HTP techniques in recent years in micro-scale culturing, online analysis and monitoring, and real-time control, which have enabled increased systems automation (Yang et al., 2017). A key technology in applying HTP to microbiological screening has been the miniaturization of bioreactors, making large experimental cultivation economical and practical (Back et al., 2016; Velez-Suberbie et al., 2017). Both micro- and mini-bioreactors are critical to biotechnology process development. Recent HTP developments applied to biological research have been applied to developing more effective large-scale operations, greatly decreasing the time and expense compared to development at scale (Lattermann and Buchs, 2015) and it is likely that there is room for continued improvements (Long et al., 2014). As an example, microtiter plates (MTP) are simple, easy to shake, and inexpensive (Bharadwaj et al., 2011; Yang et al., 2017) and have been demonstrated to effectively replace shake flasks (Oguntimein et al., 2018). MTPs have been used to screen specific activities of enzyme variants using various techniques, such as protein quantitation by immunoturbidimetric (ITA) assays, (Yang et al., 2017), direct fluorescence resonance energy transfer for protease activity (Suzuki et al., 2015), cell free protein production (Casteleijn et al., 2013), fungal biosensor assay to detect estrogen activity (Zutz et al., 2017), protein purification and characterization for crystallographic studies (Kim et al., 2011), and enzyme-screening of ionic liquid pretreated lignocellulose (Bharadwaj et al., 2011). Despite these recent examples of MTP-based screening, few details are known regarding actual culture conditions inside the MTP, the technology to measure these details in real time in such numbers and small volumes remains lacking (Long et al., 2014).

Lignocellulosic biomass (LCB) has been the focus of research as a renewable source for second generation bioethanol production but selection and development of these substrates with high bioethanol yield requires the availability of reliable methods for compositional and structural characterization (Elliston et al., 2015). As discussed in Section High Throughput Plant Cell Wall Compositional Analysis, quick HTP analysis of the potential of LCB feedstocks is an important step in the development of second generation bioethanol. HTP screening allows the rapid investigation of a large set of samples at minimum cost. An assay used to determine bioethanol production from large numbers of LCBs must be robust, rapid, easy to perform, and must use modest amounts of the samples (Elliston et al., 2015). This section of the review focuses on the use of high throughput (HTP) pretreatment and enzyme hydrolysis as well as consolidated bioprocessing for the conversion of LCB into bioethanol.

The production of second generation biofuel involves a number of consecutive process operations, each with a combination of multitude steps. These operations can be delineated into pretreatment, hydrolysis, fermentation, and distillation and/or separation. The overall process design could be one of several general approaches, including separate hydrolysis and fermentation (SHF) or simultaneous saccharification and fermentation (SSF) for any given LCB substrate, but the most rapid, effective, and cost effective method to produce bioethanol for any approach requires the optimization of various process parameters. Bearing this in mind, Decker et al. (2009) and Gomez et al. (2010) developed methods for the rapid screening of biomass for the hydrolysis stage (Decker et al., 2009; Gomez et al., 2010), however further research is required to investigate downstream process impacts due to yeast or other micro-organisms. The potential effect of fermentation inhibitors released during biomass processing on the final alcohol yields is also very critical. This may be process or substrate-specific (Pienkos and Zhang, 2009). In view of the importance of both separate hydrolysis and fermentation (SHF) and simultaneous saccharification and fermentation (SSF) methodologies, SSF methodology on a solid substrate has not been widely investigated at the much smaller HTP scale. The SSF approach introduces a complication as a result of CO₂ production during yeast fermentation which must be vented to reduce pressure in the reaction vessel while at the same time controlling evaporation, all whilst potentially being stirred vigorously. The current paradigm, however, is focused more toward SHF (Waldron, 2014) and consolidated bioprocessing (CBP) (Oguntimein et al., 2018). This method simplifies the engineering requirement such as decreased capital and operational expenditures and reduces the potential for microbial contamination prior to the addition of yeast.

Substrate Preparation

The preparation of solid substrates in the case of second generation biofuels presents its own unique set of problems to HTP screening, primarily the requirement to accurately, repeatedly and rapidly dispense solid, heterogenous sample material by weight. Manual weighing is too labor intensive and time consuming, making it impractical. Recent solutions have included automatic weighing robots (Santoro et al., 2010), production of handbills (dry biomass sheets that can be subdivided to repeatable mass) (Berlin et al., 2006) and biomass slurry pipetting (Chundawat et al., 2008). Automatic weighing and dispensing robots for dry biomass are expensive and require dedicated operation. Grinding of sample is required, with suitable size ranges pre-determined for each biomass type. Typically, the grinding is accomplished off-deck before loading the samples onto the robot, however on deck grinding through ball-milling has been used in several labs (Foster et al., 2010; Santoro et al., 2010). Attention should be paid to avoid over-milling, which reduces structural recalcitrance factors in the material. Electrostatic forces generated during dispensing (by a rotating anti-bridging wire on plastic dispenser walls or steel balls grinding biomass in a plastic tube for example) must be addressed or else the sample will be errantly dispensed to the adjacent wells, inter-well spaces, or the bottom of the balance or robot. Sample heterogeneity is also a major issue, especially in small sample masses (Santoro et al., 2010).

The use of handbills requires additional equipment and expertise, so the use of filter paper as a universal substrate for the measurement of cellulase activity has been adapted by many groups (Ghose, 1987). Elliston et al. (2015) have investigated slurry pipetting techniques using a Tecan Freedom Evo™ liquid handling robot equipped with a multi-channel arm (Tecan Group Ltd, Mannedorf, Switzerland). Liquid transfer was applied in the study for HTP SSF analyses using a 96-well plate format. The major obstacles include evaporation of samples, with the small scales utilized in HTP (typically ≤ 1 mL) driving the requirement for effective sealing, especially when incubated for days at elevated temperatures. In order to measure evaporative loss, a 1.0 mL matrix storage tube plate dried to a constant weight at 50°C was used with each well filled with 1.0 mL yeast nitrogen base (YNB) medium. The tubes were sealed with screw caps and incubated at 50°C (enzyme optimum temperature) over 72 h. Evaporation rates were low and linear over 24, 48, and 72 h time points, with evaporative losses of 0.28, 0.60, and 0.91%, respectively (Elliston et al., 2015).

For pretreated substrates, several methods have been employed. In one of the earliest studies on HTP pretreatment and enzyme hydrolysis, Chundawat et al. (2008) utilized ammonia fiber expansion in a batch pretreatment of corn stover, dispensing the pretreated material into the wells of a 96-well microtiter plate and evaluating the effects of solids loading and particle size on digestibility (Chundawat et al., 2008). While this approach gives the advantage of potentially screening numerous enzyme combinations, the process is limited to one of a few types of biomass. A more versatile approach, pioneered by Studer et al. (2010), Decker et al. (2009), and Selig et al. (2010) was to carry out pretreatment in a HTP reactor plate containing parallel individual reaction chambers. Multiple biomass types can be screened in a massively parallel fashion, though conversion conditions are limited to a single pretreatment and digestion temperature/time/pressure combination for each plate. For reactor plates in a standard 96-well format, the added advantage of automated or semi-automated simultaneous liquid transfers for all wells greatly increases throughput while retaining the option to pipet individual enzyme or acid catalysts by well. Use of acid-resistant metallurgy and appropriate sealing systems allows for high temperature and acid-catalyzed thermochemical pretreatments (Decker et al., 2009; Selig et al., 2010; Studer et al., 2010).

Effect of Solid Substrate Mass on Simultaneous Saccharification and Fermentation (SSF)

In one of the earliest biomass-related HTP methods, Decker et al. (2003) used Whatman number one filter paper and powdered celluloses dispensed as a slurry to automated the filter paper assay used to quantify cellulase activity as a precursor to SSF (Decker et al., 2003). In 2012 and 2014, Yee et al. developed a reduced-bench-scale system for both SSF and Consolidated BioProcessing (CBP, see below) of biomass using bottles, measuring products by HPLC and substrate utilization by mass loss (Yee et al., 2012, 2014). Reducing SSF to HTP-friendly microtiter plate formatting, Elliston et al. (2015), using Whatman number one filter paper (FP) and office copier paper (OCP) (6 mm diameter) as substrates, found that the masses were highly repeatable; 2.36 mg ± 3.1% (w/w) and 2.19 mg ± 2.5% (w/w), respectively, with six sample points. The consistent thickness and density of these substrates, which enabled the high mass repeatability, allowed for rapid allocation of substrate to small matrix tubes. In a comparative experiment between shake flask (200 mL in 500 mL Erlenmeyer flasks), tubes (10 mL in 30 mL screw-capped culture bottles) and MTP format (1 mL in 1 mL screw-cap matrix storage) using yeast and mold media (YM) plus glucose [0.9% (w/v)], OCP [2.5% (w/v)], or FP [2.5% (w/v)], ethanol yield was similar at each scale, validating the use of the HTP small scale method for screening yeasts by SSF (Elliston et al., 2015).

Effect of Microorganisms on Bioethanol Production From LCB

Oguntimein et al. (2018) demonstrated a HTP 96 well microplate assay to evaluate MTP- consolidated bioprocessing as a method to measure biomass conversion potential. Twenty milligrams of pre-milled switchgrass or avicel was dispensed into deep well MTPs (2.2 mL/well volume) 96 well microplates shown in Figure 2 using a Powdernium^{^®;} powder dispensing system (Symyx, Geneva, Switzerland). After additional of liquid medium, C. thermocellum Δhpt was inoculated into rows A-C while C. thermocellum LL1210 was inoculated into rows E-G). Sterile water as added to rows D and H. Plates were placed at a 45-degree angle on an orbital shaker (Cole Palmer Model 51300) set at 125 rpm in a 60°C incubator in a Coy anaerobic chamber (5% H₂, 10% CO₂, and 85% N₂, Coy Laboratories Products Inc., Grass Lake, MI).

FIGURE 2

Figure 2. Biomass loaded into deepwell microtiter plate for high throughput evaluation of Clostridia for consolidated biomass processing.

Both C. thermocellum Δhpt and LL1210 strains metabolized Avicel, generating cellobiose, glucose, lactic acid, formic acid, acetic acid and ethanol in titers and ratios similar to that obtained in bench scale fermentations, demonstrating the applicability of a HTP method using CBP using C. thermocellum. Strain LL1210 generated higher ethanol titers than those of strain Δhpt, which is consistent based on earlier reports from larger-scale experiments (Dumitrache et al., 2016). The absolute titers for ethanol were lower than those produced in pH-controlled bioreactors and switchgrass generated lower ethanol concentrations than avicel (Oguntimein et al., 2018).

Effect of Lignocellulosic Biomass

Lindedam et al. (2014) compared three HTP pretreatment and enzymatic hydrolysis systems (HTPH-systems) for screening lignocellulosic biomass by enzymatic saccharification to confirm that quantitative differences in substrate can be detected at a small scale. Twenty winter wheat cultivars grown at two sites in Denmark were hydrothermally pretreated and enzymatically digested in three separately engineered HTPH-systems at (1) University of California, Riverside, (2) National Renewable Energy Laboratory (NREL), Colorado, and (3) University of Copenhagen (CPH). All three systems delineated differential sugar release among the cultivars, though average extent of cellulose conversion varied at 57, 64, and 71% for Riverside, NREL and CPH, respectively. Riverside and NREL systems had the highest pair-wise correlation with glucose, while xylose yields correlated best between Riverside and CPH. All three systems agreed on Flair as the cultivar with the highest yield and Dinosor, Glasgow, and Robigus with the lowest. Despite the varied conditions between the three HTPH-systems which resulted in different absolute values, the correlation and rank ordering agreement between them clearly indicates that microscale combined thermochemical and enzymatic conversion can be used to identify recalcitrant phenotypes between varied feedstocks (Lindedam et al., 2014).

Elliston et al. used ethanol production under SSF conditions to assay the conversion of milled wheat straw pretreated under two different conditions different conditions (195°C for 10 min and 210°C for 10 min). Analysis of twelve replicates demonstrated the expected increase in ethanol yield for wheat straw pretreated at 210°C for 10 min (80% of theoretical yield) compared to pretreatment at 195°C for 10 min (64% of theoretical yield) (Elliston et al., 2015).

Zhang et al. also used HTP pretreatment and co-hydrolysis (HTPH) to rapidly identify promising Miscanthus genotypes, including hybrids of Miscanthus sacchariflorus/M. sinensis as well as M. lutarioriparius, highlighting the commercially promising hybrids. The results also indicated that, at least in Miscanthus, glucan plus xylan content influences both mass and theoretical yields, while lignin and ash contents had no measurable impact (Zhang et al., 2012).

Applying consolidated bioprocessing (CBP) in HTP assay format, Oguntimein et al. demonstrated the fermentation of switchgrass and Avicel by a parent strain of Clostridium thermocellum (Δhpt) (Oguntimein et al., 2018). The HTP, MTP-based CBP assay produced ethanol levels similar to bench-scale Avicel and switchgrass fermentations (Dumitrache et al., 2016; Tian et al., 2016). According to the authors, additional studies are needed comparing the effect of biomass concentration on bioethanol production, to correlate the HTP CBP screen with other analytical methods such as quantitative saccharification of residual biomass, and evaluation of well pooling necessary to determine the extent of hydrolysis and fermentation. Screening on other biomass feedstocks such as corn stover and poplar is also needed to validate a broader application of the method. Lastly, evaluating multiple microbial strains will provide a more comparative picture of the impact of microbial factors on the assay specificity (Oguntimein et al., 2018).

This review reports on the factors influencing HTP assays methods for screening of otherwise recalcitrant lignocellulosic substrates for bioethanol production so that they can be performed efficiently and reproducibly in a laboratory setting. Current methods are influenced by the preparation of LCB, the type of LCB, weight of biomass and the type of fermenting organisms. Further studies are required to evaluate and optimize the interactions of these factors in order to have practical uses in the biorefining of biomass substrates for second generation biofuels. Most of the HTP methods have been developed to mimic large-scale operating conditions (Lattermann and Buchs, 2015). This trend will likely continue toward even smaller reactors, potentially even single-cell microfluidic chips (Oguntimein et al., 2018).

NMR for Biomass Sugar and Fatty Acid Composition

Detailed characterization is required for the continued development of biomass feedstocks possessing traits desirable for biofuels and bio-derived chemicals. Quick and precise identification of cell wall chemistry traits and composition due to both gene transformation and natural variation can be accomplished through high-throughput (HTP) characterization using nuclear magnetic resonance (NMR) spectroscopy. Traditionally, lignin and carbohydrate chemistries have been obtained through time and labor-intensive bench-scale HPLC and gravimetric determination (Sluiter et al., 2008, 2010). While carbohydrate characterization methods can incorporate automated sample preparation in 96-well plates, a time-consuming HPLC analysis step for minor sugars composition is often the bottleneck in throughput (Selig et al., 2011).

Simple ¹H NMR methods provide a wealth of information about liquid samples, such as biomass hydrolysates, including both carbohydrate and hydrolysate by-product compositions. Mixture analysis of NMR spectra is well developed, with many software applications making analysis straightforward and reliable (da Silva Neto et al., 2009; Powers, 2009; Spraul et al., 2009; Da Silva et al., 2013). Traditionally, sugar analysis in biomass hydrolysates has been performed using integration of the anomeric proton region between 4.4 and 5.4 ppm against a reference standard (Kiemle et al., 2003; Mittal et al., 2009), but peak overlap occurs for several sugars and the large water peak at 4.8 ppm often makes integration of these peaks impossible. Shifting of the water peak above the anomeric proton region requires either highly acidic conditions, which cause NMR instrumentation issues (Kelly et al., 2002), or a below-freezing temperatures of water, rendering them undesirable for high-throughput analysis. Gjersing et al. (2013) reported a method developed for high-throughput screening of biomass hydrolysates generated from two-stage acid hydrolysis using ¹H NMR spectra. HPLC-measured concentration data was used to construct a Partial Least Squares Regression (PLS) model for sugar composition using NMR spectra of aqueous hydrolysates. A model for each monomeric sugar can be used to determine concentrations within the hydrolysate mixture. In this method, 8 biomass feedstocks were used to construct a PLS model with HPLC data and NMR spectra. The fully cross-validated model was used to predict sugar concentrations for 15 samples, including a feedstock that was not in the original model, and compared to HPLC-measured concentrations. The NMR based PLS model and HPLC-measured concentrations agreed, within error, demonstrating the applicability of the model (Gjersing et al., 2013). Use of a cryoprobe allowed for the NMR experiment time to be further reduced, and preliminary work indicates 4 min per sample—a dramatic increase in throughput compared to traditional HPLC methods. A combination of this NMR approach for analysis of carbohydrates combined with the high-throughput micro-scale hydrolysis preparation discussed previously (Selig et al., 2011) could provide a truly high-throughput screening method for biomass sugar composition.

Increasingly, metabolite profiles are being used to screen biological materials, including biomass populations to look for varying plant responses to stress (Ruan and Teixeira da Silva, 2011; Sun et al., 2016). Recent reports demonstrate that the laborious extraction procedures for metabolites has been improved, making metabolic analysis by NMR more practical (Fumagalli et al., 2009; Martineau et al., 2011; Rivas-Ubach et al., 2013). Additionally, high-resolution magic angle spinning (HR MAS) NMR provides the ability to use whole cell plant material to detect changes in abundant metabolites, which eliminates several preparation steps (Silva et al., 2012; Blondel et al., 2016).

Fatty acids from non-lignocellulosic biomass such as microalgae have become more routinely recognized as a feedstock for biofuel production (Fukuda et al., 2001; Chisti, 2007; Wijffels and Barbosa, 2010). Algal lipid composition varies greatly among species and additional variation is added when culture growth conditions are modified (Scott et al., 2010). Thus, it is imperative that analytical tools be developed for rapid screening of large numbers of samples necessary for comparative studies. Traditionally, protocols for analysis of lipids from microalgae involve time-consuming and labor-intensive extraction followed by chromatography (Bligh and Dyer, 1959; Jones et al., 2012). A fluorescent model was developed that suffered from many technical drawbacks to large-scale screening, including daily calibration of the fluorescent probe and specificity of cell response (Cooksey et al., 1987; Elsey et al., 2007; Chen et al., 2009). However, a simple ¹H NMR screening method was developed that allowed for assessment of major lipid classes from rough microalgae extracts (Nuzzo et al., 2013). The collection of a single NMR spectrum took only a few minutes, without purification of rough extracts, and employed the use of a reference electronic signal as an external standard, known as ERETIC (Akoka et al., 1999). This allowed for the quantification of several major lipid classes important for biodiesel synthesis, including total fatty acids, free fatty acids, triacylglycerols, unsaturated fatty acids, and saturated fatty acids (Nuzzo et al., 2013).

Overall, there have been dramatic improvements in sample handling over the last two decades that have allowed traditional bench-scale methodologies to become high-throughput. As robotics and sample changers become more common place, methodologies can be upgraded to accommodate the demand for the large data sets involved in bioinformatics.

Data Analysis and Systems Biology Modeling for High Throughput Biomass Analysis

There is a need for integrated biological models to capture the higher order complexity in the interactions that occur among cellular components. A full model of all of the higher order interactions of cellular and organismal components is one of the ultimate grand challenges of systems biology (Sweetlove et al., 2017). The ability to build such comprehensive models will usher in a new era in biology. Success in the construction and application of computational algorithms will enable new insights into the molecular mechanisms responsible for complex biological systems and related emergent properties; using technologies not previously available on a scale not feasible before. A full systems biology model of all of the higher order interactions of cellular and organismal components would lead to breakthroughs, which would have profound effects on the field (Sweetlove et al., 2017).

The cost of generating biological data is dropping exponentially, resulting in increased data that has far outstripped the predictive growth in computational power from Moore's Law. This flood of data has opened a new era of systems biology in which there are unprecedented opportunities to gain insights into complex biological systems. The dominant paradigm of high-throughput systems biology is the use of new technologies to generate massive amounts of data that can then be analyzed computationally for new insights and hypothesis generation. Solving such complex combinatorial problems will give us extraordinary levels of understanding of biological systems. Paradoxically, understanding higher order sets of relationships among biological objects leads to a combinatorial explosion in the search space of biological data. These exponentially increasing volumes of data, combined with the desire to model more and more sophisticated sets of relationships within a cell and across an organism (or in some cases even ecosystems), have led to a need for computational resources and sophisticated algorithms that can make use of such datasets. Thus, the bottleneck in biological science is often no longer data generation but rather the computational analysis.

Biological organisms, including plants, microbes, and humans, are derived from complex genetic systems that are composed of functional networks of interacting molecules, macromolecules, and even species (Foster et al., 2017). The subsequent phenotypes are the result of orchestrated, hierarchical, varied collections of expressed genomic variants regulated by and related to biotic and abiotic signals. However, at the individual organism level, the measured effects of these genomic variants can be viewed as the result of historic selective pressure and current environmental as well as epigenetic interactions. Thus, the co-occurrence of genome variants and the resulting complex phenotypes can be viewed in the context of genome-wide associations in several different ways. This phenomenon allows us to use vectors of genome variant-to-trait associations to detect the higher order interactions occurring in an organism across hierarchical phenotypes. A full model of all of the higher order interactions of cellular and organismal components is one of the ultimate grand challenges of systems biology.

We are attempting to do this for the bioenergy feedstock Populus trichocarpa (black cottonwood) and are currently using 10 million genome variants derived from the resequenced genomes of more than a thousand different genotypes and 160,000 phenotypes that have been measured across this population (including transcriptomics, metabolomics, microbiomics, and phenomics data).

Networks

Networks are useful tools for modeling and analyzing complex biological systems by representing biological objects as nodes, (e.g., genes, proteins or metabolites) and representing the relationships/interactions/similarities between them as edges (Barabasi and Oltvai, 2004). For example, networks can model co-expression relationships between genes, sequence similarity between genes, physical interactions between proteins and/or correlations between metabolites. Networks allow for biological datasets to be visualized in an intuitive manner and network visualization packages such as Cytoscape provide an interactive environment for network visualization. However, networks are not simply useful as a visualization tool. Networks provide a data structure that serves as a mathematical representation of a complex system, allowing further analysis to be performed on a dataset represented as a network. Datasets represented as networks are also very easily merged with other networks, thus constructing a useful tool for combining information from different data sources to create a combined and holistic environment for data interpretation (Shannon et al., 2003).

GWAS Network Construction

Phenotypes are often complex traits, in that they are influenced by the environment and potentially a large number of genes (Solovieff et al., 2013). GWAS attempts to associate the presence of SNPs with these complex traits (Visscher et al., 2012; Solovieff et al., 2013). This involves genotyping a large number of individuals in a population, measuring phenotypes across all of these individuals and statistically determining the association between the presence/absence of the genotyped markers or SNPs and each phenotype (Korte and Farlow, 2013). A general concern when conducting GWAS studies is that individuals within a population that can be genetically related and share causal alleles, which cause the phenotype, and non-causal alleles artifactually connected to the phenotype (Visscher et al., 2012; Korte and Farlow, 2013). These causal and non-causal alleles can be located near each other on a chromosome and could thus be in linkage disequilibrium (i.e., alleles which are correlated across a population and co-inherited. This linkage disequilibrium (LD) between causal and non-causal alleles across related individuals results in non-causal alleles being correlated with a phenotype when they have no actual effect on the phenotype. A common approach to correcting for this phenomenon is to take population structure into account to avoid artificially inflated p-values. Population structure is often estimated from a kinship matrix and incorporated into the model (Flint-Garcia et al., 2003).

Spurious phenotype-to-genotype associations can also result from outlier phenotype values, this is especially evident when using linear models to calculate such associations. We therefore often apply a median absolute deviation (MAD) from the median cutoff in order to determine if a given phenotype measurement is an outlier compared to measurement taken across the population (Leys et al., 2013).

Associations between genome variants (SNPs) and phenotypes are typically made with the use of a linear mixed model as found in EMMAX (Kang et al., 2010). This results in multiple individual tests being performed, thereby introducing a multiple hypotheses bias, i.e., type 1 error. This bias is often mitigated with one of several false discovery rate methods, including the Benjamini–Hochberg method (Benjamini and Hochberg, 1995).

Alternatively, networks can then be created in which the respective SNPs and phenotypes are nodes and an edge denotes a significant GWAS association between them, enabling the subsequent determination of whether or not a phenotype-associated SNPs reside within genes and create a subnetwork of gene-phenotype associations.

Layered Networks, LOE Scores, and New Potential Targets

Alternate lines of evidence about the relationships between genes, and between genes and phenotypes can be created using several sets of networks. We recently developed a Lines of Evidence scoring system (LOE scores) in order to quantify the number of lines of evidence connecting genes to phenotypes (Figure 3). The GWAS network layers provide functional information at various scales (from molecular to organismal to environmental), which reflect signaling cascades, biosynthetic pathway information, and various regulatory circuits. For example, the co-expression and co-methylation networks provide information from multiple regulatory layers within the cell and the SNP correlation network models putative co-evolution relationships between genes (Climer et al., 2014; Joubert et al., 2018; Weighill et al., 2018).

FIGURE 3

Figure 3. Schematic of the Lines of Evidence scoring system (LOE scores) used to quantify the number of lines of evidence connecting genes to phenotypes.

LOE (Lines of Evidence) scores for each gene can be calculated by starting with functions or topics of interest, revealing the strength of the evidence linking each gene to the function or topic of interest. The LOE breadth score quantifies types of LOE's (number of layers) that connect a gene and topic or function, and the LOE depth score quantifies the total number of functions/topic a gene is associated with. Individual layer LOE scores from each layer (e.g., co-expression or GWAS) indicate the number of function/topic associations a gene has within that particular layer (Weighill et al., 2018).

This LOE approach provides a new approach for exploring the vast data collections that are occurring in biology today. Any known genes, phenotypes or annotation topics of interest can be provided as input. A rank-order list of new candidate genes that have multiple lines of evidence supporting their involvement in the area of interest can be created from LOE scores. And as such, this approach generates a prioritized list for genetic modification via transformation, genome editing, selective breeding etc. used to validate and/or manipulate a phenotype or set of phenotypes.

Deeper Discoveries in Systems Biology: The Balance Between Type 1 and Type 2 Error

In a GWAS analysis that is done in isolation there is often large concern for false positives and stringent, riotous FDR thresholds are frequently applied. However, this overcompensation for type-1 error (avoiding false positives) will likely result in large type 2 errors (i.e., false negatives). If one's goal is to create a systems biology model that captures as many biological interactions (e.g., protein-protein interactions, epistatic and pleiotropic interactions, biosynthetic regulators, etc.) as possible, this is a heavy price to pay. We are now using a combination of relaxed FDR thresholds in combination with LOE on the resulting associations in order to strike an improved balance between type 1 and type 2 error, allowing for a more comprehensive models of the entire biological system. As such, our ability to reconstruct the entirety of a complex biological system improves as the number of population-scale endo-, meso- and exo-phenotypes are measured and combined with deep layers of experimental data collected on individual genotypes.

Future Prospects for High Throughput Biomass and Data Analysis

This review has covered recent developments for the high throughput analysis of biomass composition and other properties that have been made possible through the use of robotics and miniaturized equipment, sophisticated computational tools, rapid detection instrumentation and by reduction in sample size, preparation, man-power, and materials needed for analyses. There is still room for improvement in these processes, particularly in relation to increases in preparation throughput and accuracy of the results. Improvements to instrumentation will continue in effort to enhance sensitivity, resolution, dynamic range, and robustness. Additionally, smaller and portable instruments that can be brought to the field would aid in reducing sample collection resources and errors. Processing parameters, scalability and the effects of other variables, associated particularly in deconstruction and conversion analyses, still require further investigations for accuracy and applicability to large-scale conditions. Comprehensive analysis of data using sophisticated computational tools could extend the capabilities of associated analytical methods and instrumentation and provide a better understanding of biological systems as a whole.

Author Contributions

SD coordinated manuscript assembly and editing, provided content for sections on 2-stage acid hydrolysis and HTP recalcitrance screening, RH Provided content on NMR analysis of sugars and fatty acids, AH-W Provided content on py-MBMS, EW Provided content on NIR, DJ, DW, PJ, GT, DK: Provided content on data management, MR, GO Provided content on HTP recalcitrance screening.

Funding

Funding was also provided by The Center for Bioenergy Innovation (CBI). U.S. Department of Energy Bioenergy Research Centers supported by the Office of Biological and Environmental Research in the DOE Office of Science.

Conflict of Interest Statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Acknowledgments

Portions of the manuscript have been authored by UT-Battelle, LLC under Contract No. DE-AC05-00OR22725 with the U.S. Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan).

This work was authored in part by Alliance for Sustainable Energy, LLC, the manager and operator of the National Renewable Energy Laboratory for the U.S. Department of Energy (DOE) under Contract No. DE-AC36-08GO28308. Funding provided by U.S. Department of Energy Office of Energy Efficiency and Renewable Energy Bioenergy Technologies Office. The views expressed in the article do not necessarily represent the views of the DOE or the U.S. Government. The U.S. Government retains and the publisher, by accepting the article for publication, acknowledges that the U.S. Government retains a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this work, or allow others to do so, for U.S. Government purposes.This research was also supported by the Department of Energy Laboratory Directed Research and Development funding (7758), at the Oak Ridge National Laboratory. Oak Ridge National Laboratory is managed by UT-Battelle, LLC, for the US DOE under contract DE-AC05-00OR22725.

This research described herein was supported by an award of computer time provided by the INCITE program and used resources of the Oak Ridge Leadership Computing Facility (OLCF) and the Compute and Data Environment for Science (CADES) at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725.

Support for the Poplar GWAS dataset was provided by The BioEnergy Science Center (BESC) and The Center for Bioenergy Innovation (CBI). U.S. Department of Energy Bioenergy Research Centers supported by the Office of Biological and Environmental Research in the DOE Office of Science. The Poplar GWAS Project used resources of the Oak Ridge Leadership Computing Facility and the Compute and Data Environment for Science at Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725.

The JGI Plant Gene Atlas project conducted by the U.S. Department of Energy Joint Genome Institute was supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231. Full Gene Atlas data sets are available at: http://phytozome.jgi.doe.gov.

References

Abrams, S. M., Shenk, J. S., Westerhaus, M. O., and Barton, F. E. II (1987). Determination of forage quality by near infrared reflectance spectroscopy: efficacy of broad-based calibration equations^1,2,3,4. J. Dairy Sci. 70, 806–813. doi: 10.3168/jds.S0022-0302(87)80077-2