Comparison of Rapid Biodiversity Assessment of Meiobenthos Using MALDI-TOF MS and Metabarcoding

Rossel, Sven; Khodami, Sahar; Martínez Arbizu, Pedro

doi:10.3389/fmars.2019.00659

ORIGINAL RESEARCH article

Front. Mar. Sci., 01 November 2019

Sec. Marine Molecular Biology and Ecology

Volume 6 - 2019 | https://doi.org/10.3389/fmars.2019.00659

Comparison of Rapid Biodiversity Assessment of Meiobenthos Using MALDI-TOF MS and Metabarcoding

Sven Rossel^1*

Sahar Khodami¹

Pedro Martínez Arbizu^1,2

¹German Centre for Marine Biodiversity Research, Senckenberg am Meer, Senckenberg Gesellschaft für Naturforschung, Wilhelmshaven, Germany
²Marine Biodiversity Research, Institute for Biology and Environmental Sciences, Carl von Ossietzky University Oldenburg, Oldenburg, Germany

Nowadays, most biodiversity assessments involving meiofauna are mainly carried out using very time-consuming, specimen-wise morphological identifications, which demands comprehensive taxonomic knowledge. Animals have to be examined for minor differences of setae compositions, mouthpart morphology or number of segments for various extremities. DNA-based methods such as metabarcoding as well as recently emerged rapid analyses using MALDI-TOF mass spectrometry to identify specimens based on a proteome fingerprint could vastly accelerate the process of specimen identification in biodiversity assessments. However, these techniques depend on reference libraries to connect collected data to morphologically described species. In this study the success rate of both approaches have been tested based on reference libraries constructed using part of the samples from a new study area to identify unknown samples. Using MALDI-TOF MS we found, that species which do not exist in an incomplete mass spectra reference library only have minor impact on the results, when employing a post hoc test for Random Forest classifications. This test reveals specimens that demand morphological re-examination for the final species assignment. Metabarcoding however strongly demands a rich reference library to provide correct MOTU assessments in congruence with morphological determination. Nevertheless, with a complete library and a suitable data transformation [herein log(x + 1)], the number of reads per MOTU reflects relative species abundances in metabarcoding inference. The results of this study facilitate specimen identification by using MALDI-TOF MS, which is incomparably cheap for specimen-by-specimen identification, but when it comes to sample-wise analyses, metabarcoding outperforms other techniques by far.

Introduction

Assessing species’ diversity, distribution, and community structure is crucial to understand the relationship of species to the surrounding environment. Moreover, monitoring of communities is necessary to detect the influence of environmental changes on species compositions. Therefore, accurate species identification is necessary for biodiversity research.

Morphological identification in particular for the tiny meiofauna organisms is very challenging and time-consuming (e.g., Brannock et al., 2014; Morad et al., 2017; Rzeznik-Orignac et al., 2017). An exact determination often demands dissection of the smallest appendages (Huys et al., 1996) and a comprehensive taxonomic knowledge. Unfortunately, taxonomic identification using morphology only has shown to underestimate the true diversity compared to DNA-based methods (Tang et al., 2012), mainly because of cryptic diversity for many species (Knowlton, 1993).

Over the last years, several methods have been introduced to improve and accelerate species identification. The most commonly applied method is COI mtDNA barcoding, based on amplification of the mitochondrial cytochrome oxidase c subunit I fragment (Hebert et al., 2003a, b). Large data sets for many groups of animals have been published (e.g., Knebelsberger et al., 2014; Barco et al., 2016) and have been applied for rapid and reliable species identifications. However, this technique often demands taxa-specific optimization and several processing steps. Therefore, meiofauna barcoding surveys which often focus on identification of voucher specimens only (Vogt et al., 2014; Avó et al., 2017) may underestimate the true diversity (Tang et al., 2012) and consequently the importance of meiofauna.

Next generation sequencing approaches such as metabarcoding allow community analysis using batch samples (e.g., Taberlet et al., 2012; Leray and Knowlton, 2015; Fonseca V. et al., 2017) which not only reduces the expenses, but also decreases the effort of extraction, fragment amplification, purification, and sequencing from several 100 specimens to just processing whole samples in DNA-based biodiversity assessments. However, to reliably identify species from bulk samples, well-curated reference libraries are essential (e.g., Fonseca G et al., 2017) to provide connection between obtained OTUs to morphospecies.

Another promising method for species identification in biodiversity assessments originates from the field of microbiology. Matrix-Assisted Laser Desorption/Ionization Time-of-Flight mass spectrometry (MALDI-TOF MS) is commonly used to identify bacteria, viruses or fungi (Moura et al., 2008; La Scola et al., 2010; De Bruyne et al., 2011) based on a proteomic fingerprint. Also, pilot studies were carried out for several animal groups like fish (Mazzeo et al., 2008), dipterans (Feltens et al., 2010; Kaufmann et al., 2012) or copepods (Laakmann et al., 2013; Rossel and Martínez Arbizu, 2018a, 2019).

The method provides a fast and reliable workflow for specimen-by-specimen identification at low expenses. Studies on copepods and dipterans showed the ability to differentiate between cryptic species and the congruence of COI barcode and MALDI-TOF MS-based species delimitation (Müller et al., 2013; Bode et al., 2017). In case of whole specimen extraction from copepods, exuviae were retained and could be examined afterward for morphological identifications (Laakmann et al., 2013; Rossel and Martínez Arbizu, 2018a, b). In contrast to morphospecies assignments that were often used in ecological studies MALDI-TOF MS provides mass spectra that are comparable between studies.

Faunistic assessments are very important for understanding species relationships to the environment and DNA-based delimitation methods are widely used. Despite the fact that the accuracy of metabarcoding and MALDI-TOF MS strongly depend on reference libraries, DNA databases are remarkably poor in case of meiofauna organisms, even from well-studied areas like the North Sea (e.g., Vogt et al., 2014). Therefore, in this study, we aim at comparing the efficiency of metabarcoding and MALDI-TOF MS for community studies of previously un-sampled study sites by sampling a tidal flat and using some samples for building up DNA and mass spectra libraries and others for assessing biodiversity and community structures. Techniques will be analyzed in terms of identification success compared to morphological examination and expenses.

Materials and Methods

Sampling and Storage

Samples were taken by hand with a syringe (Ø 3.1 cm, 5 cm depth) during low tide at a tidal flat (53°38′40.2′′ N 8°04′57.6′′ E) in front of the village Hooksiel in the littoral zone of the German North Sea coast on 19^th April 2017. Twelve sandy sediment samples were fixed in absolute ethanol and stored overnight at −25°C.

Samples were sieved through a 40 μm sieve and density-gravity centrifuged according to McIntyre and Warwick (1984) employing Kaolin and Levasil^® (Kurt Obermeier GmbH & Co. KG, Bad Berleburg, Germany). Until further processing, samples were stored at −25°C in absolute ethanol.

Adult specimens were sorted on ice from centrifuged samples and morphologically identified using different identification keys (Lang, 1948; Huys et al., 1996; Wells, 2007).

Five samples were used for Metabarcoding, five for MALDI-TOF MS measurements and two for the construction of 18S rRNA and mass spectra libraries (Figure 1).

FIGURE 1

Figure 1. Workflow of community identifications using metabarcoding and MALDI-TOF MS. DNA and MS reference libraries were constructed from specimens of two samples. The DNA library was used to assign species to Illumina MiSeq sequence reads from pooled samples. With incomplete library, this will however result in an unresolved community containing unidentified MOTUs. Based on the MS reference library a Random Forest model is constructed to classify all specimens from samples. The post hoc test is then used to validate or reject classifications. Rejected classifications are re-examined morphologically and subsequently assigned to the correct species. The resulting community consists of species based on MS and specimens identified morphologically.

Specimen Processing and Measurements for MALDI-TOF MS

Individual specimens were separated into 1.5 ml microcentrifuge tubes with 0.5 μl absolute ethanol. After complete evaporation of ethanol, 2 μl of a matrix solution containing α-Cyano-4-hydroxycinnamic acid (HCCA) as a saturated solution in 50% acetonitrile, 47.5% molecular grade water and 2.5% trifluoroacetic acid were added. After 5 min of incubation, the solution was applied to one spot for crystallization on a target plate.

Samples were measured using a Microflex LT/SH System (Bruker Daltonics), employing the flexControl 3.4 (Bruker Daltonics) software. Masses were measured from 2 to 20k Dalton. For peak evaluation, mass peak range from 2 to 10k Dalton was analyzed using a centroid peak detection algorithm, a signal to noise threshold of 2 and a minimum intensity threshold of 600, with a peak resolution higher than 400. Proteins/Oligonucleotide method was employed for fuzzy control with a maximal resolution 10 times above the threshold. For a sum spectrum, 240 satisfactory shots were summed up. One mass spectrum was measured for each specimen.

If retained after extraction of proteomic data, exuviae were stored in 70% ethanol at Senckenberg German Centre for Marine Biodiversity Research (DZMB, Wilhelmshaven, Germany).

DNA Isolation, PCR Amplification, and Sequencing for Reference DNA Libraries

18S rRNA (V1 and V2 hypervariable regions, ∼380 bp) was amplified using SSU-F04 forward primer (5′-GCTTGTCTCAAAGATTAAGCC-3′) and SSU-R22 reverse primer (5′-GCCTGCTGCCTTCCTTGGA-3′) (Blaxter et al., 1998) and were used as reference library for metabarcoding.

DNA was extracted from individual specimens using 20 μl InstaGene matrix (Bio-Rad Laboratories, Munich, Germany) in a vapo.protect Mastercycler pro S Cycler (Eppendorf, Hamburg, Germany) for 50 min at 56°C and 10 min at 96°C. The prosome body part was used for MALDI-TOF MS and urosome body part for extraction of DNA for simultaneous use of one specimen for proteomics and DNA sequencing. If retained after DNA extraction, exuviae were stored in 70% ethanol at Senckenberg German Centre for Marine Biodiversity Research.

Amplification of DNA fragments was done in a total reaction volume of 20 μl, containing 10 μl AccuStart II PCR ToughMix (QuantaBio, Beverly, MA, United States), 0.2 μl primers (20 pmol/μl), 2 to 5 μl DNA extract and the respective amount of molecular grade water. Amplifications were carried out in a vapo.protect Mastercycler pro S (Eppendorf, Hamburg, Germany) for both gene fragments.

Cycler settings for amplification of 18S fragment were: an initial denaturation step at 94°C for 3 minutes (min), denaturation at 94°C for 30 seconds (s), annealing at 57°C for 50 s and elongation at 72°C for 60 s. In total, 35 amplification cycles were carried out ending in a final elongation step for 2 min at 72°C.

Negative control samples were included in all amplification runs. From each PCR product, 2 μl were verified for size conformity by electrophoresis in a 1% agarose gel stained with GelRED^TM using commercial DNA size standards.

PCR products were purified and sequenced at a contract sequencing facility (Macrogen Europe, Amsterdam, Netherlands). Trace files were assembled with SeqTrace (Stucky, 2012) and aligned in SeaView (Gouy et al., 2010). Sequences were checked for the amplification of the correct gene fragment by Blast search (Zhang et al., 2000; Morgulis et al., 2008).

18S rRNA gene fragments were amplified from seven harpacticoid species presented in the NGS library comprising Tachidius discipes Giesbrecht, 1881; Harpacticus flexus Brady & Robertson, 1873; Asellopsis intermedia (Scott, 1895); Laophonte sp.; Platychelipus littoralis Brady, 1880; Delavalia palustris Brady, 1868 and Ectinosomatidae sp. DNA amplification failed for the single specimen of D. palustris; hence, the 18S fragment was amplified using specimens of a former project (Table 1). Two of the 18S sequences from the selected specimens have shown the Q30 lower than optimal at both 5′ and 3′ end of the sequence immediately after and before the primer binding regions. To produce a reference library covering the full fragment size according to NGS libraries, the representative sequence of each species (with 100% similarity match) was retrieved from the Illumina reads.

TABLE 1

Table 1. Specimens for which an 18S gene fragment was amplified with the according GenBank accession numbers.

Processing of MALDI-TOF Mass Spectra

Mass spectrometry data from all samples were processed together in R (version 3.2.3, R Core Team. (2018) using packages ‘MALDIquant’ (Gibb and Strimmer, 2012) and ‘MALDIquantForeign’ (Gibb, 2015). Protein mass spectra were trimmed to an identical range from 2,000 to 20,000 m/z and smoothed with the Savitzky-Golay method (Savitzky and Golay, 1964). The baseline was removed based on SNIP baseline estimation method (Ryan et al., 1988) and spectra were normalized using the TIC method implemented in MALDIquant. Noise estimation was carried out with a signal to noise ratio (SNR) of 7. Peaks were repeatedly binned using command ‘binpeaks’ with a tolerance of 0.002 in a strict approach to the number of peaks for the whole data set was reduced from 9344 to 652 peaks. The resulting intensity matrix was Hellinger transformed (Legendre and Gallagher, 2001) for further use in Random Forest (RF) (Breimann, 2001) analysis.

Random Forest Analysis

Based on the library (n = 115) an RF model was calculated using ‘randomForest’ R package (Liaw and Wiener, 2002) to classify specimens from the test samples (Figure 1). The model was generated using 2,000 trees with 35 analyzed characters at each tree split. To prevent overestimation of frequent species, ‘sampsize’ was set to 1.

RF classifications were tested using the post hoc test sensu Rossel and Martínez Arbizu (2018a) with a 1% quantile as the threshold for false positive classification. The function rf.post.hoc is available from the R package Rftools (Martínez Arbizu and Rossel, 2018). RF classifications rejected by the post hoc test were morphologically re-examined (Figure 1). For species with only one specimen in the reference library, a post hoc test could not be carried out. Hence, all specimens classified as this species were automatically regarded as false positive classifications and subsequently re-examined morphologically (Figure 1).

NGS, Library Preparation

Five samples were prepared for metabarcoding containing selected specimens which were identified to the species level using a dissecting microscope. Specimens from one sample were pooled together and DNA was extracted from the pooled sample (Fontaneto et al., 2015) using E.Z.N.A. Mollusc DNA Kit (Omega Bio-tek).

A short fragment (∼380 bp) including the hypervariable regions V1 and V2 of the 18S gene (see Hadziavdic et al., 2014) was amplified using the primers SSUF04 and SSUR22 (Blaxter et al., 1998) in which the Nextera compatible Illumina adapter overhang sequences were added to the 5′ end of the locus-specific primers following the Illumina 16S Metagenomic Sequencing Library Preparation guide (15044223Rev.B), resulting in following composite primers:

5′-TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG-SSUF04,

5′-GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAG-SSUR22.

The first PCR was performed using the composite primers and Phusion High-Fidelity PCR Master Mix (Thermo Fisher) using 2 μl of template DNA and 0.5 μl of each primer (10 pmol) in a final 20 μl reaction volume with following protocol: initial denaturation for 2 min and 10 cycles of denaturation at 98°C for 15 s, annealing at 62°C for 30 s, extension at 72°C for 30 s followed by a final extension of 4 min.

Five μl of the first PCR products were purified using 2 μl of ExoSAP-IT PCR Product Cleanup Reagent (Thermo Fisher) following incubation of 5 min at 37°C and 1 min at 80°C.

A second PCR using Nextera XT V2 Indexed primers (dual indexing approach) was performed using 7 μl of the purified first PCR amplicons, 0.5 μl of each primer (10 pmol) in a final volume of 20 μl, and 10 amplification cycles of using the PCR protocol above. Incorporation of the full Illumina adapter to the amplicon was checked by comparing the length of the fragments of the 1^st and 2^nd PCR in a 1% Agarose Gel stained with 1% Gel Red. The amplicons from second PCR were purified from 1% agarose gel using Monarch DNA Gel Extraction Kit (NEB).

NGS, Sequencing, and Bioinformatics

DNA concentration of each amplicon from second PCR was measured after extracting from agarose gel with Qubit Fluorometer using Qubit dsDNA High Sensitivity (HS) Assay (Thermo Fisher). For normalization, libraries were diluted to a concentration of 4 nM/μl and pooled together with libraries from another project at equal molarity. Five μl of the 4 nM pooled library was denatured into single stranded DNA using 5 μl of NaOH 0.2M and diluted to 20 pmol concentration using 990 μl of Hyb1 buffer (Illumina). To increase the diversity in the first sequencing runs, 10% volume of 20 pmol PhiX control was spiked into the library. The spiked Library was further diluted to 17 pmol prior sequencing on a MiSeq using V3 chemistry and 301 paired-end read frame.

The Illumina reads were processed using VSEARCH (Rognes et al., 2016) following the pipeline found at https://github.com/torognes/vsearch/wiki/VSEARCH-pipeline. Initially paired-end reads were used to create contigs for each sample with following settings: expected mean size 400 bp, allowed variation: ± 50 bp, minimum contig overlap: 50 bp, maximum allowed differences in the contig: 15 bp. Contigs were quality filtered with: maximum expected errors after contig creation: 0.5, maximum ambiguities in the contig: 0. Contigs were then de-replicated using two minimum counts to create a unique sequence. The initial OTU library was created by pooling all de-noised and de-replicated reads from all samples together and performing an initial clustering at 98% similarity. Then chimeras were detected and excluded. This final OTU library was used to create an OTU table by assigning the reads of the individual samples to OTUs at 97% similarity threshold. The generated OTUs have been clustered according to the reference library at the minimum identity of 97 and 100% length coverage.

The OTU table was analyzed with R package vegan (Oksanen et al., 2013). We tested four different transformations of the data in order to check which of them produces a higher similarity between community structure present in the sample (number of counts per specimen) and results of metabarcoding. The transformations applied were Chord (divide by total n per sample), Hellinger (square root of Chord), log(x + 1) and presence/absence. For each transformation, Spearman rank correlation between the similarity of samples based morphological identification and similarity based on metabarcoding results was tested with the Mantel test (function mantel in vegan). Community similarity was depicted using bar plots and non-metric multidimensional scaling (nMDS) (function metaMDS in vegan).

Comparison of Expenses

Expenses of different methods were calculated including kits, chemicals and prices for sequencing, excluding costs for pipette tips, microcentrifuge tubes, or instruments. For DNA barcoding, cost-effective chelex DNA extraction, plate purification and sequencing services were considered for calculations. Expenses of approaches using different volumes of matrix to adjust for specimen size and also matrix prices from different providers were analyzed for MALDI-TOF MS. Different indices kits for analysis of 24 or 96 samples simultaneously were compared for metabarcoding. Detailed information on the calculation of prices can be found in the supplementary (Supplementary Tables S1–S3).

Results

MALDI-TOF MS

The RF analysis was carried out based on a library containing 115 specimens of eight species (Table 2), including specimens for which DNA was extracted and a mass spectrum was measured simultaneously. However, test samples contained two further species, which were not part of the library (Enhydrosoma gariene Gurney, 1930, P. littoralis). Hence, the library was considered incomplete. Abundances in the test samples analyzed ranged from one specimen for Laophonte sp. and Ectinosomatidae sp. 38 to 19 for H. flexus (Table 2).

TABLE 2

Table 2. Number of specimens identified for all samples morphologically (M) and classified by Random Forest (RF).

The incomplete library was used to calculate an RF model (n = 115, training error: 0.87%). This model was employed to classify specimens from the test samples (Table 2: RF). In total, of 133 specimens, five (3.76%) were misclassified by the RF model because the correct species was not part of the initial model. In the classification approach, one E. gariene and three P. littoralis specimens were misclassified as Laophonte spec., and a fourth P. littoralis specimen was classified as Microarthridion fallax Perkins, 1956. However, all of these classifications were rejected by the post hoc test (Tables 2, 3: FP) and could be morphologically re-examined to verify they belonged to species absent from the reference library (Table 2: FP – reviewed). Besides these, all specimens were classified correctly by Random Forest, including those species for which only one specimen was contained in the reference library (Laophonte sp., Ectinosomatidae sp. 38, and D. palustris). As described above, these were automatically recognized as false positives by the post hoc test and hence had to be re-examined mor.phologically. This confirmed the initial RF classification.

TABLE 3

Table 3. Number of specimens present in the samples refers to metabarcoding and the resulting number of reads per sample.

Metabarcoding

The final reference library generated by sanger sequencing produced reference sequences ranging from 344 bp in Laophonte sp. and 369 bp in D. palustris. The pure pairwise genetic distances between the seven species of the library range between a minimum of 5% and a maximum of 14.1%. This validated our OTU clustering threshold of 97% similarity.

The MiSeq run had a cluster density of 1,024 K/mm² generating 25.8 Million paired-end reads (92% cluster passing filter). From those, 69% (23.8 Million) passed the read filter and were de-multiplexed. The Hooksiel samples were multiplexed with a plankton metabarcoding project in the same run. A total of 5.6 million reads (23%) (between 3 and 5% or 1.2–0.71 million reads per sample) were assigned to this study. Samples used for metabarcoding are called AS, AT, AU, AV, and AW. Table 3 shows these samples as columns with the suffix ‘_r’ for reads from metabarcoding or ‘_s’ for specimens present in the sample. As DNA amplification from the only specimen of D. palustris failed, no reference for this species was available. Hence, metabarcoding failed to identify all species from the samples based on the incomplete reference library. However, the sequences of this species were assigned to a discrete MOTU, which could be further used for database searches or in general biodiversity assessments and comparisons. Adding genetic information for D. palustris from a previous project allowed further analyses of the data. With this, all seven species presented in the samples were detected by metabarcoding. However, H. flexus and Laophonte sp. Which were not included in the sample AU were recovered in the metabarcoding library by 14 reads each. Also P. littoralis has committed 3 and 1 reads from samples AT and AW, did not have any representative in the following reference libraries. Figure 2 shows the proportion of every species in the samples displaying the results of morphological identification (‘_s’) and metabarcoding number of reads (‘_r’) side by side to facilitate comparison. While Chord and Hellinger transformations show a certain similarity between the relative abundances of reads and specimens, the absolute values of relative abundances in a species by species comparison are still highly divergent. Remarkably, the relative abundance of A. intermedia (blue color in Figure 2) is constantly underestimated by the number of metabarcoding reads, while T. discipes (yellow) is overrepresented by the number of reads. The best agreement between morphological community structures and metabarcoding results derived from a log(x + 1) transformation of the data. The agreement in the relative proportions of number of specimens and number of reads is high across samples and species. The presence/absence transformation shows exact agreement of the samples AT and AW, and slight disagreement in samples AT, AU, and AW as explained above.

FIGURE 2

Figure 2. Comparison of the number of reads (‘_r’) to the number of morphologically identified specimens/species (‘_s’) for all samples with four different data transformations. Of the four transformations, log(x + 1) transformation and working with presence/absence data reflect the relative abundance and the presence of species best.

The Mantel test evidences significant spearman correlation among the similarity between samples of morphology assignment and metabarcoding (values shown in Figure 3) for Hellinger (r = 0.85, p = 0.042) and log(x + 1) (r = 0.9, p = 0.008) transformations, log(x + 1) over competing Hellinger transformation. However, Chord and presence/absence show no significant correlation in the Mantel test.

FIGURE 3

Figure 3. nMDS plots depicting reads per sample (‘_r’) and morphological species (‘_s’) for the four different data transformation. Again, presences/absence and log(x + 1) transformed data are most congruent comparing specimen and species abundances/presence.

The nMDS plots (Figure 3) show similarities of the samples based on community composition comparing morphological assessment and metabarcoding on the same ordination. The nMDS of the Chord transformed data appears visually as the worst of the three quantitative transformations (evidenced also by the non-significant mantel test). There is a raw agreement in the relative position of AU (green), AV (purple) and AS (blue), but AW (orange) and AT (red) show disagreement in the relative position between treatments. Interestingly, Hellinger transformation separates the treatments in ordination space, the samples from the metabarcoding (circles) showing less multivariate dispersal and located at the top-right quarter of the plot. The relative position of the samples to each other greatly agrees between treatments. The log(x + 1) nMDS plots shows the best agreement in the position of the samples in ordination space between the treatments. The couples of samples AV, AU appear close to each other and AT and AS show similar pattern. The nMDS with presence/absence data show complete agreement in samples AV and AS and lacks agreement in all others.

Discussion

MALDI-TOF MS

Based on MALDI-TOF mass spectra using RF and the post hoc test, all specimens were correctly identified. In contrast to the time demanding morphological identification, the post hoc test false positive assignment of the RF classification has greatly simplified and enhanced detection of the community composition and diversity assessments of the samples. Retaining the exuviae during the procession of specimens for MALDI-TOF MS has ensured the correct taxonomic assignment. Indeed an incomplete reference library, would lead to the misclassification of the specimens from the test samples according to the species available in the model. However, all of these classifications were recognized by the post-test and a subsequent morphological re-examination was carried out, identifying these specimens as new species. Therefore, it was shown that classification by RF using a post-test is also able to reveal species, which are new to the dataset. The results demonstrate the power of MALDI-TOF MS in combination with RF and the post hoc test for species identification based on an incomplete library as may be encountered when a new research area is accessed. According to the results, an incomplete library does not necessarily lead to misclassification of new species, which likely happens in unsupervised approaches like clustering algorithms (Collins and Cruickshank, 2013).

Therefore, even in new study areas, MALDI-TOF MS was found to be applicable by setting up a library from parts of the samples, accelerating biodiversity assessments by fast and reliable species identification compared to morphological assessments. Although one specimen per species was sufficient to create an RF model, which successfully classified all specimens of that species, using more specimens will certify the created RF model and implement a robust post hoc test to variability of the measured data.

Our results further align with low identification errors from other field studies of Bode et al. (2017) and Kaiser et al. (2018) on calanoid copepods or by Kaufmann et al. (2012) on biting midges (Diptera). However, most studies about metazoans using MALDI-TOF MS for species identification were pilot studies which did not provide biodiversity assessments (Kaufmann et al., 2012; Volta et al., 2012; Laakmann et al., 2013; Müller et al., 2013; Steinmann et al., 2013; Yssouf et al., 2013, 2014a,b; Dieme et al., 2014; Mathis et al., 2015; Hynek et al., 2018), making it difficult to compare results to existing literature. Nevertheless, all studies carried out so far provide high identification success making MALDI-TOF MS a promising tool for biodiversity assessments.

Metabarcoding

Our study showed the ability of metabarcoding to detect all species present in the samples including low abundant species as can be seen in Table 3. However, only a complete reference library can assure correct taxonomic assignment of the detected MOTUs to species. Because the public DNA depositories contain only partial records especially in case of meiofauna communities, enriching the reference libraries is crucial to assure factual diversity obtained by metabarcoding approach. Concerning the result of this study, the reads of D. palustris, the species missing from the reference library during the initial analyses, were clustered into a discrete MOTU, which further confirms the ability of metabarcoding using V1 and V2 hypervariable region of 18S in 97% pairwise similarity to discriminate the sequences into species. In congruence with previous metabarcoding studies (Guardiola et al., 2015; De Faria et al., 2018; Günther et al., 2018), we further suggest the successful use of 18S gene especially in case of metabarcoding of meiofauna communities in which amplification of COI barcode region has shown to be less successful for some organisms such as nematodes (Haenel et al., 2017). Laophonte sp. and D. palustris were presented with a single specimen in the libraries AS, AT, AW and AT, AW respectively and were recovered with the number of reads ranging from 34 to 1986, suggesting that, apart from the biomass which can effectively alter the number of reads (objective from A. intermedia, H. flexus, and T. discipes), other factors can also have influence on the absolute number of reads per MOTU.

As can be seen from Table 3 three species that were not present in the respective samples were recovered by metabarcoding with low number of reads. Sample AU had no H. flexus and Laophonte sp., however the species were recorded with 14 reads each. P. littoralis, which had no specimen in samples AT and AW, was recorded with 3 and 1 reads respectively. There are variety of reasons to explain the presence of these reads in the samples, the most probable explanations in our view being misassignment during de-multiplexing due to sequencing error in the index regions (Kircher et al., 2011) or tag jumping (Schnell et al., 2015). The wrong assignment of species to samples could be overcome by establishing a minimum number of reads to be achieved for a positive species assignment. The numbers of reads of wrong assignments are the lowest in the dataset (Table 3) ranging from 1 to 14 reads, which is less than the 34–1986 reads achieved when a single specimen was present (see above). However, we do not have an objective method for establishing a fixed threshold value for eliminating these reads. The threshold may change from study to study and even between sequencing runs of the same library. The wrong assignment of these species is the reason for the low performance of the presence/absence transformation, which otherwise would be very promising.

From the three quantitative transformations tested here, the log(x + 1) shows the best correlation between the similarity of samples based on morphological assignment and metabarcoding. Both the bar plot (Figure 2) and the multivariate nMDS plot (Figure 3) evidence greatest agreement with this transformation. Remarkably, the comparison of community structure between samples suggests the same ecological conclusion with both treatments. The samples AS and AW are converged while samples AU and AV are more divergent. Although there seems to be agreement that metabarcoding cannot estimate the absolute abundance of species in the samples (Elbrecht and Leese, 2015), our results show that the log(x + 1) transformation is accurate in estimating at least the relative abundance of the species within samples. The number of reads is most probably correlated to biomass, rather than abundance. The relatively low difference in biomass between the harpacticoid species analyzed here could be the reason for high quantitative agreements between abundances and number of reads after log(x + 1) transformation.

MALDI-TOF MS vs. Barcoding vs. Metabarcoding

In this study, reliable results have been obtained from MALDI-TOF MS and metabarcoding for species identification in biodiversity assessments in an unexplored study area. Our results show that metabarcoding finds all of the species recorded morphologically if contained in a previously generated reference library. However, in studies lacking a rich reference library, metabarcoding can fail to connect the obtained sequences to morphospecies. Nevertheless, metabarcoding of bulk samples was found to reveal overseen diversity (Leray and Knowlton, 2015; Fonseca V. et al., 2017); for example, Platyhelminthes were excessively found by metabarcoding studies but less frequently by morphological analyses (Blaxter, 2016).

Regarding to the DNA barcoding, the delimitation thresholds defined by the user may under- or over-estimate the diversity (Carugati et al., 2015) because of varying inter- and intra-specific genetic distances (Bucklin et al., 2010). In addition, the effort, which has to be considered in case of DNA barcoding to sort out every individual, DNA extraction and PCRs, is considerably higher than both metabarcoding and MALDI-TOF.

MALDI-TOF MS and COI barcoding, however, can only analyze the submitted specimens; any diversity beyond this is disregarded. An overseen assignment due to uncovered very similar morphospecies can be re-checked for detection of minor morphological differences afterward. Therefore, assessed data is always related to an actual specimen and not only to a substitutional DNA sequence, fostering a better understanding.

Connectivity, population structures or phylogenetic relationships can only be analyzed employing DNA-based methods and more genetic markers amplified from available DNA extracts (Selkoe et al., 2016). Therefore, at the beginning of a study, it should be secured that the chosen method for species identification is the suitable one because MALDI-TOF MS can only be used to recognize species and different developmental stages (Laakmann et al., 2013; Bode et al., 2017) while metabarcoding assesses general biodiversity only.

Simultaneous use of individuals for barcoding and analysis in MALDI-TOF MS was shown several times (e.g., Laakmann et al., 2013; Bode et al., 2017; Rossel and Martínez Arbizu, 2018b; Rossel and Martínez Arbizu, 2019) even for small animals such as copepods, implying a possible combination of MALDI-TOF MS and DNA barcoding for biodiversity assessments and monitoring. Voucher specimens can be analyzed by MALDI-TOF MS and sequenced simultaneously to support identifications based on mass spectra by DNA without causing high costs by barcoding of all assessed specimens. Because MALDI-TOF MS provides a discrete species-specific signal, pitfalls for underestimating species diversity are avoided.

Unfortunately, obtaining DNA and mass spectra data from a single microscopic animal while retaining an intact voucher specimen for microscopic examination, has not been developed yet. Therefore, a subsequent comparison between morphological, DNA-based and MS-based methods is, yet, not possible for single specimens after measurements were carried out.

Expenses

Based on our workflow and expenses, here we calculated the costs of single gene barcoding (e.g., COI barcoding), metabarcoding and MALDI-TOF MS (Figure 4) with hypothetical samples containing 96 specimens each. Although cost comparison strongly depends on the country where the study and analyses were carried out, the comparison can give a good impression of the difference in costs raised for the different applied methods. However, one has to be aware, that displayed costs will probably not reflect the actual costs in every country.

FIGURE 4

Figure 4. Comparison of expenses of the different species identification methods for hypothetical samples containing 96 specimens each. Set-ups for 24 and 96 samples were compared for metabarcoding. Besides the factory-delivered matrix for MALDI-TOF MS, cheaper competitive products are available, allowing identification of 9,216 specimens for less than 250€. In actual studies, metabarcoding would analyze 100s and 1000s of specimens within a single sample and would outperform the other techniques in terms of costs. Here, hypothetical samples were used to allow comparisons.

Metabarcoding is barely comparable to the other techniques considering the batch sample workflow. Using unsorted bulk samples, metabarcoding will outperform the other techniques in terms of costs and effort because especially in meiofauna biodiversity assessments, single samples can contain 1000s of specimens of different major taxa and species. However, the research questions and the desired output have to be considered, to justify possible high costs and effort caused by the method applied.

When comparing the two specimen-by-specimen methods, it is obvious that even the most expensive MALDI-TOF approach is still much cheaper than COI barcoding (Figure 4). Working plate-wise using 4 μl factory delivered matrix solution per specimen, one complete plate containing 95 specimens and one spot with bacterial standard (used for calibration) sums to only 40.15€ (0.42€ per specimen). In comparison to this, processing 96 specimens plate wise for COI barcoding including two directional sequencing costs 5.65€ per specimen. This adds up to more than 10-fold the price for MALDI-TOF MS. However, cheaper alternatives for MALDI-TOF MS are available. Competitive matrix can be purchased, reducing the costs per plate to 12.18€ while retaining the same purity of the solution. Less pure matrix can reduce the costs even more, to only 2.53€ per plate (0.03€ per specimen) and first tests showed no loss of signal quality, compared to the factory delivered matrix. This allows the identification of more than 9,000 specimens for only around 250€. Moreover, the volume of matrix should be adjusted to specimen size, thus it often has to be reduced to 3 or 2 μl, cutting the costs to around 187€ or 125€ respectively. Considering that barcoding of voucher specimens only was shown to underestimate true species diversity, MALDI-TOF MS provides a valuable alternative when assessing species diversities specimen-by-specimen because all specimens can be measured without causing high costs.

Conclusion

Metabarcoding is a promising tool to assess general diversity without emphasizing a certain animal group. Here we demonstrated that metabarcoding can detect species occurring in low biomass and abundances. However, providing several replicates from a sampling site is strongly recommended to produce a rich and complete reference library simultaneously, to be used for MOTU assignment for metabarcoding.

Our study shows that log(x + 1) transformation of the data produces multivariate community similarities that highly correlate with the morphological assignment. However, if a precise quantitative analysis of species and specimens is desired, MALDI-TOF MS is superior to metabarcoding as it results in more discrete quantitative results. Also, MALDI-TOF MS can overcome the COI barcoding as it is significantly cheaper and provides, with a good quality library, equally reliable species identifications within a considerably shorter time.

Data Availability Statement

18S sequences are accessible via GenBank. Hellinger transformed data matrix containing all submitted MALDI-TOF MS spectra and respective metadata are available as a data set on dryad (doi: 10.5061/dryad.rxwdbrv46). Raw data can also be accessed via dryad (doi: 10.5061/dryad.rxwdbrv46).

Author Contributions

SR, PM, and SK designed the study. SR and PM carried out the sampling. SR did the morphological identifications and setup of the reference libraries, and measured and analyzed the MALDI-TOF MS data. PM and SK carried out the metabarcoding analyses. SK and PM analyzed the metabarcoding data. All authors contributed to the writing of the manuscript.

Funding

SR was supported by a grant of Graduate School IBR from the Ministry for Science and Culture of Lower Saxony (IBR B7).

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Acknowledgments

This is publication no. 8 of Senckenberg am Meer Proteome Laboratory and publication no. 65 of Senckenberg am Meer Metabarcoding and Molecular Laboratory.

Supplementary Material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fmars.2019.00659/full#supplementary-material

References

Avó, A. P., Daniell, T. J., Neilson, R., Oliveira, S., Branco, J., and Adão, H. (2017). DNA barcoding and morphological identification of benthic nematodes assemblages of estuarine intertidal sediments: advances in molecular tools for biodiversity assessment. Front. Mar. Sci. 4:66.