Coming up short: Identifying substrate and geographic biases in fungal sequence databases

first_imgInsufficient reference database coverage is a widely recognized limitation of molecular ecology approaches which are reliant on database matches for assignment of function or identity. Here, we use data from 65 amplicon high-throughput sequencing (HIS) datasets targeting the internal transcribed spacer (ITS) region of fungal rDNA to identify substrates and geographic areas whose underrepresentation in the available reference databases could have meaningful impact on our ability to draw ecological conclusions. A total of 14 different substrates were investigated. Database representation was particularly poor for the fungal communities found in aquatic (freshwater and marine) and soil ecosystems. Aquatic ecosystems are identified as priority targets for the recovery of novel fungal lineages. A subset of the data representing soil samples with global distribution were used to identify geographic locations and terrestrial biomes with poor database representation. Database coverage was especially poor in tropical, subtropical, and Antarctic latitudes, and the Amazon, Southeast Asia, Australasia, and the Indian subcontinent are identified as priority areas for improving database coverage in fungi. (C) 2018 Elsevier Ltd and British Mycological Society.last_img read more