Early View e07249
Research article
Open Access

Predicting time-at-depth weighted biodiversity patterns for sharks of the North Pacific

Zachary A. Siders

Corresponding Author

Zachary A. Siders

Fisheries and Aquatic Sciences, University of Florida, Gainesville, FL, USA

Contribution: Conceptualization (equal), Data curation (equal), Formal analysis (lead), Funding acquisition (lead), Methodology (lead), Visualization (lead), Writing - original draft (lead), Writing - review & editing (equal)

Search for more papers by this author
Lauren B. Trotta

Lauren B. Trotta

Wildlife Ecology and Conservation, University of Florida, Gainesville, FL, USA

Contribution: Conceptualization (equal), Funding acquisition (supporting), Methodology (supporting), Visualization (supporting), Writing - original draft (supporting), Writing - review & editing (equal)

Search for more papers by this author
William Patrone

William Patrone

School of Aquatic and Fishery Sciences, University of Washington, Seattle, WA, USA

Contribution: Data curation (equal), Formal analysis (supporting), Methodology (supporting), Writing - review & editing (supporting)

Search for more papers by this author
Fabio P. Caltabellotta

Fabio P. Caltabellotta

Coastal Marine Fish Science Unit, Washington Department of Fish and Wildlife, Olympia, WA, USA

Contribution: Conceptualization (supporting), Data curation (equal), Funding acquisition (supporting), Writing - review & editing (supporting)

Search for more papers by this author
Katherine B. Loesser

Katherine B. Loesser

NOAA RESTORE Science Program, Stennis Space Center, MS, USA

Contribution: Data curation (supporting), Formal analysis (supporting), Methodology (supporting), Writing - original draft (supporting), Writing - review & editing (supporting)

Search for more papers by this author
Benjamin Baiser

Benjamin Baiser

Wildlife Ecology and Conservation, University of Florida, Gainesville, FL, USA

Contribution: Conceptualization (supporting), Funding acquisition (supporting), Writing - review & editing (equal)

Search for more papers by this author
First published: 08 April 2024

Abstract

Depth is a fundamental and universal driver of ocean biogeography but it is unclear how the biodiversity patterns of larger, more mobile organisms change as a function of depth. Here, we developed a predictive biogeography model to explore how information of mobile species' depth preferences influence biodiversity patterns. We employed a literature review to collate shark biotelemetry studies and used open-access tools to extract 283 total records from 119 studies of 1133 sharks from 35 species. We then matched field guide reported depth ranges and IUCN habitat associations for each shark species to use as covariates in a hurdle variant of ensemble random forests. We successfully fit this model (R2 = 0.63) to the noisy time-at-depth observations and used it to predict the time budgets of the northeast Pacific shark regional pool (n = 52). We then assessed how occurrence diversity patterns, informed by minimum and maximum depth of occurrence, compared to time-at-depth weighted diversity patterns. Time-at-depth weighted richness was highest between 0 and 25 m and at the upper part of the mesopelagic zone, 250–300 m; resulting in little similarity to common depth or elevational biodiversity patterns while the occurrence-weighted richness pattern was similar to the ‘low-plateau' pattern. In the phylogenetic and functional dimensions of biodiversity and over three different distance metrics, we found strong but haphazard differences between the occurrence- and time-at-depth weighted biodiversity patterns. The strong influence of time budgets on biodiversity led us to conclude that occurrence data alone are likely insufficient or even misleading in terms of the depth-driven biogeographic patterns in the open ocean. Utilizing the increasing amount of time-at-depth information from biotelemetry studies in predictive biogeographic models may be critical for capturing the preferences of pelagic, mobile species occupying the largest biome on the planet.

Introduction

Determinations of community membership and assessments of biodiversity patterns are frequently dependent on defining the ‘local' community (Lawton 1999, Ricklefs 2008) with modern expressions relying on georeferenced species abundance and distribution information (Hortal et al. 2015). In easily accessible ecosystems, capturing diversity patterns may be difficult, but is generally possible given sufficient sampling effort (Lomolino 2001). The open ocean, however, is difficult to access due to its vast scale and the lack of sufficient sampling to capture granular community data results in a Wallacean shortfall (Lomolino 2004). Further, many oceanic species move over some or all of the water column (depending on water depth) on the scale of minutes to hours – divorcing the local community from a particular spatial area at short time scales (Hays 2003, Bandara et al. 2021). This phenomenon is well documented in zooplankton – the primary consumer in ocean food webs – that undertake a diel vertical migration that varies in strength and intensity across the globe (Bandara et al. 2021). This oscillation of a primary prey item cascades upwards (Bollens et al. 2011) resulting in higher-order consumers developing a variety of physiological and behavioral strategies (Hays 2003, Widder 2010, Sutton 2013). Some of these behavioral strategies, such as reversing the daily migration pattern (Andrzejaczek et al. 2019) or ambushing prey along their vertical migration (Bakun 2023), expand the boundaries of the ‘local' community over much of the water column as predators and prey interact (Urmy and Benoit-Bird 2021). Community membership also changes temporally, as many higher-order consumers undertake vast migrations chasing ephemeral hotspots of productivity (Costa et al. 2012). These seasonal (or even multiyear) migrations across latitude and longitude have complicated defining communities in other ecosystems (Lean 2018). Combining these facets results in the distribution of many ocean organisms varying considerably over spatial and temporal scales and impedes defining explicit communities in the open ocean (Hortal et al. 2015). Transitory species also imbue scale-dependency on oceanic biodiversity patterns (Tittensor et al. 2010, Grady et al. 2019) as we are limited in our understanding of the interactions that define oceanic communities.

Distinct from terrestrial systems, the z-axis of the ocean, depth, is a principal and omnipresent driver of biogeographic patterns (Sutton 2013). Increases in depth beget increases in pressure, rapid decreases in temperature and, with near complete loss of light in the top 200 m, rapid declines in productivity (Robinson et al. 2010). These strongly correlated processes likely drive the relationship of decreasing species richness as a function of increasing depth in the vertical biogeography of plankton (Rutherford et al. 1999), marine mollusks (Rex et al. 2005), and benthic-oriented fishes and cephalopods (Macpherson and Duarte 1994, Macpherson 2002, Rosa et al. 2008). An inverted but analogous pattern – where species richness decreases as a function of increasing elevation in terrestrial systems– has been observed in 10 assemblages of bats (out of 20) (McCain 2007), in 23 assemblages of birds (out of 78) (McCain 2009), and in 13 assemblages of reptiles (out of 24) (McCain 2010). Pelagic fishes have shown the same depth–richness relationship (Smith and Brown 2002) as benthic fishes and cephalopods while, in contrast, pelagic cephalopods have relatively the same richness from 0 to 1000 m before richness declines with increasing depth (Rosa et al. 2008). In terrestrial systems, this latter pattern is called the ‘low-plateau' pattern and has also been observed to a lesser degree in bats, birds, and reptiles (McCain 2007, 2009, 2010). Rosa et al. (2008) attribute the discrepancies between the depth–richness relationship in benthic cephalopods and the low-plateau pattern in pelagic cephalopods to the vertical migrations of pelagic cephalopods between the epi- and mesopelagic zones and the associated differences in morphology, locomotion, and behavior.

Many sharks are similar to pelagic cephalopods with the ability to make 1000 m or more movements in short periods of time (minutes to hours) (Andrzejaczek et al. 2019, Munroe et al. 2022). Despite this capability, different sharks species have varied and highly concentrated time budgets within a few hundred meters of the water column (Andrzejaczek et al. 2022) or prefer benthic habitats and therefore exhibit limited vertical movements (Munroe et al. 2022). These time budgets partition the residency of sharks across the water column and can provide insight into the ‘local' shark community. By relaxing the assumption of spatial homogeneity across a species' depth, it is likely that the depth–richness relationship will strongly differ from those patterns based on the more frequently used depth range and mean depth of occurrence (Smith and Brown 2002).

Collecting time budgets for all species in a regional pool to define the time-weighted local community in the water column is a challenge. Typically, time–depth recorders are physically mounted to individual animals to capture biotelemetry ranging from a few hours to months or years (Whitford and Klimley 2019, Watanabe and Papastamatiou 2023). Easily sampled species, generally nearshore, in shallow waters, or those interacted with by a fishery, are targets of biotelemetry while much of the regional pool remains unsampled (Renshaw et al. 2023). The northeast Pacific (NEP) shark regional pool is a useful testing ground for developing a new approach to predicting the vertical biogeography of highly mobile marine megafauna. Many of the species are circumglobal and interact with fisheries, increasing the odds of having biotelemetry records available for them. Additionally, there is a wide diversity of shark functional groups in the region to compare model performance (Siders et al. 2022a). Here, we present a machine-learning model for predicting the time budgets of undersampled species by integrating existing biotelemetry records, habitat association, and depth ranges to train our model on a wide variety of shark species. We then use habitat associations and depth ranges of shark species in the North Pacific to predict the time budget of the regional pool. Finally, we compare the resulting taxonomic, phylogenetic, and functional biodiversity relationships across depth to understand the differences between occurrence-weighted richness patterns using traditional depth ranges and time-budget weighted richness patterns.

Material and methods

Data collection

We used the regional species (n = 52) list for the NEP, 180–255°E and 0–50°N, collated from published species lists as well as the community phylogeny built for the regional pool from Siders et al. (2022a) (Supporting information). We initially collected biotelemetry studies for the regional species pool and congeners that summarized time-at-depth information using a Boolean phrase heuristic in Google Scholar (see Supporting information for details). This set of species was supplemented opportunistically as additional species appeared in the search results. We then extracted each summarization of time-at-depth using the ‘WebPlotDigitizer' (https://apps.automeris.io/wpd/). For each depth bin, we extracted the reported lower and upper bound of the depth bin and calculated the bin midpoint and width as well as recorded the number of individuals per summarization. For species in our search set without biotelemetry, we opportunistically extracted summarizations of abundance or density at depth (see Supporting information for details).

Habitat associations

To further inform our analyses, we collected information on the broad habitat associations of species in our regional pool from the International Union for the Conservation of Nature (IUCN) API (IUCN 2021). In addition, we collected the same data for any supplementary species captured in the time-at-depth extraction phase. From the IUCN habitat classifications, we created new habitat classes of neritic, pelagic, epipelagic, mesopelagic, bathypelagic, benthic–continental slope, benthic–seamount, neritic–coral reefs, neritic–seagrass, neritic–kelp forest, neritic–subtidal, neritic–intertidal, and neritic–estuaries. For each habitat class, we coded the species association as a binary variable. We supplemented this initial habitat association by searching for studies that referenced taxa belonging to one of the habitat classifications and assumed that these references to specific habitat use superseded the IUCN habitat classifications.

Functional traits

We used the set of traits compiled by Siders et al. (2022a) for the regional pool. This set of traits included traits on habitat preference (minimum depth, maximum depth, δ13C stable isotope signature), reproduction (reproductive mode, size at birth, number of offspring, age at maturity for males and females), somatic growth (maximum length, Brody growth coefficient for males and females), diet (δ15N stable isotope signature and standardized diet indices), dentition (tooth counts and crown shape), and lateral anatomical profile (Siders et al. 2022a). We supplemented this set of traits with traits on bioluminescence and homeothermy. For bioluminescence, we collated whether a species possessed bioluminescence and the average photophore diameter (μm) and ventral photophore density (units mm−2) (Claes et al. 2014, Duchatelet et al. 2021, Mallefet et al. 2021). For homeothermy, we collated whether a species exhibited counter-current retia and the number of arteries across lateral cutaneous rete (Carey et al. 1985).

Vertical distribution analysis

Standardization

For each record for each species, we estimated an empirical cumulative distribution function (ECDF) to calculate the accumulation of proportions as a function of depth for each record. We then used the ECDFs to summarize the accumulated proportion of time-at-depth in standardized depth bins and compared across studies and across species. For a set of species with greater than five biotelemetry records, we performed a bootstrapping procedure to determine the variability in the total dissimilarity in the standardized time-at-depth across records within a given species. This bootstrapping procedure drew a random set of summarization records from two to the total number of records for a given species from across studies and calculated the weighted total dissimilarity urn:x-wiley:09067590:media:ecog13131:ecog13131-math-0001 (Eq. 1):
urn:x-wiley:09067590:media:ecog13131:ecog13131-math-0002(1)

as the sum of Euclidean distance between proportion of time in a given depth bin (Sx,iSy,i) weighted for the specific the number of individuals per record (Nx, Ny) and standardized for the total number of individuals between the two records. We then calculated the median and 95% confidence interval of the standardized distance from 100 draws across the range of summarizations for a given species. We repeated the above procedure but subset to records that were of only one individual to capture the dissimilarity across individuals.

Projection of time-at-depth distributions

We developed a hurdle ensemble random forests model (hERF) (Siders et al. 2020) to fit to the observed time-at-depth using as covariates the lower and upper bounds of the depth bin, the minimum and maximum depth of the species from Siders et al. (2022a), and the habitat association matrix. This hERF had two components: a binary component describing whether any time-at-depth occurred in one of the depth bins using a random forests classification and a positive component describing the amount of time in an occupied depth bin using a random forests regression. To generate absences for the binary component, we checked the observed set of depth bins for each record against the standardized depth bins and augmented the observed time-at-depth data for standardized depth bins without observations with zeroes (Supporting information). We used the ‘Ensemble Random Forests' package (https://zsiders.github.io/EnsembleRandomForests/) to generate 100 individual classification random forests (1000 trees per forest, 5 covariates per node) using the presence–absence data for whether a time-at-depth observation occurred in a standardized depth bin (Siders et al. 2020). This algorithm uses downsampling to draw an equal number of presences and absences for each random forests training and test sets (90 and 10% split, respectively; drawn randomly from the whole dataset) in the ensemble (Siders et al. 2020). For the regression random forests, we replicated the ERF procedure without the downsampling (as there is no minority or majority class) using only the time-at-depth observations above zero and applying a logistic transformation. For both components of the hERF, we used the number of individuals per record (standardized to a maximum of 1) as weights in the random forests and for the abundance or density datasets we assumed an n of one as no biotelemetry was conducted and only one summarization of the study event was recorded.

To generate the hERF predictions, we multiplied the ensemble predictions for the binary component and the ensemble predictions for the positive component. We then predicted each hERF to the whole dataset, the individual training sets, and the individual test sets and calculated the respective average R2 value. The average test set R2 is the most indicative of the model performance (Lawson et al. 2014, Siders et al. 2020). We then compared the average hERF logistic-transformed predictions to observations to assess the model fit and residuals. For the predictions of the observations, we assumed the augmented absences were correct. Lastly, we predicted the hERF of the regional species pool using the associated species' covariate values and the standardized depth bins used in the ECDF procedure. We repeated the ECDF procedure on these predictions to compare the species-specific predictions for species with time-at-depth observations. As the hERF does not directly handle the simplex nature of the time-at-depth distribution, we back-transformed all logit-transformed predictions of the regional pool and ensured the resulting predictions were a simplex by dividing by the sum. We used the simplex predictions for all subsequent biodiversity analyses (Supporting information).

Dimensions of biodiversity analyses

Using the hERF predictions of the regional pool, we assessed how dimensions of α diversity changed across the depth gradient. We generated two depth bin by species matrices: 1) occurrence-weighted using the minimum and maximum depths observed for the regional pool species to assign a binary variable for species presence in a depth bin; and 2) time-at-depth weighted using the hERF simplex predictions of the regional pool. To calculate species richness (taxonomic α), we summed across species in a depth bin using the two depth-bin by species matrices. As time-at-depth weighted richness sums proportions, the maximum richness is lower than the occurrence-weighted richness. We then calculated metrics of phylogenetic/functional diversity (PD/FD) (Faith 1992), mean pairwise distance (MPD), and mean nearest taxon distance (MNTD). In combination, these three metrics provide us insight into the variance in total amount of evolutionary or functional diversity (PD/FD), as well as the diversity between species that are either distantly related/functionally dissimilar (MPD) or species that are closely related/functionally similar (MNTD) present in each depth bin. We calculated phylogenetic diversity metrics using the phylogram developed in Siders et al. (2022a). To calculate functional diversity metrics, we created a weighted Gower dissimilarity matrix with the ‘gawdis' package (Bello et al. 2021) using all traits except for the minimum and maximum depth preference, as these traits were used in the hERF for prediction and are confounding with the depth bins. Weights for each trait were calculated using a genetic algorithm to balance the contribution of each trait in the Gower dissimilarity matrix (Bello et al. 2021).

For phylogenetic diversity (PD), we used the phylogram and a weighted Faith's diversity function (https://github.com/NGSwenson/lefse_0.5). For functional diversity, we used the ‘mFD' package (Magneville et al. 2022) to conduct a principal component analysis on the Gower distances, then used the number of principal components that minimized the mean absolute deviation between the two distances (Maire et al. 2015) to calculate functional dispersion (FDis following Laliberté and Legendre 2010). For the phylogenetic and functional MPD and MNTD, we calculated these metrics using both depth bin by species matrices using the ‘picante' package (Kembel et al. 2010), used the abundance weights in ‘picante' to account for the time-at-depth weights, and used the cophenetic distance from the phylogram or the Gower dissimilarities for the corresponding metrics. For phylogenetic and functional dimensions of biodiversity, we calculated the standardized effect size (Z-score) using a taxa shuffling null model, which indicates whether an assemblage of interest is overdispersed (Z-score of > 2), underdispersed (Z-score < −2) or randomly dispersed (Z-score between −2 and 2).

Results

Data collection

Vertical distributions

We acquired 283 biotelemetry records from 119 studies with time-at-depth information from 1133 sharks from 35 species, 24 of which were in our regional pool (n = 52). Of these records, 72 summarized across multiple sharks while 211 records were of an individual shark with a median of two individual records per species and a maximum of 47 for Cetorhinus maximus (basking shark), 42 of which came from Siders et al. (2022b). The median number of records per species was five (min. = 1, max. = 51), the median number of studies per species was 2 (min. = 1, max. = 11), and the median number of individuals per species was 14 (min. = 1, max. = 170) (Supporting information). We also acquired 15 records from five studies of 15 additional species with one record each, bringing the total species with time-at-depth information to 50. Three of these additional 15 species were in our regional pool with a median of 485 sharks captured in the abundance or density at depth estimation (min. = 111, max. = 15 440) (Supporting information).

Functional traits

Of the additional traits we added characterizing bioluminescence and homeothermy, only eight species had bioluminescence while four species had retia. The bioluminescent species were Centroscyllium nigrum (combtooth dogfish), Dalatias licha (kitefin shark), Etmopterus bigelowi (blurred lanternshark), Etmpoterus lucifer (blackbelly lanternshark), Etmopterus pusillus (smooth lanternshark), Euprotomicrus bispinatus (pygmy shark), Isistius brasiliensis (cookiecutter shark), and Zameus squamulosus (velvet dogfish) with a mean photophore diameter of 83.8 μm (48.76–122.40 μm) and a mean ventral photophore density of 22.73 units mm−2 (2.88–57.38 units mm−2) (Claes et al. 2014, Duchatelet et al. 2021, Mallefet et al. 2021). The species with retia were Carcharodon carcharias (great white shark), Isurus oxyrinchus (shortfin mako shark), Isurus paucus (longfin mako shark), and Lamna ditropis (salmon shark) with a mean of 28.7 arteries across the lateral rete (5.3–64.4 arteries) (Carey et al. 1985).

Vertical distribution analysis

Twenty of the 35 species had more than five records and were used to calculate the change in total dissimilarity in the time at depth distribution as a function of the number of records. For species with more than 10 records, the standardized total dissimilarity decayed exponentially with many species reaching asymptotic behavior around 8 to 10 records regardless of whether the records were aggregates or individual records (Supporting information). The binary component of the hERF had a test area-under-the-curve of 0.99, a test root-mean-squared error of 0.16, and a true skill statistic of 0.93 while the positive component had a mean test R2 of 0.71. Across species, model residuals were evenly distributed (Supporting information) and the overall R2, combining the binary and positive components, was 0.63. Covariates associated with the standardized depth bins and maximum depth were the most important variables to both the binary and positive components of the hERF (Supporting information). Association with coral reefs, the continental slope, and estuaries were ranked as the top three habitat association covariates for the binary component while association with the continental slope, seamounts, and the bathypelagic were ranked as the top three for the positive component of the hERF (Supporting information).

In general, the empirical cumulative distributions of each species were fit well by the hERF (Fig. 1). Species that had discrepancies between the observations and the model predictions either were overpredicted to more time in shallower waters than observed like Alopias pelagicus (pelagic thresher shark, Fig. 1A), Centrophorus squamosus (leafscale gulper shark, Fig. 1Q), D. licha (kitefin shark, Fig. 1U), I. paucus (longfin mako shark, Fig. 1AH), and Squalus suckleyi (Pacific spiny dogfish, Fig. 1AX) or overpredicted to spend more time in deeper water than observed like Galeus melastomus (blackmouth catshark, Fig. 1AD), Scymnodon plunketi (plunket shark, Fig. 1AO), Somniosus microcephalus (Greenland shark, Fig. 1AP), and Squalus griffini (northern spiny dogfish, Fig. 1AW). A few species suffered from under- and overprediction like Squalus acanthias (spiny dogfish, Fig. 1AT) and Squalus blainville (longnose spurdog, Fig. 1AU) that stemmed from having habitat associations in subtidal areas or estuaries (the former) or a single habitat association (the latter).

Details are in the caption following the image

The observed accumulated time-at-depth proportions for each species' records (dots) along with the predictions from the hurdle ensemble random forests model (hERF; line). Observed accumulated proportions are shaded with cooler colors for values closer to zero and with warmer colors for values closer to one. Orange lines indicate species belonging to the northeast Pacific regional pool while gray lines indicate species solely used to inform the hERF.

In the projection to the regional pool, most species time-at-depth was within the depth range but clustered in the epi- and mesopelagic zone relative to the possible depth range (Fig. 2). A few species such as Odontaspis noronhai (bigeye sand tiger shark), Pseudotriakis microdon (false catshark), and E. bigelowi had the bulk of the hERF predictions of time-at-depth outside of the minimum and maximum depth range (Fig. 2). An additional set of species had partial overlap between the bulk of the hERF predictions of time-at-depth and the depth range such as Apristurus brunneus (brown catshark) and C. nigrum. There was only one species, Trigonognathus kabeyai (viper dogfish), where the bulk of the hERF predictions of time-at-depth encompassed the documented depth range entirely (Fig. 2).

Details are in the caption following the image

The occurrence depth range informed by field guides (white lines) and predicted time-at-depth (colored, filled regions) from the hurdle ensemble random forests model for each shark species in our regional pool from the northeast Pacific. The phylogram is provided across the top and species are colored by their respective family. The epi-, meso-, and bathypelagic regions are labeled on the gradient-shaded background with darker colors indicating deeper depths and the photic, dysphotic, and aphotic zones are indicated with labeled brackets. The depicted proportion of time-at-depth is indicated by the width of the shaded region (with a maximum width indicating 30%).

Dimensions of biodiversity analyses

Taxonomic diversity strongly differed between the occurrence-weighted assemblage and the time-at-depth weighted assemblage (Fig. 3). Occurrence-weighted richness increased slightly from the surface (n = 39) to 30 m (n = 45), plateaued between 30 m and 250 m (n = 44–46), before declining proportionally with increasing depth (Fig. 3). The time-at-depth weighted richness was highest at the surface and between 10 and 20 m, declined sharply to a local minimum between 125 and 150 m (6% of maximum), increased to a local maximum between 250 and 300 m (57% of maximum), then declined sharply again with little richness below 800 m (1% of maximum) (Fig. 3).

Details are in the caption following the image

Taxonomic diversity as a function of depth for occurrence-weighted (blue) and time-at-depth weighted (orange) shark assemblages. Y-axis indicates depth (m) and grey shading of increasing saturation indicates epi-, meso- and bathypelagic zones.

With the phylogenetic diversity for the occurrence assemblage, PD and MNTD were similar with the highest overdispersion in 800–1500 m and variable but relatively the same amount of dispersion above 400 m (Fig. 4AC). Phylogram MPD for the occurrence assemblage shifted from random at the surface to its maximally underdispersion in 80–100 m above 200 m then had variable overdispersion in 200–1400 m before shifting back to random below 1400 m (Fig. 4B). The time-at-depth weighted assemblage phylogenetic diversity pattern as a function of depth was very similar across PD, MPD, and MNTD. This pattern was similar dispersion at the surface and below 900 m, the lower dispersion in 10–20 m, and the highest dispersion in 150–200 (MPD) or 200–250 m (PD and MNTD). Additionally, MPD had a locality shift to favor more underdispersion relative to PD and MNTD (Fig. 4DF). For MPD, this underdispersion was significant above 75 m as well as below 750 m (Fig. 4E).

Details are in the caption following the image

Biodiversity patterns measured in Z-scores (standardized effect sizes) as a function of depth using occurrence-weighted (A–C and G–I) and time-at-depth weighted (D–F and J–L) phylogenetic diversity (A and D), phylogenetic mean pairwise distance (MPD) (B and E), phylogenetic mean nearest taxon distance (MNTD) (C and F), functional diversity (G and J), functional MPD (H and K), and functional MNTD (I and L). The standardized effect size (Z-score) of each depth bin is indicated by the point color with warmer colors tending toward overdispersion and cooler colors tending toward underdispersion. Random dispersion is indicated by a dashed vertical line while significant over- or underdispersion are indicated by dotted vertical lines. Y-axis indicates depth (m) and grey shading of increasing saturation indicates epi-, meso-, and bathypelagic zones.

Functional diversity metrics did not follow the same patterns as phylogenetic diversity and the magnitude of the shark assemblage's dispersion showed greater variation. For the occurrence and the time-at-depth weighted assemblages, the relationships between depth and FD, MPD, and MNTD differed (Fig. 4GL). Occurrence assemblage FD shifted from random at the surface to its maximum underdispersion in 75–100 m then increasing but variable overdispersion as depth increased with significant overdispersion in 500–700 and 1000–1300 m (Fig. 4G). Functional MPD was variable but increasingly overdispersed as a function of depth with significant overdispersion in 1500–1750 m (Fig. 4H) while MNTD had significant overdispersion above 10 m with increasing randomness as a function of depth (Fig. 4I). The time-at-depth weighted FD, MPD, and MNTD were highly variable above 200 m but were all overdispersed in 0–5 m and generally underdispersed or random in 5–200 m (Fig. 4JL). The exception was high overdispersion in 150–200 m for FD (Fig. 4J) and in 50–75 m for MNTD (Fig. 4L). Below 200 m, all three time-at-depth weighted functional diversity metrics showed increasing underdispersion to roughly 700–900 m then less underdispersion in 1000–1300 m before shifting slowly back to more underdispersion with increasing depth (Fig. 4JL). The only difference between the metrics was the magnitude with the most underdispersion in MPD (Fig. 4K), then MNTD (Fig. 4L), then FD (Fig. 4J).

Discussion

Quantifying the dimensions of biodiversity of highly mobile members of ecological communities can be challenging – even more so, in the open ocean. In the open ocean, species can move easily over extreme environmental gradients in just a few hundred meters of water depth (Andrzejaczek et al. 2019, 2022). Here, we compared traditional occurrence-weighted assemblages against time-at-depth weighted assemblages in sharks of the North Pacific. To generate the time budgets of the regional pool, we built a predictive hurdle ensemble random forests model using time budgets across a suite of shark species. This decision-tree approach allowed the use of relatively simple covariates of habitat association and depth range to train the model, which performed well on the training data and on matching the depth ranges in the regional pool. Predictive models, such as the hERF developed here, are a relatively easy way to leverage and combine existing biotelemetry, habitat affinity, and natural history datasets to inform time budgets and advance the quantification of biogeographic patterns.

We show that simple occurrence data, such as depth ranges, flatten the variance that we see when considering time at depth. Uniform weighting across depth results in shark assemblages appearing to be more homogenous over larger depth ranges than when we account for species' preference, such as with time-at-depth weighting. Most sharks appear, from the time-at-depth distributions, to be specialized in the epipelagic or mesopelagic zones (Fig. 2). There is probably another, smaller bathypelagic community that was missed because these species have not been tagged, adequately sampled, or even described. This underrepresentation in our biotelemetry data and subsequently in the hERF means that prediction for the extreme cases, like deep water sharks, cannot be done reliably. Sampling biases, such as these, are especially important when considering more broadly adopting the approach to predicting biogeography presented herein.

Our predictions of the shark regional pool exhibited the same pattern as Andrzejaczek et al. (2022) empirically observed across some elasmobranch taxa – most species time budgets are heavily constrained within their depth range. As a result, the occurrence-weighted and time-at-depth weighted biodiversity patterns differed markedly as a function of depth. The occurrence-weighted richness pattern as a function of depth is most similar to those observed in pelagic and benthic fishes as well as benthic cephalopods, with a consistent loss of richness with increasing depth. However, there is a very limited low plateau between 30 and 250 m with the highest richness across all depths (Fig. 3). In contrast, the time-at-depth weighted richness pattern as a function of depth is dissimilar to any of the consistently observed patterns in marine systems or in terrestrial systems as a function of elevation (Fig. 3). Two peaks, one in the shallowest portion of the epipelagic and another at the edge of the epi- and mesopelagic zones, are observed in the richness pattern. These appear to be largely driven by the differentiation in depth preferences between Carcharhiniformes, preferring shallow waters, and Squaliformes, preferring the twilight zone (200–1000 m). This differs strongly from when solely considering their depth ranges where these two clades appear to prefer a much wider range of depths.

The differences between the occurrence weighted and time-at-depth weighted biodiversity patterns depended on both the dimension of biodiversity – phylogenetic or functional – as well as the distance metric – phylogenetic or functional distance, mean pairwise distance, and mean nearest taxon distance (Fig. 4). Generally, using occurrence assemblages increases the overdispersion of phylogenetic and functional dimensions of biodiversity at greater depths relative to the time-at-depth weighted assemblage. We also observed greater phylogenetic underdispersion in the epipelagic for the time-at-depth weighted assemblage than in the occurrence assemblage. Across functional diversity metrics, we only observed significant overdispersion for the time-at-depth weighted assemblages in the shallowest depth bin (0–5 m, MPD and MTND) and in 50–75 m (MNTD). Below the epipelagic zone there is a marked difference in FD, MPD and MNTD between the occurrence weighted assemblage, which tended towards overdispersion, and the time-at-depth weighted assemblage, which tended toward underdispersion. For the NEP shark community, these differences across dimensions and metrics are, perhaps, not surprising given the marked dissimilarities between the phylogram and functional dendrograms previously noted for this regional pool (Siders et al. 2022a). With no systematic differences between occurrence-weighted and time-at-depth weighted biodiversity patterns, we cannot translate those patterns generated from common occurrence data to those more structured by species preferences.

The hERF model performed well in predicting the vertical biogeography of sharks in the NEP despite the multitude of physiological, behavioral, and environmental processes influencing the time budgets (Andrzejaczek et al. 2019). While we generally used literature-derived values, we preferentially superseded previous reported depth ranges with recent natural history references and further triaged and amended the raw information of the IUCN habitat association data. For the latter, we relied on experience gained in developing the trait database (Siders et al. 2022a) to identify suspect habitat associations, where we searched the literature for corroborating evidence. We discuss these efforts to emphasize that data scraping alone is insufficient and a combination of data scraping, targeted searching, and expert opinion is needed to develop these datasets. This is exemplified even within our own application where the model failed to capture the time budget fully for certain taxa, such as the Squalus species. Each species within the genus had relatively similar time budgets, with the exception of S. suckleyi, but had different habitat associations. As a result, the model overpredicted time above 200 m for all Squalus, also underpredicted below 500 m for S. blaineville (caused by the sole association with benthic-continental slope habitats) and S. griffini (caused by the subtidal habitat association and benthic habitat association). Thus, the model predicted time budgets for deepwater, data-poor species (e.g. A. brunneus) should be taken with a grain of salt despite the decent hERF model performance overall.

The past decade has seen a rapid expansion of biotelemetry studies on ocean organisms (Renshaw et al. 2023) that provide invaluable insight into how species allocate time (Andrzejaczek et al. 2022) in the pelagic ocean which is largely devoid of physical structure. The correlated, drastic changes in the biogeochemical environment as water depth increases (Robinson et al. 2010) puts extreme physiological and behavioral constraints on species looking to exploit meso- and bathypelagic habitats (Widder 2010, Sutton 2013). Considering these vertical preferences is critically important in determining the boundaries of communities and the macroecological patterns of the largest biome on the planet. Predictive models that can integrate these biotelemetry observations along with habitat and trait information are crucial to expand the capabilities of exploring ocean community ecology and biodiversity to multitude of oceanic species that remain undersampled.

Acknowledgements

– This work was possible with support from University of Florida's Biodiversity Institute, College of Agricultural and Life Sciences, and College of Liberal Arts and Sciences. Special thanks to Itsumi Nakamura for providing raw data for use in this study.

Author contributions

Zachary A. Siders: Conceptualization (equal); Data curation (equal); Formal analysis (lead); Funding acquisition (lead); Methodology (lead); Visualization (lead); Writing – original draft (lead); Writing – review and editing (equal). Lauren B. Trotta: Conceptualization (equal); Funding acquisition (supporting); Methodology (supporting); Visualization (supporting); Writing – original draft (supporting); Writing – review and editing (equal). William Patrone: Data curation (equal); Formal analysis (supporting); Methodology (supporting); Writing – review and editing (supporting). Fabio P. Caltabellota: Conceptualization (supporting); Data curation (equal); Funding acquisition (supporting); Writing – review and editing (supporting). Katherine B. Loesser: Data curation (supporting); Formal analysis (supporting); Methodology (supporting); Writing – original draft (supporting); Writing – review and editing (supporting). Benjamin Baiser: Conceptualization (supporting); Funding acquisition (supporting); Writing – review and editing (equal).

Transparent peer review

The peer review history for this article is available at https://publons.com/publon/10.1111/ecog.07249.

Data availability statement

Data are available from the Dryad Digital Repository: https://doi.org/10.5061/dryad.6hdr7sr7g (Siders et al. 2024).