Introduction

The maximum degree of competition that forest populations of a species can sustain is described by the principle of self-thinning (Yoda et al. 1963; Westoby 1984). This rule, a power function linearised by a logarithmic transformation, states that environmental resources can satisfy only a limited number of trees within a stand, and this number grows progressively smaller as tree age and size increase. The proximity of a population to this asymptotic boundary indicates the intensity of intraspecific competition in a monospecific population. After the boundary is exceeded, competition between individuals begins and, in the absence of disturbance, growth trends of dominated individuals progressively slow in favour of more competitive species. This process leads the population to a maximum number of plants of a given size that can coexist within a given unit area of land (Liira et al. 2011; Vospernik and Sterba 2015).

Maximum biomass stocking is a fundamental indicator to balance forest management. Growth trends, timber quality and provision of ecosystem services from forests are highly influenced by tree density and competition for resources (Solomon and Zhang 2002; Corona 2015; Mason and Connolly 2016; Marchi et al. 2018). Competition indices ofplant interactions (Pommerening and Särkkä 2013; Cabon et al. 2018) or allometric relationships (Anfodillo et al. 2013; Marchi et al. 2017a, b) are valuable tools that support decision-makers. Forest management options that promote sustainable use (Fabbio et al. 2018) can reduce competition between trees and aid selection of preferred trajectories for forest stands in view of predicted climate change effects (Wang et al. 1998; Ray et al. 2017). The stand density index (SDI), widely accepted as a powerful tool for evaluating growth sustainability of even-aged forest systems, was proposed by Reineke (1933) for tree species in the United States and later was adopted in many other countries (Pretzsch and Biber 2005; Shaw 2006; Castagneri et al. 2008; Poschenrieder et al. 2018). SDI is calculated as:

$${\text{SDI}} = {\text{N}} \times \left( {\frac{{{\text{QMD}} }}{a}} \right)^{k} ,$$
(1)

where N is tree density, QMD is the quadratic mean diameter, a is a constant whose value can be 10 or 25 according to the use of imperial or metric measuring system, respectively, and k is a coefficient. The biological meaning of this index is the maximum number of trees per unit area a forest population (stand) can sustain for a pre-determined QMD. For instance, an SDI of 400 means that 400 trees per hectare are expected when the QMD of the stand is 25 cm. Although originally implemented for monospecific, even-aged stands, generalized use in uneven-aged stands and in multi-species stands has often been proposed (Rivoire and Le Moguedec 2012). The core of the function (1) is represented by the coefficient k. The value of this parameter is the relationship between the total number of trees per unit area (N) and the mean tree volume or mass of the stand. This rule is parameterized for forests as the stand density model where N is related to the QMD of the stand. Even when approximated with a negative exponential function (Fonseca and Duarte 2017), this relationship follows a power law (PWL), which represents the “pure” and mathematically-corrected shape of an N versus QMD relationship that is expressed as:

$${\text{N}} = \beta \times {\text{QMD}} ^{k} ,$$
(2)

when QMD is either large or small, being asymptotic to zero and +∞, respectively. The use of PWL is generally dropped in favour of logarithmic transformation of the data fitted using an ordinary least squares (OLS) function (Reineke 1933; Pretzsch and Biber 2005; Shaw 2006; Ge et al. 2017). In this case, Eq. (2) becomes:

$$\log {\text{N}} = { \log } \beta + k \times \log {\text{QMD}} .$$
(3)

This basic equation, first used by Reineke (1933), enabled the analysis of the size–density relationship for many forest species in both pure and mixed stands. Using Eq. 3, Reineke (1933) derived a slope (k coefficient) equal to − 1.605 that was consistent across 15 datasets representing 14 species (13 coniferous) and was identical for 12 species. Although the slope was taken as constant, the intercept term β was considered to be species-specific. This rule was supported by Yoda et al. (1963) but researchers in Europe have often questioned it (Pretzsch and Biber 2005; Vacchiano 2005; Castagneri et al. 2008). Recent studies demonstrated the need to evaluate the slope of plotted curves according to site quality (Ge et al. 2017), and biases could result from the log-transformation of the data (Smith 1993), and questions have been raised on its biological validity (Packard 2014). Despite this, the use of log-transformed data versus nonlinear regression for analysing biological power laws remains a common approach. Other methodologies have been tested for estimating coefficients, including stochastic frontier functions (Zhang et al. 2005; Weiskittel et al. 2009), reduced major axis regression (Solomon and Zhang 2002; Zhang et al. 2005), quantile regression (Vospernik and Sterba 2015; Ducey et al. 2017), bisector regression (Newton 2006), hierarchical Bayesian models (Zhang et al. 2015) and mixed modelling (Zhang et al. 2005).

The aims of this study were to evaluate the performance of OLS and PWL methods for estimating k in SDI calculations and to analyse the discrepancies between them. Comparison between these two methods for stand density model fitting is based on simulated data with a known degree of correlation between N and QMD. The subsequent stand density management trajectories that were generated were evaluated according to the results of these comparisons.

Materials and methods

To analyse OLS and PWL robustness in coefficient estimation, I generated an artificial dataset composed of 100,000 replicates of 200 records each. The aim was to simulate empirical records derived from a hypothetical distribution of monitored stands with a known degree of correlation between N and QMD. A five-step procedure was followed for this simulation (Fig. 1).

Fig. 1
figure 1

Flowchart of the simulation process

First, a “perfect” dataset with nonparametric correlation between N and QMD of − 1.0 was built with the normally distributed dependent variable N by extracting the 200 records representing the total number of trees per hectare (N) from a normal distribution with mean = 2700 treesha−1 and a standard deviation = 685 treesha−1. Those starting values were judged as adequate to obtain a broad range of simulated stands of density from 300 (mature stand) to 5000 (young high forest stand before self-thinning starts) and a theoretical SDI around 680. Then, the corresponding quadratic mean diameter (QMD) for each of the 200 normally distributed N values was calculated with the following equation, derived from Eq. (2) and assuming k = − 1.605 and β = 1.2·105:

$${\text{QMD}} = \sqrt[k]{{\frac{{\text{N}}}{\beta }}}.$$
(4)

This first simulated dataset represented the hypothetical and perfect situation of a forest stand in self-thinning mode. Afterward, this original vector of QMD values (200 records) was iteratively modified to generate 99,999 alternative artificial stands. A series made by 200 random multipliers was then generated 99,999 times, to increase or reduce QMD with an artificial degree of noise. Such multipliers were derived from a second normal distribution with mean = 1 and with a variable and randomly generated standard deviation between 0.01 and 0.3. As a result, the Spearman correlation coefficient (Spearman 1987) between the original vector N and the 100,000 artificial vectors of QMD values ranged from − 1.0 to − 0.2 (Fig. 2). The main idea behind this operation was to control the degree of correlation between N and QMD to determine whether this information might be used as ancillary data to evaluate the coefficients estimated by the two methods. Summary statistics of 100,000 generated plots are reported in Table 1.

Fig. 2
figure 2

Histogram of Spearman correlation coefficients between the generated number of trees per hectare (N), and the 100,000 quadratic mean diameter (QMD) simulations

Table 1 Summary statistics of the 100,000 randomly generated forest monitoring plots

With the generated and log-transformed data, the stand density model of the simulated 100,000 records was fitted using the classical linear ordinary least squares (OLS) fit and a power law was fitted using the Gauss–Newton algorithm (PWL). The degree of equivalence between the estimated β and k coefficients was evaluated using a simple linear model analysis (expected results were slope ≠ 1 and intercept ≠ 0). The difference between coefficients, i.e. k or β estimated by PWL minus k or β estimated by OLS, was also evaluated in light of the correlation between N and QMD in each of the simulated plots. A parametric ANOVA based on correlation coefficient classes was finally performed to assess differences within the estimated k values using the Duncan test post hoc to identify possible groups. The degree of correlation between N and QMD was the factor targeted for evaluation, and eight classes were drawn dividing the dataset into eight groups. Every class was defined including all the plots with a ρ value between x and x + 0.1 (e.g., class 7 included all the records with − 0.4 ≤ ρ < − 0.3, while class 6 was − 0.5 ≤ ρ < − 0.4).

Results

Given the nature of generated datasets, comparison between fitted models was possible in the absence of reference values. Indeed, the original k and β values were assumed to be starting points that were only calculable with the initial perfect dataset that contained no random noise. Based on this assumption, the introduced changes to the QMD vector brought the two models to calculate a wide range of k and β values. The k coefficient for OLS ranged from − 1.61 to − 0.11 while β ranged from 8·103 to 1.3·105. Similarly, with PWL, k ranged from − 1.62 to − 0.08, while β ranged from 8.5·103 to 1.4·105. With the linear model analysis, a common trend resulted between the estimated k and β values. Models appeared to be similar but with very high (− 1.0 ≤ ρ < − 0.8) and very low correlations (− 0.3 < ρ) as shown in Fig. 3. The OLS model generally estimated higher k-values (Fig. 3, left side) and lower β values (Fig. 3, right side) than did PWL, especially when correlation coefficients ranged from − 0.75 to − 0.4.

Fig. 3
figure 3

Relationship between k (left) and β (right) coefficients calculated by OLS and PWL models in each of the 100,000 simulated plots. Each dot represents a single record and is coloured according to the correlation class between N and QMD. N = number of trees/ha; OLS = ordinary least squares; PWL = power law; QMD = quadratic mean diameter of the stand β

The relationship between the difference ink estimates from OLS versus PWL and the correlation between N and QMS in each artificial plot is shown in Fig. 4. Here, a simple local polynomial regression fitting was also added as a trend line and to characterize the behaviour of the phenomenon. Concerning k (on the left), the core of the stand density index calculation, the difference increased quickly, when the correlation coefficient was between − 1.0 and − 0.75, becoming almost flat at − 0.6. Then the difference decreased, becoming smaller until correlation values exceeded − 0.3. The difference in calculated maximum SDI values was also evaluated and is reported in the right panel of Fig. 4.

Fig. 4
figure 4

Relationship between the difference in k (left) and maximum SDI (right) estimated with ordinary least squares and power law models in each of the 100,000 simulated plots (y-axis) and the correlation between N and quadratic mean diameter of the stand (QMD) in each (x-axes). Each dot represents a single record and is coloured according to the correlation class between N and QMD. The black line is a local polynomial regression model and describes the average trend

Seven statistically different correlation groups were detected, including border crossing classes (e.g., “ab”, “bc” and “abc”) (Table 2). The smallest differences supported by statistical evidence in k estimation were detected for the class 8 (− 0.3 ≤ ρ), class 7 (− 0.4 ≤ ρ < − 0.3) and class 6 (− 0.5 ≤ ρ < − 0.4), all of which differed from each other and from the other groups. Then class 1 (− 1.0 ≤ ρ < − 0.9) was intermediate between classes 6 and 2 (− 0.9 ≤ ρ < –0.8), 3 (− 0.8 ≤ ρ < − 0.7) and 4 (− 0.7 ≤ ρ < − 0.6), and was ranked with an average difference in k between 0.2769 and 0.2968. An “admixture zone” was comprised of class 2 (“abc” group) and classes 3 and 4 (“ab” group) where differences were less pronounced. The highest values were recorded for class 5 (− 0.6 ≤ ρ < − 0.5) with an average difference around 0.3. Class 5, which might also represents the most common case in empirical data, was used to generate Fig. 5, where a simulation of an elementary stocking chart was drawn with two SDI values (1000 and 500) and two possible k values differing from ± 0.3. As a result, a discrepancy between the two lines was evident (Fig. 5).

Table 2 Results of Duncan test where the difference between Power Law and Ordinary Least Squares in k estimation was statistically evaluated
Fig. 5
figure 5

SDI curves calculated with different k values for the same N and QMD dataset; k values here were adopted according to detected difference for class 5 (± 0.03). N = number of trees/ha; QMD = quadratic mean diameter of the stand; SDI = stand density index

Discussion

The degree of equivalence between OLS and PWL coefficients were highly dependent on the “quality” of the data, i.e. on the degree of random noise during QMD and N calculation for each plot. Even if logarithmic transformation is a primary tool for OLS linear regression of power law relationships (Anfodillo et al. 2013), the results showed that this mathematical adjustment should be carefully evaluated when the primary goal is to study the model’s coefficients rather than its predicted values. Transformation might introduce biases into the fitting procedure, resulting in quite different values for k and β. This bias could then directly be transmitted to the stand density index calculation where k is the only coefficient.

Field data collection often is the most expensive component of research projects but is necessary to obtain high quality data. Similarly, the contributions of robust models, analytical procedures, and statistics must not be underestimated (Fassnacht et al. 2014; Ferrara et al. 2017; Marchi et al. 2017a, b). Important adjustments are sometimes used to handle biases introduced to datasets, especially when using inverse functions (Smith 1993). For SDI calculation, the use of a log-transformed version of the canonical SDI formula, Eq. (1), cannot be a possible solution. This would be true even if Eq. (1) could be written as:

$$\log {\text{SDI}} = \log {\text{N}} + k \times { \log }\left( {\frac{{{\text{QMD}} }}{a}} \right).$$
(5)

Using Eq. (5) with a value for k estimated from Eq. (3) is worthy of note. An additional test highlighted that the difference in SDI curves shown in Fig. 5 remains. It is well known that the logarithmic transformation of an arithmetic dataset often results in slightly biased estimates when values are predicted and transformed back to arithmetic units from OLS (Baskerville 1972; Newman 1993; Packard 2013). This issue is mainly due to an estimation based on a geometric mean of the dependent variable rather than on the arithmetic mean at that value of the independent variable (Smith 1993).

Accurate determination of the self-thinning trajectory for any population is a difficult task, whatever the data source and for many biological issues. Despite taking the appropriate analytical steps, some problems are inherent in the data where the actual area of a sampled stand that is in self-thinning mode is often poorly defined. Indeed, despite the intensity of sampling efforts, only some stands are actually in self-thinning mode (Weiskittel et al. 2009; Vospernik and Sterba 2015). Pests, insects, disease, and wind storms can reduce tree densities more quickly than natural mortality (Ray et al. 2017). Forests are already adapting to a changing climate with a plastic reaction across different ecological regions (O’Neill et al. 2008). Many forest tree species are still growing under different ecological regimes than those observed in the past (Pecchi et al. 2019), shaping their spatial distribution and influencing their ecological dynamics. This could generate unexpected stresses which might alter the canonical self-thinning rule. In all the cases, it is necessary to monitor forest resources carefully as well as to adjust statistical analyses and mathematical models that are insensitive to observations in under-stocked conditions. Although some efforts have been made to address these problems (Bi et al. 2000), limitations still persist. Stochastic natural impacts to stands might be represented by introduction of random noise to the artificial dataset. Most recommended analytical approaches would suggest cleaning the dataset prior to modelling by removing the non-asymptotic size-density values from the data sets. However, this issue is irrelevant to this study where the equivalence between methods was tested rather than the ability of a method to yield a real value.

Concerning the use of classic modelling techniques (i.e., OLS or PWL) on longitudinal studies such as SDI calculation from long-term monitoring plots data, a possible solution might call for mixed models, linear and nonlinear. Such techniques, which uses the restricted maximum likelihood algorithm, are known to be much more reliable than OLS and the Gauss–Newton fitting procedure when autocorrelated data and large datasets are analysed, such as those derived from long-term forest time series. Some authors have tested this possibility (Solomon and Zhang 2002; Pourmajidian et al. 2010). In the current case, an interesting solution could include the correlation class as a random term with the QMD as a fixed effect. In this case, the mixed models “gain” could refer to the reduction of mean squared error from estimating coefficients rather than from the use of separate models. However, possible biases in k estimation should also be evaluated as stressed by our results.

As regards possible solutions for the detected problem, even if a simple mathematical correction could be thought to be applied, no general rules could be provided. The bias correction for log transformation is generally simple when data are available. Anyway, this correction is aimed to adjust predicted values and not for coefficients. In addition, there is a chance, probably quite high, that the findings will not hold, meaning they are as they are just because of the bias. In this view, the use of PWL is preferred, with a logarithmic transformation just in case of violation of parametric assumptions (non-normality of data) to deal with this.

Conclusions

Understanding long-term dynamics is fundamental for sustainable forest management. The stand density model is a well-known way to evaluate stand loading and to derive the maximum stand density index in even-aged forests. Even if the 100,000 simulated plots could include a wide spectrum of study cases, empirical results from long-term monitoring networks are necessary to support the evidences obtained via simulation. In conclusion, given the power law as the “natural” structure of the N versus QMD relationship, the nonlinear method is preferred, thus avoiding the logarithmic transformation.