Predicting the Distribution of Tree Species and Their Biomass in Yangambi Biosphere Using Spatial Interpolation

Background and Objective: Knowledge of the spatial distribution of trees and stands is very important in forest management strategies. This study investigated whether spatial interpolation methods could predict the spatial distribution of tree species and their biomass in a mixed forest of Yangambi Biosphere, Democratic Republic of Congo. Materials and Methods: A 90×90 m grid was installed in a mixed forest, the coordinates of each selected tree were recorded with a GPS and the Diameter at Breast Height (DBH) measured. The 3 biomass was estimated with an allometric equation. Data was transferred to ArcGIS 10.3 software where maps predicting the spatial distribution of tree species and biomass were made using ArcGIS-Geostatistical Analyst Extension. The 7 spatial interpolation methods were tested: Inverse Distance Weighting (IDW), Simple, Ordinary and Universal Kriging (SK, OK and UK), Local Polynomial Interpolation (LPI), Global Polynomial Interpolation (GPI) and Kernel Interpolation (KI). Results: Scorodophloeus zenkeri was the most dominant plant species (31%), followed by Strombosia pustulata (12%) and Microdesmis yafungana (8%). The 3 biomass ranged from 100.32 to 8777.30 kg with a mean value of 2866.70 kg. The coefficient of variation was 72.98% with a standard deviation of 2092.30 kg, suggesting that forest biomass was highly variable. The LPI and GPI methods on the one hand, OK and UK on the other gave similar predictions of tree species. The species spatial distribution verification was nearly consistent with IDW. Conclusion: There is a need to expand the study area later and conduct further investigations to refine the predictions.


INTRODUCTION
The Democratic Republic of Congo (DRC) forests cover an area estimated at 1,280,042.16 km² where there is an immense tropical forest, the second largest on the planet 1 . This forest needs to be studied, preserved and monitored for the benefit of humanity 2 . With regard to three species, forest modelers have developed many static or dynamic models to understand and predict the evolution of trees and stand. Many of these models also incorporate mortality caused by competition between trees 3 . Other types of models relate to various complementary elements: Regeneration, branching or the elaboration of the quality of the wood 4 . Thus, to simulate the evolution of a forest stand, it is often necessary to involve a whole chain of models 3 . The distribution of tree species is intimately associated with ecological, biotic and abiotic factors, therefore changes in these factors can alter the geographic distribution of plant species and the composition of forests 5 . Describing the geographic distribution of plant species relies on understanding the relationships between species and their environment 6 . National inventories represent an important source of information on the geographical distribution of species 7 . However, this information often remains incomplete in view of the immense territory to be covered 8 . Therefore, inventory data may present spatial, temporal or taxonomic and environmental biases 9 . Recent studies have shown that different prediction methods can lead to very divergent results 10 , without implying that one method is no longer true than another one. Rather, it derives from the fact that the different prediction methods are sensitive to the available data and the mathematical functions used 11 . Many of these models also incorporate mortality caused by competition between trees. Other types of models relate to various complementary elements: Regeneration, branching or the elaboration of the quality of the wood. Thus, to simulate the evolution of a forest stand, it is often necessary to involve a whole chain of models 3,6 . In this context, geostatistical methods can represent an effective instrument for filling in the gaps in inventories 12 . Methods for predicting the geographic distribution of species do not seek to describe a realistic process but rather are an approximate description of the ecological niche of a species in space 13,14 . Several authors have emphasized the importance of biomass estimates in the monitoring and management of forest carbon storage 1,5 . Although allometric equations have been the best tools to estimate forest biomass, it has been suggested that in some instances, the estimations are highly uncertain 6,7 . Allometric equations calculate biomass using tree parameters such as Diameter at Breast Height (DBH), height and/or wood density. Although tools for measuring DBH, tree height and/or wood density are readily available, however, obtaining data on these parameters for each individual tree can be a time-consuming process that may require a great deal of manpower. Therefore, randomly taking a few measurements of tree DBH to calculate the biomass and predicting the rest of these parameters can help to quickly identify forest areas of interests in term of tree biomass management. The objective of this study was therefore to determine the most appropriate and efficient spatial interpolation methods for predicting the geographical distribution of tree species and their biomass in the Yangambi Biosphere Reserve.

MATERIALS AND METHODS Study site:
This study was conducted from June to July, 2018 in the Yangambi Biosphere Reserve, Yangambi, Democratic Republic of Congo. The history, climate, floristique composition of this reserve have widely been discussed by Jean de 15 . The study site is shown in Fig. 1.

Vegetation:
The vegetation of the Yangambi Biosphere Reserve is part of the Guineo-Congolese Regional Center of Endemism. Its assessment has shown that there is a diversity of plant formations that can be explained both by the physical environment (the presence of several rivers in particular) and by the influence of man who has altered the habitats at different times. The vegetation of the Yangambi Biosphere Reserve is composed of undisturbed forests, secondary forests, mosaic forests, swamp forests and forest plantations 15   Methodological approach of the study: A 90×90 m grid was installed in Lusambila mixed forest (Yangambi), then the coordinates of each selected tree species were recorded with a Geo-Explorer 7.0 Global Positioning System (GPS) and it's the Diameter at Breast Height (DBH) measured. Tree's name and family were also recorded. Additionally, soil samples were also collected at each tree location with a Koppechy cylinder at 10 cm depth and CO 2 emissions measured using a Vernier CO 2 gas sensor (Vernier, Beaverton, Oregon, United States of America) inserted in a 20 cm height and 15 cm diameter chamber (data not included). The sensor was hooked to LabQuest 2 (version 2.8.4), a standalone data logger with built-in graphing and analysis software 2 (version 2.8.4). The LabQuest 2 data logger was in turn connected to a laptop with Logger Pro 3 software. Soil temperature and moisture were measured using sensors connected to the data logger and the laptop (data omitted). Trees biomass was estimated with an allometric equation which relates the biomass of individual trees to easily obtainable non-destructive measurements, such as Diameter at Breast Height (DBH). The following allometric equation for the moist forest was used: where, Y is biomass in kg/tree, D is DBH in cm and H is height in m.
Data on tree species and biomass were recorded in an Excel file and transferred into ArcGIS 10.3 where interpolated maps were produced using ARCGIS Geostatistical Analyst Extension. The distribution of tree species was predicted using seven geostatistical models: Inverse distance weighing, ordinary, simple and universal kriging, global and local polynomial interpolation and finally the Kernel interpolation methods.
A brief overview of spatial interpolation methods: Spatial interpolation is technique with the capability of producing prediction surfaces and also providing measures of the accuracy of these predictions.
They are based on statistical models that include autocorrelation (statistical relationships among the measured points). They are used for data analysis in areas such as geography, atmospheric sciences, petroleum and mining exploration, environmental analysis, precision agriculture, fish and wildlife studies and many more. The ArcGIS software has an extension called ArcGIS-Geostatistical analyst containing these models. In this study, we tested 7 interpolation methods, namely: Local polynomial interpolation (LPI), Global polynomial interpolation (GPI), Inverse distance weighted (IDW), Simple (SK), Ordinary (OK) and Universal (UK) kriging and Kernel interpolation (KI). Local polynomial interpolation (LPI) method it is a deterministic interpolation method that adapts to many polynomials, each in specified overlapping neighborhoods. A first-order global polynomial fits a single plane through the data, a global second-order polynomial fits a surface with curvature, allowing surfaces to represent valleys, a global third-order polynomial allows two turns and so on. The local polynomial interpolation as well as other methods used in this study were extensively discussed by Kumar 17 .

Floristic composition and specific dominance:
The floristic composition of plant species is given in Fig. 2. It shows that a total of 60 individual trees, divided into 25 species and 35 families were selected ( Table 1). The figure also shows that the species Scorodophloeus zenkeri was the most dominant with 31%, followed by Strombosia pustulata (12%) and Microdesmis yafungana (8%). Cynometra hankei, Petersianthus macrocarpus and Trichilia lanata each share a specific dominance of 5%.    Predictive maps of the potential distribution of plant species: The predictive maps showing the spatial distribution of tree species are given in Fig. 3a-f for the different geostatistical models. These species distribution maps have several applications and the most common include estimating the ecological niche of species. An ecological niche can be described as an area within which a species can survive and reproduce depending on environmental factors which can be indirect or direct. These factors can influence the spatial distribution of species at three levels: (i) Limiting factors, which control the ecophysiology of the species (ii) Disturbances (natural or artificial, for example anthropogenic pressures) and (iii) Resources (food, water).

Inverse distance weighted method:
The spatial distribution of tree species as predicted by the Inverse Distance Weighted (IDW) method is shown in Fig. 3a. The inverse distance weighted method predicted a high probability of the presence of Scorodophloeus zenkeri, Strombosia pustulata and Prioria oxyphylla in the Northwest. In the North-East and South-West, it shows an obvious under-representation of these three species. Furthermore, the map clearly shows a wider potential distribution of Microdesmis yafungana, Guarea thompsonii and Irvingia grandifolia in the Northeast, Petersianthus macrocarpus and Pancovia harmisiana in the Southeast Region.

Global polynomial interpolation method:
The spatial distribution of tree species as predicted by the Global Polynomial Interpolation (GPI) method is shown in Fig. 3b. Overall, Fig. 4

Local polynomial interpolation method:
The spatial distribution of tree species as predicted by the Local Polynomial Interpolation (LPI) method is showen in Fig. 3c. Along with the observation made in Fig. 3b,  of the Global Polynomial Interpolation and the Local Polynomial Interpolation methods for which the 2 species prediction maps were almost identical Fig. 3(b-c). These two methods predicted a clustered columnar distribution of oblique oriented species, with a tendency to minimize the probability of the presence of dominant species such as Scorodophloeus zenkeri and Strombosiopsis tetrandra (Fig. 2). On the other hand, they also tend to predict these species only towards the North-East. It has been reported that when some species are more or less sampled than others, the prediction model may be influenced by the most common species or less represented one 19 . The comparison of potential species distribution maps from geostatistical methods showed that prediction maps produced by the Ordinary Kriging and the Universal Kriging methods were both similar. This similarity shows that they are both based on same statistical processes including autocorrelation, i.e. the statistical relationships between the measured points. The maps of favorable zones (of potential distribution) estimated by these models tended to be the most often optimistic 18 . For the rest, these methods seem to predict more or less correctly the zones of presence in the North-West of the area, a little less in the center and the East. The verification map (not included) of the spatial distribution of species therefore seems consistent with the prediction map of the distribution of species by the Simple Kriging method, more particularly for the species Scorodophloeus zenkeri in almost all geographical directions. This trend confirms the biogeographical history and environmental preferences of this species in these areas. In agreement with the other methods, the prediction maps of the distribution of species by the Kernel Interpolation method and that of Inverse Distance Weighting also seemed to corroborate with the reality of the verification map which shows a predominance of species such as Scorodophloeus zenkeri and Strombosiopsis tetrandra towards the North-West and towards the South and Cynometra hankei towards the East. In addition, among all of the methods tested, the Inverse Distance Weighting method gave rise to predictions centered around measured trees Fig. 3(a-f), while other methods in particular the simple, Universal and Ordinary Kriging methods, the Local and Global Polynomial Interpolation method have shown a tendency to predict areas of potential presence for larger species Fig. 3(b-f). The reasons for these differences can be attributed on the one hand to the profile of the presence of data, on the other hand to the distinct ecological characteristics of the species 20 . The number and size of the grid can also have an impact on the methods and their predictive capacity 12 . An intermediate number of presence cells, distributed in a homogeneous manner and fully included in a restricted distribution area leads to a good prediction of the geographical distribution of species. On the contrary, for an intermediate or large sample, with places of higher prevalence and wide geographical distribution, the models tend to make conservative or erroneous predictions, contrary to a small sample and an intermediate distribution area, allowed to obtain a good prediction of the potential distribution area. Furthermore, as said earlier, several studies have shown that different prediction methods can lead to different results 21 because these methods are sensitive to the available data and the mathematical functions used 14 . Forest biomass showed an East-West distribution with trees with the highest biomass on the eastern portion of the forest. This may be due to environmental factors influencing forest growth. It is recommending that the study be repeated for a larger area (at least 150×150 m). The number of trees to be measured should be increased, for example doubled, but the spatial interpolation methods can be limited to three: Inverse distance weighing, Ordinary kriging and Universal kriging if variograms can be fitted.

CONCLUSION
The present study aimed to determine the most appropriate and efficient methods for the prediction of the geographical distribution of tree species and their biomass in the Yangambi Biosphere Reserve. To achieve this, a systematic inventory of the selected species was carried out using Global Positioning System (GPS) and tree parameters measurements. From these data, the prediction maps were made with seven spatial interpolation methods. Although, these models are promising, it would be desirable that this study be repeated over time and in several places to determine the best models to be used in our region/country for a better understanding of the spatial distribution of forest tree species and their biomass.

SIGNIFICANCE STATEMENT
Knowledge of tree species distribution and their biomass estimates is critical for monitoring forest carbon storage and for strategic forest management. Unfortunately, measuring tree parameters to obtain such information can be a time-consuming process that may require a great deal of manpower. Therefore, randomly taking a few measurements of tree parameters to calculate the biomass and predicting the rest of these parameters can help to quickly identify forest areas of interest in terms of tree biomass management. Therefore, this study provides an opportunity to quickly obtain the needed information to improve forest management operations.