Estimating and Interpreting Fine-Scale Gridded Population Using Random Forest Regression and Multisource Data

Yun Zhou; Mingguo Ma; Kaifang Shi; Zhenyu Peng

doi:10.3390/ijgi9060369

Estimating and Interpreting Fine-Scale Gridded Population Using Random Forest Regression and Multisource Data

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi9060369 ◽

2020 ◽

Vol 9 (6) ◽

pp. 369

Author(s):

Yun Zhou ◽

Mingguo Ma ◽

Kaifang Shi ◽

Zhenyu Peng

Keyword(s):

Random Forest ◽

Population Distribution ◽

Population Data ◽

Estimation Accuracy ◽

Chinese Government ◽

Random Forest Regression ◽

Regression Techniques ◽

Nighttime Light ◽

Fine Resolution ◽

Impervious Areas

Gridded population results at a fine resolution are important for optimizing the allocation of resources and researching population migration. For example, the data are crucial for epidemic control and natural disaster relief. In this study, the random forest model was applied to multisource data to estimate the population distribution in impervious areas at a 30 m spatial resolution in Chongqing, Southwest China. The community population data from the Chinese government were used to validate the estimation accuracy. Compared with the other regression techniques, the random forest regression method produced more accurate results (R2 = 0.7469, RMSE = 2785.04 and p < 0.01). The points of interest (POIs) data played a more important role in the population estimation than the nighttime light images and natural topographical data, particularly in urban settings. Our results support the wide application of our method in mapping densely populated cities in China and other countries with similar characteristics.

Download Full-text

Early Classification Method for US Corn and Soybean by Incorporating MODIS-Estimated Phenological Data and Historical Classification Maps in Random-Forest Regression Algorithm

Photogrammetric Engineering & Remote Sensing ◽

10.14358/pers.21-00003r2 ◽

2021 ◽

Vol 87 (10) ◽

pp. 747-758

Author(s):

Toshihiro Sakamoto

Keyword(s):

Random Forest ◽

Classification Accuracy ◽

Classification Method ◽

Estimation Accuracy ◽

Random Forest Regression ◽

Crop Phenology ◽

Phenological Data ◽

Mixed Pixel ◽

Crop Classification ◽

Emergence Date

An early crop classification method is functionally required in a near-real-time crop-yield prediction system, especially for upland crops. This study proposes methods to estimate the mixed-pixel ratio of corn, soybean, and other classes within a low-resolution MODIS pixel by coupling MODIS-derived crop phenology information and the past Cropland Data Layer in a random-forest regression algorithm. Verification of the classification accuracy was conducted for the Midwestern United States. The following conclusions are drawn: The use of the random-forest algorithm is effective in estimating the mixed-pixel ratio, which leads to stable classification accuracy; the fusion of historical data and MODIS-derived crop phenology information provides much better crop classification accuracy than when these are used individually; and the input of a longer MODIS data period can improve classification accuracy, especially after day of year 279, because of improved estimation accuracy for the soybean emergence date.

Download Full-text

Systematic Framework to Predict Early-Stage Liver Carcinoma Using Hybrid of Feature Selection Techniques and Regression Techniques

Complexity ◽

10.1155/2022/7816200 ◽

2022 ◽

Vol 2022 ◽

pp. 1-11

Author(s):

Marium Mehmood ◽

Nasser Alshammari ◽

Saad Awadh Alanazi ◽

Fahad Ahmad

Keyword(s):

Feature Selection ◽

Random Forest ◽

Liver Diseases ◽

Early Stage ◽

Support Vector ◽

Liver Carcinoma ◽

Random Forest Regression ◽

Soft Computing Techniques ◽

Regression Algorithms ◽

Regression Techniques

The liver is the human body’s mandatory organ, but detecting liver disease at an early stage is very difficult due to the hiddenness of symptoms. Liver diseases may cause loss of energy or weakness when some irregularities in the working of the liver get visible. Cancer is one of the most common diseases of the liver and also the most fatal of all. Uncontrolled growth of harmful cells is developed inside the liver. If diagnosed late, it may cause death. Treatment of liver diseases at an early stage is, therefore, an important issue as is designing a model to diagnose early disease. Firstly, an appropriate feature should be identified which plays a more significant part in the detection of liver cancer at an early stage. Therefore, it is essential to extract some essential features from thousands of unwanted features. So, these features will be mined using data mining and soft computing techniques. These techniques give optimized results that will be helpful in disease diagnosis at an early stage. In these techniques, we use feature selection methods to reduce the dataset’s feature, which include Filter, Wrapper, and Embedded methods. Different Regression algorithms are then applied to these methods individually to evaluate the result. Regression algorithms include Linear Regression, Ridge Regression, LASSO Regression, Support Vector Regression, Decision Tree Regression, Multilayer Perceptron Regression, and Random Forest Regression. Based on the accuracy and error rates generated by these Regression algorithms, we have evaluated our results. The result shows that Random Forest Regression with the Wrapper Method from all the deployed Regression techniques is the best and gives the highest R2-Score of 0.8923 and lowest MSE of 0.0618.

Download Full-text

The Random Forest-Based Method of Fine-Resolution Population Spatialization by Using the International Space Station Nighttime Photography and Social Sensing Data

Remote Sensing ◽

10.3390/rs10101650 ◽

2018 ◽

Vol 10 (10) ◽

pp. 1650 ◽

Cited By ~ 16

Author(s):

Kangning Li ◽

Yunhao Chen ◽

Ying Li

Keyword(s):

Random Forest ◽

International Space Station ◽

Population Distribution ◽

Space Station ◽

Small Scale ◽

International Space ◽

Point Of Interest ◽

Scale Population ◽

Fine Resolution ◽

Population Mapping

Despite the importance of high-resolution population distribution in urban planning, disaster prevention and response, region economic development, and improvement of urban habitant environment, traditional urban investigations mainly focused on large-scale population spatialization by using coarse-resolution nighttime light (NTL) while few efforts were made to fine-resolution population mapping. To address problems of generating small-scale population distribution, this paper proposed a method based on the Random Forest Regression model to spatialize a 25 m population from the International Space Station (ISS) photography and urban function zones generated from social sensing data—point-of-interest (POI). There were three main steps, namely HSL (hue saturation lightness) transformation and saturation calibration of ISS, generating functional-zone maps based on point-of-interest, and spatializing population based on the Random Forest model. After accuracy assessments by comparing with WorldPop, the proposed method was validated as a qualified method to generate fine-resolution population spatial maps. In the discussion, this paper suggested that without help of auxiliary data, NTL cannot be directly employed as a population indicator at small scale. The Variable Importance Measure of the RF model confirmed the correlation between features and population and further demonstrated that urban functions performed better than LULC (Land Use and Land Cover) in small-scale population mapping. Urban height was also shown to improve the performance of population disaggregation due to its compensation of building volume. To sum up, this proposed method showed great potential to disaggregate fine-resolution population and other urban socio-economic attributes.

Download Full-text

Compression Strength Prediction Using Machine Learning Techniques

International Journal of Advanced Trends in Computer Science and Engineering ◽

10.30534/ijatcse/2021/431012021 ◽

2021 ◽

Vol 10 (1) ◽

pp. 301-307

Keyword(s):

Random Forest ◽

Compression Strength ◽

Machine Learning Techniques ◽

Support Vector ◽

Mathematical Relationship ◽

Random Forest Regression ◽

Learning Techniques ◽

Regression Techniques ◽

Compressive Strength Of Concrete ◽

Advanced Computing

The advanced computing techniques and its applications on other engineering disciplines accelerated the different aspects and phases in engineering process. Nowadays there are so many computer aided methods widely used in civil engineering domain. The mathematical relationship between ratios of different concrete components and other influencing factors with its compression strength need to be analyzed for different engineering needs. This paper aims to develop a mathematical relationship after analyzing the above factors and to foresee the compressive strength of concrete by applying various regression techniques such as linear regression, support vector regression, decision tree regression and random forest regression on assumeddata set., It was found that the accuracy of the random forest regression was considerable as per the result after applying the various regression techniques.

Download Full-text

An Improved Index for Urban Population Distribution Mapping Based on Nighttime Lights (DMSP-OLS) Data: An Experiment in Riyadh Province, Saudi Arabia

Remote Sensing ◽

10.3390/rs13061171 ◽

2021 ◽

Vol 13 (6) ◽

pp. 1171

Author(s):

Mohammed Alahmadi ◽

Shawky Mansour ◽

David Martin ◽

Peter Atkinson

Keyword(s):

Spatial Information ◽

Population Distribution ◽

Population Data ◽

Mean Relative Error ◽

Nighttime Lights ◽

Bare Land ◽

Distribution Mapping ◽

Nighttime Light ◽

Saturation Effects ◽

Population Mapping

Knowledge of the spatial pattern of the population is important. Census population data provide insufficient spatial information because they are released only for large geographic areas. Nighttime light (NTL) data have been utilized widely as an effective proxy for population mapping. However, the well-reported challenges of pixel overglow and saturation influence the applicability of the Defense Meteorological Program Operational Line-Scan System (DMSP-OLS) for accurate population mapping. This paper integrates three remotely sensed information sources, DMSP-OLS, vegetation, and bare land areas, to develop a novel index called the Vegetation-Bare Adjusted NTL Index (VBANTLI) to overcome the uncertainties in the DMSP-OLS data. The VBANTLI was applied to Riyadh province to downscale governorate-level census population for 2004 and 2010 to a gridded surface of 1 km resolution. The experimental results confirmed that the VBANTLI significantly reduced the overglow and saturation effects compared to widely applied indices such as the Human Settlement Index (HSI), Vegetation Adjusted Normalized Urban Index (VANUI), and radiance-calibrated NTL (RCNTL). The correlation coefficient between the census population and the RCNTL (R = 0.99) and VBANTLI (R = 0.98) was larger than for the HSI (R = 0.14) and VANUI (R = 0.81) products. In addition, Model 5 (VBANTLI) was the most accurate model with R2 and mean relative error (MRE) values of 0.95% and 37%, respectively.

Download Full-text

Estimation of the erodibility of treated unsaturated lateritic soil using support vector machine-polynomial and -radial basis function and random forest regression techniques

Cleaner Materials ◽

10.1016/j.clema.2021.100039 ◽

2022 ◽

Vol 3 ◽

pp. 100039

Author(s):

Kennedy C. Onyelowe ◽

Tammineni Gnananandarao ◽

Ahmed M. Ebid

Keyword(s):

Support Vector Machine ◽

Random Forest ◽

Radial Basis Function ◽

Basis Function ◽

Lateritic Soil ◽

Support Vector ◽

Random Forest Regression ◽

Radial Basis ◽

Regression Techniques

Download Full-text

Estimation of aboveground biomass in North China using Sentinel-1 and 2 datasets

10.5194/egusphere-egu2020-6548 ◽

2020 ◽

Author(s):

Yueting Wang ◽

Xiaoli Zhang

Keyword(s):

Time Series ◽

Random Forest ◽

Aboveground Biomass ◽

North China ◽

Vegetation Indices ◽

Estimation Accuracy ◽

Biophysical Parameters ◽

Random Forest Regression ◽

Multispectral Data ◽

Variables Selection

<p>Forest aboveground biomass (AGB) plays an important role in measuring forest carbon reserves. Accurate mapping AGB is important for monitoring carbon stocks and will contribute to achieve the goal of sustainable development. In this study, we explored the potential of mapping AGB in north China using a three-year monthly time series of Senitinel-1 (S1) and Sentinel-2 (S2) data. The backscattering and indices of SAR S1 combined with spectral reflectance, vegetation indices and biophysical parameters from multispectral S2 imagery were evaluated for AGB prediction in a Random Forest regression.&#160;Three scenarios were conducted with different datasets to determine:&#160;(1) the potential of using S1 and S2 to estimate AGB, (2)&#160;optimal variables selection for AGB mapping, (3)&#160;contribution of time series datasets to improving the accuracy of AGB mapping. Random forest regression was used to develop forest AGB estimation models, which was divided into three types of modeling using only S1, only S2, and a combination of S1 and S2. Compared to S1 (RMSE&#160;= 65.7 Mg/ha), S2 achieved better prediction accuracy (RMSE = 58.4 Mg/ha), although the combination of S1 and S2 time series datasets estimated&#160;the best AGB results (RMSE&#160;= 42.3 Mg/ha).&#160;The research implied that incorporation of SAR and multispetral data considerably improved AGB mapping performance when compared with the use of SAR or multispectral data alone.&#160;This proposed approach provides a new insight in improving the estimation accuracy of forest AGB in north&#160;China.</p>

Download Full-text

Modeling Spatiotemporal Population Changes by Integrating DMSP-OLS and NPP-VIIRS Nighttime Light Data in Chongqing, China

Remote Sensing ◽

10.3390/rs13020284 ◽

2021 ◽

Vol 13 (2) ◽

pp. 284

Author(s):

Dan Lu ◽

Yahui Wang ◽

Qingyuan Yang ◽

Kangchuan Su ◽

Haozhe Zhang ◽

...

Keyword(s):

Spatial Distribution ◽

Relative Error ◽

Urban Areas ◽

Large Scale ◽

Population Distribution ◽

Spatial Optimization ◽

Distribution Data ◽

Mountainous Areas ◽

Mean Relative Error ◽

Nighttime Light

The sustained growth of non-farm wages has led to large-scale migration of rural population to cities in China, especially in mountainous areas. It is of great significance to study the spatial and temporal pattern of population migration mentioned above for guiding population spatial optimization and the effective supply of public services in the mountainous areas. Here, we determined the spatiotemporal evolution of population in the Chongqing municipality of China from 2000–2018 by employing multi-period spatial distribution data, including nighttime light (NTL) data from the Defense Meteorological Satellite Program’s Operational Linescan System (DMSP-OLS) and the Suomi National Polar-orbiting Partnership Visible Infrared Imaging Radiometer Suite (NPP-VIIRS). There was a power function relationship between the two datasets at the pixel scale, with a mean relative error of NTL integration of 8.19%, 4.78% less than achieved by a previous study at the provincial scale. The spatial simulations of population distribution achieved a mean relative error of 26.98%, improved the simulation accuracy for mountainous population by nearly 20% and confirmed the feasibility of this method in Chongqing. During the study period, the spatial distribution of Chongqing’s population has increased in the west and decreased in the east, while also increased in low-altitude areas and decreased in medium-high altitude areas. Population agglomeration was common in all of districts and counties and the population density of central urban areas and its surrounding areas significantly increased, while that of non-urban areas such as northeast Chongqing significantly decreased.

Download Full-text

Population cluster data to assess the urban-rural split and electrification in Sub-Saharan Africa

Scientific Data ◽

10.1038/s41597-021-00897-9 ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Babak Khavari ◽

Alexandros Korkovelos ◽

Andreas Sahlberg ◽

Mark Howells ◽

Francesco Fuso Nerini

Keyword(s):

World Bank ◽

Population Distribution ◽

Population Data ◽

Sub Saharan Africa ◽

Demographic And Health Surveys ◽

Middle Income ◽

Disease Response ◽

The World Bank ◽

The World ◽

Urban Rural

AbstractHuman settlements are usually nucleated around manmade central points or distinctive natural features, forming clusters that vary in shape and size. However, population distribution in geo-sciences is often represented in the form of pixelated rasters. Rasters indicate population density at predefined spatial resolutions, but are unable to capture the actual shape or size of settlements. Here we suggest a methodology that translates high-resolution raster population data into vector-based population clusters. We use open-source data and develop an open-access algorithm tailored for low and middle-income countries with data scarcity issues. Each cluster includes unique characteristics indicating population, electrification rate and urban-rural categorization. Results are validated against national electrification rates provided by the World Bank and data from selected Demographic and Health Surveys (DHS). We find that our modeled national electrification rates are consistent with the rates reported by the World Bank, while the modeled urban/rural classification has 88% accuracy. By delineating settlements, this dataset can complement existing raster population data in studies such as energy planning, urban planning and disease response.

Download Full-text

A Random Forest Regression Model Predicting the Winners of Summer Olympic Events

Proceedings of the 2020 2nd International Conference on Big Data Engineering ◽

10.1145/3404512.3404513 ◽

2020 ◽

Author(s):

Mengjie Jia ◽

Yue Zhao ◽

Furong Chang ◽

Bofeng Zhang ◽

Kenji Yoshigoe

Keyword(s):

Random Forest ◽

Regression Model ◽

Random Forest Regression

Download Full-text