01_Exploratory analyses

title: "Exploratory Analysis for Random Forest Modeling" author: "Dra. Zaira Rosario Pérez-Vázquez" date: "`r Sys.Date()`" output: html_document: toc: true number_sections: true toc_depth: 2 fig_caption: true fig_path: "docs/" theme: flatly

knitr::opts_chunk$set(echo = TRUE, message = FALSE, warning = FALSE)

1. Introduction

This report documents exploratory data analysis (EDA) for a Random Forest model on simulated forest floor carbon data. Data includes measurements from two layers (L and FH) across three years: 2013, 2018, 2023.

2. Load Libraries

library(tidyverse)
library(psych)
library(here)
library(DescTools)
library(ggpubr)
library(car)
library(Hmisc)
library(moments)
library(cowplot)
library(readr)

3. Load Data

data_path <- here("data", "RF_data_Carbon_ForestFloor.csv")
datasetFF <- read_csv(data_path)
str(datasetFF)

4. Data Preparation

datasetFF <- datasetFF %>%
  mutate(
    YEAR = as.factor(YEAR),
    LAYER = as.factor(LAYER),
    CONDITION = as.factor(CONDITION),
    LAYER_NUM = as.factor(LAYER_NUM),
    CON_NUM = as.factor(CON_NUM)
  )

5. Summary Statistics

describe(datasetFF$C_STOCKS)
describeBy(datasetFF$C_STOCKS, list(datasetFF$YEAR, datasetFF$LAYER))

6. Histogram by Layer and Year

datasetFF %>%
  ggplot(aes(x = C_STOCKS)) +
  geom_histogram(bins = 30, fill = "steelblue", alpha = 0.7) +
  facet_grid(YEAR ~ LAYER) +
  theme_minimal() +
  labs(title = "Distribution of C Stocks", x = "C Stocks (Mg ha^-1)", y = "Count")

7. Normality Assessment (L layer)

plots <- list()
for (yr in c("2013", "2018", "2023")) {
  data_tmp <- datasetFF %>% filter(YEAR == yr & LAYER == "L")
  p <- ggplot(data_tmp, aes(x = C_STOCKS)) +
    geom_histogram(aes(y = ..density..), binwidth = 0.5, fill = "gray", color = "black") +
    geom_density(color = "blue") +
    stat_function(fun = dnorm, args = list(mean = mean(data_tmp$C_STOCKS), sd = sd(data_tmp$C_STOCKS)), color = "red") +
    labs(title = paste("Year", yr), x = "C Stocks (Mg ha^-1)", y = "Density") +
    theme_minimal()
  plots[[yr]] <- p
}
ggarrange(plotlist = plots, ncol = 3, common.legend = TRUE)

8. Spearman Correlations

data2013 <- datasetFF %>% filter(YEAR == "2013")
corr_vars <- data2013 %>% select(C_STOCKS, UTMX, UTMY, STAND_AGE, BASAL_AREA, DOM_HEIGHT, SPECIES_RICHNESS, SHANNON_INDEX, GAP_FRACTION, CANOPY_COVER, ELEVATION, SLOPE, ASPECT)
rcorr(as.matrix(corr_vars), type = "spearman")

9. Variance Inflation Factor (VIF)

modelC_2013VIF <- lm(C_STOCKS ~ UTMX + UTMY + STAND_AGE + BASAL_AREA + DOM_HEIGHT + SPECIES_RICHNESS + SHANNON_INDEX + GAP_FRACTION + ELEVATION + SLOPE + ASPECT, data = data2013)
vif(modelC_2013VIF)

10. Train/Test Split

set.seed(123)
train_idx <- sample(seq_len(nrow(data2013)), size = 0.85 * nrow(data2013))
training_2013 <- data2013[train_idx, ]
testing_2013 <- data2013[-train_idx, ]

write_csv(training_2013, here("data", "training_2013_85.csv"))
write_csv(testing_2013, here("data", "testing_2013_15.csv"))

11. Conclusion

This exploratory analysis identified key variables, distributional patterns, and correlation structures to inform future Random Forest modeling stages. All outputs are saved in the /outputs folder, and this report is rendered to /docs for easy viewing.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

01_Exploratory analyses

title: "Exploratory Analysis for Random Forest Modeling" author: "Dra. Zaira Rosario Pérez-Vázquez" date: "`r Sys.Date()`" output: html_document: toc: true number_sections: true toc_depth: 2 fig_caption: true fig_path: "docs/" theme: flatly

1. Introduction

2. Load Libraries

3. Load Data

4. Data Preparation

5. Summary Statistics

6. Histogram by Layer and Year

7. Normality Assessment (L layer)

8. Spearman Correlations

9. Variance Inflation Factor (VIF)

10. Train/Test Split

11. Conclusion

Clone this wiki locally

01_Exploratory analyses

title: "Exploratory Analysis for Random Forest Modeling" author: "Dra. Zaira Rosario Pérez-Vázquez" date: "r Sys.Date()" output: html_document: toc: true number_sections: true toc_depth: 2 fig_caption: true fig_path: "docs/" theme: flatly

1. Introduction

2. Load Libraries

3. Load Data

4. Data Preparation

5. Summary Statistics

6. Histogram by Layer and Year

7. Normality Assessment (L layer)

8. Spearman Correlations

9. Variance Inflation Factor (VIF)

10. Train/Test Split

11. Conclusion

Clone this wiki locally

title: "Exploratory Analysis for Random Forest Modeling" author: "Dra. Zaira Rosario Pérez-Vázquez" date: "`r Sys.Date()`" output: html_document: toc: true number_sections: true toc_depth: 2 fig_caption: true fig_path: "docs/" theme: flatly