Wildfire Severity and Structural Damage Prediction in Wildland-Urban Interface (WUI) Areas

Understanding and predicting wildfire impacts on residential structures in Wildland-Urban Interface (WUI) areas, where homes meet wildland vegetation.

30 Fires
40K+ Residential Structures
2 Prediction Models
Scroll

Overview

Why This Matters

Wildfires in California are becoming more frequent and more destructive, especially in Wildland-Urban Interface (WUI) areas, where homes meet wildland vegetation.

Yet after major wildfire events, residents, planners, and insurers are often left asking the same question:

Why did some homes burn while others survived?

Despite growing wildfire risk, publicly accessible tools that assess damage at the individual structure level remain limited.


North Complex Fire structure damage map
Observed structure outcomes in the North Complex Fire
(red = destroyed, blue = undamaged).

What We're Doing

In this project, we develop models to predict whether a residential structure will be destroyed or experience no damage during a wildfire event in California.

Using publicly available structural, environmental, and spatial data, we train both Random Forest and Neural Network models to identify patterns associated with destruction.

Our goal is to provide insights that can support:

  • risk assessment
  • community preparedness
  • safer planning in WUI areas

Structure-Level Damage Map

This map shows observed structure outcomes from the CAL FIRE Damage Inspection (DINS) dataset for two fires, filtered by residential structures and damage classes of interest (Destroyed >50% or No Damage).

Use the controls to explore spatial patterns among surveyed residential structures within each fire perimeter.

Fire Selection:

Our Approach

To understand why some homes are destroyed while others survive, we combine multiple public datasets that describe wildfire severity, environmental conditions, and structural characteristics. We transform these sources into structure-level features and train machine learning models to predict whether a residential structure is destroyed or experiences no damage.

Wildfire Structure Damage Prediction Pipeline Flowchart

Flowchart showing a high-level overview of how multiple data sources are merged and engineered to generate structure-level wildfire damage predictions.


Data

Prediction Target

Our prediction target is a binary structure damage outcome derived from CAL FIRE Damage Inspection (DINS) reports. We retain only residential structures from our selected wildfires that were classified as Destroyed (>50%) or No Damage to focus on clearly defined damage states.

Damage & Structural Attributes

To capture structural characteristics that may influence damage outcomes, we integrate:

Environmental & Fire Severity Features

We constructed a merged pixel-level fire severity dataset integrating:

These datasets were integrated into a unified pixel-level fire severity dataset during our Q1 project.

Spatial Context

To identify structures located within wildfire-prone Wildland-Urban Interface (WUI) areas, we incorporate:


Data Processing & Feature Engineering

To construct a dataset ready for modeling, we performed three major processing steps: cleaning and standardization, spatial integration, and final feature engineering.

Cleaning & Standardization

We filtered the DINS dataset to include only residential structures classified as Destroyed (>50%) or No Damage. Column names were standardized, invalid entries were converted to missing values, and columns with more than 70% missingness were removed.

All datasets were reprojected into a consistent coordinate system to ensure accurate spatial joins.

Spatial Integration

We restricted our analysis to structures located within Wildland-Urban Interface (WUI) areas by clipping SILVIS WUI data to selected wildfire perimeters.

Structures were spatially joined to fire severity data using a 30-meter buffer to capture localized burn conditions. Additional building characteristics were added using nearest-neighbor spatial matching with the NSI dataset.

Final Feature Engineering

Missing categorical variables were imputed using an “Unknown” category, and all categorical features were one-hot encoded.

Structural and environmental variables were combined into a unified structure-level dataset, and the damage outcome was encoded as a binary target variable (1 for Destroyed, 0 for No Damage).

The resulting dataset contains structure-level environmental and structural features, with a binary target variable indicating whether each structure was destroyed or experienced no damage during a wildfire.


Structural Damage Model

We trained two models, Random Forest and Neural Network, to predict whether a residential structure would be Destroyed (>50%) or experience No Damage during a wildfire. Using two different modeling approaches allows us to compare model performance and identify key predictive features across methods.

To test whether wildfire characteristics affect model performance, we trained both Random Forest and Neural Network models on six different subsets of the data, using the same general training, tuning, and threshold-optimization pipeline for each subset.

To prevent overfitting on the dataset and increase the interpretability of the model, multicollinear features were removed, and feature selection was run using the top 95% cumulative importance on a LightGBM for each model training set.

To reduce data leakage, we split the data at the wildfire level, meaning all structures from a single wildfire are kept within a single training, validation, or test set.

Random Forest Model

The Random Forest model serves as a strong tree-based model capable of capturing nonlinear relationships between structural and environmental variables.

Hyperparameters were tuned using randomized search with grouped cross-validation, including parameters such as maximum tree depth and minimum samples per leaf.

After selecting the best performing model, we performed threshold tuning on the validation set to find the threshold that would maximize the validation F1 score.

Neural Network Model

A feedforward neural network was trained to capture complex interactions between structural and environmental variables.

Hyperparameters were selected using a grid search over a chosen number of neurons per layer, dropout rate, learning rate, and regularization rate, with cross-validation and early stopping based on validation metrics.

The best hyperparameters were then run on the test set with early stopping, resulting in the final neural network model used to create performance metrics.

Evaluation Metrics

Due to class imbalance, we optimized Precision-Recall AUC (PR-AUC) during cross-validation and used validation-based probability threshold tuning.

Final model performance was evaluated using Accuracy, Precision, Recall, and F1-score.

Using the final trained models, we generated:

Results

Overall Model Performance

Model Class Precision Recall F1 Support
Random Forest No Damage 0.642 0.888 0.745 303
Destroyed 0.974 0.893 0.932 1402
Macro avg 0.808 0.890 0.838 1705
Weighted avg 0.915 0.892 0.898 1705
Accuracy 89.2%
Neural Network No Damage 0.824 0.508 0.629 303
Destroyed 0.902 0.976 0.938 1402
Macro avg 0.863 0.742 0.783 1705
Weighted avg 0.888 0.893 0.883 1705
Accuracy 89.3%

Both models achieve similar overall performance on the wildfire test set, with accuracy near 89%. However, the models differ in how they identify damage.

In practice, this means the Random Forest tends to identify surviving structures more reliably, while the Neural Network is stronger at detecting high-risk structures likely to be destroyed.

Depending on the use case, this trade-off may matter: a homeowner focused on identifying high-risk homes might prefer the Neural Network, while planners focused on identifying surviving structures may benefit more from the Random Forest.


Fire x Model Demo

While the table above summarizes overall test-set performance, the demo below shows how predictions vary across individual wildfire events and allow comparison between models.

Select a fire and model to view the corresponding damage maps (actual, predicted, probability)

Choose a fire + model to view maps.

Feature Importance

Feature importance analysis using SHAP values reveals which structural and environmental factors most strongly influence predicted wildfire damage risk.

The SHAP summary plots below show how feature importance changes across different subsets of fires.

Select a fire subset and model to explore how these importance patterns vary across fire subsets and between the two models.

Choose a fire subset + model to view the SHAP plot.

Discussion

Key Insights


Interpretation of Results

Both models identified similar predictors of wildfire damage: nine of the top ten most influential features were shared between the Random Forest and Neural Network models. This strong agreement suggests that these features represent genuine drivers of structural vulnerability rather than model-specific artifacts.

Model performance also varied across fire conditions. Both models performed best on lower-severity fires and struggled more in high-severity scenarios. In these extreme scenarios, structural features may matter less because many structures in the fire's path are destroyed regardless of construction materials or design.


Limitations

While our models demonstrate meaningful predictive performance, several limitations should be considered when interpreting these results.


Future Work

Future work could address these limitations and build on our findings in several ways:

By addressing these areas, future research could improve both our understanding of structural vulnerability to wildfires and the practical use of wildfire damage prediction models.

Conclusion

In this project, we investigated whether residential structural damage in Wildland-Urban Interface areas could be predicted using publicly available geospatial, environmental, and structural data. To do this, we trained and evaluated Random Forest and Neural Network models using burn severity, weather, spatial, and structural inspection data from 30 California wildfires.

Our results show that structure-level wildfire damage can be meaningfully predicted using accessible public data. Both models identified consistent patterns linking structural characteristics and environmental conditions to wildfire damage outcomes, and their strong agreement on influential predictors increases confidence in these findings.

Overall, this project demonstrates that publicly available data can provide a practical foundation for structure-level wildfire risk assessment. As wildfire frequency and intensity continue to increase, accessible and interpretable tools for understanding structural vulnerability may help support preparedness, mitigation, and future planning in fire-prone communities beyond California.