Wildfire Severity and Structural Damage Prediction in Wildland-Urban Interface (WUI) Areas

Understanding and predicting wildfire impacts on residential structures in Wildland-Urban Interface (WUI) areas, where homes meet wildland vegetation.

30 Fires

40K+ Residential Structures

2 Prediction Models

Scroll

Overview

Why This Matters

Wildfires in California are becoming more frequent and more destructive, especially in Wildland-Urban Interface (WUI) areas, where homes meet wildland vegetation.

Yet after major wildfire events, residents, planners, and insurers are often left asking the same question:

Why did some homes burn while others survived?

Despite growing wildfire risk, publicly accessible tools that assess damage at the individual structure level remain limited.

North Complex Fire structure damage map — Observed structure outcomes in the North Complex Fire
(red = destroyed, blue = undamaged).

What We're Doing

In this project, we develop models to predict whether a residential structure will be destroyed or experience no damage during a wildfire event in California.

Using publicly available structural, environmental, and spatial data, we train both Random Forest and Neural Network models to identify patterns associated with destruction.

Our goal is to provide insights that can support:

risk assessment
community preparedness
safer planning in WUI areas

Structure-Level Damage Map

This map shows observed structure outcomes from the CAL FIRE Damage Inspection (DINS) dataset for two fires, filtered by residential structures and damage classes of interest (Destroyed >50% or No Damage).

Use the controls to explore spatial patterns among surveyed residential structures within each fire perimeter.

Fire Selection:

Show All Structures Show Damaged Structures Show Undamaged Structures

Our Approach

To understand why some homes are destroyed while others survive, we combine multiple public datasets that describe wildfire severity, environmental conditions, and structural characteristics. We transform these sources into structure-level features and train machine learning models to predict whether a residential structure is destroyed or experiences no damage.

Wildfire Structure Damage Prediction Pipeline Flowchart

Flowchart showing a high-level overview of how multiple data sources are merged and engineered to generate structure-level wildfire damage predictions.

Data

Prediction Target

Our prediction target is a binary structure damage outcome derived from CAL FIRE Damage Inspection (DINS) reports. We retain only residential structures from our selected wildfires that were classified as Destroyed (>50%) or No Damage to focus on clearly defined damage states.

Damage & Structural Attributes

To capture structural characteristics that may influence damage outcomes, we integrate:

CAL FIRE Damage Inspection (DINS) — structure damage classification and structure attributes such as construction materials.
National Structure Inventory (NSI) — building characteristics such as square footage, number of stories, foundation type, and structural value

Environmental & Fire Severity Features

We constructed a merged pixel-level fire severity dataset integrating:

Monitoring Trends in Burn Severity (MTBS) — burn severity classifications
LANDFIRE — fuels and vegetation characteristics
National Land Cover Database (NLCD) — land cover classification
PRISM — temperature and precipitation data

These datasets were integrated into a unified pixel-level fire severity dataset during our Q1 project.

View Q1 project deliverables

Q1 Report Q1 GitHub Repository

Spatial Context

To identify structures located within wildfire-prone Wildland-Urban Interface (WUI) areas, we incorporate:

SILVIS Labs WUI dataset — Wildland-Urban Interface classifications and perimeters
MTBS Fire Perimeters — wildfire boundary perimeters

Data Processing & Feature Engineering

To construct a dataset ready for modeling, we performed three major processing steps: cleaning and standardization, spatial integration, and final feature engineering.

Cleaning & Standardization

We filtered the DINS dataset to include only residential structures classified as Destroyed (>50%) or No Damage. Column names were standardized, invalid entries were converted to missing values, and columns with more than 70% missingness were removed.

All datasets were reprojected into a consistent coordinate system to ensure accurate spatial joins.

Spatial Integration

We restricted our analysis to structures located within Wildland-Urban Interface (WUI) areas by clipping SILVIS WUI data to selected wildfire perimeters.

Structures were spatially joined to fire severity data using a 30-meter buffer to capture localized burn conditions. Additional building characteristics were added using nearest-neighbor spatial matching with the NSI dataset.

Final Feature Engineering

Missing categorical variables were imputed using an “Unknown” category, and all categorical features were one-hot encoded.

Structural and environmental variables were combined into a unified structure-level dataset, and the damage outcome was encoded as a binary target variable (1 for Destroyed, 0 for No Damage).

View additional technical preprocessing details

Key preprocessing steps included:

Filtering to 30 California wildfires.
Reprojecting coordinates to EPSG:5070 for spatial consistency.
Applying spatial joins with 30-meter buffers.
Using nearest-neighbor matching (1500m threshold) to integrate NSI data, then removing duplicate matches.
Removing columns with all false values after one-hot encoding.
Imputing missing categorical values as "Unknown".
Binary target encoding (1 = Destroyed, 0 = No Damage).

The resulting dataset contains structure-level environmental and structural features, with a binary target variable indicating whether each structure was destroyed or experienced no damage during a wildfire.

Structural Damage Model

We trained two models, Random Forest and Neural Network, to predict whether a residential structure would be Destroyed (>50%) or experience No Damage during a wildfire. Using two different modeling approaches allows us to compare model performance and identify key predictive features across methods.

To test whether wildfire characteristics affect model performance, we trained both Random Forest and Neural Network models on six different subsets of the data, using the same general training, tuning, and threshold-optimization pipeline for each subset.

All fires
Small fires — fewer structure records
Wind-driven vs plume-driven fires
Low severity vs high severity fires — based on MTBS burn severity classification
- Low severity: fewer than 33% of pixels fall into MTBS severity classes 3-4
- High severity: more than 33% of pixels fall into MTBS severity classes 3-4

To prevent overfitting on the dataset and increase the interpretability of the model, multicollinear features were removed, and feature selection was run using the top 95% cumulative importance on a LightGBM for each model training set.

To reduce data leakage, we split the data at the wildfire level, meaning all structures from a single wildfire are kept within a single training, validation, or test set.

Random Forest Model

The Random Forest model serves as a strong tree-based model capable of capturing nonlinear relationships between structural and environmental variables.

Hyperparameters were tuned using randomized search with grouped cross-validation, including parameters such as maximum tree depth and minimum samples per leaf.

After selecting the best performing model, we performed threshold tuning on the validation set to find the threshold that would maximize the validation F1 score.

View additional Random Forest technical details

Cross-validation: 3-fold GroupKFold (grouped by wildfire)
Search method: RandomizedSearchCV with 20 sampled configurations
Class weight: balanced
n_estimators: {300, 500, 800}
max_depth: {20, 40, 60, None}
min_samples_split: {2, 5, 10}
min_samples_leaf: {1, 2, 5}
max_features: {"sqrt", 0.3, 0.5}
Threshold tuning range: 0.2 - 0.8

Neural Network Model

A feedforward neural network was trained to capture complex interactions between structural and environmental variables.

Hyperparameters were selected using a grid search over a chosen number of neurons per layer, dropout rate, learning rate, and regularization rate, with cross-validation and early stopping based on validation metrics.

The best hyperparameters were then run on the test set with early stopping, resulting in the final neural network model used to create performance metrics.

View additional Neural Network technical details

Cross-validation: 5-fold cross-validation on combined training + validation sets
Search method: Grid search over architecture and regularization parameters
Architecture (units per layer): [128, 64, 32, 16, 8]
Activation: ReLU
Dropout rate: 0.15
Batch normalization: Enabled
Learning rate: 0.001
L2 regularization: 0.01
Early stopping: 15-epoch patience (best validation epoch selected)

Evaluation Metrics

Due to class imbalance, we optimized Precision-Recall AUC (PR-AUC) during cross-validation and used validation-based probability threshold tuning.

Final model performance was evaluated using Accuracy, Precision, Recall, and F1-score.

Using the final trained models, we generated:

Model performance metrics and confusion matrices
Feature importance analysis using SHAP
Fire-level structural loss reports
Fire-level damage map visualizations (actual outcome, predicted outcome, and predicted probability)

Results

Overall Model Performance

Model	Class	Precision	Recall	F1	Support
Random Forest	No Damage	0.642	0.888	0.745	303
	Destroyed	0.974	0.893	0.932	1402
	Macro avg	0.808	0.890	0.838	1705
	Weighted avg	0.915	0.892	0.898	1705
	Accuracy	89.2%

Neural Network	No Damage	0.824	0.508	0.629	303
	Destroyed	0.902	0.976	0.938	1402
	Macro avg	0.863	0.742	0.783	1705
	Weighted avg	0.888	0.893	0.883	1705
	Accuracy	89.3%

Both models achieve similar overall performance on the wildfire test set, with accuracy near 89%. However, the models differ in how they identify damage.

The Random Forest has higher recall for No Damage structures (0.888 vs. 0.508), meaning it is better at recognizing homes that survive a fire.
The Neural Network has higher recall for Destroyed structures (0.976 vs. 0.893), making it more effective at identifying homes likely to be lost during a wildfire.

In practice, this means the Random Forest tends to identify surviving structures more reliably, while the Neural Network is stronger at detecting high-risk structures likely to be destroyed.

Depending on the use case, this trade-off may matter: a homeowner focused on identifying high-risk homes might prefer the Neural Network, while planners focused on identifying surviving structures may benefit more from the Random Forest.

Fire x Model Demo

While the table above summarizes overall test-set performance, the demo below shows how predictions vary across individual wildfire events and allow comparison between models.

Select a fire and model to view the corresponding damage maps (actual, predicted, probability)

Fire Model Compare with Neural Network for this selected fire

Choose a fire + model to view maps.

Feature Importance

Feature importance analysis using SHAP values reveals which structural and environmental factors most strongly influence predicted wildfire damage risk.

How to read a SHAP summary plot

Each dot represents one structure.
The x-axis shows the SHAP value, indicating how much that feature pushes the prediction toward structural damage or no structural damage.
Points to the right (positive SHAP values) increase predicted structural damage risk, while points to the left (negative SHAP values) decrease it.
Color indicates the feature value (red = high, blue = low). For categorical variables, red indicates the presence of that feature while blue indicates its absence.
Features are ordered from most to least influential, top to bottom.

The SHAP summary plots below show how feature importance changes across different subsets of fires.

Select a fire subset and model to explore how these importance patterns vary across fire subsets and between the two models.

Fire Subset Model

Choose a fire subset + model to view the SHAP plot.

Discussion

Key Insights

Structural characteristics strongly influence wildfire damage. Features such as exterior wall materials, roof type, and eave design were among the most important predictors of whether a structure was destroyed.
Environmental conditions also play an important role. Burn severity, precipitation, temperature, and vapor pressure deficit contribute to wildfire damage risk in addition to structural characteristics.
Model behavior varies depending on fire conditions. The models performed best on lower-severity fires and struggled more in high-severity and plume-driven fires, suggesting that extreme fire conditions can reduce the predictive value of structural characteristics.
Structure-level modeling provides meaningful insights into wildfire damage. By combining structural inspection data with environmental and geospatial variables, the models capture both building-level vulnerability and local wildfire conditions that influence damage outcomes.
Publicly available data can support practical wildfire risk assessment. Even with important limitations, publicly available structural, environmental, and geospatial data can help identify vulnerable structures and support preparedness planning and risk modeling in Wildland-Urban Interface areas.

Interpretation of Results

Both models identified similar predictors of wildfire damage: nine of the top ten most influential features were shared between the Random Forest and Neural Network models. This strong agreement suggests that these features represent genuine drivers of structural vulnerability rather than model-specific artifacts.

Structural features such as outer wall materials, roof type, and eave design were among the most influential predictors.
Environmental conditions including burn severity, precipitation, and vapor pressure deficit also strongly influenced predictions.

Model performance also varied across fire conditions. Both models performed best on lower-severity fires and struggled more in high-severity scenarios. In these extreme scenarios, structural features may matter less because many structures in the fire's path are destroyed regardless of construction materials or design.

Limitations

While our models demonstrate meaningful predictive performance, several limitations should be considered when interpreting these results.

Binary damage classification. The models only distinguish between destroyed and undamaged structures, which excludes partial damage levels.
In real wildfire scenarios, structures often experience partial damage or varying degrees of structural loss, and collapsing these outcomes into a binary label may obscure meaningful differences in vulnerability between structures. This limitation reduces the model's ability to represent the full spectrum of wildfire damage outcomes.
Class imbalance. The dataset contains far more destroyed structures than undamaged ones.
Although balanced class weights were used during training, this imbalance still affects model behavior and contributes to differences in precision and recall across classes. As a result, the models may be more sensitive to patterns associated with destroyed structures and may be less reliable when identifying structures that ultimately remain undamaged.
Regional data scope. The dataset is limited to California wildfires.
Although cross-validation was designed to reduce data leakage between fires, the ability of these models to generalize to new geographic regions remains uncertain. Differences in vegetation types, building materials, fire behavior, and climate conditions across regions may limit the direct transferability of our models outside the California wildfires used in training.
Post-fire data dependency. Our framework depends partly on post-fire burn severity data from the MTBS dataset.
Because this data is typically released one to two years after a wildfire occurs, the approach cannot yet support real-time wildfire damage prediction. Consequently, the current modeling framework is better suited for retrospective analysis and long-term planning rather than real-time prediction during active wildfire events.

Future Work

Future work could address these limitations and build on our findings in several ways:

Extend the target variable to multi-class damage levels instead of a binary destroyed/undamaged outcome.
Incorporate additional weather variables such as wind speed and wind direction, which are especially important in wind-driven fires.
Expand the dataset to include more fires and geographic regions to improve generalizability.
Integrate real-time weather feeds and updated structural inventories to move toward real-time wildfire risk assessment.

By addressing these areas, future research could improve both our understanding of structural vulnerability to wildfires and the practical use of wildfire damage prediction models.

Conclusion

In this project, we investigated whether residential structural damage in Wildland-Urban Interface areas could be predicted using publicly available geospatial, environmental, and structural data. To do this, we trained and evaluated Random Forest and Neural Network models using burn severity, weather, spatial, and structural inspection data from 30 California wildfires.

Our results show that structure-level wildfire damage can be meaningfully predicted using accessible public data. Both models identified consistent patterns linking structural characteristics and environmental conditions to wildfire damage outcomes, and their strong agreement on influential predictors increases confidence in these findings.

Overall, this project demonstrates that publicly available data can provide a practical foundation for structure-level wildfire risk assessment. As wildfire frequency and intensity continue to increase, accessible and interpretable tools for understanding structural vulnerability may help support preparedness, mitigation, and future planning in fire-prone communities beyond California.

Links

For more details on our project, check out our report, poster, and code repositories:

Project Report Project Poster Project Repository Website Repository