Our prediction target is a binary structure damage outcome derived from CAL FIRE Damage Inspection (DINS) reports. We retain only residential structures from our selected wildfires that were classified as Destroyed (>50%) or No Damage to focus on clearly defined damage states.
Wildfire Severity and Structural Damage Prediction in Wildland-Urban Interface (WUI) Areas
Understanding and predicting wildfire impacts on residential structures in Wildland-Urban Interface (WUI) areas, where homes meet wildland vegetation.
Overview
Why This Matters
Wildfires in California are becoming more frequent and more destructive, especially in Wildland-Urban Interface (WUI) areas, where homes meet wildland vegetation.
Yet after major wildfire events, residents, planners, and insurers are often left asking the same question:
Why did some homes burn while others survived?
Despite growing wildfire risk, publicly accessible tools that assess damage at the individual structure level remain limited.
(red = destroyed, blue = undamaged).
What We're Doing
In this project, we develop models to predict whether a residential structure will be destroyed or experience no damage during a wildfire event in California.
Using publicly available structural, environmental, and spatial data, we train both Random Forest and Neural Network models to identify patterns associated with destruction.
Our goal is to provide insights that can support:
- risk assessment
- community preparedness
- safer planning in WUI areas
Structure-Level Damage Map
This map shows observed structure outcomes from the CAL FIRE Damage Inspection (DINS) dataset for two fires, filtered by residential structures and damage classes of interest (Destroyed >50% or No Damage).
Use the controls to explore spatial patterns among surveyed residential structures within each fire perimeter.
Our Approach
To understand why some homes are destroyed while others survive, we combine multiple public datasets that describe wildfire severity, environmental conditions, and structural characteristics. We transform these sources into structure-level features and train machine learning models to predict whether a residential structure is destroyed or experiences no damage.
Flowchart showing a high-level overview of how multiple data sources are merged and engineered to generate structure-level wildfire damage predictions.
Data
Prediction Target
Damage & Structural Attributes
To capture structural characteristics that may influence damage outcomes, we integrate:
- CAL FIRE Damage Inspection (DINS) — structure damage classification and structure attributes such as construction materials.
- National Structure Inventory (NSI) — building characteristics such as square footage, number of stories, foundation type, and structural value
Environmental & Fire Severity Features
We constructed a merged pixel-level fire severity dataset integrating:
- Monitoring Trends in Burn Severity (MTBS) — burn severity classifications
- LANDFIRE — fuels and vegetation characteristics
- National Land Cover Database (NLCD) — land cover classification
- PRISM — temperature and precipitation data
These datasets were integrated into a unified pixel-level fire severity dataset during our Q1 project.
Spatial Context
To identify structures located within wildfire-prone Wildland-Urban Interface (WUI) areas, we incorporate:
- SILVIS Labs WUI dataset — Wildland-Urban Interface classifications and perimeters
- MTBS Fire Perimeters — wildfire boundary perimeters
Data Processing & Feature Engineering
To construct a dataset ready for modeling, we performed three major processing steps: cleaning and standardization, spatial integration, and final feature engineering.
Cleaning & Standardization
We filtered the DINS dataset to include only residential structures classified as Destroyed (>50%) or No Damage. Column names were standardized, invalid entries were converted to missing values, and columns with more than 70% missingness were removed.
All datasets were reprojected into a consistent coordinate system to ensure accurate spatial joins.
Spatial Integration
We restricted our analysis to structures located within Wildland-Urban Interface (WUI) areas by clipping SILVIS WUI data to selected wildfire perimeters.
Structures were spatially joined to fire severity data using a 30-meter buffer to capture localized burn conditions. Additional building characteristics were added using nearest-neighbor spatial matching with the NSI dataset.
Final Feature Engineering
Missing categorical variables were imputed using an “Unknown” category, and all categorical features were one-hot encoded.
Structural and environmental variables were combined into a unified structure-level dataset, and the damage outcome was encoded as a binary target variable (1 for Destroyed, 0 for No Damage).
The resulting dataset contains structure-level environmental and structural features, with a binary target variable indicating whether each structure was destroyed or experienced no damage during a wildfire.
Structural Damage Model
We trained two models, Random Forest and Neural Network, to predict whether a residential structure would be Destroyed (>50%) or experience No Damage during a wildfire. Using two different modeling approaches allows us to compare model performance and identify key predictive features across methods.
To test whether wildfire characteristics affect model performance, we trained both Random Forest and Neural Network models on six different subsets of the data, using the same general training, tuning, and threshold-optimization pipeline for each subset.
- All fires
- Small fires — fewer structure records
- Wind-driven vs plume-driven fires
- Low severity vs high severity fires — based on MTBS burn severity classification
- Low severity: fewer than 33% of pixels fall into MTBS severity classes 3-4
- High severity: more than 33% of pixels fall into MTBS severity classes 3-4
To prevent overfitting on the dataset and increase the interpretability of the model, multicollinear features were removed, and feature selection was run using the top 95% cumulative importance on a LightGBM for each model training set.
To reduce data leakage, we split the data at the wildfire level, meaning all structures from a single wildfire are kept within a single training, validation, or test set.
Random Forest Model
The Random Forest model serves as a strong tree-based model capable of capturing nonlinear relationships between structural and environmental variables.
Hyperparameters were tuned using randomized search with grouped cross-validation, including parameters such as maximum tree depth and minimum samples per leaf.
After selecting the best performing model, we performed threshold tuning on the validation set to find the threshold that would maximize the validation F1 score.
Neural Network Model
A feedforward neural network was trained to capture complex interactions between structural and environmental variables.
Hyperparameters were selected using a grid search over a chosen number of neurons per layer, dropout rate, learning rate, and regularization rate, with cross-validation and early stopping based on validation metrics.
The best hyperparameters were then run on the test set with early stopping, resulting in the final neural network model used to create performance metrics.
Evaluation Metrics
Due to class imbalance, we optimized Precision-Recall AUC (PR-AUC) during cross-validation and used validation-based probability threshold tuning.
Final model performance was evaluated using Accuracy, Precision, Recall, and F1-score.
Using the final trained models, we generated:
- Model performance metrics and confusion matrices
- Feature importance analysis using SHAP
- Fire-level structural loss reports
- Fire-level damage map visualizations (actual outcome, predicted outcome, and predicted probability)
Results
Overall Model Performance
| Model | Class | Precision | Recall | F1 | Support |
|---|---|---|---|---|---|
| Random Forest | No Damage | 0.642 | 0.888 | 0.745 | 303 |
| Destroyed | 0.974 | 0.893 | 0.932 | 1402 | |
| Macro avg | 0.808 | 0.890 | 0.838 | 1705 | |
| Weighted avg | 0.915 | 0.892 | 0.898 | 1705 | |
| Accuracy | 89.2% | ||||
| Neural Network | No Damage | 0.824 | 0.508 | 0.629 | 303 |
| Destroyed | 0.902 | 0.976 | 0.938 | 1402 | |
| Macro avg | 0.863 | 0.742 | 0.783 | 1705 | |
| Weighted avg | 0.888 | 0.893 | 0.883 | 1705 | |
| Accuracy | 89.3% | ||||
Both models achieve similar overall performance on the wildfire test set, with accuracy near 89%. However, the models differ in how they identify damage.
- The Random Forest has higher recall for No Damage structures (0.888 vs. 0.508), meaning it is better at recognizing homes that survive a fire.
- The Neural Network has higher recall for Destroyed structures (0.976 vs. 0.893), making it more effective at identifying homes likely to be lost during a wildfire.
In practice, this means the Random Forest tends to identify surviving structures more reliably, while the Neural Network is stronger at detecting high-risk structures likely to be destroyed.
Depending on the use case, this trade-off may matter: a homeowner focused on identifying high-risk homes might prefer the Neural Network, while planners focused on identifying surviving structures may benefit more from the Random Forest.
Fire x Model Demo
While the table above summarizes overall test-set performance, the demo below shows how predictions vary across individual wildfire events and allow comparison between models.
Select a fire and model to view the corresponding damage maps (actual, predicted, probability)
Feature Importance
Feature importance analysis using SHAP values reveals which structural and environmental factors most strongly influence predicted wildfire damage risk.
The SHAP summary plots below show how feature importance changes across different subsets of fires.
Select a fire subset and model to explore how these importance patterns vary across fire subsets and between the two models.
Discussion
Key Insights
- Structural characteristics strongly influence wildfire damage. Features such as exterior wall materials, roof type, and eave design were among the most important predictors of whether a structure was destroyed.
- Environmental conditions also play an important role. Burn severity, precipitation, temperature, and vapor pressure deficit contribute to wildfire damage risk in addition to structural characteristics.
- Model behavior varies depending on fire conditions. The models performed best on lower-severity fires and struggled more in high-severity and plume-driven fires, suggesting that extreme fire conditions can reduce the predictive value of structural characteristics.
- Structure-level modeling provides meaningful insights into wildfire damage. By combining structural inspection data with environmental and geospatial variables, the models capture both building-level vulnerability and local wildfire conditions that influence damage outcomes.
- Publicly available data can support practical wildfire risk assessment. Even with important limitations, publicly available structural, environmental, and geospatial data can help identify vulnerable structures and support preparedness planning and risk modeling in Wildland-Urban Interface areas.
Interpretation of Results
Both models identified similar predictors of wildfire damage: nine of the top ten most influential features were shared between the Random Forest and Neural Network models. This strong agreement suggests that these features represent genuine drivers of structural vulnerability rather than model-specific artifacts.
- Structural features such as outer wall materials, roof type, and eave design were among the most influential predictors.
- Environmental conditions including burn severity, precipitation, and vapor pressure deficit also strongly influenced predictions.
Model performance also varied across fire conditions. Both models performed best on lower-severity fires and struggled more in high-severity scenarios. In these extreme scenarios, structural features may matter less because many structures in the fire's path are destroyed regardless of construction materials or design.
Limitations
While our models demonstrate meaningful predictive performance, several limitations should be considered when interpreting these results.
-
Binary damage classification.
The models only distinguish between destroyed and undamaged structures, which excludes partial damage levels.
In real wildfire scenarios, structures often experience partial damage or varying degrees of structural loss, and collapsing these outcomes into a binary label may obscure meaningful differences in vulnerability between structures. This limitation reduces the model's ability to represent the full spectrum of wildfire damage outcomes.
-
Class imbalance.
The dataset contains far more destroyed structures than undamaged ones.
Although balanced class weights were used during training, this imbalance still affects model behavior and contributes to differences in precision and recall across classes. As a result, the models may be more sensitive to patterns associated with destroyed structures and may be less reliable when identifying structures that ultimately remain undamaged.
-
Regional data scope.
The dataset is limited to California wildfires.
Although cross-validation was designed to reduce data leakage between fires, the ability of these models to generalize to new geographic regions remains uncertain. Differences in vegetation types, building materials, fire behavior, and climate conditions across regions may limit the direct transferability of our models outside the California wildfires used in training.
-
Post-fire data dependency.
Our framework depends partly on post-fire burn severity data from the MTBS dataset.
Because this data is typically released one to two years after a wildfire occurs, the approach cannot yet support real-time wildfire damage prediction. Consequently, the current modeling framework is better suited for retrospective analysis and long-term planning rather than real-time prediction during active wildfire events.
Future Work
Future work could address these limitations and build on our findings in several ways:
- Extend the target variable to multi-class damage levels instead of a binary destroyed/undamaged outcome.
- Incorporate additional weather variables such as wind speed and wind direction, which are especially important in wind-driven fires.
- Expand the dataset to include more fires and geographic regions to improve generalizability.
- Integrate real-time weather feeds and updated structural inventories to move toward real-time wildfire risk assessment.
By addressing these areas, future research could improve both our understanding of structural vulnerability to wildfires and the practical use of wildfire damage prediction models.
Conclusion
In this project, we investigated whether residential structural damage in Wildland-Urban Interface areas could be predicted using publicly available geospatial, environmental, and structural data. To do this, we trained and evaluated Random Forest and Neural Network models using burn severity, weather, spatial, and structural inspection data from 30 California wildfires.
Our results show that structure-level wildfire damage can be meaningfully predicted using accessible public data. Both models identified consistent patterns linking structural characteristics and environmental conditions to wildfire damage outcomes, and their strong agreement on influential predictors increases confidence in these findings.
Overall, this project demonstrates that publicly available data can provide a practical foundation for structure-level wildfire risk assessment. As wildfire frequency and intensity continue to increase, accessible and interpretable tools for understanding structural vulnerability may help support preparedness, mitigation, and future planning in fire-prone communities beyond California.
Links
For more details on our project, check out our report, poster, and code repositories: