Finding datasets to round out your project or story is often the most challenging and time-consuming part of the process. Here are ten go-to datasets.
Join the DZone community and get the full member experience. Think about when you completed your last significant data project. How much time did you spend collecting, curating, and engineering datasets? I’ve found that finding the perfect dataset to complete your story or analysis can often be the most difficult part of the process. I recently spent a considerable amount of time researching specific US wildfire and forestry data to support a new analysis and visualization series. I was unsuccessful – until my colleague sent me to the California Forest Observatory and I found exactly what I needed. Over the years, many of the datasets that I’ve used in my own projects were shared with me by colleagues. I decided to compile a list of searchable repositories, individual datasets of note, and emerging data platforms to help make these sources more easily accessible to others. https://forestobservatory.com/ The California Forest Observatory is a data-driven forest monitoring system that maps wildfire hazard drivers across California, including forest structure, weather, topography, and infrastructure. You can download canopy cover, canopy height, canopy base height, canopy bulk density, canopy layer count, ladder fuel density, and surface fuels geodata for the state by county, community, or watershed. Additional Resources: Modeling & Monitoring Powerline Tree Strike Risk at Scale https://www.openstreetmap.org OpenStreetMap provides a broad range of map data maintained by a worldwide community of geographers and cartographers. You can access roads, trails, points of interest, railways, and much more worldwide. Geofabrik’s OpenStreetMap Data Extracts are one of the easiest ways to download information for your area of interest quickly. Additional Resources: Todd Mostak’s complete OpenStreetMap extraction and load into OmniSci; Analyze OpenStreetMap Data with OSMnx and OmniSci https://registry.opendata.aws/ The Registry of Open Data on AWS has empowered laboratories, research institutions, and various other organizations to deliver open datasets to developers, startups, and enterprises worldwide since its launch in 2018. Anyone can easily access the registry through a web interface and search for datasets with keywords or tags like flood risk, remote sensing, imagery, or human genome.