Overview

Cloud Datasets is a collection of curated data sets built on Google Cloud & Google BigQuery. Datasets are ready for analytics and AI/ML initiatives without the need of data engineering. 

COVID Dataset

The Dataflix COVID dataset is a centralized repository of up-to-date and curated data focused on key tracking metics and U.S. census data. The dataset is publicly-readable & accessible on Google BigQuery – ready for analysis, analytics and machine learning initiatives.

The dataset is built on data sourced from trusted sources like CSSE at Johns Hopkins University and government agencies, covering a wide range of metrics including confirmed cases, new cases, % population, mortality rate and deaths, aggregated at various geographic levels including city, county, state and country. New data is published on daily basis. Our objective is to make structured COVID data available for organizations and individuals to help in the fight against COVID-19.

Data Catalog

Name Description Source Frequency
bi_usa_daily_trends
Daily US trends aggregated aggregated by county & state, covering key COVID tracking metrics and population
CSSE at Johns Hopkins University
Daily
bi_usa_snapshot
Snapshot of U.S. COVID tracking data by county & state, covering key COVID tracking metrics and population. View built on world_covid and usa_pop tables.
View
Real-time
india_covid
India COVID data
Govt. of India
Daily
india_pop
India population data
Govt. of India
On-demand
usa_pop
U.S. population data
United States Census Bureau
On-demand
world_covid
World COVID tracking data
CSSE at Johns Hopkins University
Daily
world_pop
World population data
World Bank
On-demand

Sample Queries

Total confirmed cases and new cases by state
SELECT state_name State, sum(confirmed) TotalCases, sum(confirmed_new) NewCases FROM `covid-assistant.covid.bi_usa_snapshot`
group by state_name

New cases trend in the U.S.
SELECT date, sum(confirmed_new) NewCases FROM `covid-assistant.covid.bi_usa_daily_trends`
group by date

Traffic & Safety Dataset

Traffic and safety dataset is a high-demand automotive curated dataset, making it easy to access and discover deep insights into vehicle safety, driver behavior and competitors. Dataset contain historical data sourced from authentic and trusted sources like The National Highway Traffic Safety Administration (NHTSA), the National Center for Statistics and Analysis (NCSA), and the Bureau of Economic Analysis (BEA).

Traffic and safety dataset is supports wide range of analysis including – Design and liability risk, Geo & Demographics, Driver Behavior, Crash Analysis and Competitor analysis.

Fatality Analysis Reporting System (FARS) data is made available to the public by National Highway Traffic Safety Administration (NHTSA). Over the years, changes have been made to the type of data collected and the way the data is presented in the data files. Some data files have been discontinued and new ones have been created. For the current data collection year there are 20 data files or 20 Tables.

  • The Fatality Analysis Reporting System (FARS) contains data derived from a census of fatal traffic crashes within the 50 States, the District of Columbia, and Puerto Rico.
  • To be included in FARS, a crash must involve a motor vehicle traveling on a trafficway customarily open to the public and must result in the death of at least one person (occupant of a vehicle or a non-motorist) within 30 days of the crash.
  • As part of our Dataflix-data360 initiative, we have collected all the data from Fatality Analysis Reporting System (FARS) system, processed the data and present you as Safety and Behavior Analytics Dataset.
  • This dataset contains information about the accident, no of persons involved, cause of accident, Driver alcohol behavior, road conditions, info about pedestrians, damage to the car and public/private property etc.

Key Metrics Covered

Geo & Demographics

  • Avg. accidents per year by state (map)
    Road type analysis – interstate, intersection, junction
  • Urban vs. Rural analysis
  • Crashes by age group
  • Crashes by gender

Driver Behavior

  • Ratio of normal driver vs. drowsy driver (pie chart)
  • Ratio of normal driver vs. distracted driver (pie chart)
  • Percentage of accidents by Motorcycle, Pedestrian, Pedal-cyclist, Police Pursuit (bubble graph)
  • Percentage of accidents by hit and run.
  • Percentage of drivers with alcohol use, split by state.
  • Helmet usage ratio

Crash Analysis

  • Accidents by year for the past 20 years
  • Accidents by crash type
  • Accidents by crash
  • Accidents by time of the day
  • Accidents by vehicle type

Tables

ACCIDENT: This table contains information about crash characteristics and environmental conditions at the time of the crash. There is one record per crash. For this table, data is present from 1975 to 2018

VEHICLE: This table contains information describing the in-transport motor vehicles and the drivers of in-transport motor vehicle who are involved in the crash. There is one record per in-transport motor vehicle. For this table, data is present from 1975 to 2018

PERSON: This table contains information describing all persons involved in the crash including motorists (i.e., drivers and passengers of in-transport motor vehicles) and non-motorists (e.g., pedestrians and pedal cyclists). It provides information such as age, sex, vehicle occupant restraint use, and injury severity. There is one record per person. Data is present from 1975 to 2018

PARKWORK: This is a new table. Data is present from 2014. This table contains information about parked and working vehicles that were involved in crashes. There is one record per parked/working vehicle.

PBTYPE: Data is present from 2014. This table contains information about crashes between motor vehicles and pedestrians, people on personal conveyances and bicyclists. There is one record for each pedestrian, bicyclist or person on a personal conveyance.

CEVENT: Data is present from 2010. This table contains information for all of the qualifying events (i.e., both harmful and non-harmful involving in-transport motor vehicles) which occurred in the crash. It details the chronological sequence of events resulting from an un-stabilized situation that constitutes a motor vehicle traffic crash. There is one record per event. Included in each record is a description of the event or object contacted the vehicles involved, and the vehicles’ area of impact.

VEVENT: Data is present from 2010. This table contains the sequence of events for each in- transport motor vehicle involved in the crash. In addition, this table has a data element that records the sequential event number for each vehicle (VEVENTNUM). There is one record for each event for each in-transport motor vehicle.

VSOE: Data is present from 2010. This table contains the sequence of events for each in- transport motor vehicle involved in the crash. There is one record for each event for each in-transport motor vehicle.

DAMAGE: Data is present from 2012. This table contains information about all of the areas on this vehicle that were damaged in the crash. There is one record per damaged area.

DISTRACT: Data is present from 2010. This table contains information about driver distractions. There is at least one record per in-transport motor vehicle. Each distraction is a separate record.

DRIMPAIR: Data is present from 2010. This table contains information about physical impairments of drivers of motor vehicles. There is one record per impairment and there is at least one record for each driver of an in-transport motor vehicle.

FACTOR: Data is present from 2010. This table contains information about vehicle circumstances which may have contributed to the crash. There is at least one record per in-transport motor vehicle. Each factor is a separate record.

MANEUVER: Data is present from 2010. This table contains information about actions taken by the driver to avoid something or someone in the road. There is at least one record per in-transport motor vehicle. Each maneuver is a separate record.

NMIMPAIR: Data is present from 2010. This table contains information about physical impairments of people who are not occupants of motor vehicles. There is one record per impairment and there is at least one record for each person who is not an occupant of a motor vehicle.

NMPRIOR: Data is present from 2010. This table contains information about the actions of people who are not occupants of motor vehicles (e.g., pedestrians and bicyclists) at the time of their involvement in the crash. There is one record per action and there is at least one record for each person who is not an occupant of a motor vehicle.

NMCRASH: Data is present from 2010. This table contains information about any contributing circumstances or improper actions of people who are not occupants of motor vehicles (e.g., pedestrians and bicyclists) noted on the police report. There is one record per action and there is at least one record for each person who is not an occupant of a motor vehicle.

SAFETYEQ: Data is present from 2010. This table contains information about safety equipment used by people who are not occupants of motor vehicles. There is one record for each person who is not an occupant of a motor vehicle.

VIOLATN: Data is present from 2010. This table contains information about violations which were charged to drivers. There is at least one record per in-transport motor vehicle. Each violation is a separate record.

VINDECODE: Data is present from 2013. This table contains vehicle descriptors for all vehicles, mainly passenger vehicles, trucks and motorcycles, based on the vehicle’s VIN which is decoded using the VINtelligence program. There is one record per vehicle.

VISION: Data is present from 2010. This table contains information about circumstances which may have obscured the driver’s vision. There is at least one record per in-transport motor vehicle. Each obstruction is a separate record.

DRUGS: Data is present from 2018. This table contains the specimens tested and the drug results from toxicology reports of all persons involved in the crash. There is one record per specimen tested and its corresponding drug result.

Performance

All tables in “Traffic and Safety” dataset are partitioned on “L_YEAR” column. This will help in improving the performance of queries and to reduce the cost of querying. Example: Accident table contains around 900MB for years between 1975 and 2018, when filtered on year, data size would be reduced to 20MB to 25MB.

Sample Queries

SELECT * FROM `dataflix-public-datasets.traffic_safety.accident`
WHERE L_YEAR = 2018

SELECT * FROM `dataflix-public-datasets.traffic_safety.accident`
WHERE L_YEAR between 2010 and 2018