This public data includes pitch-by-pitch data for Major League Baseball (MLB) games in 2016.
This dataset contains the following tables:
| Table Name | Description |
|---|---|
| games_wide | Every pitch, steal, or lineup event for each at bat in the 2016 regular season.* |
| games_post_wide | Every pitch, steal, or lineup event for each at-bat in the 2016 post season.* |
| schedules | The schedule for every team in the regular season. |
*The schemas for the games_wide and games_post_wide tables are identical.
With this data you can effectively replay a game and rebuild basic statistics for players and teams.
You can start exploring this data in the BigQuery console:
Go to the Baseball public dataset
Sample Queries
Here are some examples of SQL queries you can run on this data in BigQuery.
These samples use BigQuery’s support for standard SQL. Use the #standardSQL
tag to let BigQuery know you want to use standard SQL. For more information
about the #standardSQL prefix, see
Setting a query prefix.
What types of pitches were thrown, how many, top speeds?
#standardSQL
SELECT
stats.pitchTypeDescription as ptDescription,
COUNT(stats.pitchType) as thrown,
MAX(stats.pitchSpeed) as speed_max,
SUM(is_hit) as hits,
SUM(out_on_strikes) as strike_outs,
SUM(is_ground_out) as ground_outs,
ROUND(SUM(is_ground_out)/ COUNT(stats.pitchType),4) as gdPerPitch,
SUM(is_walk) as walks
FROM (
SELECT
inningNumber,
inningHalf,
inningHalfEventSequenceNumber as isn,
inningEventType,
atBatEventSequenceNumber as absn,
description,
atBatEventType,
pitchType,
pitchTypeDescription,
pitchSpeed,
pitchZone,
balls,
strikes,
outs,
status,
outcomeId,
outcomeDescription,
is_hit,
is_double_play,
(if (strikes = 3, 1, 0)) as out_on_strikes,
(if (balls = 4, 1, 0)) as is_walk,
(if (pitchZone = 1 OR pitchZone = 2 or pitchZone = 3 or pitchZone = 10, 1, 0)) as high_zone,
(if (pitchZone = 7 OR pitchZone = 8 or pitchZone = 9 or pitchZone = 12, 1, 0)) as low_zone,
(if (outcomeId = "oGO", 1, 0)) as is_ground_out
FROM
`bigquery-public-data.baseball.games_wide`
WHERE
atBatEventType = "PITCH"
ORDER BY
inningNumber ASC,
inningHalf DESC,
inningHalfEventSequenceNumber ASC,
atBatEventSequenceNumber ASC) as stats
GROUP BY
ptDescription
ORDER BY
thrown DESC
Only the top ten results are shown here, along with the first several columns:

What was the at bat and pitching sequence for the fastest pitch(es) in the 2016 season?
#standardSQL
SELECT
pitcherLastName,
inningNumber,
inningHalf as half,
inningHalfEventSequenceNumber as seq,
hitterPitchCount,
pitcherPitchCount,
pitchSpeed,
pitchTypeDescription,
pitchZone,
hitterLastName,
outcomeDescription
FROM
`bigquery-public-data.baseball.games_wide`
WHERE
gameId = "a178f845-e16f-409d-9d12-3022d1af1dea"
AND inningNumber = 9
AND inningHalf = "TOP"
AND inningEventType = "AT_BAT"
AND atBatEventType = "PITCH"
ORDER BY
inningHalfEventSequenceNumber ASC,
atBatEventSequenceNumber ASC
Only the top ten results are shown here, along with the first several columns:

About the dataset
Dataset Source: Sportradar LLC
Category: Sports
Use: Copyright Sportradar LLC. Access to data is intended solely for internal research and testing purposes, and is not to be used for any business or commercial purpose. Data are not to be exploited in any manner without express approval from Sportradar. Display of data must include the phrase, “Data provided by Sportradar LLC,” and be hyperlinked to www.sportradar.com.
Update Frequency: Monthly (as needed)
View in BigQuery: Go to baseball data