This dataset is collected by the NYC Taxi and Limousine Commission (TLC) and includes trip records from all trips completed in Yellow and Green taxis in NYC from 2009 to present, and all trips in for-hire vehicles (FHV) from 2015 to present. Records include fields capturing pick-up and drop-off dates/times, pick-up and drop-off locations, trip distances, itemized fares, rate types, payment types, and driver-reported passenger counts. For detailed information about this dataset, go to TOC Trip Record Data
You can start exploring this data for taxis and FHVs in the BigQuery console:
Sample queries
Here are some examples of SQL queries you can run on this data in BigQuery.
These samples use BigQuery’s support for standard SQL. Use the
#standardSQL tag to let BigQuery know you want to use standard SQL. For more
information about the #standardSQL prefix, see
Setting a query prefix.
How many trips did Yellow taxis take each month in 2015?
This query returns monthly trip totals for all Yellow taxis in 2015.
Web UI
Open the following query in the Web UI
#standardSQL
SELECT
TIMESTAMP_TRUNC(pickup_datetime,
MONTH) month,
COUNT(*) trips
FROM
`bigquery-public-data.new_york.tlc_yellow_trips_2015`
GROUP BY
1
ORDER BY
1
Command-line
bq query --use_legacy_sql=false '
SELECT
TIMESTAMP_TRUNC(pickup_datetime,
MONTH) month,
COUNT(*) trips
FROM
`bigquery-public-data.new_york.tlc_yellow_trips_2015`
GROUP BY
1
ORDER BY
1'
The results are shown here:
+---------------------+----------+
| month | trips |
+---------------------+----------+
| 2015-01-01 00:00:00 | 12748986 |
| 2015-02-01 00:00:00 | 12450521 |
| 2015-03-01 00:00:00 | 13351609 |
| 2015-04-01 00:00:00 | 13071789 |
| 2015-05-01 00:00:00 | 13158262 |
| 2015-06-01 00:00:00 | 12324935 |
| 2015-07-01 00:00:00 | 11562783 |
| 2015-08-01 00:00:00 | 11130304 |
| 2015-09-01 00:00:00 | 11225063 |
| 2015-10-01 00:00:00 | 12315488 |
| 2015-11-01 00:00:00 | 11312676 |
| 2015-12-01 00:00:00 | 11460573 |
+---------------------+----------+
What was the average speed of Yellow taxi trips in 2015?
This query returns the average speed during each hour of the day for Yellow taxi trips in 2015.
Web UI
Open the following query in the Web UI
#standardSQL
SELECT
EXTRACT(HOUR
FROM
pickup_datetime) hour,
ROUND(AVG(trip_distance / TIMESTAMP_DIFF(dropoff_datetime,
pickup_datetime,
SECOND))*3600, 1) speed
FROM
bigquery-public-data.new_york.tlc_yellow_trips_2015
WHERE
trip_distance > 0
AND fare_amount/trip_distance BETWEEN 2
AND 10
AND dropoff_datetime > pickup_datetime
GROUP BY
1
ORDER BY
1
Command-line
bq query --use_legacy_sql=false '
SELECT
EXTRACT(HOUR
FROM
pickup_datetime) hour,
ROUND(AVG(trip_distance / TIMESTAMP_DIFF(dropoff_datetime,
pickup_datetime,
SECOND))*3600, 1) speed
FROM
`bigquery-public-data.new_york.tlc_yellow_trips_2015`
WHERE
trip_distance > 0
AND fare_amount/trip_distance BETWEEN 2
AND 10
AND dropoff_datetime > pickup_datetime
GROUP BY
1
ORDER BY
1'
During the day the average speed is around 11-12 MPH, but at 5 AM the average speed almost doubles to 21 MPH.
The results are shown here:
+------+-------+
| hour | speed |
+------+-------+
| 0 | 15.8 |
| 1 | 16.3 |
| 2 | 16.8 |
| 3 | 17.5 |
| 4 | 20.0 |
| 5 | 21.6 |
| 6 | 17.6 |
| 7 | 13.7 |
| 8 | 11.6 |
| 9 | 11.4 |
| 10 | 11.5 |
| 11 | 11.3 |
| 12 | 11.2 |
| 13 | 11.3 |
| 14 | 11.2 |
| 15 | 11.0 |
| 16 | 11.5 |
| 17 | 11.2 |
| 18 | 11.1 |
| 19 | 11.8 |
| 20 | 12.9 |
| 21 | 13.6 |
| 22 | 14.1 |
| 23 | 14.9 |
+------+-------+
What was the average speed of Yellow taxi trips in 2015 by day of week?
This query shows the average speed by the day of the week, for all Yellow taxi trips in 2015.
Web UI
Open the following query in the Web UI
#standardSQL
SELECT
EXTRACT(DAYOFWEEK
FROM
pickup_datetime) DAYOFWEEK,
ROUND(AVG(trip_distance / TIMESTAMP_DIFF(dropoff_datetime,
pickup_datetime,
SECOND))*3600, 1) speed
FROM
bigquery-public-data.new_york.tlc_yellow_trips_2015
WHERE
trip_distance > 0
AND fare_amount/trip_distance BETWEEN 2
AND 10
AND dropoff_datetime > pickup_datetime
GROUP BY
1
ORDER BY
1
Command-line
bq query --use_legacy_sql=false '
SELECT
EXTRACT(DAYOFWEEK
FROM
pickup_datetime) DAYOFWEEK,
ROUND(AVG(trip_distance / TIMESTAMP_DIFF(dropoff_datetime,
pickup_datetime,
SECOND))*3600, 1) speed
FROM
`bigquery-public-data.new_york.tlc_yellow_trips_2015`
WHERE
trip_distance > 0
AND fare_amount/trip_distance BETWEEN 2
AND 10
AND dropoff_datetime > pickup_datetime
GROUP BY
1
ORDER BY
1'
Sundays are the fastest days of the week, followed by Saturday and Monday. Why are Mondays faster than all the other week days?
The results are shown here:
+-----------+-------+
| DAYOFWEEK | speed |
+-----------+-------+
| 1 | 14.7 |
| 2 | 13.4 |
| 3 | 12.3 |
| 4 | 12.2 |
| 5 | 12.2 |
| 6 | 12.3 |
| 7 | 13.1 |
+-----------+-------+
About the data
Dataset Source: NYC OpenData
Category: Geographic, Transportation
Use: This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source — https://data.cityofnewyork.us/ — and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.
Update Frequency: Weekly
View in BigQuery: Go to New York City dataset