Question 1

Create a materialized view to compute the average, min and max trip time between each taxi zone.From this MV, find the pair of taxi zones with the highest average trip time. You may need to use the dynamic filter pattern for this.

Bonus (no marks): Create an MV which can identify anomalies in the data. For example, if the average trip time between two zones is 1 minute, but the max trip time is 10 minutes and 20 minutes respectively.

Options:

  1. Yorkville East, Steinway
  2. Murray Hill, Midwood
  3. East Flatbush/Farragut, East Harlem North
  4. Midtown Center, University Heights/Morris Heights

p.s. The trip time between taxi zones does not take symmetricity into account, i.e. A -> B and B -> A are considered different trips. This applies to subsequent questions as well.

Answer:

CREATE MATERIALIZED VIEW trip_stats AS 
WITH s AS (
	SELECT pulocationid, dolocationid , 
	avg( EXTRACT (EPOCH FROM (tpep_dropoff_datetime-tpep_pickup_datetime) ))AS average 
	FROM trip_data
	GROUP BY pulocationid, dolocationid
),
trip_calc AS (
	SELECT 
		zone_pickup.zone AS zone_pickup ,
		zone_dropoff.zone AS zone_dropoff ,
		avg( EXTRACT (EPOCH FROM (tpep_dropoff_datetime-tpep_pickup_datetime) ))AS average,
		min( EXTRACT (EPOCH FROM (tpep_dropoff_datetime-tpep_pickup_datetime) )) AS min,
		max( EXTRACT (EPOCH FROM (tpep_dropoff_datetime-tpep_pickup_datetime) )) AS max
	FROM trip_data 
	JOIN taxi_zone AS zone_pickup ON trip_data.pulocationid = zone_pickup.location_id 
	JOIN taxi_zone AS zone_dropoff ON trip_data.dolocationid = zone_dropoff.location_id 
	GROUP BY 
		zone_pickup.zone,
		zone_dropoff.zone
		)
SELECT * FROM trip_calc WHERE trip_calc.average = (SELECT max(average) FROM s)

Untitled

Question 2

Recreate the MV(s) in question 1, to also find the number of trips for the pair of taxi zones with the highest average trip time. Options:

  1. 5
  2. 3
  3. 10
  4. 1
CREATE MATERIALIZED VIEW trip_stats AS 
WITH s AS (
	SELECT pulocationid, dolocationid , 
	avg( EXTRACT (EPOCH FROM (tpep_dropoff_datetime-tpep_pickup_datetime) ))AS average 
	FROM trip_data
	GROUP BY pulocationid, dolocationid
),
trip_calc AS (
	SELECT 
		zone_pickup.zone AS zone_pickup ,
		zone_dropoff.zone AS zone_dropoff ,
		avg( EXTRACT (EPOCH FROM (tpep_dropoff_datetime-tpep_pickup_datetime) ))AS average,
		min( EXTRACT (EPOCH FROM (tpep_dropoff_datetime-tpep_pickup_datetime) )) AS min,
		max( EXTRACT (EPOCH FROM (tpep_dropoff_datetime-tpep_pickup_datetime) )) AS max,
		count(*)
	FROM trip_data 
	JOIN taxi_zone AS zone_pickup ON trip_data.pulocationid = zone_pickup.location_id 
	JOIN taxi_zone AS zone_dropoff ON trip_data.dolocationid = zone_dropoff.location_id 
	GROUP BY 
		zone_pickup.zone,
		zone_dropoff.zone
		)
SELECT * FROM trip_calc WHERE trip_calc.average = (SELECT max(average) FROM s)

Untitled