Create a materialized view to compute the average, min and max trip time between each taxi zone.From this MV, find the pair of taxi zones with the highest average trip time. You may need to use the dynamic filter pattern for this.
Bonus (no marks): Create an MV which can identify anomalies in the data. For example, if the average trip time between two zones is 1 minute, but the max trip time is 10 minutes and 20 minutes respectively.
Options:
p.s. The trip time between taxi zones does not take symmetricity into account, i.e. A -> B
and B -> A
are considered different trips. This applies to subsequent questions as well.
CREATE MATERIALIZED VIEW trip_stats AS
WITH s AS (
SELECT pulocationid, dolocationid ,
avg( EXTRACT (EPOCH FROM (tpep_dropoff_datetime-tpep_pickup_datetime) ))AS average
FROM trip_data
GROUP BY pulocationid, dolocationid
),
trip_calc AS (
SELECT
zone_pickup.zone AS zone_pickup ,
zone_dropoff.zone AS zone_dropoff ,
avg( EXTRACT (EPOCH FROM (tpep_dropoff_datetime-tpep_pickup_datetime) ))AS average,
min( EXTRACT (EPOCH FROM (tpep_dropoff_datetime-tpep_pickup_datetime) )) AS min,
max( EXTRACT (EPOCH FROM (tpep_dropoff_datetime-tpep_pickup_datetime) )) AS max
FROM trip_data
JOIN taxi_zone AS zone_pickup ON trip_data.pulocationid = zone_pickup.location_id
JOIN taxi_zone AS zone_dropoff ON trip_data.dolocationid = zone_dropoff.location_id
GROUP BY
zone_pickup.zone,
zone_dropoff.zone
)
SELECT * FROM trip_calc WHERE trip_calc.average = (SELECT max(average) FROM s)
Recreate the MV(s) in question 1, to also find the number of trips for the pair of taxi zones with the highest average trip time. Options:
CREATE MATERIALIZED VIEW trip_stats AS
WITH s AS (
SELECT pulocationid, dolocationid ,
avg( EXTRACT (EPOCH FROM (tpep_dropoff_datetime-tpep_pickup_datetime) ))AS average
FROM trip_data
GROUP BY pulocationid, dolocationid
),
trip_calc AS (
SELECT
zone_pickup.zone AS zone_pickup ,
zone_dropoff.zone AS zone_dropoff ,
avg( EXTRACT (EPOCH FROM (tpep_dropoff_datetime-tpep_pickup_datetime) ))AS average,
min( EXTRACT (EPOCH FROM (tpep_dropoff_datetime-tpep_pickup_datetime) )) AS min,
max( EXTRACT (EPOCH FROM (tpep_dropoff_datetime-tpep_pickup_datetime) )) AS max,
count(*)
FROM trip_data
JOIN taxi_zone AS zone_pickup ON trip_data.pulocationid = zone_pickup.location_id
JOIN taxi_zone AS zone_dropoff ON trip_data.dolocationid = zone_dropoff.location_id
GROUP BY
zone_pickup.zone,
zone_dropoff.zone
)
SELECT * FROM trip_calc WHERE trip_calc.average = (SELECT max(average) FROM s)