Answer

  1. Which tag has the following text? - Automatically remove the container when it exits

    docker run —help

    Untitled

    d —rm

  2. Run docker with the python:3.9 image in an interactive mode and the entrypoint of bash. Now check the python modules that are installed ( use pip list ).What is version of the package wheel ?

    create python docker container then list package

    Untitled

    docker run -it --rm python:3.9 bash -c "pip list”

    —it ⇒ for running in interactive mode

    —rm ⇒ for auto remove when container complete running

    —bash ⇒ entrypoint to terminal

  3. Remember that lpep_pickup_datetime and lpep_dropoff_datetime columns are in the format timestamp (date and hour+min+sec) and not in date.

    prepare the data first like in the question. here my script to create postgres via docker and ingest data from csv to that db. Available in my github repo too.

  4. Which was the pick up day with the largest trip distance Use the pick up time for your calculations.

    select
    lpep_pickup_datetime, sum(trip_distance)
    from
    taxi_data
    group by lpep_pickup_datetime
    order by  sum(trip_distance) desc;
    
  5. Consider lpep_pickup_datetime in '2019-09-18' and ignoring Borough has Unknown. Which were the 3 pick up Boroughs that had a sum of total_amount superior to 50000?

    select
    z.borough, sum(t.total_amount)
    from taxi_data t
    	left join zone z 
    	on z.locationid  = t."PULocationID"
    where t.lpep_pickup_datetime::DATE ='2019-09-18'
    group by z.borough
    having sum(t.total_amount)>50000
    order by sum(t.total_amount) desc;
    
  6. For the passengers picked up in September 2019 in the zone name Astoria which was the drop off zone that had the largest tip? We want the name of the zone, not the id.

    SELECT 
    t."index", 
    z.zone AS "PULocation", 
    d."DOLocation",
    sum(t.tip_amount) 
    FROM taxi_data t
    LEFT JOIN ZONE z 
    	ON t."PULocationID" = z.locationid 
    LEFT JOIN (
    		SELECT 
    		taxi_data."index","zone" AS "DOLocation"
    		FROM taxi_data 
    		LEFT JOIN zone  
    			ON taxi_data."DOLocationID" = zone.locationid 
    		WHERE taxi_data .lpep_pickup_datetime BETWEEN '2019-09-01' AND '2019-09-30'
    )d
    		ON t."index" = d."index"
    WHERE t.lpep_pickup_datetime BETWEEN '2019-09-01' AND '2019-09-30'
    	AND z.zone = 'Astoria'
    GROUP BY t."index", z.zone , d."DOLocation"
    ORDER BY sum(t.tip_amount) desc
    LIMIT 1
    
  7. We have to make some preparation:

    here the main terraform command

    terraform init
    
    terraform plan
    
    terrafoem apply
    
    terraform destroy