In this homework, we're going to extend Module 5 Homework and learn about streaming with PySpark.
Instead of Kafka, we will use Red Panda, which is a drop-in replacement for Kafka.
Ensure you have the following set up (if you had done the previous homework and the module):
For this homework we will be using the files from Module 5 homework:
Let's start redpanda in a docker container.
There's a docker-compose.yml
file in the homework folder (taken from here)
Copy this file to your homework directory and run
docker-compose up
(Add -d
if you want to run in detached mode)
Now let's find out the version of redpandas.
For that, check the output of the command rpk help
inside the container. The name of the container is redpanda-1
.
Find out what you need to execute based on the help
output.
What's the version, based on the output of the command you executed? (copy the entire version)
v22.3.5 (rev 28b2443)