MP4-Distributed-Systems

RainStorm - A Stream Processing framework

RainStorm operates using a controller-worker architecture. In this setup, one machine acts as the controller, where the user submits jobs to the framework. The controller processes the job input and creates a job information map, which outlines the operations to be performed and the tasks required for each operation. Additionally, the map assigns machines to specific operations. Once the map is complete, the controller distributes it to all machines participating in the job.

Upon receiving the job information map, each worker identifies its responsibilities by scanning the map to determine the operations it is assigned. The operations can be categorized as source, sink, or intermediate:

Source machines fetch input files, read parts of these files, and forward each line to the next stage.
Intermediate machines receive input from the previous stage, process it according to the job information map, store the output in HyDFS, and forward the result to the next stage.
Sink machines receive processed input, perform the final operation as specified in the job map, and stream the output to the controller.

Each stage ensures seamless data flow:

Outputs are hashed to determine the relevant task in the next operation.
A unique key is generated for each output before passing it to the subsequent operation.
The machine locally stores the output until it receives an acknowledgment (ACK) from the machine handling the next stage.

When processing input, a machine stores the corresponding key in its local memory and sends an ACK to the previous stage upon successful processing. Retaining these keys ensures that every key is processed exactly once, even in the event of failures. All information is also logged in HyDFS for durability and recovery purposes.

Project Setup for all machines

Ensure that you have Go Lang Installed on your machine.
Clone this repo on each machine.
cd into mp4-distributed-systems folder on all machines.

cd mp4-distributed-systems

Start the introducer on machine 1 -

./start_introducer.sh

Ensure that your log files are present in the same directory on all machines. Example - /root/

How to run Rain Storm Stream Processing System

Now that the introducer is started, you can execute go run main.go and join from any of the machines but before that you have to follow these steps -

Create a .env file in mp1-distributed-system folder and add the following on all machines-

INTRODUCER_ADDR=127.0.0.1

Enter the introducer IP / Hostname. For example -

INTRODUCER_ADDR=fa24-cs425-9201

This has to be performed once on the machine where the distributed disseminator is to be run.

Run ./worker_run.sh on the machine

./worker_run.sh

Various commands will be displayed for performing various actions related to RainStorm and HyDFS.
To run Rain Storm, you have to press command 17, and enter the Operation One's File Name & its Type, Operation Two's File Name & its Type, HyDFS Source File Name, HyDFS Destination File Name and Number of Tasks.
After some amount of time, the leader machine will start streaming keys on the basis of operation 1 and operation 2.

Group G92

Rishi Mundada ([email protected]) Daksh Pokar ([email protected])

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
data		data
disseminator		disseminator
executor		executor
failurehandler		failurehandler
fileserver		fileserver
globals		globals
helpers		helpers
introducer		introducer
logger		logger
operations		operations
pingAck		pingAck
rainstorm		rainstorm
scripts		scripts
types		types
utils		utils
MP3graph.png		MP3graph.png
MP4.CS425.FA24.pdf		MP4.CS425.FA24.pdf
README.md		README.md
cs425_MP4_report.pdf		cs425_MP4_report.pdf
go.mod		go.mod
go.sum		go.sum
main.go		main.go
start_introducer.sh		start_introducer.sh
test.txt		test.txt
testfile.txt		testfile.txt
worker_run.sh		worker_run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MP4-Distributed-Systems

Project Setup for all machines

How to run Rain Storm Stream Processing System

Group G92

About

Uh oh!

Releases

Packages

Uh oh!

Languages

dakshpokar/RainStorm-Realtime-Stream-Processing

Folders and files

Latest commit

History

Repository files navigation

MP4-Distributed-Systems

Project Setup for all machines

How to run Rain Storm Stream Processing System

Group G92

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages