#

spark-streaming

Here are 266 public repositories matching this topic...

databrickslabs / dbldatagen

Generate relevant synthetic data quickly for your projects. The Databricks Labs synthetic data generator (aka `dbldatagen`) may be used to generate large simulated / synthetic data sets for test, POCs, and other uses in Databricks environments including in Delta Live Tables pipelines

python spark faker pyspark spark-streaming data-generation databricks synthetic-data datagen datagenerator deltalake datageneration delta-live-tables

Updated Jun 8, 2024
Python

LearningJournal / Spark-Streaming-In-Python

Apache Spark 3 - Structured Streaming Course Material

python big-data apache-spark bigdata pyspark data-lake spark-streaming spark-sql

Updated Aug 19, 2023
Python

ApacheSpark

martandsingh / ApacheSpark

This repository will help you to learn about databricks concept with the help of examples. It will include all the important topics which we need in our real life experience as a data engineer. We will be using pyspark & sparksql for the development. At the end of the course we also cover few case studies.

sql database spark hive hadoop etl pyspark data-engineering spark-streaming data-analysis databricks datalake spark-sql timetravel apachespark etl-pipeline deltalake

Updated Dec 28, 2023
Python

jmcmt87 / spark_app_twitter

A data engineering project (Twitter monitor app)

python kafka mongodb s3 pandas pyspark spark-streaming altair

Updated Jun 27, 2022
Python

garystafford / kafka-connect-msk-demo

For a series of posts on Amazon MSK, Amazon EKS, and Amazon EMR

kubernetes aws kafka spark spark-streaming kafka-connect

Updated Jan 2, 2022
Python

yugokato / Spark-and-Kafka_IoT-Data-Processing-and-Analytics

Final Project for IoT: Big Data Processing and Analytics class. Analyzing U.S nationwide temperature from IoT sensors in real-time

python kafka bigdata pyspark spark-streaming iot-sensors

Updated Nov 21, 2016
Python

dogukannulu / streaming_data_processing

Create a streaming data, transfer it to Kafka, modify it with PySpark, take it to ElasticSearch and MinIO

python docker elasticsearch airflow kibana kafka spark hadoop zookeeper pyspark minio spark-streaming hdfs

Updated Jul 21, 2023
Python

kaantas / kafka-twitter-spark-streaming

Counting Tweets Per User in Real-Time

python twitter spark twitter-api pyspark spark-streaming apache-kafka tweepy

Updated Jul 28, 2017
Python

pathwaycom / pathway-benchmarks

Benchmarks for data processing systems: Pathway, Spark, Flink, Kafka Streams

streaming latency pagerank spark-streaming wordcount benchmark-framework flink kafka-streams streaming-data pathway

Updated Jun 25, 2024
Python

prophecy-io / spark-ai

Toolbox for building Generative AI applications on top of Apache Spark.

machine-learning spark pipeline data-engineering spark-streaming generative-ai

Updated Jan 10, 2024
Python

mikeroyal / Apache-Spark-Guide

Apache Spark Guide

data-science machine-learning awesome big-data spark apache-spark pyspark data-engineering spark-streaming awesome-list data-engineering-pipeline awesome-automations

Updated Feb 1, 2022
Python

brennerh1 / databricks-demos

Repository of notebooks and related collateral used in the Databricks Demo Hub, showing how to use Databricks, Delta Lake, MLflow, and more.

pyspark spark-streaming databricks pyspark-notebook databricks-notebooks databricks-demos

Updated May 27, 2021
Python

KentHsu / Udacity-Data-Streaming-Nanodegree

Udacity Data Streaming Nanodegree Program

spark-streaming kafka-connect apache-kafka kafka-rest-proxy data-streaming ksql faust-application

Updated Feb 20, 2021
Python

juan-csv / Architecture-for-real-time-video-streaming-analytics

Video processing (webcam) in real time using Kafka and Spark.

real-time kafka spark video-processing spark-streaming kafka-consumer kafka-producer video-streaming kafka-streams emotion-recognition

Updated Sep 14, 2020
Python

nicshub / tap

Slides, Code, Images and Meme related to the course Technologies for Advanced Programming

docker data kafka spark jupyter-notebook spark-streaming flume

Updated Jul 2, 2020
Python

harshkavdikar1 / Tweet-Analysis-With-Kafka-and-Spark

A real time analytics dashboard to analyze the trending hashtags and @ mentions at any location using kafka and spark streaming.

python kafka spark highcharts spark-streaming node-js analytics-dashboard spark-sql

Updated Nov 11, 2022
Python

xzZero / DataEng_IBM

Solution for IBM Data Engineer Professional Certificate

sql database nosql data-warehouse data-engineering spark-streaming etl-pipeline

Updated Nov 27, 2022
Python

zekeriyyaa / PySpark-Structured-Streaming-ROS-Kafka-ApacheSpark-Cassandra

A structured streaming was applied to the robot data from ROS-Gazebo simulation environment using Apache Spark. Data is collected in Kafka, analyzed by Apache Spark and stored in Cassandra.

python apache-spark cqlsh python3 ros pyspark spark-streaming kafka-consumer data-analysis apache-kafka kafka-producer apache-cassandra structured-streaming spark-sql spark-kafka-integration spark-cassandra-connector spark-cassandra ros-noetic spark-kafka-connector

Updated Feb 6, 2022
Python

samerelhousseini / Geospatial-Analysis-With-Spark

This is a data processing pipeline that implements an End-to-End Real-Time Geospatial Analytics and Visualization multi-component full-stack solution, using Apache Spark Structured Streaming, Apache Kafka, MongoDB Change Streams, Node.js, React, Uber's Deck.gl and React-Vis, and using the Massachusetts Bay Transportation Authority's (MBTA) APIs …

react nodejs kafka spark mongodb spark-streaming spark-sql deckgl react-vis

Updated Dec 11, 2022
Python

HashLoad / freeza-offset

Spark stream consumption commit in kafka consumer group

kafka spark spark-streaming databricks kafka-offset-commits kafka-commit

Updated Jul 10, 2020
Python

Improve this page

Add a description, image, and links to the spark-streaming topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the spark-streaming topic, visit your repo's landing page and select "manage topics."