Skip to content

dharaneeshvrd/spark-examples

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Spark-Examples

SparkSQL using medical datasets provided by openpayments

Using SparkSQL, tried the following things

  • Converted and stored data from raw csv to compacted parquet files for small storage and efficient performance
    • Deafult partition
    • Year wise partition
  • Executed various analytical queries

Datasets

SparkStreaming

  • Explored DStreams and Structured Stream. Have tried one example for converting unstructured data stream to structured hdfs output using DStreams and Structured streaming
  • Used Yahoo's Weather API and Strutured Streaming to calculate every hour's average temperature using Window functions