Skip to content

SMBUD project held at Politecnico di Milano (a.y. 2022/2023)

Notifications You must be signed in to change notification settings

gabrieleginestroni/SMBUDproject22-23

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

99 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LaTeX

Systems and Methods for Big and Unstructured Data project

Systems and methods for Big and Unstructured Data project held at Politecnico di Milano in a.y. 2022/2023. The aim of the project is to design and implement NoSQL databases for different scenarios.

Teacher Marco Brambilla

Application Overview

This project aims to build databases with different technologies that handle scientific articles contained in the DBLP bibliography. The focus is on creating a database which allows efficient information retrieval of the articles.

First assignment - Neo4J

Neo4J

Design, store and query graph data structures in a NoSQL DB for DBLP bibliography.

Tasks to perform:

  • Design conceptual model
  • Store a sample dataset in Neo4J
  • Write basic data creation\update Commands (minimum 5)
  • Write basic Queries (minimum 10)
  • Check complexity and performance time

Second assignment - MongoDB

MongoDB

Design, store and query documental data structures in a NoSQL DB for DBLP bibliography.

Tasks to perform:

  • Design conceptual model
  • Store a sample dataset in MongoDB
  • Write basic data creation\update Commands (minimum 5)
  • Write basic Queries (minimum 10)
  • Check complexity and performance time

Third assignment - Spark

Apache Spark

Design, store and query data structures in a NoSQL DB for DBLP bibliography using Spark.

Tasks to perform:

  • Design conceptual model
  • Store a sample dataset in Spark
  • Write basic data creation\update Commands (minimum 5)
  • Write 10 Queries with the following requirements (provided using their equivalents for simplicity):
    • WHERE, JOIN
    • WHERE, LIMIT, LIKE
    • WHERE, IN, Nested Query
    • GROUP BY, 1 JOIN, AS
    • WHERE, GROUP BY
    • GROUP BY, HAVING, AS
    • WHERE, GROUP BY, HAVING, AS
    • WHERE, Nested Query (i.e., 2-step Queries), GROUP BY
    • WHERE, GROUP BY, HAVING, 1 JOIN
    • WHERE, GROUP BY, HAVING, 2 JOINs
  • Check complexity and performance time

Documents

The final version includes:

Project presentation slides: presentation

Tools

  • LaTeX - IntelliJ
  • GraphDB - Neo4j
  • Documental DB - MongoDB
  • Conceptual Models - draw.io

Authors

About

SMBUD project held at Politecnico di Milano (a.y. 2022/2023)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • TeX 70.7%
  • Jupyter Notebook 29.3%