While browsing on LinkedIn I came across a demo from Cinemersive Labs showing 3D six-degrees-of-freedom photographs created from single photo shots. Wanting to make an intro towards generative 3D, this repo will attempt to follow a similar idea to create interactive scenes from a single (or more) image.
The project includes docker and docker-compose files for easy setup and a .devcontainer configuration folder for use in VSCode that also auto-installs extensions and sets up workspace settings.
I recommend using Visual Studio Code (VS Code) with the Remote Development extension to work on this project. This allows you to open the project as a container and have all dependencies automatically set up.
To use VS Code, follow these steps:
- Open VS Code.
- Install the Remote Development extension if you haven't already.
- Press Ctrl/Cmd + Shift + P and select "Dev Containers: Open Folder in Container..." from the command palette.
- Wait for the container to set up and start development.
- Find out why the outputs from of ZoeDepth from the original repo are different that the one from the 🤗 implementation (+fix). Track the updates in this issue on 🤗 Transformers
- Begin by cloning the "Image to 3D" tab functionality of the ZoeDepth 🤗 demo
- Understand "intrinsic and extrinsic camera parameters" and how to use them in 3D a bit better
- Render images from 3D mesh
- Use Pytorch3D whenever possible
- Finish cleaning up
Image_to_3D
notebook and complete transition to scripts for main functionality - Fill missing parts of the image using Stable Diffusion or other similar generative model
- Probably also use a depth control net for the generated images
- Could/should we use a video generation model instead?