Using GPT vision model with RAG approach

This repository now includes an example of integrating a GPT Vision model with Azure AI Search. This feature enables indexing and searching images and graphs, such as financial documents, in addition to text-based content, and then sending the retrieved content to the GPT model for response generation.

Feature Overview

Document Handling: Source documents are split into pages and saved as PNG files in blob storage. Each file's name and page number are embedded for reference.
Data Extraction: Text data is extracted using OCR.
Data Indexing: Text and image embeddings, generated using Azure AI Vision (Azure AI Vision Embeddings), are indexed in Azure AI Search along with the raw text.
Search and Response: Searches can be conducted using vectors or hybrid methods. Responses are generated by GPT vision model based on the retrieved content.

Getting Started

Prerequisites

Create a Computer Vision account in Azure Portal first, so that you can agree to the Responsible AI terms for that resource. You can delete that account after agreeing.
The ability to deploy a gpt-4o model in the supported regions. If you're not sure, try to create a gpt-4o deployment from your Azure OpenAI deployments page.
Ensure that you can deploy the Azure OpenAI resource group in a region where all required components are available:
- Azure OpenAI models
  - gpt-35-turbo
  - text-embedding-ada-002
  - gpt-4o
- Azure AI Vision

Setup and Usage

Update repository: Pull the latest changes.
Enable GPT vision approach:

First, make sure you do not have integrated vectorization enabled, since that is currently incompatible:
```
azd env set USE_FEATURE_INT_VECTORIZATION false
```
Then set the environment variable for enabling vision support:
```
azd env set USE_GPT4V true
```
When set, that flag will provision a Computer Vision resource and gpt-4o model, upload image versions of PDFs to Blob storage, upload embeddings of images in a new imageEmbedding field, and enable the vision approach in the UI.
Clean old deployments (optional): Run azd down --purge for a fresh setup.
Start the application: Execute azd up to build, provision, deploy, and initiate document preparation.
Web Application Usage:
- Access the developer options in the web app and select "Use GPT vision model".
- Sample questions will be updated for testing.
- Interact with the questions to view responses.
- The 'Thought Process' tab shows the retrieved data and its processing by the GPT vision model.

Feel free to explore and contribute to enhancing this feature. For questions or feedback, use the repository's issue tracker.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gpt4v.md

gpt4v.md

Using GPT vision model with RAG approach

Feature Overview

Getting Started

Prerequisites

Setup and Usage

Files

gpt4v.md

Latest commit

History

gpt4v.md

File metadata and controls

Using GPT vision model with RAG approach

Feature Overview

Getting Started

Prerequisites

Setup and Usage