Skip to content

Latest commit

 

History

History
59 lines (41 loc) · 3.25 KB

gpt4v.md

File metadata and controls

59 lines (41 loc) · 3.25 KB

Using GPT vision model with RAG approach

This repository now includes an example of integrating a GPT Vision model with Azure AI Search. This feature enables indexing and searching images and graphs, such as financial documents, in addition to text-based content, and then sending the retrieved content to the GPT model for response generation.

Feature Overview

  • Document Handling: Source documents are split into pages and saved as PNG files in blob storage. Each file's name and page number are embedded for reference.
  • Data Extraction: Text data is extracted using OCR.
  • Data Indexing: Text and image embeddings, generated using Azure AI Vision (Azure AI Vision Embeddings), are indexed in Azure AI Search along with the raw text.
  • Search and Response: Searches can be conducted using vectors or hybrid methods. Responses are generated by GPT vision model based on the retrieved content.

Getting Started

Prerequisites

Setup and Usage

  1. Update repository: Pull the latest changes.

  2. Enable GPT vision approach:

    First, make sure you do not have integrated vectorization enabled, since that is currently incompatible:

    azd env set USE_FEATURE_INT_VECTORIZATION false

    Then set the environment variable for enabling vision support:

    azd env set USE_GPT4V true

    When set, that flag will provision a Computer Vision resource and gpt-4o model, upload image versions of PDFs to Blob storage, upload embeddings of images in a new imageEmbedding field, and enable the vision approach in the UI.

  3. Clean old deployments (optional): Run azd down --purge for a fresh setup.

  4. Start the application: Execute azd up to build, provision, deploy, and initiate document preparation.

  5. Web Application Usage: GPT4V configuration screenshot

    • Access the developer options in the web app and select "Use GPT vision model".
    • Sample questions will be updated for testing.
    • Interact with the questions to view responses.
    • The 'Thought Process' tab shows the retrieved data and its processing by the GPT vision model.

Feel free to explore and contribute to enhancing this feature. For questions or feedback, use the repository's issue tracker.