Skip to content

seenu433/apim-genai

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 

Repository files navigation

Azure APIM GenAI policies

Architecture Components

This repository contains the Bicep templates for deploying the following components:

  • Azure APIM Developer Sku
  • Azure Application Insights
  • Azure OpenAI Cognitive Services (3)

APIM GenAI capabilities demonstrated in this repository include:

  • Load balancing and Circuit breaker policies
  • Token Rate limiting policies
  • Emit Token metrics to Application Insights
  • User Managed Identity Authentication for Cognitive Services

Deployment

Deploy the template to the Azure subscription using the following command:

    cd infra
    az deployment sub create --name apim-genai --template-file main.bicep --parameters parameters.json --location francecentral

Testing

  1. Open the AzureOpenAI API in the Azure APIM instance and navigate to the test tab.

  2. Select the POST method for Creates a completion for the chat message

  3. Fill the template parameters as follows

    • deployment-id: chat
    • api-version: 2024-02-01
  4. Overwrite the Request body with the below

    {"temperature":1,"top_p":1,"stream":false,"stop":null,"max_tokens":500,"presence_penalty":0,"frequency_penalty":0,"logit_bias":{},"user":"user-1234","messages": [{"role":"system","content":"You are an AI assistant that helps people find information"},{"role":"user","content":"Negate the following sentence.The price for bubblegum increased on thursday."}],"n":1}
  5. Click send and observe the response

    • x-ms-region changing to the region of AOAI service that served the request
  6. Metrics can be observed in the Application Insights instance created in the same resource group as the APIM instance.

    • Metric blade -> select genaitest in the Metric Namespace dropdown
    • Logs blade -> Tables -> customMetrics has the custom metrics emitted by the APIM policies. customDimensions field contains the dimension which can be used to aggregate the metrics.
  7. Update the azure-openai-token-limit policy for the API to use 100 as the tokens-per-minute and the second request should return a 429 response which is issued by APIM,

Streaming

For metric collection with streaming, follow the below instructions:

  1. Change the buffering property on the forward-request policy as mentioned below

    <forward-request buffer-response="false" />
  2. Use the curl command below to test streaming

    curl -N https://{yourapimname}.azure-api.net/openai/deployments/chat/chat/completions?api-version=2024-02-01 -H "Content-Type: application/json" -H "ocp-apim-subscription-key: {yoursubscriptionkey}" -d "{\"temperature\":1,\"top_p\":1,\"stream\":true,\"stop\":null,\"max_tokens\":2000,\"presence_penalty\":0,\"frequency_penalty\":0,\"logit_bias\":{},\"user\":\"user-1234\",\"messages\": [{\"role\":\"system\",\"content\":\"You are an AI assistant that helps people find information\"},{\"role\":\"user\",\"content\":\"Write a 10 page story on greek mythology.\"}],\"n\":1}"

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages