docs[patch]: Update elasticsearch vector store docs (#6419)

* Update elasticsearch vector store docs * Fence compatibility markdown * Sidebar
langchain-ai · Aug 6, 2024 · b79ea85 · b79ea85
1 parent c5fb8bb
commit b79ea85
Show file tree

Hide file tree

Showing 4 changed files with 400 additions and 51 deletions.
diff --git a/docs/core_docs/docs/integrations/vectorstores/elasticsearch.ipynb b/docs/core_docs/docs/integrations/vectorstores/elasticsearch.ipynb
@@ -0,0 +1,398 @@
+{
+ "cells": [
+  {
+   "cell_type": "raw",
+   "id": "1957f5cb",
+   "metadata": {
+    "vscode": {
+     "languageId": "raw"
+    }
+   },
+   "source": [
+    "---\n",
+    "sidebar_label: Elasticsearch\n",
+    "sidebar_class_name: node-only\n",
+    "---"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ef1f0986",
+   "metadata": {},
+   "source": [
+    "# Elasticsearch\n",
+    "\n",
+    "```{=mdx}\n",
+    "\n",
+    ":::tip Compatibility\n",
+    "Only available on Node.js.\n",
+    ":::\n",
+    "\n",
+    "```\n",
+    "\n",
+    "[Elasticsearch](https://github.com/elastic/elasticsearch) is a distributed, RESTful search engine optimized for speed and relevance on production-scale workloads. It supports also vector search using the [k-nearest neighbor](https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm) (kNN) algorithm and also [custom models for Natural Language Processing](https://www.elastic.co/blog/how-to-deploy-nlp-text-embeddings-and-vector-search) (NLP).\n",
+    "You can read more about the support of vector search in Elasticsearch [here](https://www.elastic.co/guide/en/elasticsearch/reference/current/knn-search.html).\n",
+    "\n",
+    "This guide provides a quick overview for getting started with Elasticsearch [vector stores](/docs/concepts/#vectorstores). For detailed documentation of all `ElasticVectorSearch` features and configurations head to the [API reference](https://api.js.langchain.com/classes/langchain_community_vectorstores_elasticsearch.ElasticVectorSearch.html)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c824838d",
+   "metadata": {},
+   "source": [
+    "## Overview\n",
+    "\n",
+    "### Integration details\n",
+    "\n",
+    "| Class | Package | [PY support](https://python.langchain.com/v0.2/docs/integrations/vectorstores/elasticsearch/) |  Package latest |\n",
+    "| :--- | :--- | :---: | :---: |\n",
+    "| [`ElasticVectorSearch`](https://api.js.langchain.com/classes/langchain_community_vectorstores_elasticsearch.ElasticVectorSearch.html) | [`@langchain/community`](https://www.npmjs.com/package/@langchain/community) | ✅ |  ![NPM - Version](https://img.shields.io/npm/v/@langchain/community?style=flat-square&label=%20&) |"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "36fdc060",
+   "metadata": {},
+   "source": [
+    "## Setup\n",
+    "\n",
+    "To use Elasticsearch vector stores, you'll need to install the `@langchain/community` integration package.\n",
+    "\n",
+    "LangChain.js accepts [`@elastic/elasticsearch`](https://github.com/elastic/elasticsearch-js) as the client for Elasticsearch vectorstore. You'll need to install it as a peer dependency.\n",
+    "\n",
+    "This guide will also use [OpenAI embeddings](/docs/integrations/text_embedding/openai), which require you to install the `@langchain/openai` integration package. You can also use [other supported embeddings models](/docs/integrations/text_embedding) if you wish.\n",
+    "\n",
+    "```{=mdx}\n",
+    "import IntegrationInstallTooltip from \"@mdx_components/integration_install_tooltip.mdx\";\n",
+    "import Npm2Yarn from \"@theme/Npm2Yarn\";\n",
+    "\n",
+    "<IntegrationInstallTooltip></IntegrationInstallTooltip>\n",
+    "\n",
+    "<Npm2Yarn>\n",
+    "  @langchain/community @elastic/elasticsearch @langchain/openai\n",
+    "</Npm2Yarn>\n",
+    "```\n",
+    "\n",
+    "### Credentials\n",
+    "\n",
+    "To use Elasticsearch vector stores, you'll need to have an Elasticsearch instance running.\n",
+    "\n",
+    "You can use the [official Docker image](https://www.elastic.co/guide/en/elasticsearch/reference/current/docker.html) to get started, or you can use [Elastic Cloud](https://www.elastic.co/cloud/), Elastic's official cloud service.\n",
+    "\n",
+    "For connecting to Elastic Cloud you can read the documentation reported [here](https://www.elastic.co/guide/en/kibana/current/api-keys.html) for obtaining an API key.\n",
+    "\n",
+    "If you are using OpenAI embeddings for this guide, you'll need to set your OpenAI key as well:\n",
+    "\n",
+    "```typescript\n",
+    "process.env.OPENAI_API_KEY = \"YOUR_API_KEY\";\n",
+    "```\n",
+    "\n",
+    "If you want to get automated tracing of your model calls you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:\n",
+    "\n",
+    "```typescript\n",
+    "// process.env.LANGCHAIN_TRACING_V2=\"true\"\n",
+    "// process.env.LANGCHAIN_API_KEY=\"your-api-key\"\n",
+    "```"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "93df377e",
+   "metadata": {},
+   "source": [
+    "## Instantiation\n",
+    "\n",
+    "Instatiating Elasticsearch will vary depending on where your instance is hosted."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "dc37144c-208d-4ab3-9f3a-0407a69fe052",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "import {\n",
+    "  ElasticVectorSearch,\n",
+    "  type ElasticClientArgs,\n",
+    "} from \"@langchain/community/vectorstores/elasticsearch\";\n",
+    "import { OpenAIEmbeddings } from \"@langchain/openai\";\n",
+    "\n",
+    "import { Client, type ClientOptions } from \"@elastic/elasticsearch\";\n",
+    "\n",
+    "import * as fs from \"node:fs\";\n",
+    "\n",
+    "const embeddings = new OpenAIEmbeddings({\n",
+    "  model: \"text-embedding-3-small\",\n",
+    "});\n",
+    "\n",
+    "const config: ClientOptions = {\n",
+    "  node: process.env.ELASTIC_URL ?? \"https://127.0.0.1:9200\",\n",
+    "};\n",
+    "\n",
+    "if (process.env.ELASTIC_API_KEY) {\n",
+    "  config.auth = {\n",
+    "    apiKey: process.env.ELASTIC_API_KEY,\n",
+    "  };\n",
+    "} else if (process.env.ELASTIC_USERNAME && process.env.ELASTIC_PASSWORD) {\n",
+    "  config.auth = {\n",
+    "    username: process.env.ELASTIC_USERNAME,\n",
+    "    password: process.env.ELASTIC_PASSWORD,\n",
+    "  };\n",
+    "}\n",
+    "// Local Docker deploys require a TLS certificate\n",
+    "if (process.env.ELASTIC_CERT_PATH) {\n",
+    "  config.tls = {\n",
+    "    ca: fs.readFileSync(process.env.ELASTIC_CERT_PATH),\n",
+    "    rejectUnauthorized: false,\n",
+    "  }\n",
+    "}\n",
+    "const clientArgs: ElasticClientArgs = {\n",
+    "  client: new Client(config),\n",
+    "  indexName: process.env.ELASTIC_INDEX ?? \"test_vectorstore\",\n",
+    "};\n",
+    "\n",
+    "const vectorStore = new ElasticVectorSearch(embeddings, clientArgs);"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ac6071d4",
+   "metadata": {},
+   "source": [
+    "## Manage vector store\n",
+    "\n",
+    "### Add items to vector store"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "17f5efc0",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[ '1', '2', '3', '4' ]\n"
+     ]
+    }
+   ],
+   "source": [
+    "import type { Document } from \"@langchain/core/documents\";\n",
+    "\n",
+    "const document1: Document = {\n",
+    "  pageContent: \"The powerhouse of the cell is the mitochondria\",\n",
+    "  metadata: { source: \"https://example.com\" }\n",
+    "};\n",
+    "\n",
+    "const document2: Document = {\n",
+    "  pageContent: \"Buildings are made out of brick\",\n",
+    "  metadata: { source: \"https://example.com\" }\n",
+    "};\n",
+    "\n",
+    "const document3: Document = {\n",
+    "  pageContent: \"Mitochondria are made out of lipids\",\n",
+    "  metadata: { source: \"https://example.com\" }\n",
+    "};\n",
+    "\n",
+    "const document4: Document = {\n",
+    "  pageContent: \"The 2024 Olympics are in Paris\",\n",
+    "  metadata: { source: \"https://example.com\" }\n",
+    "}\n",
+    "\n",
+    "const documents = [document1, document2, document3, document4];\n",
+    "\n",
+    "await vectorStore.addDocuments(documents, { ids: [\"1\", \"2\", \"3\", \"4\"] });"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "dcf1b905",
+   "metadata": {},
+   "source": [
+    "### Delete items from vector store\n",
+    "\n",
+    "You can delete values from the store by passing the same id you passed in:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "ef61e188",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "await vectorStore.delete({ ids: [\"4\"] });"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c3620501",
+   "metadata": {},
+   "source": [
+    "## Query vector store\n",
+    "\n",
+    "Once your vector store has been created and the relevant documents have been added you will most likely wish to query it during the running of your chain or agent.\n",
+    "\n",
+    "### Query directly\n",
+    "\n",
+    "Performing a simple similarity search can be done as follows:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 13,
+   "id": "aa0a16fa",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "* The powerhouse of the cell is the mitochondria [{\"source\":\"https://example.com\"}]\n",
+      "* Mitochondria are made out of lipids [{\"source\":\"https://example.com\"}]\n"
+     ]
+    }
+   ],
+   "source": [
+    "const filter = [{\n",
+    "  operator: \"match\",\n",
+    "  field: \"source\",\n",
+    "  value: \"https://example.com\",\n",
+    "}];\n",
+    "\n",
+    "const similaritySearchResults = await vectorStore.similaritySearch(\"biology\", 2, filter);\n",
+    "\n",
+    "for (const doc of similaritySearchResults) {\n",
+    "  console.log(`* ${doc.pageContent} [${JSON.stringify(doc.metadata, null)}]`);\n",
+    "}"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3ed9d733",
+   "metadata": {},
+   "source": [
+    "The vector store supports [Elasticsearch filter syntax](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-filter-context.html) operators.\n",
+    "\n",
+    "If you want to execute a similarity search and receive the corresponding scores you can run:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 14,
+   "id": "5efd2eaa",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "* [SIM=0.374] The powerhouse of the cell is the mitochondria [{\"source\":\"https://example.com\"}]\n",
+      "* [SIM=0.370] Mitochondria are made out of lipids [{\"source\":\"https://example.com\"}]\n"
+     ]
+    }
+   ],
+   "source": [
+    "const similaritySearchWithScoreResults = await vectorStore.similaritySearchWithScore(\"biology\", 2, filter)\n",
+    "\n",
+    "for (const [doc, score] of similaritySearchWithScoreResults) {\n",
+    "  console.log(`* [SIM=${score.toFixed(3)}] ${doc.pageContent} [${JSON.stringify(doc.metadata)}]`);\n",
+    "}"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0c235cdc",
+   "metadata": {},
+   "source": [
+    "### Query by turning into retriever\n",
+    "\n",
+    "You can also transform the vector store into a [retriever](/docs/concepts/#retrievers) for easier usage in your chains. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 15,
+   "id": "f3460093",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[\n",
+      "  Document {\n",
+      "    pageContent: 'The powerhouse of the cell is the mitochondria',\n",
+      "    metadata: { source: 'https://example.com' },\n",
+      "    id: undefined\n",
+      "  },\n",
+      "  Document {\n",
+      "    pageContent: 'Mitochondria are made out of lipids',\n",
+      "    metadata: { source: 'https://example.com' },\n",
+      "    id: undefined\n",
+      "  }\n",
+      "]\n"
+     ]
+    }
+   ],
+   "source": [
+    "const retriever = vectorStore.asRetriever({\n",
+    "  // Optional filter\n",
+    "  filter: filter,\n",
+    "  k: 2,\n",
+    "});\n",
+    "await retriever.invoke(\"biology\");"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e2e0a211",
+   "metadata": {},
+   "source": [
+    "### Usage for retrieval-augmented generation\n",
+    "\n",
+    "For guides on how to use this vector store for retrieval-augmented generation (RAG), see the following sections:\n",
+    "\n",
+    "- [Tutorials: working with external knowledge](/docs/tutorials/#working-with-external-knowledge).\n",
+    "- [How-to: Question and answer with RAG](/docs/how_to/#qa-with-rag)\n",
+    "- [Retrieval conceptual docs](/docs/concepts#retrieval)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8a27244f",
+   "metadata": {},
+   "source": [
+    "## API reference\n",
+    "\n",
+    "For detailed documentation of all `ElasticVectorSearch` features and configurations head to the [API reference](https://api.js.langchain.com/classes/langchain_community_vectorstores_elasticsearch.ElasticVectorSearch.html)."
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "TypeScript",
+   "language": "typescript",
+   "name": "tslab"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "mode": "typescript",
+    "name": "javascript",
+    "typescript": true
+   },
+   "file_extension": ".ts",
+   "mimetype": "text/typescript",
+   "name": "typescript",
+   "version": "3.7.2"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}