Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add bulk guide #155

Merged
merged 6 commits into from
May 25, 2023
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 7 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,18 @@
# CHANGELOG

Inspired from [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)

## [Unreleased]

### Added

- Adds Github workflow for changelog verification ([#89](https://github.com/opensearch-project/opensearch-rs/pull/89))
- Adds Github workflow for unit tests ([#112](https://github.com/opensearch-project/opensearch-rs/pull/112))
- Adds support for OpenSearch Serverless ([#96](https://github.com/opensearch-project/opensearch-rs/pull/96))
- Adds Bulk Guide ([#144](https://github.com/opensearch-project/opensearch-rs/pull/155))
sayuree marked this conversation as resolved.
Show resolved Hide resolved

### Dependencies

- Bumps `simple_logger` from 2.3.0 to 4.0.0
- Bumps `serde_with` from ~1 to ~2
- Bumps `textwrap` from ^0.15 to ^0.16
Expand All @@ -19,13 +23,15 @@ Inspired from [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
- Bumps `syn` from ~1.0 to ~2.0

### Changed

- Updates users guide with complete examples ([#114](https://github.com/opensearch-project/opensearch-rs/pull/114))

### Deprecated

### Removed

### Fixed

- [BUG] cargo make test fails out of the box ([#117](https://github.com/opensearch-project/opensearch-rs/pull/117))
- Update CI to run cargo make test fails out of the box ([#120](https://github.com/opensearch-project/opensearch-rs/pull/120))
- Add cargo cache to Github actions to speed up builds ([#121](https://github.com/opensearch-project/opensearch-rs/pull/121))
Expand All @@ -34,4 +40,4 @@ Inspired from [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)

### Security

[Unreleased]: https://github.com/opensearch-project/opensearch-rs/compare/2.0...HEAD
[unreleased]: https://github.com/opensearch-project/opensearch-rs/compare/2.0...HEAD
200 changes: 200 additions & 0 deletions guides/bulk.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,200 @@
# Bulk

In this guide, you'll learn how to use the OpenSearch Ruby Client API to perform bulk operations. You'll learn how to index, update, and delete multiple documents in a single request.
sayuree marked this conversation as resolved.
Show resolved Hide resolved

## Setup

First, create a client instance with the following code:

```rust
let url = Url::parse("http://localhost:9200")?;
let conn_pool = SingleNodeConnectionPool::new(url);
let transport = TransportBuilder::new(conn_pool).disable_proxy().build()?;
let client = OpenSearch::new(transport);
sayuree marked this conversation as resolved.
Show resolved Hide resolved
```

Next, create an index named `movies` and another named `books` with the default settings:

```rust
movies = 'movies'
books = 'books'
sayuree marked this conversation as resolved.
Show resolved Hide resolved
client.indices().create(IndicesCreateParts::Index(movies)).send().await?;
client.indices().create(IndicesCreateParts::Index(books)).send().await?;
```

## Bulk API

The `bulk` API action allows you to perform document operations in a single request. The body of the request is an array of objects that contains the bulk operations and the target documents to index, create, update, or delete.

### Indexing multiple documents
Xtansia marked this conversation as resolved.
Show resolved Hide resolved

The following code creates two documents in the `movies` index and one document in the `books` index:

```rust
let mut body: Vec<JsonBody<_>> = Vec::with_capacity(4);
// add the first operation and document
body.push(json!({ "_index": { "_id": "1" }}).into());
body.push(json!({
"id": 1,
"title": "Beauty and the Beast",
"year": "1991"
}).into());

// add the second operation and document
body.push(json!({ "_index": { "_id": "2" }}).into());
body.push(json!({
"id": 2,
"title": "Beauty and the Beast - Live Action",
"year": "2017"
}).into());

client
.bulk(BulkParts::Index(movies))
.body(body)
.send()
.await?;

// add the third operation and document
body.push(json!({ "_index": { "_id": "1" }}).into());
body.push(json!({
"id": 1,
"title": "The Lion King",
"year": "1994"
}).into());

client
.bulk(BulkParts::Index(books))
.body(body)
.send()
.await?;
sayuree marked this conversation as resolved.
Show resolved Hide resolved
```

As you can see, each bulk operation is comprised of two objects. The first object contains the operation type and the target document's `_index` and `_id`. The second object contains the document's data. As a result, the body of the request above contains six objects for three index actions.

Alternatively, the `bulk` method can accept an array of hashes where each hash represents a single operation. The following code is equivalent to the previous example:
Xtansia marked this conversation as resolved.
Show resolved Hide resolved

```rust
client
.bulk()
.body(
vec![json!(
{ "index": { "_index": movies, "_id": 1, "data": { "title": "Beauty and the Beast", "year": 1991 } } }, { "index": { "_index": movies, "_id": 2, "data": { "title": "Beauty and the Beast - Live Action", year: 2017 } } }, { "index": { "_index": books, "_id": 1, "data": { "title": "The Lion King", year: 1994 }}
})
])
.send()
.await?;
```
sayuree marked this conversation as resolved.
Show resolved Hide resolved

We will use this format for the rest of the examples in this guide.

### Creating multiple documents

Similarly, instead of calling the `create` method for each document, you can use the `bulk` API to create multiple documents in a single request. The following code creates three documents in the `movies` index and one in the `books` index:
Xtansia marked this conversation as resolved.
Show resolved Hide resolved

```rust
client
.bulk()
.body(vec![
json!({ "create": { "data": { "title": "Beauty and the Beast 2", "year": 2030 } } },
{ "create": { "data": { "title": "Beauty and the Beast 3", "year": 2031 } } },
{ "create": { "data": { "title": "Beauty and the Beast 4", "year": 2049 } } },
{ "create": { "_index": books, "data": { "title": "The Lion King 2", "year": 1998 } } })
])
.send()
.await?;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be updated to the "clean" format shown above.

```

Note that we specified only the `_index` for the last document in the request body. This is because the `bulk` method accepts an `index` parameter that specifies the default `_index` for all bulk operations in the request body. Moreover, we omit the `_id` for each document and let OpenSearch generate them for us in this example, just like we can with the `create` method.
Xtansia marked this conversation as resolved.
Show resolved Hide resolved

### Updating multiple documents

```rust
client
.bulk(BulkParts::Index(movies))
.body(vec![
json!(
{ "update": { "_id": 1, "data": { "doc": { "year": 1992 } } } },
{ "update": { "_id": 2, "data": { "doc": { "year": 2018 } } } }
)
])
.send()
.await?;
sayuree marked this conversation as resolved.
Show resolved Hide resolved
```

Xtansia marked this conversation as resolved.
Show resolved Hide resolved
Note that the updated data is specified in the `doc` field of the `data` object.
sayuree marked this conversation as resolved.
Show resolved Hide resolved

### Deleting multiple documents

```rust
client
.bulk(BulkParts::Index(movies))
.body(vec![
json!(
{ "delete": { "_id": 1 } },
{ "delete": { "_id": 2 } }
)
])
.send()
.await?;
sayuree marked this conversation as resolved.
Show resolved Hide resolved
```

### Mix and match operations

You can mix and match the different operations in a single request. The following code creates two documents, updates one document, and deletes another document:

```rust
client
.bulk(BulkParts::Index(movies))
.body(vec![
json!(
{ "create": { "data": { "title": "Beauty and the Beast 5", "year": 2050 } } },
{ "create": { "data": { "title": "Beauty and the Beast 6", "year": 2051 } } },
{ "update": { "_id": 3, "data": { "doc": { "year": 2052 } } } },
{ "delete": { "_id": 4 } }
)
])
.send()
.await?
sayuree marked this conversation as resolved.
Show resolved Hide resolved
```

### Handling errors

The `bulk` API returns an array of responses for each operation in the request body. Each response contains a `status` field that indicates whether the operation was successful or not. If the operation was successful, the `status` field is set to a `2xx` code. Otherwise, the response contains an error message in the `error` field.

The following code shows how to look for errors in the response:

```rust
response = client
.bulk(BulkParts::Index(movies))
.body(vec![
json!(
{ "create": { "_id": 1, "data": { "title": "Beauty and the Beast", "year": 1991 } } },
{ "create": { "_id": 2, "data": { "title": "Beauty and the Beast 2", "year": 2030 } } },
{ "create": { "_id": 1, "data": { "title": "Beauty and the Beast 3", "year": 2031 } } }, // document already exists error
{ "create": { "_id": 2, "data": { "title": "Beauty and the Beast 4", "year": 2049 } } } // document already exists error
)
])
.send()
.await?;
sayuree marked this conversation as resolved.
Show resolved Hide resolved

let response_body = response.json::<Value>().await?;

for item in response["items"].iter() {
let range = 200..299;
if range.contains(item["create"]["status"]).not() {
sayuree marked this conversation as resolved.
Show resolved Hide resolved
println!("{}", item["create"]["error"]["reason"]);
}
}
```

## Cleanup

To clean up the resources created in this guide, delete the `movies` and `books` indices:

```rust
client
.indices()
.delete(IndicesDeleteParts::Index(&[movies, books]))
.send()
.await?;
```