Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

objstore : implement Aliyun OSS #1234

Closed
wants to merge 48 commits into from
Closed

objstore : implement Aliyun OSS #1234

wants to merge 48 commits into from

Conversation

ghost
Copy link

@ghost ghost commented Jun 7, 2019

  • [] CHANGELOG entry if change is relevant to the end user.

My company infra is worked with Aliyun Cloud , the cold-monitoring data have to be shipped to Aliyun OSS.This PR works to implement it. The code I submit can upload prometheus data to aliyun oss , it also works well to go with thanos bucket ls/verify/inspect although I only deploy it in test env , because I need to make thanos working with the whole framework ,like service discovery & load balance .I will work on this to make aliyun oss objstore work well . If you want something such as doc, I will do it later .

In order to make this PR, I pass over the FOREACH test , the print seem to be OK . although along with this test , i jump over the "bucket creating & bucket delete" testing , I will do it next week .

ps. The object-config.file format like
object.config-file format
type: OSS
config:
bucket: "XXXX"
endpoint: "XXXX"
secret_key: "XXXX"
secret_id: "XXXX"

Changes

pull-d67a69d & pull-4abb631 is the first try, they work for implement aliyun oss objstore

pull-f973be4 can be ignored , I was stupid to push my local changed compiling files go.mod & go.sum to repository , so i have to restore them .

pull-453829 & pull-86c5548 is uploading some images , IGNORE THEM ALSO

Verification

aliyun oss
https://github.com/shaulboozhiao/thanos/blob/master/docs/img/thanos.jpg

FOREACH test result
[thanos@XXX objtesting]$ go test acceptance_e2e_test.go foreach.go
ok command-line-arguments 0.160s

and aliyun oss page

foreach test
https://github.com/shaulboozhiao/thanos/blob/master/docs/img/foreachtest.jpg

@ghost ghost mentioned this pull request Jun 8, 2019
@wujinhu
Copy link
Contributor

wujinhu commented Jun 10, 2019

@shaulboozhiao I have opened another PR to implement this, 1238, and we will maintain this.

@wujinhu
Copy link
Contributor

wujinhu commented Jun 10, 2019

sorry,i did not know you have worked on this before i created this pr

Copy link
Member

@jojohappy jojohappy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, Thanks! Some suggestions.

Please add this into CHANGELOG. It is a major changes.

Don't forget adding THANOS_SKIP_ALIYUN_OSS_TESTS=true to .circleci/config.yml to skip oss storage testing, at this moment your ci has been failed.

pkg/objstore/oss/oss.go Outdated Show resolved Hide resolved
pkg/objstore/oss/oss.go Outdated Show resolved Hide resolved
pkg/objstore/oss/oss.go Outdated Show resolved Hide resolved
pkg/objstore/oss/oss.go Outdated Show resolved Hide resolved
@jojohappy
Copy link
Member

jojohappy commented Jun 10, 2019

@shaulboozhiao Please keep a calm, everyone will respect for your work. You submit the PR at first, so we can close another one. And we also can talk with each other offline in slack or wechat.

Anyway, your PR is very nice! Good job.

@domgreen
Copy link
Contributor

@shaulboozhiao can I please remind you that as a project Thanos follows a code of conduct please be respectful to everyone in the community.

If there are any concerns please deal with them in a way you would like to be treated yourself or reach out to any of the maintainers if you have an issue.

@ghost
Copy link
Author

ghost commented Jun 10, 2019

@domgreen Thans bros, It is my fault to be so childish, I will keep in mind, thank you for your advice .

@ghost
Copy link
Author

ghost commented Jun 10, 2019

@wujinhu I am so sorry to you , Cause I am so childish , so mean to you . I must to say , please @wujinhu dedicate all your time to aliyun product development , It is awesome cloud infra. ~~ I am so so sorry for my words ~~

@wujinhu
Copy link
Contributor

wujinhu commented Jun 11, 2019

@shaulboozhiao Never mind. I think we share the same goal. We can make this provider better together. You are doing a great job, I will take a review later, thanks. :)

@ghost
Copy link
Author

ghost commented Jun 11, 2019

@jojohappy @bwplotka @povilasv While runing circleci test , it reports Docs have discrepancies, do 'make docs' and commit changes: I do not know what the problem is , and I do not want to crack Makefile , so I edit .circleci/config.yml and add make docs before make check-docs , is it all right ? or shauld i revert this edits ?

@jojohappy
Copy link
Member

Docs have discrepancies, do 'make docs' and commit changes

@shaulboozhiao You should run make docs locally, as you said, it would generate the new documents, then commit and push changes to your remote branch.

We should ensure that the documents are the newest in our master branch.

part, err := uploadEveryPart(lastslice, chunksnum+1)
if err != nil {
if err := b.bucket.AbortMultipartUpload(init); err != nil {
return errors.Wrap(err, "multi-part upload ends with aborting")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicated AbortMultipartUpload here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No,It is not duplicated , both norm chunk and the last part should abort muliti-part uploading because of failure.

Copy link
Member

@jojohappy jojohappy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! More nit picks.

pkg/objstore/oss/oss.go Outdated Show resolved Hide resolved
pkg/objstore/oss/oss.go Outdated Show resolved Hide resolved
pkg/objstore/oss/oss.go Outdated Show resolved Hide resolved
pkg/objstore/oss/oss.go Outdated Show resolved Hide resolved
if err := b.bucket.AbortMultipartUpload(init); err != nil {
return errors.Wrap(err, "multi-part upload ends with aborting")
}
return errors.Wrap(err, "multi-part upload last chunk failed")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return errors.Wrap(err, "multi-part upload last chunk failed")
return errors.Wrap(err, "failed to upload the last chunk")

{
init, err := b.bucket.InitiateMultipartUpload(name)
if err != nil {
return errors.Wrap(err, "initiate multi-part upload failed")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return errors.Wrap(err, "initiate multi-part upload failed")
return errors.Wrap(err, "failed to initiate multi-part upload")

return errors.New("can not guess the size of io.Reader")
case 0:
if err := b.bucket.PutObject(name, ncloser); err != nil {
return errors.Wrap(err, "upload oss object")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return errors.Wrap(err, "upload oss object")
return errors.Wrap(err, "failed to upload object")

}
_, err = b.bucket.CompleteMultipartUpload(init, parts)
if err != nil {
return errors.Wrap(err, "multi-part upload failed")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return errors.Wrap(err, "multi-part upload failed")
return errors.Wrap(err, "failed to set multi-part upload completive")

@jojohappy
Copy link
Member

@shaulboozhiao Any update for this?

@ghost
Copy link
Author

ghost commented Jul 23, 2019

@jojohappy Sorry , I will update it later ~~

@ghost
Copy link
Author

ghost commented Jul 27, 2019

@wujinhu @jojohappy Code is updated

Copy link
Member

@jojohappy jojohappy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks!

But it seems that you changed some import code, could you tell us what format tool you use?

@@ -13,6 +13,9 @@ import (
"github.com/thanos-io/thanos/pkg/objstore/azure"
"github.com/thanos-io/thanos/pkg/objstore/cos"
"github.com/thanos-io/thanos/pkg/objstore/gcs"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove blank line.

@@ -13,6 +13,9 @@ import (
"github.com/thanos-io/thanos/pkg/objstore/azure"
"github.com/thanos-io/thanos/pkg/objstore/cos"
"github.com/thanos-io/thanos/pkg/objstore/gcs"

"github.com/thanos-io/thanos/pkg/objstore/oss"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto.

@@ -39,7 +39,9 @@ import (
"github.com/thanos-io/thanos/pkg/discovery/cache"
"github.com/thanos-io/thanos/pkg/discovery/dns"
"github.com/thanos-io/thanos/pkg/extprom"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove blank line.

extpromhttp "github.com/thanos-io/thanos/pkg/extprom/http"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto.

@@ -29,7 +29,9 @@ import (
"github.com/thanos-io/thanos/pkg/discovery/cache"
"github.com/thanos-io/thanos/pkg/discovery/dns"
"github.com/thanos-io/thanos/pkg/extprom"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove blank line.

extpromhttp "github.com/thanos-io/thanos/pkg/extprom/http"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto.

@@ -31,6 +22,15 @@ import (
"github.com/prometheus/client_golang/prometheus"
"github.com/prometheus/common/route"
"github.com/prometheus/tsdb/labels"
"github.com/thanos-io/thanos/pkg/block"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why those import packages are moved here?

if err := b.bucket.AbortMultipartUpload(init); err != nil {
return prt, errors.Wrap(err, "failed to upload multi-part chunk")
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should return err at the end of if blank. Otherwise the error would return nil below.

CHANGELOG.md Outdated
@@ -12,7 +12,7 @@ We use *breaking* word for marking changes that are not backward compatible (rel
## Unreleased.

- [#1338](https://github.com/thanos-io/thanos/pull/1338) Querier still warns on store API duplicate, but allows a single one from duplicated set. This is gracefully warn about the problematic logic and not disrupt immediately.

- `AliYun OSS` object storage, see [documents](docs/storage.md#aliyun-oss-configuration) for further information.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add the PR id and link at the start of message.

"time"

alioss "github.com/aliyun/aliyun-oss-go-sdk/oss"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove the blank line


func calculateChunks(name string, r io.Reader) (int, int64, error) {
switch r.(type) {
case *os.File:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider just changing the interface for upload from io.Reader to something like:

type ReadSizer interface {
        Read(p []byte) (n int, err error)
        Size() int64
}

As this clearly won't work if change our types.

}

ncloser := ioutil.NopCloser(r)
switch chunksnum {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we simplify this code?

Something like:

if chunksnum == 0 {
	if err := b.bucket.PutObject(name, ncloser); err != nil {
		return errors.Wrap(err, "failed to upload oss object")
	}
        return nil
}

Init, err := b.bucket.InitiateMultipartUpload(name)
if err != nil {
	return errors.Wrap(err, "failed to initiate multi-part upload")
}
...

pkg/query/api/v1.go Outdated Show resolved Hide resolved
if fileInfo, err := f.Stat(); err == nil {
s := fileInfo.Size()
return int(math.Floor(float64(s) / alioss.MaxPartSize)), s % alioss.MaxPartSize, nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should upload smaller part each time, not 5GB each part, things will fail.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with you , the smaller part uploading will accelerate compacting . But there is a another problem , see UploadPart method in multipart.go (https://github.com/aliyun/aliyun-oss-go-sdk/blob/master/oss/multipart.go) , you can not excess ten thousand times for the multi-part upload task . This means , if I upload every part exactly 100M , I can upload to Aliyun OSS 1TiB object at most .As compacting works on and on , I think it may be not enough .

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can define part size like below:
partSize = max(100MB, fileSize / 10001)

@wujinhu
Copy link
Contributor

wujinhu commented Aug 20, 2019

@shaulboozhiao Any updates? @jojohappy and I found a bug here.

func setRange(start, end int64) (alioss.Option, error) {
	var opt alioss.Option
	if 0 <= start && start <= end {
		opt = alioss.Range(start, end)
	} else {
		return nil, errors.Errorf("Invalid range specified: start=%d end=%d", start, end)
	}
	return opt, nil
}

OSS will get the whole object if end > size of this object. So, we will get error message like below if we call getRange method with large length.

bucket_e2e_test.go:355: unexpected error: rpc error: code = Aborted desc = fetch series for block 01DJQA7FZPVH4RM86QDFW6GVM5: preload series: invalid remaining size 427, expected 1430846

You can run bucket_e2e_test.go and view OSS documentation for more details.
https://help.aliyun.com/document_detail/31980.html?spm=a2c4g.11186623.6.781.181266f2Y8E5zs

@wujinhu
Copy link
Contributor

wujinhu commented Aug 21, 2019

@shaulboozhiao Do you still have time to finish this PR? I think I can do this if you do mind.

@jojohappy
Copy link
Member

OSS will get the whole object if end > size of this object. So, we will get error message like below if we call getRange method with large length.

@shaulboozhiao It meant that we could not fetch objects with specific range if it is out of length of object, but thanos need it. Other object storage service, such as gcs, s3, support it as default, but not Aliyun OSS. So we should check the end of range whether or not is out of length of object in this case.

@bwplotka
Copy link
Member

bwplotka commented Aug 23, 2019

Awesome work @shaulboozhiao here and everyone else 👍 ❤️ Looks like you @shaulboozhiao did the majority of work and only last touches/rebase plus some testing are left. (:

Since you are busy I think it makes sense for others to continue your work on top of your commits if that's ok with you! @wujinhu if @shaulboozhiao is busy and you have time, feel free to get his changes on your branch and start a new PR and we will close this one.

As per our Code of Conduct we are here all for joint collaboration and being friendly to each other, so I think that makes sense. (:
We all want your changes @shaulboozhiao to be available for everyone, so let's deliver it together!

In case of any questions we are all available on CNCF slack (:

@wujinhu
Copy link
Contributor

wujinhu commented Sep 2, 2019

Because @shaulboozhiao is not responsive, I will open another PR this week as discussed above.

@bwplotka
Copy link
Member

bwplotka commented Sep 2, 2019

Awesome, closing this then now. @wujinhu make sure to grab @shaulboozhiao changes and feel free to finish on top of this 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants