Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Confusing different behaviour between Azure, Memory, Disk & Zip modules. #37

Open
ckpearson opened this issue Sep 13, 2023 · 4 comments

Comments

@ckpearson
Copy link

I'm currently reworking a system that was using Azure blob storage to use FluentStorage so we can target different providers / deploy on-prem and make use of disk storage.

Data for the application is split into containers, and as per the docs the first folder in the path is used as the container for Azure, but otherwise should just be treated as a folder otherwise.

This behaviour works just fine for the in-memory module, calling GetBlobAsync returns a blob that is a folder, and I can check the full path to ensure it sits under root so I can classify it as a "container".

For the local-disk module however, creating one of these produces a directory with a .empty file under it, and having a look at the code of the disk implementation, it doesn't appear to have handling for directories - I'd expect it to be happy enough to create empty directories, and just check whether a path is for a directory or a file and return a blob with the relevant properties set.

The ZIP module I can sort of understand the behaviour for because you have to create the ZipEntry objects and it infers hierarchy based on the paths as opposed to supporting folders within the archive as a first-class concept.

I guess what I'm asking is whether there's a consistent way I can support the arbitrary modules and still have everything work fine? I guess I could enforce the creation of a '.empty' file and explicitly check for its existence, but that feels redundant for the Azure module.

My first guess would be to check whether the storage implementation is IHierarchicalBlobStorage but the azure one doesn't appear to be.

Any suggestions would be appreciated.

@dammitjanet
Copy link

As far as I can see, and I've looked into this on #30, you are correct, there are no concrete implementations of IHierarchicalBlobStorage

it was created 4 years ago by the original developer, and not functionally completed. I understand why it's there, just not why it wasn't used

But basically, if you try and create a folder on AWS (and Azure I believe), the folder won't exist unless there is a file inside it. FileSystem Blob for example should support IHierarchicalBlobStorage and so should Zip, and these should explicitly implement their Interface in the correct manner.

@ckpearson
Copy link
Author

Aye, the way I got around it at least for now is that I wrote a facade replicating the existing azure BlobContainerClient and BlobClient classes from the Azure storage SDK.

    public sealed class BlobContainerClient
    {
        private const string _dummyFileName = ".empty";
        private readonly IBlobStorage _storage;

        public BlobContainerClient(IBlobStorage storage, string name)
        {
            _storage = storage;
            Name = name;
        }

        public string Name { get; }

        internal IBlobStorage Storage => _storage;

        public async Task<bool> ExistsAsync()
        {
            if (_storage is IHierarchicalBlobStorage || _storage is global::FluentStorage.Azure.Blobs.IAzureBlobStorage)
            {
                var blob = await Storage.GetBlobAsync(Name);
                if (blob == null)
                {
                    return false;
                }
                return blob.IsFolder && blob.FullPath == $"/{Name}";
            }
            else
            {
                var blob = await Storage.GetBlobAsync($"{Name}/{_dummyFileName}");
                if (blob == null)
                {
                    return false;
                }
                return blob.FullPath == $"/{Name}/{_dummyFileName}";
            }
        }

        public async Task DeleteIfExistsAsync()
        {
            if (await ExistsAsync())
            {
                await Storage.DeleteAsync(Name);
            }
        }

        /*
         * todo: need to be careful as fluentstorage assumes '.empty' by default but doesn't expose that as a constant
         * so it *could* change, and theoretically if code doesn't use the abstractions here, it could theoretically
         * choose to create a new folder and write the .empty file or not write it at all, so we'd be without the ability
         * to check for a folder / container's existence.
         */
        public async Task CreateAsync()
        {
            await Storage.CreateFolderAsync(Name, dummyFileName: _dummyFileName);
        }
    }

    public sealed class BlobClient
    {
        private readonly BlobContainerClient _containerClient;
        private readonly string _path;

        public BlobClient(BlobContainerClient containerClient, string path)
        {
            if (path.StartsWith(containerClient.Name, StringComparison.OrdinalIgnoreCase))
            {
                throw new ArgumentException("Path starts with the container name", nameof(path));
            }

            _containerClient = containerClient;
            _path = path;
        }

        internal BlobContainerClient ContainerClient => _containerClient;

        public async Task<bool> ExistsAsync()
        {
            return await _containerClient.Storage.ExistsAsync($"{_containerClient.Name}/{_path}", CancellationToken.None);
        }

        public async Task DownloadToAsync(Stream stream)
        {
            await _containerClient.Storage.ReadToStreamAsync($"{_containerClient.Name}/{_path}", stream);
        }

        public async Task UploadAsync(BinaryData data, CancellationToken? cancellationToken = null)
        {
            using var stream = data.ToStream();
            await _containerClient.Storage.WriteAsync($"{_containerClient.Name}/{_path}", stream, cancellationToken:cancellationToken ?? CancellationToken.None);
        }

        public async Task DeleteIfExistsAsync()
        {
            if (await _containerClient.Storage.ExistsAsync($"{_containerClient.Name}/{_path}"))
            {
                await _containerClient.Storage.DeleteAsync($"{_containerClient.Name}/{_path}");
            }
        }

        public string Name => _path;
    }

    public static class BlobStorageExtensions
    {
        public static BlobContainerClient GetBlobContainerClient(this IBlobStorage storage, string container)
        {
            return new BlobContainerClient(storage, container);
        }

        public static BlobClient GetBlobClient(this BlobContainerClient containerClient, string blobPath)
        {
            return new BlobClient(containerClient, blobPath);
        }

        public static async Task DeleteAllContainersAsync(this IBlobStorage storage)
        {
            var containers = await storage.ListAsync(new ListOptions { Recurse = false });

            foreach(var container in containers)
            {
                await storage.GetBlobContainerClient(container.Name).DeleteIfExistsAsync();
            }
        }

        public static bool IsStorageForZipArchive(this IBlobStorage storage)
        {
            /*
             * This isn't ideal, but in some circumstances we need to know if the zip storage handler
             * is being used.
             * 
             * Mostly when using the background queue processor or interacting with it via a worker pool
             * or in parallel.
             * 
             * The underlying types are private in the FluentStorage library, so we have to resort
             * to inspecting the type name.
             * 
             * Checking whether it contains 'zip' just to add a bit of resiliency in case the type name changes.
             */
            return storage.GetType().Name.Contains("zip", StringComparison.OrdinalIgnoreCase);
        }

        // todo: may need to add some special handling in here because azure containers in real environments
        // suffer from a timing issue during drop and re-create.
        public static async Task RecreateContainer(this IBlobStorage storage, string container)
        {
            var containerClient = storage.GetBlobContainerClient(container);
            await containerClient.DeleteIfExistsAsync();
            await containerClient.CreateAsync();
        }
    }

Extremely rough attempt at this, but it's what's being used by our system and it does seem to be working with all the Core Storage providers.

Was just a convenient shortcut to avoid a larger scale refactor, and helps with maintaining the container / blob separation the system already uses.

#38 is behind the need for checking whether the storage implementation is the zip archive one, as we have a worker pool responsible for handling queued storage upload tasks, and the zip archive one doesn't seem to be thread safe.

@robinrodricks
Copy link
Owner

@ckpearson is this something we can add to the core library? can you suggest changes or file a PR that would be greatly appreciated. I want to support all use cases in-built and not force users to implement their own wrappers as far as possible.

@ckpearson
Copy link
Author

@robinrodricks I suppose it could be possible, but I'm wary of adding something that then re-replicates the specific structure used by the Azure SDK, plus it's not complete so is likely to be missing functionality people might rely on during their migration from azure blob storage to fluent storage.

It might be an idea to put a "ContainerStorageHelper" type thing into the library that offers a lightweight wrapper for dealing with a container-based abstraction, but again, not sure how aligned with fluent storage that is, as people might be using AWS for example.

I think it might just be better if the underlying issue with how directories & empty files are handled, and maybe get the hierarchical storage interface properly implemented by the adapters that it makes sense for.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants