Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FileReader filter #3459

Merged
merged 14 commits into from
Nov 9, 2021
Merged

FileReader filter #3459

merged 14 commits into from
Nov 9, 2021

Conversation

ksztenderski
Copy link
Contributor

@ksztenderski ksztenderski commented Oct 28, 2021

Description

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Refactoring (Redesign of existing code that doesn't affect functionality)
  • Other (e.g. Documentation, Tests, Configuration)

What happened in this PR

It adds file_filter argument to FileReader allowing to filter files matching list of glob patterns.

Additional information

  • Affected modules and functionalities:
    FileReader operator
  • Key points relevant for the review:

Checklist

Tests

  • Existing tests apply
  • New tests added
    • Python tests
    • GTests
    • Benchmark
    • Other
  • N/A

Documentation

  • Existing documentation applies
  • Documentation updated
    • Docstring
    • Doxygen
    • RST
    • Jupyter
    • Other
  • N/A

DALI team only

Requirements

  • Implements new requirements
  • Affects existing requirements
  • N/A

REQ IDs: N/A

JIRA TASK: N/A

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@mzient
Copy link
Contributor

mzient commented Oct 28, 2021

Please rebase this PR on latest main.

@ksztenderski
Copy link
Contributor Author

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [3291616]: BUILD STARTED

@jantonguirao jantonguirao self-assigned this Oct 29, 2021
@dali-automaton
Copy link
Collaborator

CI MESSAGE: [3291616]: BUILD PASSED

@@ -0,0 +1,136 @@
// Copyright (c) 2017-2021, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Copyright (c) 2017-2021, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
// Copyright (c) 2021, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

// open the root
DIR *dir = opendir(file_root.c_str());

DALI_ENFORCE(dir != nullptr, "Directory " + file_root + " could not be opened.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
DALI_ENFORCE(dir != nullptr, "Directory " + file_root + " could not be opened.");
DALI_ENFORCE(dir != nullptr, make_string("Directory ", file_root, " could not be opened."));

nitpick

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dali::make_string can be found in dali/core/format.h

@@ -41,7 +42,7 @@ For example, this directory structure::
<file_root>/2/car.jpeg
<file_root>/2/truck.jp2

will yield the following outputs::
without specified ``file_filters`` will yield the following outputs::
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
without specified ``file_filters`` will yield the following outputs::
by default will yield the following outputs::

@@ -0,0 +1,124 @@
// Copyright (c) 2017-2021, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Copyright (c) 2017-2021, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
// Copyright (c) 2021, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

Comment on lines 68 to 70
if (HasKnownExtension(std::string(entry->d_name))) {
std::string rel_path = curr_entry + dir_sep + std::string{entry->d_name};
file_label_pairs.emplace_back(rel_path, label);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (HasKnownExtension(std::string(entry->d_name))) {
std::string rel_path = curr_entry + dir_sep + std::string{entry->d_name};
file_label_pairs.emplace_back(rel_path, label);
std::string d_name(entry->d_name);
if (HasKnownExtension(d_name)) {
file_label_pairs.emplace_back(join_path(curr_entry, d_name), label);

nitpick: to avoid creating the same string twice

Comment on lines 26 to 91
class FilesystemTest : public ::testing::Test {
std::vector<std::pair<std::string, int>> readFileLabelFile() {
std::vector<std::pair<std::string, int>> image_label_pairs;
std::string file_list = file_root + "/image_list.txt";
std::ifstream s(file_list);
DALI_ENFORCE(s.is_open(), "Cannot open: " + file_list);

std::vector<char> line_buf(16 << 10);
char *line = line_buf.data();
for (int n = 1; s.getline(line, line_buf.size()); n++) {
int i = strlen(line) - 1;

for (; i >= 0 && isspace(line[i]); i--) {}

int label_end = i + 1;

if (i < 0)
continue;

for (; i >= 0 && isdigit(line[i]); i--) {}

int label_start = i + 1;

for (; i >= 0 && isspace(line[i]); i--) {}

int name_end = i + 1;
DALI_ENFORCE(
name_end > 0 && name_end < label_start && label_start >= 2 && label_end > label_start,
make_string("Incorrect format of the list file \"", file_list, "\":", n,
" expected file name followed by a label; got: ", line));

line[label_end] = 0;
line[name_end] = 0;

image_label_pairs.emplace_back(line, std::atoi(line + label_start));
}
std::sort(image_label_pairs.begin(), image_label_pairs.end());
DALI_ENFORCE(s.eof(), "Wrong format of file_list: " + file_list);

return image_label_pairs;
}

protected:
FilesystemTest()
: file_root(testing::dali_extra_path() + "/db/single/jpeg"),
file_label_pairs(readFileLabelFile()) {}

std::vector<std::string> globMatch(std::vector<std::string> &filters) {
std::vector<std::string> correct_match;
glob_t pglob;
for (auto &filter : filters) {
std::string pattern = file_root + filesystem::dir_sep + '*' + filesystem::dir_sep + filter;
if (glob(pattern.c_str(), GLOB_TILDE, NULL, &pglob) == 0) {
for (unsigned int count = 0; count < pglob.gl_pathc; ++count) {
std::string match(pglob.gl_pathv[count]);
correct_match.push_back(match.substr(file_root.length() + 1, std::string::npos));
}
globfree(&pglob);
}
}
std::sort(correct_match.begin(), correct_match.end());
std::unique(correct_match.begin(), correct_match.end());
return correct_match;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A thought. A significant part of this file is about re-implementing what the reader does, to build the reference. I think the test would be easier to read if we just hardcoded the input list of file/label and the expected outcome in each testcase.
I get that this makes it more flexible in case of changing the files in the test directory, but on the other hand it also introduces another source of bugs.
Feel free to disregard this idea if you feel is better as is now.

@@ -126,7 +128,35 @@ def test_file_reader_relpath_file_list():
assert contents == ref_contents(fnames[index])


batch_size_alias_test=64
def _test_file_reader_filter(filters):
batch_size = 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd test bigger batch size>1 as well

def _test_file_reader_filter(filters):
batch_size = 1

pipe = Pipeline(batch_size, 1, 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd test with more than one thread

Comment on lines 142 to 144
for filter in filters:
for file in glob.glob(os.path.join(root, dir, filter)):
fnames.append((label, file.split('/')[-1], file))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

total nitpick: why not use a set() from the start?

DIR *dir = opendir(file_root.c_str());

DALI_ENFORCE(dir != nullptr, "Directory " + file_root + " could not be opened.");

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Open question: Do we want those benchmarks to stay in DALI repository? Doesn't seem to be benchmarking any DALI component

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's to compare the two solutions.

Copy link
Contributor

@klecki klecki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some nitpick, I think we can go with making the list of the known extensions be the default - either add it in case of empty filters in traverse_directories or put it as the default there, and have the if in the call site. Probably the second one works better as the documentation.

@@ -0,0 +1,135 @@
// Copyright (c) 2017-2021, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a nitpick:

Suggested change
// Copyright (c) 2017-2021, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
// Copyright (c) 2021, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

DIR *dir = opendir(file_root.c_str());

DALI_ENFORCE(dir != nullptr, "Directory " + file_root + " could not be opened.");

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's to compare the two solutions.

// open the root
DIR *dir = opendir(file_root.c_str());

DALI_ENFORCE(dir != nullptr, "Directory " + file_root + " could not be opened.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dali::make_string can be found in dali/core/format.h

Signed-off-by: ksztenderski <ksztenderski@nvidia.com>
Signed-off-by: ksztenderski <ksztenderski@nvidia.com>
Signed-off-by: ksztenderski <ksztenderski@nvidia.com>
…ngs argument.

Signed-off-by: ksztenderski <ksztenderski@nvidia.com>
Signed-off-by: ksztenderski <ksztenderski@nvidia.com>
@ksztenderski
Copy link
Contributor Author

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [3316146]: BUILD STARTED

@@ -51,6 +53,13 @@ will yield the following outputs::
<contents of 2/car.jpeg> 2
<contents of 2/truck.jp2> 2

and with ``file_filters = [*.jpg, *.jpeg]`` will yield the following outputs::
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick:

Suggested change
and with ``file_filters = [*.jpg, *.jpeg]`` will yield the following outputs::
and with ``file_filters = ["*.jpg", "*.jpeg"]`` will yield the following outputs::

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still not addressed?

Comment on lines +34 to +37
static const std::vector<std::string> kKnownExtensionsGlob = {
"*.jpg", "*.jpeg", "*.png", "*.bmp", "*.tif", "*.tiff", "*.pnm", "*.ppm",
"*.pgm", "*.pbm", "*.jp2", "*.webp", "*.flac", "*.ogg", "*.wav"};

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have we checked how that displays on the documentation?

@@ -189,7 +189,7 @@ std::string OpSchema::GetArgumentDefaultValueString(const std::string &name) con

auto &val = *value_ptr;
auto str = val.ToString();
if (val.GetTypeId() == DALI_STRING ||
if (val.GetTypeId() == DALI_STRING || val.GetTypeId() == DALI_STRING_VEC ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍


def test_file_reader_filters():
for filters in [['*.jpg'], ['*.jpg', '*.png', '*.jpeg'], ['dog*.jpg', 'cat*.png', '*.jpg']]:
for num_threads in [1, 2, 4, 8]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

num_threads = random.choice([1, 2, 4, 8])
batch_size = random.choice([1, 3, 10])

should be enough, IMO

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [3316146]: BUILD FAILED

Signed-off-by: ksztenderski <ksztenderski@nvidia.com>
@@ -29,7 +30,8 @@ This operator can be used in the following modes:

In this mode, the directory indicated in ``file_root`` argument should contain one or more
subdirectories. The files in these subdirectories are listed and assigned labels based on
lexicographical order of the subdirectory.
lexicographical order of the subdirectory. If you provide ``file_filters`` argument with
list of glob strings, the operator will list files matching at least one of the patterns.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
list of glob strings, the operator will list files matching at least one of the patterns.
a list of glob strings, the operator will list files matching at least one of the patterns. Otherwise, a default set of filters is used (see the default value of ``file_filters`` for details).

I'd also mention what happens by default

@@ -51,6 +53,13 @@ will yield the following outputs::
<contents of 2/car.jpeg> 2
<contents of 2/truck.jp2> 2

and with ``file_filters = [*.jpg, *.jpeg]`` will yield the following outputs::
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still not addressed?

Comment on lines 115 to 116
.AddOptionalArg<bool>("case_sensitive_filter", R"(If set to True, the filter will be applied
to original file paths, otherwise to lower-cased file paths.)", false)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine, but I think it'd be even better if we just make the filter case insensitive as well. That is, case insensitive filter="*.JPEG" would match "dog.jpeg"

Comment on lines +71 to +72
for (auto &filter : filters) {
if (fnmatch(filter.c_str(), entry->d_name, case_sensitive_filter ? 0 : FNM_CASEFOLD) == 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for (auto &filter : filters) {
if (fnmatch(filter.c_str(), entry->d_name, case_sensitive_filter ? 0 : FNM_CASEFOLD) == 0) {
int flags = case_sensitive_filter ? 0 : FNM_CASEFOLD;
for (auto &filter : filters) {
if (fnmatch(filter.c_str(), entry->d_name, flags) == 0) {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the documentation it seems to me that the documentation is not accurate (the part saying that it matches lowercase filters).
I'd double check:

filter="*.JPEG"
case_sensitive = false
file="dog.jpeg"

^^ this should match. Then the documentation should be updated

@@ -189,6 +189,7 @@ std::string OpSchema::GetArgumentDefaultValueString(const std::string &name) con

auto &val = *value_ptr;
auto str = val.ToString();

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revert? Otherwise this file would should changes from this PR, when there are actually none

Update FileReader documentation.

Fix glob matching in nosetests for file_filters.

Revert op_schema.

Signed-off-by: ksztenderski <ksztenderski@nvidia.com>
@ksztenderski
Copy link
Contributor Author

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [3361929]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [3361929]: BUILD PASSED

Signed-off-by: ksztenderski <ksztenderski@nvidia.com>
@ksztenderski
Copy link
Contributor Author

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [3362777]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [3362799]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [3362777]: BUILD PASSED

@ksztenderski ksztenderski merged commit e04574d into NVIDIA:main Nov 9, 2021
@dali-automaton
Copy link
Collaborator

CI MESSAGE: [3362799]: BUILD PASSED

cyyever pushed a commit to cyyever/DALI that referenced this pull request Jan 23, 2022
Add file_filters argument to FileReader allowing to select files by
matching glob patterns.

Filters can be matched case-sensitively or case-insensitively by
specifying the value of case_sensitive_filter argument (by default
matching case-insensitive).

Add benchmark comparing original matching by known extensions
and the new matching with fnmatch. Changed implementation
entirely to match only with fnmatch in (file, label) case.

Add tests for file_filters in FileReader.

Temporary fix for other readers that use FileLabelLoader and don't
support case_sensitive_filter and thus do not assign it the default
value when initializing with GetArgument("case_sensitive_filter").

Signed-off-by: ksztenderski <ksztenderski@nvidia.com>
cyyever pushed a commit to cyyever/DALI that referenced this pull request Jan 23, 2022
Add file_filters argument to FileReader allowing to select files by
matching glob patterns.

Filters can be matched case-sensitively or case-insensitively by
specifying the value of case_sensitive_filter argument (by default
matching case-insensitive).

Add benchmark comparing original matching by known extensions
and the new matching with fnmatch. Changed implementation
entirely to match only with fnmatch in (file, label) case.

Add tests for file_filters in FileReader.

Temporary fix for other readers that use FileLabelLoader and don't
support case_sensitive_filter and thus do not assign it the default
value when initializing with GetArgument("case_sensitive_filter").

Signed-off-by: ksztenderski <ksztenderski@nvidia.com>
cyyever pushed a commit to cyyever/DALI that referenced this pull request Jan 23, 2022
Add file_filters argument to FileReader allowing to select files by
matching glob patterns.

Filters can be matched case-sensitively or case-insensitively by
specifying the value of case_sensitive_filter argument (by default
matching case-insensitive).

Add benchmark comparing original matching by known extensions
and the new matching with fnmatch. Changed implementation
entirely to match only with fnmatch in (file, label) case.

Add tests for file_filters in FileReader.

Temporary fix for other readers that use FileLabelLoader and don't
support case_sensitive_filter and thus do not assign it the default
value when initializing with GetArgument("case_sensitive_filter").

Signed-off-by: ksztenderski <ksztenderski@nvidia.com>
cyyever pushed a commit to cyyever/DALI that referenced this pull request Jan 23, 2022
Add file_filters argument to FileReader allowing to select files by
matching glob patterns.

Filters can be matched case-sensitively or case-insensitively by
specifying the value of case_sensitive_filter argument (by default
matching case-insensitive).

Add benchmark comparing original matching by known extensions
and the new matching with fnmatch. Changed implementation
entirely to match only with fnmatch in (file, label) case.

Add tests for file_filters in FileReader.

Temporary fix for other readers that use FileLabelLoader and don't
support case_sensitive_filter and thus do not assign it the default
value when initializing with GetArgument("case_sensitive_filter").

Signed-off-by: ksztenderski <ksztenderski@nvidia.com>
cyyever pushed a commit to cyyever/DALI that referenced this pull request Jan 23, 2022
Add file_filters argument to FileReader allowing to select files by
matching glob patterns.

Filters can be matched case-sensitively or case-insensitively by
specifying the value of case_sensitive_filter argument (by default
matching case-insensitive).

Add benchmark comparing original matching by known extensions
and the new matching with fnmatch. Changed implementation
entirely to match only with fnmatch in (file, label) case.

Add tests for file_filters in FileReader.

Temporary fix for other readers that use FileLabelLoader and don't
support case_sensitive_filter and thus do not assign it the default
value when initializing with GetArgument("case_sensitive_filter").

Signed-off-by: ksztenderski <ksztenderski@nvidia.com>
cyyever pushed a commit to cyyever/DALI that referenced this pull request Jan 23, 2022
Add file_filters argument to FileReader allowing to select files by
matching glob patterns.

Filters can be matched case-sensitively or case-insensitively by
specifying the value of case_sensitive_filter argument (by default
matching case-insensitive).

Add benchmark comparing original matching by known extensions
and the new matching with fnmatch. Changed implementation
entirely to match only with fnmatch in (file, label) case.

Add tests for file_filters in FileReader.

Temporary fix for other readers that use FileLabelLoader and don't
support case_sensitive_filter and thus do not assign it the default
value when initializing with GetArgument("case_sensitive_filter").

Signed-off-by: ksztenderski <ksztenderski@nvidia.com>
cyyever pushed a commit to cyyever/DALI that referenced this pull request Jan 23, 2022
Add file_filters argument to FileReader allowing to select files by
matching glob patterns.

Filters can be matched case-sensitively or case-insensitively by
specifying the value of case_sensitive_filter argument (by default
matching case-insensitive).

Add benchmark comparing original matching by known extensions
and the new matching with fnmatch. Changed implementation
entirely to match only with fnmatch in (file, label) case.

Add tests for file_filters in FileReader.

Temporary fix for other readers that use FileLabelLoader and don't
support case_sensitive_filter and thus do not assign it the default
value when initializing with GetArgument("case_sensitive_filter").

Signed-off-by: ksztenderski <ksztenderski@nvidia.com>
cyyever pushed a commit to cyyever/DALI that referenced this pull request Jan 23, 2022
Add file_filters argument to FileReader allowing to select files by
matching glob patterns.

Filters can be matched case-sensitively or case-insensitively by
specifying the value of case_sensitive_filter argument (by default
matching case-insensitive).

Add benchmark comparing original matching by known extensions
and the new matching with fnmatch. Changed implementation
entirely to match only with fnmatch in (file, label) case.

Add tests for file_filters in FileReader.

Temporary fix for other readers that use FileLabelLoader and don't
support case_sensitive_filter and thus do not assign it the default
value when initializing with GetArgument("case_sensitive_filter").

Signed-off-by: ksztenderski <ksztenderski@nvidia.com>
cyyever pushed a commit to cyyever/DALI that referenced this pull request Jan 23, 2022
Add file_filters argument to FileReader allowing to select files by
matching glob patterns.

Filters can be matched case-sensitively or case-insensitively by
specifying the value of case_sensitive_filter argument (by default
matching case-insensitive).

Add benchmark comparing original matching by known extensions
and the new matching with fnmatch. Changed implementation
entirely to match only with fnmatch in (file, label) case.

Add tests for file_filters in FileReader.

Temporary fix for other readers that use FileLabelLoader and don't
support case_sensitive_filter and thus do not assign it the default
value when initializing with GetArgument("case_sensitive_filter").

Signed-off-by: ksztenderski <ksztenderski@nvidia.com>
cyyever pushed a commit to cyyever/DALI that referenced this pull request Jan 23, 2022
Add file_filters argument to FileReader allowing to select files by
matching glob patterns.

Filters can be matched case-sensitively or case-insensitively by
specifying the value of case_sensitive_filter argument (by default
matching case-insensitive).

Add benchmark comparing original matching by known extensions
and the new matching with fnmatch. Changed implementation
entirely to match only with fnmatch in (file, label) case.

Add tests for file_filters in FileReader.

Temporary fix for other readers that use FileLabelLoader and don't
support case_sensitive_filter and thus do not assign it the default
value when initializing with GetArgument("case_sensitive_filter").

Signed-off-by: ksztenderski <ksztenderski@nvidia.com>
cyyever pushed a commit to cyyever/DALI that referenced this pull request Jan 23, 2022
Add file_filters argument to FileReader allowing to select files by
matching glob patterns.

Filters can be matched case-sensitively or case-insensitively by
specifying the value of case_sensitive_filter argument (by default
matching case-insensitive).

Add benchmark comparing original matching by known extensions
and the new matching with fnmatch. Changed implementation
entirely to match only with fnmatch in (file, label) case.

Add tests for file_filters in FileReader.

Temporary fix for other readers that use FileLabelLoader and don't
support case_sensitive_filter and thus do not assign it the default
value when initializing with GetArgument("case_sensitive_filter").

Signed-off-by: ksztenderski <ksztenderski@nvidia.com>
cyyever pushed a commit to cyyever/DALI that referenced this pull request Jan 23, 2022
Add file_filters argument to FileReader allowing to select files by
matching glob patterns.

Filters can be matched case-sensitively or case-insensitively by
specifying the value of case_sensitive_filter argument (by default
matching case-insensitive).

Add benchmark comparing original matching by known extensions
and the new matching with fnmatch. Changed implementation
entirely to match only with fnmatch in (file, label) case.

Add tests for file_filters in FileReader.

Temporary fix for other readers that use FileLabelLoader and don't
support case_sensitive_filter and thus do not assign it the default
value when initializing with GetArgument("case_sensitive_filter").

Signed-off-by: ksztenderski <ksztenderski@nvidia.com>
cyyever pushed a commit to cyyever/DALI that referenced this pull request Jan 23, 2022
Add file_filters argument to FileReader allowing to select files by
matching glob patterns.

Filters can be matched case-sensitively or case-insensitively by
specifying the value of case_sensitive_filter argument (by default
matching case-insensitive).

Add benchmark comparing original matching by known extensions
and the new matching with fnmatch. Changed implementation
entirely to match only with fnmatch in (file, label) case.

Add tests for file_filters in FileReader.

Temporary fix for other readers that use FileLabelLoader and don't
support case_sensitive_filter and thus do not assign it the default
value when initializing with GetArgument("case_sensitive_filter").

Signed-off-by: ksztenderski <ksztenderski@nvidia.com>
cyyever pushed a commit to cyyever/DALI that referenced this pull request Feb 21, 2022
Add file_filters argument to FileReader allowing to select files by
matching glob patterns.

Filters can be matched case-sensitively or case-insensitively by
specifying the value of case_sensitive_filter argument (by default
matching case-insensitive).

Add benchmark comparing original matching by known extensions
and the new matching with fnmatch. Changed implementation
entirely to match only with fnmatch in (file, label) case.

Add tests for file_filters in FileReader.

Temporary fix for other readers that use FileLabelLoader and don't
support case_sensitive_filter and thus do not assign it the default
value when initializing with GetArgument("case_sensitive_filter").

Signed-off-by: ksztenderski <ksztenderski@nvidia.com>
cyyever pushed a commit to cyyever/DALI that referenced this pull request May 13, 2022
Add file_filters argument to FileReader allowing to select files by
matching glob patterns.

Filters can be matched case-sensitively or case-insensitively by
specifying the value of case_sensitive_filter argument (by default
matching case-insensitive).

Add benchmark comparing original matching by known extensions
and the new matching with fnmatch. Changed implementation
entirely to match only with fnmatch in (file, label) case.

Add tests for file_filters in FileReader.

Temporary fix for other readers that use FileLabelLoader and don't
support case_sensitive_filter and thus do not assign it the default
value when initializing with GetArgument("case_sensitive_filter").

Signed-off-by: ksztenderski <ksztenderski@nvidia.com>
cyyever pushed a commit to cyyever/DALI that referenced this pull request Jun 7, 2022
Add file_filters argument to FileReader allowing to select files by
matching glob patterns.

Filters can be matched case-sensitively or case-insensitively by
specifying the value of case_sensitive_filter argument (by default
matching case-insensitive).

Add benchmark comparing original matching by known extensions
and the new matching with fnmatch. Changed implementation
entirely to match only with fnmatch in (file, label) case.

Add tests for file_filters in FileReader.

Temporary fix for other readers that use FileLabelLoader and don't
support case_sensitive_filter and thus do not assign it the default
value when initializing with GetArgument("case_sensitive_filter").

Signed-off-by: ksztenderski <ksztenderski@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants