Skip to content

Latest commit

 

History

History
237 lines (214 loc) · 11.2 KB

TODO.md

File metadata and controls

237 lines (214 loc) · 11.2 KB

TODO List

  • Write unit tests.
  • Achieve 100% test coverage.
  • Python Learning: design a preprocessing function that takes a tf.tensor. This function should take a batch, preprocess it (possibly using many cores) and then fetch the results. The results should then be saved to disk. Useful links: TensorFlow guide to data performance, TensorFlow tutorial to image classification, TensorFlow tutorial to loading images, TensorFlow guide to building input pipelines.
  • bl.utils could be split into many utilities submodules.
  • Use type annotations where applicable.
  • Document code.
  • Allow different batch sizes for different models.
  • Why do more_itertools.filter_except and more_itertools.map_except need to do exceptions = tuple(exceptions)?
  • Finish step detection analysis.
  • Implement a function wrapper that transforms the arguments before forwarding. For instance:
import operator

lower_eq = transform(operator.eq, keyfunc=lambda x: x.lower())
assert "Hi" != "hi"
assert lower_eq("Hi", "hi")
  • Am I normalizing images correctly? Make sure I am!
  • Write READMEs for each subpackage.
  • Include licenses in each module.
  • Make cv2 path-like compliant.
  • Take a look at the relationship between bubble or droplet formation rate and camera acquisition speed.
  • [No. Take a look at sentinel package or PEP 0661] Implement a typing helper Sentinel which expects a sentinel value called, for instance, _sentinel, or another type. Equivalent to typing.Optional, but using any other sentinel instead of None. See typing.Literal in Python 3.8.
  • Create my own models and test Kramer's. Some steps are:
    • Learn where to put Dropout layers. This paper is awesome.
    • Always make the number of dense units a multiple of 8. There is a Tensorflow reference for this, find it.
    • Check if image sizes should be multiples of 8 as well.
    • Implement droplet/bubble tracking. See what André Provensi texted me.
    • Can the wet/dry areas ratio be of use to the nets?
    • Think of cool names for the nets.
  • Read this. Am I evaluating models correctly?
  • Include strategy as part of a model's description?
  • Implement callbacks for reporting the history and timestamps of a models' training. This would be useful to compare the training of models, in special execution speed (to allow comparison between CPUs versus GPUs or uniform versus mixed precision).
  • See Netron for NN.
  • Choose a reasonably performing network and train two versions of it: with and without mixed precision. Measure train time and final validation loss. The training should always be performed in the same conditions (i.e. using GPUs and MirroredStrategy), being the application of mixed precision the only difference between the two nets.
  • Organize datasets and publish them on Kaggle?
  • Use narrower visualization windows?
  • Take a look at this, on how to use TensorBoard, and at TensorFlow's guide.
  • Include depth? See

Elboushaki, A., Hannane, R., Afdel, K., Koutti, L., 2020. MultiD-CNN: A multi-dimensional feature learning approach based on deep convolutional networks for gesture recognition in RGB-D image sequences. Expert Systems with Applications.. doi:10.1016/j.eswa.2019.112829

They have two inputs: a RGB image + a depth, which maps each pixel of an image to a relative distance to the photographer. With a 2D experiment, this would be very important to include a depth map to allow the model to see a different between closer bubbles (that should look bigger) and more distant bubbles (which look smaller).

  • Use object detection.
  • Use transfer learning from one case to another.
  • Implement a way to measure the training time.
  • Implement a warm-up: the first epoch of training (after compiling or restoring) should be discarded to avoid including TF warmup in the training time measurement.
  • Optimize for the activation functions
  • For many parameters, and above all for setting key names, how about creating a specialized dataclasses.dataclass? For instance, instead of:
class CSVDataset:
    def __init__(
        self,
        path: Path,
        features_columns: Optional[List[str]] = None,
        target_column: str = "target",
    ) -> None:
        if features_columns is None:
            features_columns = ["image_path"]

        X = pd.read_csv(path, columns=features_columns + [target_column])
        self.y = X.pop(target_column)
        self.X = X

we could write:

@dataclass(frozen=True, kwargs=True)
class CSVDatasetColumns:
    features_columns: List[str] = field(default_factory=lambda: ["image_path"])
    target_column: str = "target"


class CSVDataset:
    def __init__(
        self, path: Path, csv_columns: CSVDatasetColumns = CSVDatasetColumns()
    ) -> None:
        X = pd.read_csv(
            path, columns=csv_columns.features_columns + [csv_columns.target_column]
        )
        self.y = X.pop(csv_columns.target_column)
        self.X = X

It may become a little bit more verbose, but it also isolates the logic of parameters. Also, it avoids using string constants directly in the function signature, delegating this responsibility to a helper class.

  • Implement integrated gradients.

  • Perform the following studies:

    • Influence of batch size
    • Learning curve (metric versus dataset size)
    • Visualization window size
    • Direct versus indirect visualization
    • How random contrast (and others) affect image variance, and what does this mean in machine learning?
    • Train on one set, evaluate on another
  • [No. It is not useful enough.] Release Pack as a standalone package, including functional programming functionality:

def double(arg):
    return 2 * arg


def is_greater_than(threshold):
    return lambda arg: arg > threshold


p = Pack("abc", x=3, y=2)
res = (
    p  # sends p
    | double  # duplicates all values: Pack('aa', 'bb', 'cc', x=6, y=4)
    | (
        str.upper,
        is_greater_than(5),
    )  # applies str.upper to args, is_greater_than(5) to kwargs values
)
print(res)  # prints Pack('AA', 'BB', 'CC', x=True, y=False)

and think of other things.