Welcome to my Github project! This repository was created to assist with the preparation of geospatial data for deep learning purposes. Specifically, the project focuses on efficiently cropping large datasets into smaller tiles, with the goal of generating a dataset with minimum overlap between tiles (Which could result in the loss of representativeness) and avoiding nodata tiles. This process is essential for achieving optimal model performance, and can be applied to various other applications within the geospatial imagery field. Thank you for checking out my project, and feel free to explore the code and contribute to its development!
It generates patches based on a probabilistic approach that tries to augment the covered area distributing images more efficiently while avoiding images with a nodata percentage avobe a given threshold. It is perfect for deep learning purposes as it will maximize the outcome of your model!
If we generate the dataset randomly we can see that there are zones that have great number of tiles, while with this approach you can obtain a more well distributed dataset.
With an input of shape:
How is the spatial distribution of tiles? (Not normalized scales)
Iterations | RANDchete (random) | MAXchete (maximize) |
---|---|---|
100 iter | ||
1000 iter | ||
std (1000 iter) | ± 9.1 | ± 3.9 |
Compared to the random sample, the data is now more evenly distributed and the standard deviation has been reduced by over 2 times which shows library's effectiveness.
There are three ways of creating your dataset:
- randchete -> Random approach
- seqchete -> Sequential approach
- maxchete -> Distribution approach
Just instantiate the class and machete the data!
from mapchete import FARMchete
maxchete = FARMchete(input_file).get("maxchete")
maxchete.plot_bands()
maxchete.get_rasters(avg_density=4, size=512 , no_data_percentage=0.3, output_path="raster_clip", clear_output_path=True)
# avg_density In how many output images a given pixel of the input image will be in average.
fig, ax = maxchete.get_3Ddistribution()
There is another useful function called merge_tiffs which can merge generated images to se check the distribution fo tiles. If you use lower sampling, this becomes a useful tool, but if you opt for higher sampling, the algorithm should be capable of generating the complete extent of the original image.
from mapchete import merge_tiffs
merge_tiffs(folder="raster_clip")
At the moment there is no suitable distribution in PyPi, so installing must be done trough the setup.py
git clone https://github.com/abetatos/mapchete.git
cd mapchete
python setup.py install
In older versions of python (<=3.7) rasterio gives some trouble installing, it is recommended to use newer versions (>3.7) or conda for the instalation. Rasterio version is not fixed in the requirements to give more versatility to the user but it was tested with version==1.2.10.