Skip to content

Library for cropping geospatial data into smaller pieces maximizing covered area

License

Notifications You must be signed in to change notification settings

abetatos/mapchete

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

65 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

mapchete_final

Cut your geospatial data into smaller pieces

MAPchete

Welcome to my Github project! This repository was created to assist with the preparation of geospatial data for deep learning purposes. Specifically, the project focuses on efficiently cropping large datasets into smaller tiles, with the goal of generating a dataset with minimum overlap between tiles (Which could result in the loss of representativeness) and avoiding nodata tiles. This process is essential for achieving optimal model performance, and can be applied to various other applications within the geospatial imagery field. Thank you for checking out my project, and feel free to explore the code and contribute to its development!

What does MAPchete have to offer?

It generates patches based on a probabilistic approach that tries to augment the covered area distributing images more efficiently while avoiding images with a nodata percentage avobe a given threshold. It is perfect for deep learning purposes as it will maximize the outcome of your model!

If we generate the dataset randomly we can see that there are zones that have great number of tiles, while with this approach you can obtain a more well distributed dataset.

With an input of shape:

mapchete_final

How is the spatial distribution of tiles? (Not normalized scales)

Iterations RANDchete (random) MAXchete (maximize)
100 iter
1000 iter image image
std (1000 iter) ± 9.1 ± 3.9

Compared to the random sample, the data is now more evenly distributed and the standard deviation has been reduced by over 2 times which shows library's effectiveness.

How does it work?

There are three ways of creating your dataset:

  • randchete -> Random approach
  • seqchete -> Sequential approach
  • maxchete -> Distribution approach

Just instantiate the class and machete the data!

from mapchete import FARMchete

maxchete = FARMchete(input_file).get("maxchete")
maxchete.plot_bands()

mapchete_final

Run to get the tiles:

maxchete.get_rasters(avg_density=4, size=512 , no_data_percentage=0.3, output_path="raster_clip", clear_output_path=True)
# avg_density In how many output images a given pixel of the input image will be in average.

Study the output

fig, ax = maxchete.get_3Ddistribution()

mapchete_final

There is another useful function called merge_tiffs which can merge generated images to se check the distribution fo tiles. If you use lower sampling, this becomes a useful tool, but if you opt for higher sampling, the algorithm should be capable of generating the complete extent of the original image.

from mapchete import merge_tiffs
merge_tiffs(folder="raster_clip")

mapchete_final

Instalation

At the moment there is no suitable distribution in PyPi, so installing must be done trough the setup.py

git clone https://github.com/abetatos/mapchete.git 
cd mapchete
python setup.py install

In older versions of python (<=3.7) rasterio gives some trouble installing, it is recommended to use newer versions (>3.7) or conda for the instalation. Rasterio version is not fixed in the requirements to give more versatility to the user but it was tested with version==1.2.10.

About

Library for cropping geospatial data into smaller pieces maximizing covered area

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages