From c6d02b1f1793beebc351749221979defb9d430c7 Mon Sep 17 00:00:00 2001 From: Jakub Nowosad Date: Wed, 25 Sep 2024 11:25:54 +0200 Subject: [PATCH 1/2] improves ch10 --- 10-gis.Rmd | 120 ++++++++++++++++++++++++++--------------------------- _10-ex.Rmd | 10 +++-- 2 files changed, 66 insertions(+), 64 deletions(-) diff --git a/10-gis.Rmd b/10-gis.Rmd index ec242c9e8..aa40fa434 100644 --- a/10-gis.Rmd +++ b/10-gis.Rmd @@ -41,8 +41,8 @@ A defining feature of [interpreted](https://en.wikipedia.org/wiki/Interpreter_(c rather than relying on pointing and clicking on different parts of a screen, you type commands into the console and execute them with the `Enter` key. A common and effective workflow when using interactive development environments such as RStudio or VS Code is to type code into source files in a source editor and control interactive execution of the code with a shortcut such as `Ctrl+Enter`. -CLIs are not unique to R: most early computing environments relied on a command line 'shell' and it was only after the invention and widespread adoption of the computer mouse in the 1990s that graphical user interfaces (GUIs)\index{graphical user interface} became common. -GRASS GIS the longest-standing continuously-developed open source GIS\index{GIS} software, for example, relied on its command line interface before it gained a GUI [@landa_new_2008]. +Command line interfaces (CLIs) are not unique to R: most early computing environments relied on a command line 'shell' and it was only after the invention and widespread adoption of the computer mouse in the 1990s that graphical user interfaces (GUIs)\index{graphical user interface} became common. +GRASS GIS the longest-standing continuously developed open source GIS\index{GIS} software, for example, relied on its CLI before it gained a GUI [@landa_new_2008]. Most popular GIS software projects are GUI-driven. You *can* interact with QGIS\index{QGIS}, SAGA\index{SAGA}, GRASS GIS\index{GRASS GIS} and gvSIG from system terminals and embedded CLIs, but their design encourages most people to interact with them by 'pointing and clicking'. An unintended consequence of this is that most GIS users miss out on the advantages of CLI-driven and scriptable approaches. @@ -50,11 +50,11 @@ According to the creator of the popular QGIS software [@sherman_desktop_2008]: > With the advent of 'modern' GIS software, most people want to point and click their way through life. That’s good, but there is a tremendous amount of flexibility and power waiting for you with the command line. Many times you can do something on the command line in a fraction of the time you can do it with a GUI. -The 'CLI vs GUI' debate does not have to be adverserial: both ways of working have advantages, depending on a range of factors including the task (with drawing new features being well suited to GUIs), the level of reproducibility desired, and the user's skillset. +The 'CLI vs GUI' debate does not have to be adversarial: both ways of working have advantages, depending on a range of factors including the task (with drawing new features being well suited to GUIs), the level of reproducibility desired, and the user's skillset. GRASS GIS is a good example of GIS software that is primarily based on a CLI but which also has a prominent GUI. Likewise, while R is focused on its CLI, IDEs such as RStudio provide a GUI for improving accessibility. -Software cannot be neatly categorised into CLI-based or GUI-based. -However, interactive command line interfaces have several important advantages in terms of: +Software cannot be neatly categorized into CLI or GUI-based. +However, interactive command-line interfaces have several important advantages in terms of: - Automating repetitive tasks - Enabling transparency and reproducibility @@ -79,12 +79,12 @@ IDEs such as RStudio and VS Code provide code auto-completion and other features ``` R is a natural choice for people wanting to build bridges between reproducible data analysis workflows and GIS because it *originated* as an interface language. -A key feature of R (and its predecessor S) is that it provides access to statistical algorithms in other languages (particularly FORTRAN\index{FORTRAN} and C), but from a powerful high level functional language with an intuitive REPL environment, which C and FORTRAN lacked [@chambers_extending_2016]. +A key feature of R (and its predecessor S) is that it provides access to statistical algorithms in other languages (particularly FORTRAN\index{FORTRAN} and C), but from a powerful high-level functional language with an intuitive REPL environment, which C and FORTRAN lacked [@chambers_extending_2016]. R continues this tradition with interfaces to numerous languages, notably C++\index{C++}. Although R was not designed as a command line GIS, its ability to interface with dedicated GISs gives it astonishing geospatial capabilities. -With GIS bridges, R can replicate more diverse workflows, with the additional reproducibility, scalability and productity benefits of controlling them from a programming environment and a consistent CLI. -Furthermore, R outperforms GISs in some areas of geocomputation\index{geocomputation}, including interactive/animated map making (see Chapter \@ref(adv-map)) and spatial statistical modeling (see Chapter \@ref(spatial-cv)). +With GIS bridges, R can replicate more diverse workflows, with the additional reproducibility, scalability and productivity benefits of controlling them from a programming environment and a consistent CLI. +Furthermore, R outperforms GISs in some areas of geocomputation\index{geocomputation}, including interactive/animated map-making (see Chapter \@ref(adv-map)) and spatial statistical modeling (see Chapter \@ref(spatial-cv)). This chapter focuses on 'bridges' to three mature open source GIS products, summarized in Table \@ref(tab:gis-comp): @@ -98,8 +98,8 @@ There have also been major developments in enabling open source GIS software to library(dplyr) d = tibble( "GIS" = c("QGIS", "SAGA", "GRASS GIS"), - "First release" = c("2002", "2004", "1982"), - "No. functions" = c(">1000", ">600", ">500"), + "First Release" = c("2002", "2004", "1982"), + "No. Functions" = c(">1000", ">600", ">500"), "Support" = c("hybrid", "hybrid", "hybrid") ) knitr::kable( @@ -121,7 +121,7 @@ In addition to the three R-GIS bridges mentioned above, this chapter also provid QGIS\index{QGIS} is the most popular open-source GIS (Table \@ref(tab:gis-comp); @graser_processing_2015). QGIS provides a unified interface to QGIS's native geoalgorithms, GDAL, and --- when they are installed --- from other *providers* such as GRASS GIS\index{GRASS GIS}, and SAGA\index{SAGA} [@graser_processing_2015]. -Since version 3.14 (released in summer 2020), QGIS ships with the `qgis_process` command line utility for accessing a bounty of functionality for geocomputation. +Since version 3.14 (released in summer 2020), QGIS ships with the `qgis_process` command-line utility for accessing a bounty of functionality for geocomputation. `qgis_process` provides access to 300+ geoalgorithms in the standard QGIS installation and 1,000+ via plugins to external providers such as GRASS GIS and SAGA. The **qgisprocess** package\index{qgisprocess (package)} provides access to `qgis_process` from R. @@ -168,7 +168,7 @@ qgis_enable_plugins(c("grassprovider", "processing_saga_nextgen"), quiet = TRUE) ``` -Please note that aside from installing SAGA on your system you also need to install the QGIS Python plugin Processing Saga NextGen. +Please note that aside from installing SAGA on your system, you also need to install the QGIS Python plugin Processing Saga NextGen. You can do so from within QGIS with the [Plugin Manager](https://docs.qgis.org/latest/en/docs/training_manual/qgis_plugins/fetching_plugins.html) or programmatically with the help of the Python package [qgis-plugin-manager](https://github.com/3liz/qgis-plugin-manager) (at least on Linux). `qgis_providers()` lists the name of the software and the corresponding count of available geoalgorithms. @@ -207,7 +207,7 @@ incongr_wgs = st_transform(incongruent, "EPSG:4326") aggzone_wgs = st_transform(aggregating_zones, "EPSG:4326") ``` -```{r uniondata, echo=FALSE, fig.cap="Illustration of two areal units: incongruent (black lines) and aggregating zones (red borders).", fig.width=8} +```{r uniondata, echo=FALSE, fig.cap="Two areal units: incongruent (black lines) and aggregating zones (red borders).", fig.width=8} #| message: FALSE #| results: hide library(tmap) @@ -246,7 +246,7 @@ qgis_search_algorithms("union") One of the algorithms on the above list, `"native:union"`, sounds promising. The next step is to find out what this algorithm does and how we can use it. This is the role of the `qgis_show_help()`, which returns a short summary of what the algorithm does, its arguments, and outputs.^[We can also extract some of information independently with `qgis_get_description()`, `qgis_get_argument_specs()`, and `qgis_get_output_specss()`.] -This makes it output rather long. +This makes its output rather long. The following command returns a data frame with each row representing an argument required by `"native:union"` and columns with the name, description, type, default value, available values, and acceptable values associated with each: ```{r 09-gis-6, eval=FALSE, linewidth=80} @@ -278,8 +278,8 @@ This can be very convenient, but we recommend providing the path to your spatial This can increase algorithm runtimes. The main function of **qgisprocess** is `qgis_run_algorithm()`, which sends inputs to QGIS and returns the outputs. -It accepts the algorithm name and a set of named arguments shown in the help list, and performs expected calculations. -In our case, three arguments seem important - `INPUT`, `OVERLAY`, and `OUTPUT`. +It accepts the algorithm name and a set of named arguments shown in the help list, and it performs expected calculations. +In our case, three arguments seem important: `INPUT`, `OVERLAY`, and `OUTPUT`. The first one, `INPUT`, is our main vector object `incongr_wgs`, while the second one, `OVERLAY`, is `aggzone_wgs`. The last argument, `OUTPUT`, is an output file name, which **qgisprocess** will automatically choose and create in `tempdir()` if none is provided. @@ -300,7 +300,7 @@ union_sf = st_as_sf(union) ``` Note that the QGIS\index{QGIS} union\index{vector!union} operation merges the two input layers into one layer by using the intersection\index{vector!intersection} and the symmetrical difference of the two input layers (which, by the way, is also the default when doing a union operation in GRASS GIS\index{GRASS GIS} and SAGA\index{SAGA}). -This is **not** the same as `st_union(incongr_wgs, aggzone_wgs)` (see Exercises)! +This is **not** the same as `st_union(incongr_wgs, aggzone_wgs)` (see the Exercises)! The result, `union_sf`, is a multipolygon with a larger number of features than two input objects. Notice, however, that many of these polygons are small and do not represent real areas but are rather a result of our two datasets having a different level of detail. @@ -325,15 +325,15 @@ The GRASS GIS provider in QGIS was called `grass7` until QGIS version 3.34. Thus, if you have an older QGIS version, you must prefix the algorithms with `grass7` instead of `grass`. ``` -Similarly to the previous step, we should start by looking at this algorithm's help. +Similar to the previous step, we should start by looking at this algorithm's help. ```{r, eval=FALSE} qgis_show_help("grass:v.clean") ``` -We have omitted the output here, because the help text is quite long and contains a lot of arguments.^[Also note that these arguments, contrary to the QGIS's ones, are in lower case.] -This is because `v.clean` is a multi tool -- it can clean different types of geometries and solve different types of topological problems. -For this example, let's focus on just a few arguments, however, we encourage you to visit [this algorithm's documentation](https://grass.osgeo.org/grass-stable/manuals/v.clean.html) to learn more about `v.clean` capabilities. +We have omitted the output here, because the help text is quite long and contains a lot of arguments.^[Also note that these arguments, contrary to the QGIS's ones, are in lowercase.] +This is because `v.clean` is a multi-tool -- it can clean different types of geometries and solve different types of topological problems. +For this example, let's focus on just a few arguments, however, we encourage you to visit this [algorithm's documentation](https://grass.osgeo.org/grass-stable/manuals/v.clean.html) to learn more about `v.clean` capabilities. ```{r, eval=FALSE} qgis_get_argument_specs("grass:v.clean") |> @@ -366,7 +366,7 @@ clean = qgis_run_algorithm("grass:v.clean", clean_sf = st_as_sf(clean) ``` -The result, the right panel of \@ref(fig:sliver), looks as expected -- sliver polygons are now removed. +The result, the right panel of Figure \@ref(fig:sliver), looks as expected -- sliver polygons are now removed. ```{r sliver, echo=FALSE, fig.cap="Sliver polygons colored in red (left panel). Cleaned polygons (right panel)."} knitr::include_graphics("images/10-sliver.png") @@ -377,10 +377,10 @@ knitr::include_graphics("images/10-sliver.png") Digital elevation models (DEMs)\index{digital elevation model} contain elevation information for each raster cell. They are used for many purposes, including satellite navigation, water flow models, surface analysis, or visualization. Here, we are interested in deriving new information from a DEM raster that could be used as predictors for statistical learning. -Various terrain parameters can be helpful, for example, for the prediction of landslides (see Chapter \@ref(spatial-cv)) +Various terrain parameters can be helpful, for example, for the prediction of landslides (see Chapter \@ref(spatial-cv)). For this section, we will use `dem.tif` -- a digital elevation model of the Mongón study area (downloaded from the Land Process Distributed Active Archive Center, see also `?dem.tif`). -It has a resolution of about 30 by 30 meters and uses a projected CRS. +It has a resolution of about 30 x 30 meters and uses a projected CRS. ```{r, eval=FALSE, message=FALSE} library(qgisprocess) @@ -419,8 +419,8 @@ Therefore, we only have to specify one argument -- the input `DEM`. Of course, when applying this algorithm you should make sure that the parameter values are in correspondence with your study aim.^[The additional arguments of `"sagang:sagawetnessindex"` are well-explained at https://gis.stackexchange.com/a/323454/20955.] Before running the SAGA algorithm from within QGIS, we change the default raster output format from `.tif` to SAGA's native raster format `.sdat`. -Hence, all output rasters we do not specify ourselves will from now on be written to the `.sdat` format. -Depending on the software versions (SAGA, GDAL) you are using, this might not be necessary but often enough this will save you trouble when trying to read in output rasters created with SAGA. +Hence, all output rasters that we do not specify ourselves will from now on be written to the `.sdat` format. +Depending on the software versions (SAGA, GDAL) you are using, this might not be necessary, but often enough this will save you trouble when trying to read-in output rasters created with SAGA. ```{r, eval=FALSE} options(qgisprocess.tmp_raster_ext = ".sdat") @@ -439,10 +439,10 @@ dem_wetness_twi = qgis_as_terra(dem_wetness$TWI) options(qgisprocess.tmp_raster_ext = ".tif") ``` -You can see the TWI map on the left panel of Figure \@ref(fig:qgis-raster-map). +You can see the TWI map in the left panel of Figure \@ref(fig:qgis-raster-map). The topographic wetness index is unitless: its low values represent areas that will not accumulate water, while higher values show areas that will accumulate water at increasing levels. -Information from digital elevation models can also be categorized, for example, to geomorphons\index{geomorphons} -- the geomorphological phenotypes consisting of 10 classes that represent terrain forms, such as slopes, ridges, or valleys [@jasiewicz_geomorphons_2013]. +Information from digital elevation models can also be categorized, for example, to geomorphons\index{geomorphons} -- the geomorphological phenotypes consisting of ten classes that represent terrain forms, such as slopes, ridges, or valleys [@jasiewicz_geomorphons_2013]. These phenotypes are used in many studies, including landslide susceptibility, ecosystem services, human mobility, and digital soil mapping. The original implementation of the geomorphons' algorithm was created in GRASS GIS, and we can find it in the **qgisprocess** list as `"grass:r.geomorphon"`: @@ -454,7 +454,7 @@ qgis_show_help("grass:r.geomorphon") # output not shown ``` -Calculation of geomorphons requires an input DEM (`elevation`), and can be customized with a set of optional arguments. +Calculation of geomorphons requires an input DEM (`elevation`) and can be customized with a set of optional arguments. It includes, `search` -- a length for which the line-of-sight is calculated, and ``-m`` -- a flag specifying that the search value will be provided in meters (and not the number of cells). More information about additional arguments can be found in the original paper and the [GRASS GIS documentation](https://grass.osgeo.org/grass-stable/manuals/r.geomorphon.html). @@ -465,7 +465,7 @@ dem_geomorph = qgis_run_algorithm("grass:r.geomorphon", ) ``` -Our output, `dem_geomorph$forms`, contains a raster file with 10 categories -- each one representing a terrain form. +Our output, `dem_geomorph$forms`, contains a raster file with ten categories -- each representing a terrain form. We can read it into R with `qgis_as_terra()`, and then visualize it (Figure \@ref(fig:qgis-raster-map), right panel) or use it in our subsequent calculations. ```{r, eval=FALSE} @@ -481,11 +481,11 @@ knitr::include_graphics("images/10-qgis-raster-map.png") ## SAGA {#saga} -The System for Automated Geoscientific Analyses (SAGA\index{SAGA}; Table \@ref(tab:gis-comp)) provides the possibility to execute SAGA modules via the command line interface\index{command line interface} (`saga_cmd.exe` under Windows and just `saga_cmd` under Linux) (see the [SAGA wiki on modules](https://sourceforge.net/p/saga-gis/wiki/Executing%20Modules%20with%20SAGA%20CMD/)). +The System for Automated Geoscientific Analyses (SAGA\index{SAGA}; Table \@ref(tab:gis-comp)) provides the possibility to execute SAGA modules via the command-line interface\index{command line interface} (`saga_cmd.exe` under Windows and just `saga_cmd` under Linux) (see the [SAGA wiki on modules](https://sourceforge.net/p/saga-gis/wiki/Executing%20Modules%20with%20SAGA%20CMD/)). In addition, there is a Python interface (SAGA Python API\index{API}). **Rsagacmd**\index{Rsagacmd (package)} uses the former to run SAGA\index{SAGA} from within R. -We will use **Rsagacmd** in this section to delineate areas with similar values of the normalized difference vegetation index (NDVI) of the Mongón study area in Peru from the 22nd of September 2000 (Figure \@ref(fig:sagasegments), left panel) by using a seeded region growing algorithm from SAGA\index{segmentation}.^[Read Section \@ref(local-operations) on details of how to calculate NDVI from a remote sensing image.] +We will use **Rsagacmd** in this section to delineate areas with similar values of the normalized difference vegetation index (NDVI) of the Mongón study area in Peru from the September 22, 2000 (Figure \@ref(fig:sagasegments), left panel) by using a seeded region growing algorithm from SAGA\index{segmentation}.^[Read Section \@ref(local-operations) on details of how to calculate NDVI from a remote-sensing image.] ```{r, eval=FALSE} ndvi = rast(system.file("raster/ndvi.tif", package = "spDataLarge")) @@ -529,7 +529,7 @@ Our output is a list of three objects: `variance` -- a raster map of local varia The second SAGA tool we use is `seeded_region_growing`.^[You can read more about the tool at https://saga-gis.sourceforge.io/saga_tool_doc/8.3.0/imagery_segmentation_3.html.] The `seeded_region_growing` tool requires two inputs: our `seed_grid` calculated in the previous step and the `ndvi` raster object. -Additionally, we can specify several parameters, such as `normalize` to standardize the input features, `neighbour` (4 or 8-neighborhood), and `method`. +Additionally, we can specify several parameters, such as `normalize` to standardize the input features, `neighbour` (4- or 8-neighborhood), and `method`. The last parameter can be set to either `0` or `1` (region growing is based on raster cells' values and their positions or just the values). For a more detailed description of the method, see @bohner_image_2006. @@ -558,16 +558,16 @@ knitr::include_graphics("images/10-saga-segments.png") The resulting polygons (segments) represent areas with similar values. They can also be further aggregated into larger polygons using various techniques, such as clustering (e.g., *k*-means), regionalization (e.g., SKATER) or supervised classification methods. -You can try to do it in Exercises. +You can try to do it in the Exercises. R also has other tools to achieve the goal of creating polygons with similar values (so-called segments). -It includes the **SegOptim** package [@goncalves_segoptim_2019] that allows running several image segmentation algorithms and **supercells** [@nowosad_extended_2022] that implements superpixels\index{segmentation!superpixels} algorithm SLIC to work with geospatial data. +It includes the **SegOptim** package [@goncalves_segoptim_2019] that allows running several image segmentation algorithms and **supercells** package [@nowosad_extended_2022] that implements superpixels\index{segmentation!superpixels} algorithm SLIC to work with geospatial data. ## GRASS GIS {#grass} The U.S. Army - Construction Engineering Research Laboratory (USA-CERL) created the core of the Geographical Resources Analysis Support System (GRASS GIS)\index{GRASS GIS} (Table \@ref(tab:gis-comp); @neteler_open_2008) from 1982 to 1995. Academia continued this work since 1997. -Similar to SAGA\index{SAGA}, GRASS GIS focused on raster processing in the beginning while only later, since GRASS GIS 6.0, adding advanced vector functionality [@bivand_applied_2013]. +Similar to SAGA\index{SAGA}, GRASS GIS focused on raster processing in the beginning while, only later since GRASS GIS 6.0, adding advanced vector functionality [@bivand_applied_2013]. GRASS GIS stores the input data in an internal database. With regard to vector data, GRASS GIS is by default a topological GIS, i.e., it only stores the geometry of adjacent features once. @@ -586,10 +586,10 @@ To quickly use GRASS GIS from within R, we will use the **link2GI** package, how See [GRASS within R](https://grasswiki.osgeo.org/wiki/R_statistics/rgrass#GRASS_within_R) for how to do so. Please note that the code instructions in the following paragraphs might be hard to follow when using GRASS GIS for the first time but by running through the code line-by-line and by examining the intermediate results, the reasoning behind it should become even clearer. -Here, we introduce **rgrass**\index{rgrass (package)} with one of the most interesting problems in GIScience - the traveling salesman problem\index{traveling salesman}. +Here, we introduce **rgrass**\index{rgrass (package)} with one of the most interesting problems in GIScience: the traveling salesman problem\index{traveling salesman}. Suppose a traveling salesman would like to visit 24 customers. -Additionally, the salesman would like to start and finish his journey at home which makes a total of 25 locations while covering the shortest distance possible. -There is a single best solution to this problem; however, to check all of the possible solutions it is (mostly) impossible for modern computers [@longley_geographic_2015]. +Additionally, the salesman would like to start and finish the journey at home which makes a total of 25 locations while covering the shortest distance possible. +There is a single best solution to this problem; however, to check all of the possible solutions, it is (mostly) impossible for modern computers [@longley_geographic_2015]. In our case, the number of possible solutions correspond to `(25 - 1)! / 2`, i.e., the factorial of 24 divided by 2 (since we do not differentiate between forward or backward direction). Even if one iteration can be done in a nanosecond, this still corresponds to `r format(factorial(25 - 1) / (2 * 10^9 * 3600 * 24 * 365))` years. Luckily, there are clever, almost optimal solutions which run in a tiny fraction of this inconceivable amount of time. @@ -602,7 +602,7 @@ points = cycle_hire[1:25, ] ``` Aside from the cycle hire points data, we need a street network for this area. -We can download it with from OpenStreetMap\index{OpenStreetMap} with the help of the **osmdata** \index{osmdata (package)} package (see also Section \@ref(retrieving-data)). +We can download it from OpenStreetMap\index{OpenStreetMap} with the help of the **osmdata** \index{osmdata (package)} package (see also Section \@ref(retrieving-data)). To do this, we constrain the query of the street network (in OSM language called "highway") to the bounding box\index{bounding box} of `points`, and attach the corresponding data as an `sf`-object\index{sf}. `osmdata_sf()` returns a list with several spatial objects (points, lines, polygons, etc.), but here, we only keep the line objects with their related ids.^[As a convenience to the reader, one can attach `london_streets` to the global environment using `data("london_streets", package = "spDataLarge")`.] @@ -641,7 +641,7 @@ write_VECT(terra::vect(points[, 1]), vname = "points") The **rgrass** package expects its inputs and gives its outputs as **terra** objects. Therefore, we need to convert our `sf` spatial vectors to **terra**'s `SpatVector`s using the `vect()` function to be able to use `write_VECT()`.^[You can learn more how to convert between spatial classes in R by reading the (Conversions between different spatial classes in R)[https://geocompx.org/post/2021/spatial-classes-conversion/] blog post and the -(Coercion between object formats)[https://CRAN.R-project.org/package=rgrass/vignettes/coerce.html] vignette] +(Coercion between object formats)[https://CRAN.R-project.org/package=rgrass/vignettes/coerce.html] vignette.] Now, both datasets exist in the GRASS GIS database. To perform our network\index{network} analysis, we need a topologically clean street network\index{topology cleaning}. @@ -656,7 +656,7 @@ execGRASS( ``` ```{block2 06-raster-vector-22, type='rmdnote'} -To learn about the possible arguments and flags of the GRASS GIS modules you can you the `help` flag. +To learn about the possible arguments and flags of the GRASS GIS modules, you can use the `help` flag. For example, try `execGRASS("g.region", flags = "help")`. ``` @@ -722,38 +722,38 @@ To find out which datasets are currently available, run `execGRASS("g.list", typ Prior to importing data into R, you might want to perform some (spatial) subsetting\index{vector!subsetting}. Use `"v.select"` and `"v.extract"` for vector data. `"db.select"` lets you select subsets of the attribute table of a vector layer without returning the corresponding geometry. -- You can also start R from within a running GRASS GIS\index{GRASS GIS} session [for more information please refer to @bivand_applied_2013]. +- You can also start R from within a running GRASS GIS\index{GRASS GIS} session [for more information, please refer to @bivand_applied_2013]. - Refer to the excellent [GRASS GIS online help](https://grass.osgeo.org/grass-stable/manuals/) or `execGRASS("g.manual", flags = "i")` for more information on each available GRASS GIS geoalgorithm\index{geoalgorithm}. ## When to use what? -To recommend a single R-GIS interface is hard since the usage depends on personal preferences, the tasks at hand and your familiarity with different GIS\index{GIS} software packages which in turn probably depends on your domain. -As mentioned previously, SAGA\index{SAGA} is especially good at the fast processing of large (high-resolution) raster\index{raster} datasets, and frequently used by hydrologists, climatologists and soil scientists [@conrad_system_2015]. +To recommend a single R-GIS interface is hard since the usage depends on personal preferences, the tasks at hand, and your familiarity with different GIS\index{GIS} software packages which in turn probably depends on your domain. +As mentioned previously, SAGA\index{SAGA} is especially good at the fast processing of large (high-resolution) raster\index{raster} datasets and frequently used by hydrologists, climatologists and soil scientists [@conrad_system_2015]. GRASS GIS\index{GRASS GIS}, on the other hand, is the only GIS presented here supporting a topologically based spatial database which is especially useful for network analyses but also simulation studies. QGISS\index{QGIS} is much more user-friendly compared to GRASS GIS and SAGA, especially for first-time GIS users, and probably the most popular open-source GIS. Therefore, **qgisprocess**\index{qgisprocess (package)} is an appropriate choice for most use cases. Its main advantages are: - A unified access to several GIS, and therefore the provision of >1000 geoalgorithms (Table \@ref(tab:gis-comp)) including duplicated functionality, e.g., you can perform overlay-operations using QGIS-\index{QGIS}, SAGA-\index{SAGA} or GRASS GIS-geoalgorithms\index{GRASS GIS} -- Automatic data format conversions (SAGA uses `.sdat` grid files and GRASS GIS uses its own database format but QGIS will handle the corresponding conversions) +- Automatic data format conversions (SAGA uses `.sdat` grid files and GRASS GIS uses its own database format, but QGIS will handle the corresponding conversions) - Its automatic passing of geographic R objects to QGIS geoalgorithms\index{geoalgorithm} and back into R - Convenience functions to support named arguments and automatic default value retrieval (as inspired by **rgrass**\index{rgrass (package)}) By all means, there are use cases when you certainly should use one of the other R-GIS bridges. -Though QGIS is the only GIS providing a unified interface to several GIS\index{GIS} software packages, it only provides access to a subset of the corresponding third-party geoalgorithms (for more information please refer to @muenchow_rqgis:_2017). +Though QGIS is the only GIS providing a unified interface to several GIS\index{GIS} software packages, it only provides access to a subset of the corresponding third-party geoalgorithms (for more information, please refer to @muenchow_rqgis:_2017). Therefore, to use the complete set of SAGA and GRASS GIS functions, stick with **Rsagacmd**\index{Rsagacmd (package)} and **rgrass**. In addition, if you would like to run simulations with the help of a geodatabase\index{spatial database} [@krug_clearing_2010], use **rgrass** directly since **qgisprocess** always starts a new GRASS GIS session for each call. Finally, if you need topological correct data and/or spatial database management functionality such as multi-user access, we recommend the usage of GRASS GIS. -Please note that there are a number of further GIS software packages that have a scripting interface but for which there is no dedicated R package that accesses these: gvSig, OpenJump, and the Orfeo Toolbox.^[Please note that **link2GI** provides a partial integration with the Orfeo Toolbox\index{Orfeo Toolbox} and that you can also access the Orfeo Toolbox geoalgorithms via **qgisprocess**. Note also that TauDEM\index{TauDEM} can be accessed from with R with package **traudem**.] +Please note that there are a number of further GIS software packages that have a scripting interface but for which there is no dedicated R package that accesses these: gvSig, OpenJump, and the Orfeo Toolbox.^[Please note that **link2GI** provides a partial integration with the Orfeo Toolbox\index{Orfeo Toolbox} and that you can also access the Orfeo Toolbox geoalgorithms via **qgisprocess**. Note also that TauDEM\index{TauDEM} can be accessed from R with package **traudem**.] ## Bridges to GDAL {#gdal} As discussed in Chapter \@ref(read-write), GDAL\index{GDAL} is a low-level library that supports many geographic data formats. -GDAL is so effective that most GIS programs use GDAL\index{GDAL} in the background for importing and exporting geographic data, rather than re-inventing the wheel and using bespoke read-write code. +GDAL is so effective that most GIS programs use GDAL\index{GDAL} in the background for importing and exporting geographic data, rather than reinventing the wheel and using bespoke read-write code. But GDAL\index{GDAL} offers more than data I/O. It has [geoprocessing tools](https://gdal.org/programs/index.html) for vector and raster data, functionality to create [tiles](https://gdal.org/programs/gdal2tiles.html#gdal2tiles) for serving raster data online, and rapid [rasterization](https://gdal.org/programs/gdal_rasterize.html#gdal-rasterize) of vector data. -Since GDAL is a command line tool, all its commands can be accessed from within R via the `system()` command. +Since GDAL is a command-line tool, all its commands can be accessed from within R via the `system()` command. The code chunk below demonstrates this functionality: `linkGDAL()` searches the computer for a working GDAL\index{GDAL} installation and adds the location of the executable files to the PATH variable, allowing GDAL to be called (usually only needed under Windows). @@ -786,13 +786,13 @@ Other commonly used GDAL tools include: - `gdalinfo`: provides metadata of a raster dataset - `gdal_translate`: converts between different raster file formats - `ogr2ogr`: converts between different vector file formats -- `gdalwarp`: reprojects, transform, and clip raster datasets +- `gdalwarp`: reprojects, transforms, and clips raster datasets - `gdaltransform`: transforms coordinates Visit https://gdal.org/programs/ to see the complete list of GDAL tools and to read their help files. The 'link' to GDAL provided by **link2GI** could be used as a foundation for doing more advanced GDAL work from the R or system CLI. -TauDEM (https://hydrology.usu.edu/taudem/) and the Orfeo Toolbox (https://www.orfeo-toolbox.org/) are other spatial data processing libraries/programs offering a command line interface -- the above example shows how to access these libraries from the system command line via R. +TauDEM (https://hydrology.usu.edu/taudem/) and the Orfeo Toolbox (https://www.orfeo-toolbox.org/) are other spatial data processing libraries/programs offering a command-line interface -- the above example shows how to access these libraries from the system command line via R. This in turn could be the starting point for creating a proper interface to these libraries in the form of new R packages. Before diving into a project to create a new bridge, however, it is important to be aware of the power of existing R packages and that `system()` calls may not be platform-independent (they may fail on some computers). @@ -801,13 +801,13 @@ On the other hand, **sf** and **terra** brings most of the power provided by GDA ## Bridges to spatial databases {#postgis} \index{spatial database} -Spatial database management systems (spatial DBMS) store spatial and non-spatial data in a structured way. +Spatial database management systems (spatial DBMSs) store spatial and non-spatial data in a structured way. They can organize large collections of data into related tables (entities) via unique identifiers (primary and foreign keys) and implicitly via space (think for instance of a spatial join). This is useful because geographic datasets tend to become big and messy quite quickly. Databases enable storing and querying large datasets efficiently based on spatial and non-spatial fields, and provide multi-user access and topology\index{topological relations} support. The most important open source spatial database\index{spatial database} is PostGIS\index{PostGIS} [@obe_postgis_2015].^[ -SQLite/SpatiaLite are certainly also important but implicitly we have already introduced this approach since GRASS GIS\index{GRASS GIS} is using SQLite in the background (see Section \@ref(grass)). +SQLite/SpatiaLite are certainly also important, but implicitly we have already introduced this approach since GRASS GIS\index{GRASS GIS} is using SQLite in the background (see Section \@ref(grass)). ] R bridges to spatial DBMSs such as PostGIS\index{PostGIS} are important, allowing access to huge data stores without loading several gigabytes of geographic data into RAM, and likely crashing the R session. The remainder of this section shows how PostGIS can be called from R, based on "Hello real world" from *PostGIS in Action, Second Edition* [@obe_postgis_2015].^[ @@ -889,7 +889,7 @@ In fact, function names of the **sf** package largely follow the PostGIS\index{P The prefix `st` stands for space/time. ] -The last query will find all Hardee's restaurants (`HDE`) within the 35 km buffer zone (Figure \@ref(fig:postgis)). +The last query will find all Hardee's restaurants (`HDE`) within the 35-km buffer zone (Figure \@ref(fig:postgis)). ```{r 09-gis-42, eval=FALSE, warning=FALSE} query = paste( @@ -948,7 +948,7 @@ PostgreSQL/PostGIS is a formidable choice as an open-source spatial database. But the same is true for the lightweight SQLite/SpatiaLite database engine and GRASS GIS\index{GRASS GIS} which uses SQLite in the background (see Section \@ref(grass)). If your datasets are too big for PostgreSQL/PostGIS and you require massive spatial data management and query performance, it may be worth exploring large-scale geographic querying on distributed computing systems. -Such systems are outside the scope of this book but it worth mentioning that open source software providing this functionality exists. +Such systems are outside the scope of this book, but it is worth mentioning that open source software providing this functionality exists. Prominent projects in this space include [GeoMesa](http://www.geomesa.org/) and [Apache Sedona](https://sedona.apache.org/). The [**apache.sedona**](https://cran.r-project.org/package=apache.sedona) package provides an interface to the latter. @@ -995,7 +995,7 @@ However, keep in mind that the availability of COGs is a big plus while browsing For larger areas of interest, requested images are still relatively difficult to work with: they may use different map projections, may spatially overlap, and their spatial resolution often depends on the spectral band. The **gdalcubes** package\index{gdalcubes (package)} [@appel_gdalcubes_2019] can be used to abstract from individual images and to create and process image collections as four-dimensional data cubes\index{data cube}. -The code below shows a minimal example to create a lower resolution (250m) maximum NDVI composite from the Sentinel-2 images returned by the previous STAC-API search. +The code below shows a minimal example to create a lower resolution (250 m) maximum NDVI composite from the Sentinel-2 images returned by the previous STAC-API search. ```{r 09-gdalcubes-example, eval=FALSE} library(gdalcubes) @@ -1025,7 +1025,7 @@ For more details, please refer to this [tutorial presented at OpenGeoHub summer The combination of STAC\index{STAC}, COGs\index{COG}, and data cubes\index{data cube} forms a cloud-native workflow to analyze (large) collections of satellite imagery in the cloud\index{cloud computing}. These tools already form a backbone, for example, of the **sits** package\index{sits (package)}, which allows land use and land cover classification of big Earth observation data in R. The package builds EO data cubes from image collections available in cloud services and performs land classification of data cubes using various machine and deep learning algorithms. -For more information about **sits** visit https://e-sensing.github.io/sitsbook/ or read the related article [@rs13132428]. +For more information about **sits**, visit https://e-sensing.github.io/sitsbook/ or read the related article [@rs13132428]. ### openEO @@ -1036,8 +1036,8 @@ Implementations are available for eight different backends (see https://hub.open Since the functionality and data availability differs among the backends, the **openeo** R package [@lahn_openeo_2021] dynamically loads available processes and collections from the connected backend. Afterwards, users can load image collections, apply and chain processes, submit jobs, and explore and plot results. -The following code will connect to the [openEO platform backend](https://openeo.cloud/), request available datasets, processes, and output formats, define a process graph to compute a maximum NDVI image from Sentinel-2 data, and finally executes the graph after logging in to the backend. -The openEO\index{OpenEO} platform backend includes a free tier and registration is possible from existing institutional or internet platform accounts. +The following code will connect to the [openEO platform backend](https://openeo.cloud/), request available datasets, processes, and output formats, define a process graph to compute a maximum NDVI image from Sentinel-2 data, and finally execute the graph after logging in to the backend. +The openEO\index{OpenEO} platform backend includes a free tier, and registration is possible from existing institutional or internet platform accounts. ```{r 09-openeo-example, eval=FALSE} library(openeo) diff --git a/_10-ex.Rmd b/_10-ex.Rmd index 628b7aeae..48ddc2ecb 100644 --- a/_10-ex.Rmd +++ b/_10-ex.Rmd @@ -4,7 +4,7 @@ library(terra) ``` -E1. Compute global solar irradiation for an area of `system.file("raster/dem.tif", package = "spDataLarge")` for March 21 at 11:00 AM using the `r.sun` GRASS GIS through **qgisprocess**. +E1. Compute global solar irradiation for an area of `system.file("raster/dem.tif", package = "spDataLarge")` for March 21 at 11:00 am using the `r.sun` GRASS GIS through **qgisprocess**. ```{r} library(qgisprocess) @@ -29,6 +29,7 @@ plot(gsi_dem) ``` + E2. Compute catchment area\index{catchment area} and catchment slope of `system.file("raster/dem.tif", package = "spDataLarge")` using **Rsagacmd**. ```{r} @@ -93,6 +94,7 @@ tmap_arrange(tm1, tm2) ``` + E4. Attach `data(random_points, package = "spDataLarge")` and read `system.file("raster/dem.tif", package = "spDataLarge")` into R. Select a point randomly from `random_points` and find all `dem` pixels that can be seen from this point (hint: viewshed\index{viewshed} can be calculated using GRASS GIS). Visualize your result. @@ -140,8 +142,8 @@ mapview(out, col = "white", map.type = "Esri.WorldImagery") + ``` -E5. Use `gdalinfo` via a system call for a raster\index{raster} file stored on disk of your choice. -What kind of information you can find there? +E5. Use `gdalinfo` via a system call for a raster\index{raster} file stored on a disk of your choice. +What kind of information can you find there? ```{r} link2GI::linkGDAL() @@ -176,7 +178,7 @@ plot(st_geometry(ca_highways)) ``` -E8. The `ndvi.tif` raster (`system.file("raster/ndvi.tif", package = "spDataLarge")`) contains NDVI calculated for the Mongón study area based on Landsat data from September 22nd, 2000. +E8. The `ndvi.tif` raster (`system.file("raster/ndvi.tif", package = "spDataLarge")`) contains NDVI calculated for the Mongón study area based on Landsat data from September 22, 2000. Use **rstac**, **gdalcubes**, and **terra** to download Sentinel-2 images for the same area from 2020-08-01 to 2020-10-31, calculate its NDVI, and then compare it with the results from `ndvi.tif`. From 5f28c13ebf42d6abb9642d7a8377595bb2ec5814 Mon Sep 17 00:00:00 2001 From: Jakub Nowosad Date: Wed, 25 Sep 2024 15:12:20 +0200 Subject: [PATCH 2/2] Update 10-gis.Rmd --- 10-gis.Rmd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/10-gis.Rmd b/10-gis.Rmd index e9e8cc50d..d3ad40667 100644 --- a/10-gis.Rmd +++ b/10-gis.Rmd @@ -485,7 +485,7 @@ The System for Automated Geoscientific Analyses (SAGA\index{SAGA}; Table \@ref(t In addition, there is a Python interface (SAGA Python API\index{API}). **Rsagacmd**\index{Rsagacmd (package)} uses the former to run SAGA\index{SAGA} from within R. -We will use **Rsagacmd** in this section to delineate areas with similar values of the normalized difference vegetation index (NDVI) of the Mongón study area in Peru from the September 22, 2000 (Figure \@ref(fig:sagasegments), left panel) by using a seeded region growing algorithm from SAGA\index{segmentation}.^[Read Section \@ref(local-operations) on details of how to calculate NDVI from a remote-sensing image.] +We will use **Rsagacmd** in this section to delineate areas with similar values of the normalized difference vegetation index (NDVI) of the Mongón study area in Peru from September in the year 2000 (Figure \@ref(fig:sagasegments), left panel) by using a seeded region growing algorithm from SAGA\index{segmentation}.^[Read Section \@ref(local-operations) on details of how to calculate NDVI from a remote-sensing image.] ```{r, eval=FALSE} ndvi = rast(system.file("raster/ndvi.tif", package = "spDataLarge"))