Merge pull request #70 from JFormoso/main

Directory clean up
ScienceCore · Jun 5, 2024 · dda615b · dda615b
2 parents 2706958 + 85790df
commit dda615b
Show file tree

Hide file tree

Showing 79 changed files with 1,433 additions and 2,692 deletions.
diff --git a/book/.netrc b/book/.netrc
diff --git a/book/01_Open_Science/Open_Science_Intro.md b/book/01_Open_Science/Open_Science_Intro.md
@@ -0,0 +1,105 @@
+---
+jupyter:
+  jupytext:
+    text_representation:
+      extension: .md
+      format_name: markdown
+      format_version: '1.3'
+      jupytext_version: 1.16.2
+  kernelspec:
+    display_name: base
+    language: python
+    name: python3
+---
+
+## About this tutorial
+
+This tutorial is part of a project which focuses on leveraging the vast amount of Earth science data available through the NASA Earthdata Cloud to better understand and forecast environmental risks such as wildfire, drought, and floods. At its core, this project embodies the principles of open science, aiming to make data, methods, and findings accessible to all. 
+We aim to equip learners with the skills to analyze, visualize, and report on data related to these critical environmental risks through open science-based workflows and the use of cloud-based data computing.
+
+
+## What is Open Science
+
+"Open Science is the principle and practice of making research products and processes available to all, while respecting diverse cultures, maintaining security and privacy, and fostering collaborations, reproducibility, and equity."
+
+<!-- #region -->
+
+<img src="../assets/image165.png" width="800">
+
+### Availability of Open Science Resources:
+
+- Many existing open science resources, over 100 Petabytes of openly available NASA data.
+- Tools and practices for collaboration and code development.
+
+### Outputs and Project Openness:
+
+- Choice between openness from project inception or at publication.
+- Making data, code, and results open.
+
+### Importance of Sharing and Impact:
+
+- Enhances the discoverability and accessibility of scientific processes and outputs.
+- Open methods enhance reproducibility.
+- Transparency and verifiability enhance accuracy.
+- Scrutiny of analytic decisions promotes trust.
+- Accessible data and collective efforts accelerate discoveries.
+- Open science fosters inclusion, diversity, equity, and accessibility (IDEA).
+- And much more..
+
+
+<img src="../assets/image377.jpg" width="800">
+
+<!-- #endregion -->
+
+## Why now
+
+- The internet offers numerous platforms for public hosting and free access to research and data. These platforms, coupled with advancements in computational power, empower individuals to engage in sophisticated data analysis. This connectivity facilitates the integration of participants, stakeholders, and outcomes of open science initiatives online.
+
+- Science and science communication confront growing resistance from the public due to concerns about result reproducibility and the proliferation of misinformation. Open science practices address these challenges by leveraging community feedback to validate results more rigorously and by making findings readily accessible to the public, countering misinformation.
+
+- Scientific rigor and accuracy are bolstered when researchers validate their peers' findings. However, the lack of access to original data and code in scientific articles delays this process.
+
+<!-- #region -->
+## Where to start: Open Research Products
+
+Scientific knowledge, or research products, take the form of:
+
+<img src="../assets/image5.png" width="500">
+
+### What is data?
+
+Scientifically or technically relevant information that can be stored digitally and accessed electronically such as:
+
+- Information produced by missions and experiments, including calibrations, coefficients, and documentation.
+- Information needed to validate scientific conclusions of peer-reviewed publications.
+- Metadata.
+
+### What is code?
+
+- General Purpose Software – Software produced for widespread use, not specialized scientific purposes. This encompasses both commercial software and open-source software.
+- Operational and Infrastructure Software – Software used by data centers and large information technology facilities to provide data services.
+- Libraries – No creative process is truly complete until it manifests a tangible reality. Whether your idea is an action or a physical creation, bringing it to life will likely involve the hard work of iteration, testing, and refinement.
+- Modeling and Simulation Software – Software that either implements solutions to mathematical equations given input data and boundary conditions, or infers models from data.
+- Analysis Software – Software developed to manipulate measurements or model results to visualize or gain understanding.
+- Single-use Software – Software written for use in unique instances, such as making a plot for a paper, or manipulating data in a specific way.
+
+### What are results?
+
+Results capture the different research outputs of the scientific process. Publications are the most common type of results, but this can include a number of other types of products:
+
+- Peer-reviewed publications
+- Computational notebooks
+- Blog posts
+- Videos and podcasts
+- Social media posts
+- Conference abstracts and presentations
+- Forum discussions
+
+Products are created throughout the scientific process that are needed to enable others to reproduce the findings. The products of research include data, code, analysis pipelines, papers, and more!
+
+
+<img src="../assets/image7.jpeg" width="800">
+
+
+
+<!-- #endregion -->
diff --git a/book/01_Open_Science/Open_Science_Intro_Slides.md b/book/01_Open_Science/Open_Science_Intro_Slides.md
@@ -0,0 +1,105 @@
+---
+jupyter:
+  jupytext:
+    text_representation:
+      extension: .md
+      format_name: markdown
+      format_version: '1.3'
+      jupytext_version: 1.16.2
+  kernelspec:
+    display_name: base
+    language: python
+    name: python3
+---
+
+## About this tutorial
+
+This tutorial is part of a project which focuses on leveraging the vast amount of Earth science data available through the NASA Earthdata Cloud to better understand and forecast environmental risks such as wildfire, drought, and floods. At its core, this project embodies the principles of open science, aiming to make data, methods, and findings accessible to all. 
+We aim to equip learners with the skills to analyze, visualize, and report on data related to these critical environmental risks through open science-based workflows and the use of cloud-based data computing.
+
+
+## What is Open Science
+
+"Open Science is the principle and practice of making research products and processes available to all, while respecting diverse cultures, maintaining security and privacy, and fostering collaborations, reproducibility, and equity."
+
+<!-- #region -->
+
+<img src="../assets/image165.png" width="800">
+
+### Availability of Open Science Resources:
+
+- Many existing open science resources, over 100 Petabytes of openly available NASA data.
+- Tools and practices for collaboration and code development.
+
+### Outputs and Project Openness:
+
+- Choice between openness from project inception or at publication.
+- Making data, code, and results open.
+
+### Importance of Sharing and Impact:
+
+- Enhances the discoverability and accessibility of scientific processes and outputs.
+- Open methods enhance reproducibility.
+- Transparency and verifiability enhance accuracy.
+- Scrutiny of analytic decisions promotes trust.
+- Accessible data and collective efforts accelerate discoveries.
+- Open science fosters inclusion, diversity, equity, and accessibility (IDEA).
+- And much more..
+
+
+<img src="../assets/image377.jpg" width="800">
+
+<!-- #endregion -->
+
+## Why now
+
+- The internet offers numerous platforms for public hosting and free access to research and data. These platforms, coupled with advancements in computational power, empower individuals to engage in sophisticated data analysis. This connectivity facilitates the integration of participants, stakeholders, and outcomes of open science initiatives online.
+
+- Science and science communication confront growing resistance from the public due to concerns about result reproducibility and the proliferation of misinformation. Open science practices address these challenges by leveraging community feedback to validate results more rigorously and by making findings readily accessible to the public, countering misinformation.
+
+- Scientific rigor and accuracy are bolstered when researchers validate their peers' findings. However, the lack of access to original data and code in scientific articles delays this process.
+
+<!-- #region -->
+## Where to start: Open Research Products
+
+Scientific knowledge, or research products, take the form of:
+
+<img src="../assets/image5.png" width="500">
+
+### What is data?
+
+Scientifically or technically relevant information that can be stored digitally and accessed electronically such as:
+
+- Information produced by missions and experiments, including calibrations, coefficients, and documentation.
+- Information needed to validate scientific conclusions of peer-reviewed publications.
+- Metadata.
+
+### What is code?
+
+- General Purpose Software – Software produced for widespread use, not specialized scientific purposes. This encompasses both commercial software and open-source software.
+- Operational and Infrastructure Software – Software used by data centers and large information technology facilities to provide data services.
+- Libraries – No creative process is truly complete until it manifests a tangible reality. Whether your idea is an action or a physical creation, bringing it to life will likely involve the hard work of iteration, testing, and refinement.
+- Modeling and Simulation Software – Software that either implements solutions to mathematical equations given input data and boundary conditions, or infers models from data.
+- Analysis Software – Software developed to manipulate measurements or model results to visualize or gain understanding.
+- Single-use Software – Software written for use in unique instances, such as making a plot for a paper, or manipulating data in a specific way.
+
+### What are results?
+
+Results capture the different research outputs of the scientific process. Publications are the most common type of results, but this can include a number of other types of products:
+
+- Peer-reviewed publications
+- Computational notebooks
+- Blog posts
+- Videos and podcasts
+- Social media posts
+- Conference abstracts and presentations
+- Forum discussions
+
+Products are created throughout the scientific process that are needed to enable others to reproduce the findings. The products of research include data, code, analysis pipelines, papers, and more!
+
+
+<img src="../assets/image7.jpeg" width="800">
+
+
+
+<!-- #endregion -->
diff --git a/book/2_Selecting_an_AOI.py → ...patial_fundamentals/2_Selecting_an_AOI.md b/book/2_Selecting_an_AOI.py → ...patial_fundamentals/2_Selecting_an_AOI.md
@@ -1,26 +1,26 @@
-# ---
-# jupyter:
-#   jupytext:
-#     text_representation:
-#       extension: .py
-#       format_name: light
-#       format_version: '1.5'
-#       jupytext_version: 1.16.1
-#   kernelspec:
-#     display_name: nasa_topst
-#     language: python
-#     name: python3
-# ---
-
-# Selecting and modifying Areas of Interest (AOIs) is an important part of geospatial data analysis workflows. The Python ecosystem of libraries provide a number ways to do this, some of which will be explored and demonstrated in this notebook. In particular, we will demonstrate the following: 
-# 1. How to specify AOIs in different ways
-# 2. How to use `geopandas` to load shapely geometries, visualize them, and perform operations such as `intersection`
-# 3. Querying data providers using the AOIs defined, and understanding how query results can change based on AOI
-# 4. Perform windowing operations when reading raster data using `rasterio`
-#
-# This can be used to effectively query cloud services for datasets over specific regions, 
-
-# +
+---
+jupyter:
+  jupytext:
+    text_representation:
+      extension: .md
+      format_name: markdown
+      format_version: '1.3'
+      jupytext_version: 1.16.2
+  kernelspec:
+    display_name: nasa_topst
+    language: python
+    name: python3
+---
+
+Selecting and modifying Areas of Interest (AOIs) is an important part of geospatial data analysis workflows. The Python ecosystem of libraries provide a number ways to do this, some of which will be explored and demonstrated in this notebook. In particular, we will demonstrate the following: 
+1. How to specify AOIs in different ways
+2. How to use `geopandas` to load shapely geometries, visualize them, and perform operations such as `intersection`
+3. Querying data providers using the AOIs defined, and understanding how query results can change based on AOI
+4. Perform windowing operations when reading raster data using `rasterio`
+
+This can be used to effectively query cloud services for datasets over specific regions, 
+
+```python
 # library to handle filepath operations
 from pathlib import Path
 
@@ -49,15 +49,17 @@
                   GDAL_HTTP_COOKIEJAR=Path('~/cookies.txt').expanduser()
                   )
 rio_env.__enter__()
-# -
+```
 
-# AOIs are `vector` data formats, because they refer to specific `points` or `polygons` that refer to the location of interest in a given co-ordinate reference system (CRS). For example, the city center of of [Tokyo, Japan](https://en.wikipedia.org/wiki/Tokyo) can be specified by the latitude and longitude pair (35.689722, 139.692222) in the [WGS84 CRS](https://en.wikipedia.org/wiki/World_Geodetic_System). In Python, we use the popular `shapely` library to define a vector shapes, as shown below:
+AOIs are `vector` data formats, because they refer to specific `points` or `polygons` that refer to the location of interest in a given co-ordinate reference system (CRS). For example, the city center of of [Tokyo, Japan](https://en.wikipedia.org/wiki/Tokyo) can be specified by the latitude and longitude pair (35.689722, 139.692222) in the [WGS84 CRS](https://en.wikipedia.org/wiki/World_Geodetic_System). In Python, we use the popular `shapely` library to define a vector shapes, as shown below:
 
+```python
 tokyo_point = Point(35.689722, 139.692222)
+```
 
-# This code will generative an interactive plot - feel free to pan/zoom around!
+This code will generative an interactive plot - feel free to pan/zoom around!
 
-# +
+```python
 m = folium.Map(location=(tokyo_point.x, tokyo_point.y), control_scale = True, zoom_start=8)
 radius = 50
 folium.CircleMarker(
@@ -74,17 +76,18 @@
 ).add_to(m)
 
 m
-# -
+```
 
-# AOIs can also take the form of bounds. Typically they are specified by four values - the minimum and maximum extent each in the `x` and `y` directions. For rasterio, these are specified in the format `(x_min, y_min, x_max, y_max)`, with values specified in the local CRS. We specify values in the local CRS.
+AOIs can also take the form of bounds. Typically they are specified by four values - the minimum and maximum extent each in the `x` and `y` directions. For rasterio, these are specified in the format `(x_min, y_min, x_max, y_max)`, with values specified in the local CRS. We specify values in the local CRS.
 
-# +
+```python
 marrakesh_polygon = Polygon.from_bounds(-8.18, 31.42, -7.68, 31.92)
 
 # We will load the polygon into a geopandas dataframe for ease of plotting
 gdf = gpd.GeoDataFrame({'geometry':[marrakesh_polygon]}, crs='epsg:4326')
+```
 
-# +
+```python
 m = folium.Map(location=[31.62, -7.88], zoom_start=10)
 for _, row in gdf.iterrows():
     sim_geo = gpd.GeoSeries(row["geometry"]).simplify(tolerance=0.001)
@@ -94,13 +97,13 @@
     geo_j.add_to(m)
 
 m
-# -
+```
 
-# Geopandas dataframes require a `geometry` column containing `shapely` shapes (`Points`, `Polygons`, etc.)  and also require a `CRS` to be specified to work and render correctly. In this example, we specify `EPSG:4326` as our CRS, which corresponds to WGS84 system, which refers to locations on the globe using `(latitude, longitude)` pair values.
-#
-# Let's add another polygon to the above example, and also see how to calculate their intersection:
+Geopandas dataframes require a `geometry` column containing `shapely` shapes (`Points`, `Polygons`, etc.)  and also require a `CRS` to be specified to work and render correctly. In this example, we specify `EPSG:4326` as our CRS, which corresponds to WGS84 system, which refers to locations on the globe using `(latitude, longitude)` pair values.
 
-# +
+Let's add another polygon to the above example, and also see how to calculate their intersection:
+
+```python
 marrakesh_polygon = Polygon.from_bounds(-8.18, 31.42, -7.68, 31.92) # Original polygon
 marrakesh_polygon_2 = Polygon.from_bounds(-8.38, 31.22, -7.68, 31.52) # Arbitrary second overlapping polygon
 intersection_polygon = marrakesh_polygon.intersection(marrakesh_polygon_2) # Calculate the intersection of polygons
@@ -119,11 +122,11 @@
     geo_j.add_to(m)
 
 m
-# -
+```
 
-# Let us now query a DAAC for data over a new region. We will be going over the details of the query in the next chapter, but will simply see an example of data querying here. First, let's look at the region in a folium map
+Let us now query a DAAC for data over a new region. We will be going over the details of the query in the next chapter, but will simply see an example of data querying here. First, let's look at the region in a folium map
 
-# +
+```python
 lake_mead_polygon = Polygon.from_bounds(-114.52, 36.11,-114.04, 36.48)
 
 # We will load the polygon into a geopandas dataframe for ease of plotting
@@ -137,8 +140,9 @@
     geo_j.add_to(m)
 
 m
+```
 
-# +
+```python
 # URL of CMR service
 STAC_URL = 'https://cmr.earthdata.nasa.gov/stac'
 
@@ -157,26 +161,34 @@
 }
 
 search = catalog.search(**opts)
-# -
+```
 
+```python
 print("Number of tiles found intersecting given polygon: ", len(list(search.items())))
+```
+
+How many search results did you get? What happens if you modify the date range in the previous cell and re-run the search? Note: if you make the time window too large, it will take a while for results to return. 
 
-# How many search results did you get? What happens if you modify the date range in the previous cell and re-run the search? Note: if you make the time window too large, it will take a while for results to return. 
-#
-# Lastly, let's visualize some of the returned data. Here's a sample returned search result - you can click on  the keys and see the data contained in them:
+Lastly, let's visualize some of the returned data. Here's a sample returned search result - you can click on  the keys and see the data contained in them:
 
+```python
 sample_result = list(search.items())[0]
 sample_result
+```
 
+```python
 data_url = sample_result.assets['0_B01_WTR'].href
+```
 
+```python
 with rasterio.open(data_url) as ds:
     img = ds.read(1)
     cmap = ds.colormap(1)
     profile = ds.profile
 cmap = ListedColormap([np.array(cmap[key]) / 255 for key in range(256)])
+```
 
-# +
+```python
 fig, ax = plt.subplots(1, 1, figsize=(10,  10))
 im = show(img, ax=ax, transform=profile['transform'], cmap=cmap, interpolation='none')
 
@@ -207,3 +219,4 @@
                      'HLS Cloud/Cloud Shadow', 
                     ],
                     fontsize=7)   
+```
diff --git a/book/remote-sensing.md → ...Geospatial_fundamentals/remote-sensing.md b/book/remote-sensing.md → ...Geospatial_fundamentals/remote-sensing.md