Skip to content

Commit

Permalink
Merge pull request #70 from JFormoso/main
Browse files Browse the repository at this point in the history
Directory clean up
  • Loading branch information
JFormoso committed Jun 5, 2024
2 parents 2706958 + 85790df commit dda615b
Show file tree
Hide file tree
Showing 79 changed files with 1,433 additions and 2,692 deletions.
1 change: 0 additions & 1 deletion book/.netrc

This file was deleted.

105 changes: 105 additions & 0 deletions book/01_Open_Science/Open_Science_Intro.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
---
jupyter:
jupytext:
text_representation:
extension: .md
format_name: markdown
format_version: '1.3'
jupytext_version: 1.16.2
kernelspec:
display_name: base
language: python
name: python3
---

## About this tutorial

This tutorial is part of a project which focuses on leveraging the vast amount of Earth science data available through the NASA Earthdata Cloud to better understand and forecast environmental risks such as wildfire, drought, and floods. At its core, this project embodies the principles of open science, aiming to make data, methods, and findings accessible to all.
We aim to equip learners with the skills to analyze, visualize, and report on data related to these critical environmental risks through open science-based workflows and the use of cloud-based data computing.


## What is Open Science

"Open Science is the principle and practice of making research products and processes available to all, while respecting diverse cultures, maintaining security and privacy, and fostering collaborations, reproducibility, and equity."

<!-- #region -->

<img src="../assets/image165.png" width="800">

### Availability of Open Science Resources:

- Many existing open science resources, over 100 Petabytes of openly available NASA data.
- Tools and practices for collaboration and code development.

### Outputs and Project Openness:

- Choice between openness from project inception or at publication.
- Making data, code, and results open.

### Importance of Sharing and Impact:

- Enhances the discoverability and accessibility of scientific processes and outputs.
- Open methods enhance reproducibility.
- Transparency and verifiability enhance accuracy.
- Scrutiny of analytic decisions promotes trust.
- Accessible data and collective efforts accelerate discoveries.
- Open science fosters inclusion, diversity, equity, and accessibility (IDEA).
- And much more..


<img src="../assets/image377.jpg" width="800">

<!-- #endregion -->

## Why now

- The internet offers numerous platforms for public hosting and free access to research and data. These platforms, coupled with advancements in computational power, empower individuals to engage in sophisticated data analysis. This connectivity facilitates the integration of participants, stakeholders, and outcomes of open science initiatives online.

- Science and science communication confront growing resistance from the public due to concerns about result reproducibility and the proliferation of misinformation. Open science practices address these challenges by leveraging community feedback to validate results more rigorously and by making findings readily accessible to the public, countering misinformation.

- Scientific rigor and accuracy are bolstered when researchers validate their peers' findings. However, the lack of access to original data and code in scientific articles delays this process.

<!-- #region -->
## Where to start: Open Research Products

Scientific knowledge, or research products, take the form of:

<img src="../assets/image5.png" width="500">

### What is data?

Scientifically or technically relevant information that can be stored digitally and accessed electronically such as:

- Information produced by missions and experiments, including calibrations, coefficients, and documentation.
- Information needed to validate scientific conclusions of peer-reviewed publications.
- Metadata.

### What is code?

- General Purpose Software – Software produced for widespread use, not specialized scientific purposes. This encompasses both commercial software and open-source software.
- Operational and Infrastructure Software – Software used by data centers and large information technology facilities to provide data services.
- Libraries – No creative process is truly complete until it manifests a tangible reality. Whether your idea is an action or a physical creation, bringing it to life will likely involve the hard work of iteration, testing, and refinement.
- Modeling and Simulation Software – Software that either implements solutions to mathematical equations given input data and boundary conditions, or infers models from data.
- Analysis Software – Software developed to manipulate measurements or model results to visualize or gain understanding.
- Single-use Software – Software written for use in unique instances, such as making a plot for a paper, or manipulating data in a specific way.

### What are results?

Results capture the different research outputs of the scientific process. Publications are the most common type of results, but this can include a number of other types of products:

- Peer-reviewed publications
- Computational notebooks
- Blog posts
- Videos and podcasts
- Social media posts
- Conference abstracts and presentations
- Forum discussions

Products are created throughout the scientific process that are needed to enable others to reproduce the findings. The products of research include data, code, analysis pipelines, papers, and more!


<img src="../assets/image7.jpeg" width="800">



<!-- #endregion -->
105 changes: 105 additions & 0 deletions book/01_Open_Science/Open_Science_Intro_Slides.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
---
jupyter:
jupytext:
text_representation:
extension: .md
format_name: markdown
format_version: '1.3'
jupytext_version: 1.16.2
kernelspec:
display_name: base
language: python
name: python3
---

## About this tutorial

This tutorial is part of a project which focuses on leveraging the vast amount of Earth science data available through the NASA Earthdata Cloud to better understand and forecast environmental risks such as wildfire, drought, and floods. At its core, this project embodies the principles of open science, aiming to make data, methods, and findings accessible to all.
We aim to equip learners with the skills to analyze, visualize, and report on data related to these critical environmental risks through open science-based workflows and the use of cloud-based data computing.


## What is Open Science

"Open Science is the principle and practice of making research products and processes available to all, while respecting diverse cultures, maintaining security and privacy, and fostering collaborations, reproducibility, and equity."

<!-- #region -->

<img src="../assets/image165.png" width="800">

### Availability of Open Science Resources:

- Many existing open science resources, over 100 Petabytes of openly available NASA data.
- Tools and practices for collaboration and code development.

### Outputs and Project Openness:

- Choice between openness from project inception or at publication.
- Making data, code, and results open.

### Importance of Sharing and Impact:

- Enhances the discoverability and accessibility of scientific processes and outputs.
- Open methods enhance reproducibility.
- Transparency and verifiability enhance accuracy.
- Scrutiny of analytic decisions promotes trust.
- Accessible data and collective efforts accelerate discoveries.
- Open science fosters inclusion, diversity, equity, and accessibility (IDEA).
- And much more..


<img src="../assets/image377.jpg" width="800">

<!-- #endregion -->

## Why now

- The internet offers numerous platforms for public hosting and free access to research and data. These platforms, coupled with advancements in computational power, empower individuals to engage in sophisticated data analysis. This connectivity facilitates the integration of participants, stakeholders, and outcomes of open science initiatives online.

- Science and science communication confront growing resistance from the public due to concerns about result reproducibility and the proliferation of misinformation. Open science practices address these challenges by leveraging community feedback to validate results more rigorously and by making findings readily accessible to the public, countering misinformation.

- Scientific rigor and accuracy are bolstered when researchers validate their peers' findings. However, the lack of access to original data and code in scientific articles delays this process.

<!-- #region -->
## Where to start: Open Research Products

Scientific knowledge, or research products, take the form of:

<img src="../assets/image5.png" width="500">

### What is data?

Scientifically or technically relevant information that can be stored digitally and accessed electronically such as:

- Information produced by missions and experiments, including calibrations, coefficients, and documentation.
- Information needed to validate scientific conclusions of peer-reviewed publications.
- Metadata.

### What is code?

- General Purpose Software – Software produced for widespread use, not specialized scientific purposes. This encompasses both commercial software and open-source software.
- Operational and Infrastructure Software – Software used by data centers and large information technology facilities to provide data services.
- Libraries – No creative process is truly complete until it manifests a tangible reality. Whether your idea is an action or a physical creation, bringing it to life will likely involve the hard work of iteration, testing, and refinement.
- Modeling and Simulation Software – Software that either implements solutions to mathematical equations given input data and boundary conditions, or infers models from data.
- Analysis Software – Software developed to manipulate measurements or model results to visualize or gain understanding.
- Single-use Software – Software written for use in unique instances, such as making a plot for a paper, or manipulating data in a specific way.

### What are results?

Results capture the different research outputs of the scientific process. Publications are the most common type of results, but this can include a number of other types of products:

- Peer-reviewed publications
- Computational notebooks
- Blog posts
- Videos and podcasts
- Social media posts
- Conference abstracts and presentations
- Forum discussions

Products are created throughout the scientific process that are needed to enable others to reproduce the findings. The products of research include data, code, analysis pipelines, papers, and more!


<img src="../assets/image7.jpeg" width="800">



<!-- #endregion -->
Original file line number Diff line number Diff line change
@@ -1,26 +1,26 @@
# ---
# jupyter:
# jupytext:
# text_representation:
# extension: .py
# format_name: light
# format_version: '1.5'
# jupytext_version: 1.16.1
# kernelspec:
# display_name: nasa_topst
# language: python
# name: python3
# ---

# Selecting and modifying Areas of Interest (AOIs) is an important part of geospatial data analysis workflows. The Python ecosystem of libraries provide a number ways to do this, some of which will be explored and demonstrated in this notebook. In particular, we will demonstrate the following:
# 1. How to specify AOIs in different ways
# 2. How to use `geopandas` to load shapely geometries, visualize them, and perform operations such as `intersection`
# 3. Querying data providers using the AOIs defined, and understanding how query results can change based on AOI
# 4. Perform windowing operations when reading raster data using `rasterio`
#
# This can be used to effectively query cloud services for datasets over specific regions,

# +
---
jupyter:
jupytext:
text_representation:
extension: .md
format_name: markdown
format_version: '1.3'
jupytext_version: 1.16.2
kernelspec:
display_name: nasa_topst
language: python
name: python3
---

Selecting and modifying Areas of Interest (AOIs) is an important part of geospatial data analysis workflows. The Python ecosystem of libraries provide a number ways to do this, some of which will be explored and demonstrated in this notebook. In particular, we will demonstrate the following:
1. How to specify AOIs in different ways
2. How to use `geopandas` to load shapely geometries, visualize them, and perform operations such as `intersection`
3. Querying data providers using the AOIs defined, and understanding how query results can change based on AOI
4. Perform windowing operations when reading raster data using `rasterio`

This can be used to effectively query cloud services for datasets over specific regions,

```python
# library to handle filepath operations
from pathlib import Path

Expand Down Expand Up @@ -49,15 +49,17 @@
GDAL_HTTP_COOKIEJAR=Path('~/cookies.txt').expanduser()
)
rio_env.__enter__()
# -
```

# AOIs are `vector` data formats, because they refer to specific `points` or `polygons` that refer to the location of interest in a given co-ordinate reference system (CRS). For example, the city center of of [Tokyo, Japan](https://en.wikipedia.org/wiki/Tokyo) can be specified by the latitude and longitude pair (35.689722, 139.692222) in the [WGS84 CRS](https://en.wikipedia.org/wiki/World_Geodetic_System). In Python, we use the popular `shapely` library to define a vector shapes, as shown below:
AOIs are `vector` data formats, because they refer to specific `points` or `polygons` that refer to the location of interest in a given co-ordinate reference system (CRS). For example, the city center of of [Tokyo, Japan](https://en.wikipedia.org/wiki/Tokyo) can be specified by the latitude and longitude pair (35.689722, 139.692222) in the [WGS84 CRS](https://en.wikipedia.org/wiki/World_Geodetic_System). In Python, we use the popular `shapely` library to define a vector shapes, as shown below:

```python
tokyo_point = Point(35.689722, 139.692222)
```

# This code will generative an interactive plot - feel free to pan/zoom around!
This code will generative an interactive plot - feel free to pan/zoom around!

# +
```python
m = folium.Map(location=(tokyo_point.x, tokyo_point.y), control_scale = True, zoom_start=8)
radius = 50
folium.CircleMarker(
Expand All @@ -74,17 +76,18 @@
).add_to(m)

m
# -
```

# AOIs can also take the form of bounds. Typically they are specified by four values - the minimum and maximum extent each in the `x` and `y` directions. For rasterio, these are specified in the format `(x_min, y_min, x_max, y_max)`, with values specified in the local CRS. We specify values in the local CRS.
AOIs can also take the form of bounds. Typically they are specified by four values - the minimum and maximum extent each in the `x` and `y` directions. For rasterio, these are specified in the format `(x_min, y_min, x_max, y_max)`, with values specified in the local CRS. We specify values in the local CRS.

# +
```python
marrakesh_polygon = Polygon.from_bounds(-8.18, 31.42, -7.68, 31.92)

# We will load the polygon into a geopandas dataframe for ease of plotting
gdf = gpd.GeoDataFrame({'geometry':[marrakesh_polygon]}, crs='epsg:4326')
```

# +
```python
m = folium.Map(location=[31.62, -7.88], zoom_start=10)
for _, row in gdf.iterrows():
sim_geo = gpd.GeoSeries(row["geometry"]).simplify(tolerance=0.001)
Expand All @@ -94,13 +97,13 @@
geo_j.add_to(m)

m
# -
```

# Geopandas dataframes require a `geometry` column containing `shapely` shapes (`Points`, `Polygons`, etc.) and also require a `CRS` to be specified to work and render correctly. In this example, we specify `EPSG:4326` as our CRS, which corresponds to WGS84 system, which refers to locations on the globe using `(latitude, longitude)` pair values.
#
# Let's add another polygon to the above example, and also see how to calculate their intersection:
Geopandas dataframes require a `geometry` column containing `shapely` shapes (`Points`, `Polygons`, etc.) and also require a `CRS` to be specified to work and render correctly. In this example, we specify `EPSG:4326` as our CRS, which corresponds to WGS84 system, which refers to locations on the globe using `(latitude, longitude)` pair values.

# +
Let's add another polygon to the above example, and also see how to calculate their intersection:

```python
marrakesh_polygon = Polygon.from_bounds(-8.18, 31.42, -7.68, 31.92) # Original polygon
marrakesh_polygon_2 = Polygon.from_bounds(-8.38, 31.22, -7.68, 31.52) # Arbitrary second overlapping polygon
intersection_polygon = marrakesh_polygon.intersection(marrakesh_polygon_2) # Calculate the intersection of polygons
Expand All @@ -119,11 +122,11 @@
geo_j.add_to(m)

m
# -
```

# Let us now query a DAAC for data over a new region. We will be going over the details of the query in the next chapter, but will simply see an example of data querying here. First, let's look at the region in a folium map
Let us now query a DAAC for data over a new region. We will be going over the details of the query in the next chapter, but will simply see an example of data querying here. First, let's look at the region in a folium map

# +
```python
lake_mead_polygon = Polygon.from_bounds(-114.52, 36.11,-114.04, 36.48)

# We will load the polygon into a geopandas dataframe for ease of plotting
Expand All @@ -137,8 +140,9 @@
geo_j.add_to(m)

m
```

# +
```python
# URL of CMR service
STAC_URL = 'https://cmr.earthdata.nasa.gov/stac'

Expand All @@ -157,26 +161,34 @@
}

search = catalog.search(**opts)
# -
```

```python
print("Number of tiles found intersecting given polygon: ", len(list(search.items())))
```

How many search results did you get? What happens if you modify the date range in the previous cell and re-run the search? Note: if you make the time window too large, it will take a while for results to return.

# How many search results did you get? What happens if you modify the date range in the previous cell and re-run the search? Note: if you make the time window too large, it will take a while for results to return.
#
# Lastly, let's visualize some of the returned data. Here's a sample returned search result - you can click on the keys and see the data contained in them:
Lastly, let's visualize some of the returned data. Here's a sample returned search result - you can click on the keys and see the data contained in them:

```python
sample_result = list(search.items())[0]
sample_result
```

```python
data_url = sample_result.assets['0_B01_WTR'].href
```

```python
with rasterio.open(data_url) as ds:
img = ds.read(1)
cmap = ds.colormap(1)
profile = ds.profile
cmap = ListedColormap([np.array(cmap[key]) / 255 for key in range(256)])
```

# +
```python
fig, ax = plt.subplots(1, 1, figsize=(10, 10))
im = show(img, ax=ax, transform=profile['transform'], cmap=cmap, interpolation='none')

Expand Down Expand Up @@ -207,3 +219,4 @@
'HLS Cloud/Cloud Shadow',
],
fontsize=7)
```
File renamed without changes.
Loading

0 comments on commit dda615b

Please sign in to comment.