Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Seasonal/recurrent searches #488

Open
betolink opened this issue Mar 7, 2024 · 5 comments
Open

Seasonal/recurrent searches #488

betolink opened this issue Mar 7, 2024 · 5 comments
Labels
enhancement New feature or request

Comments

@betolink
Copy link
Member

betolink commented Mar 7, 2024

A common search pattern is a seasonal search, e.g. Landsat scenes from July for the last 10 years. This is supported by CMR(although is not well documented) and will allow us to search without having to use for loops.

results = earthaccess.search_data(
    short_name=["HLSL30"],
    point=(-82.19,27.91),
    cloud_cover=(0,20),
    temporal=("2014", "2024", (182, 212)), # this is not being passed to CMR but this is the current notation 
)

Will return all HLS scenes from July for the last 10 years with max cloud coverage of 20%. I think this is very useful, however the 182, 212 range is not straight forward to calculate, maybe we need to parse a date and use it with caution as leap years will have a different offset.

@betolink betolink added the enhancement New feature or request label Mar 7, 2024
@Rapsodia86
Copy link

Because of the leap years, would it be possible to use mm-dd?

results = earthaccess.search_data(
    short_name=["HLSL30"],
    point=(-82.19,27.91),
    cloud_cover=(0,20),
    temporal=("2014", "2024", ("06-30","07-30")), # this is not being passed to CMR but this is the current notation 
)

And that brings me to another thing!
I do not know how consistent you want to be with the results from the portal but it is important to set also hh:mm:ss in the temporal search.
The first instance gives the same results as in the portal.
The question of the date to : whether it is included or if it is up to that date but without it in earthaccess may be confusing for the user. Perhaps that would be an additional setting. Or setting the exact time is the quickest way? Anyway, that is something to specify in the documentation I assume.

>>> granules = earthaccess.search_data(
...  short_name="ECO_L2T_LSTE",
...  temporal = ("2023-01-01", "2024-01-01-23:59:59"),
...  point =(-83.08301,42.34026),
...  count=-1,
...  version="002"
... )
Granules found: 409
>>>
>>> granules = earthaccess.search_data(
...  short_name="ECO_L2T_LSTE",
...  temporal = ("2023-01-01", "2024-01-01"),
...  point =(-83.08301,42.34026),
...  count=-1,
...  version="002"
... )
Granules found: 406

@betolink
Copy link
Member Author

betolink commented Mar 7, 2024

I think we should try to follow the conventions from the portal, actually I think this behavior (bug) was already reported by @amfriesz in #190

@Rapsodia86
Copy link

Ok, that is exactly the thing!
Sorry, should have checked all the issues before, but it just came to my mind when I was writing the comment since I had been exploring the search parameters yesterday:)

@betolink
Copy link
Member Author

betolink commented Mar 7, 2024

No worries! I think we should fix that and implement the recurrent search even if we are off by a day in leap years. In the case of COGs we can have a very streamlined workflow with xarray:

  1. Seasonal search
    results = earthaccess.search_data(
        short_name=["HLSL30"],
        point=(-82.19,27.91),
        cloud_cover=(0,20),
        temporal=("2014", "2024", ("06-30","07-30")),
    )
  2. Open and load granules (filtering by band see Add tutorial / how-to about filtering file (not granule) results by name #428)
    fo = earthaccess.open(results, bands=["B01", "B02"])
    ds = xr.open_mfdataset(fo, engine="rioxarray")
  3. Efficient operations (subset, sampling) with no services in between!
    summer_mean = ds.clip(polygon).mean("time")

@chuckwondo
Copy link
Collaborator

A common search pattern is a seasonal search, e.g. Landsat scenes from July for the last 10 years. This is supported by CMR(although is not well documented) and will allow us to search without having to use for loops.

results = earthaccess.search_data(
    short_name=["HLSL30"],
    point=(-82.19,27.91),
    cloud_cover=(0,20),
    temporal=("2014", "2024", (182, 212)), # this is not being passed to CMR but this is the current notation 
)

Will return all HLS scenes from July for the last 10 years with max cloud coverage of 20%. I think this is very useful, however the 182, 212 range is not straight forward to calculate, maybe we need to parse a date and use it with caution as leap years will have a different offset.

For reference, this CMR temporal range feature is documented (you have to look very closely) under Temporal Range searches.

Specifically, it is shown in this easily-missed example at the end of the list of examples:

2000-01-01T00:00:00.000Z,2023-01-31T23:59:59.999Z,1,31 - matches data between the Julian days 1 to 31 from 2000-01-01T00:00:00.000Z to 2023-01-31T23:59:59.999Z.

It can also be seen in this example under the section "Find collections with temporal" (unlinked sub-heading):

curl "https://cmr.earthdata.nasa.gov/search/collections?temporal\[\]=2000-01-01T10:00:00Z,2010-03-10T12:00:00Z,30,60&temporal\[\]=2000-01-01T10:00:00Z,,30&temporal\[\]=2000-01-01T10:00:00Z,2010-03-10T12:00:00Z"

The first two values of the parameter together define the temporal bounds. See under Temporal Range searches for different ways of specifying the temporal bounds including ISO 8601.

For temporal range search, the default is inclusive on the range boundaries. This can be changed by specifying exclude_boundary option with options[temporal][exclude_boundary]=true. This option has no impact on periodic temporal searches.

and again under the section "Finding granules with temporal" (again, unlinked):

curl "https://cmr.earthdata.nasa.gov/search/granules?collection_concept_id=C1234567-PODAAC&temporal\[\]=2000-01-01T10:00:00Z,2010-03-10T12:00:00Z,30,60&temporal\[\]=2000-01-01T10:00:00Z,,30&temporal\[\]=2000-01-01T10:00:00Z,2010-03-10T12:00:00Z"

The first two values of the parameter together define the temporal bounds. See under Temporal Range searches for different ways of specifying the temporal bounds including ISO 8601.

For temporal range search, the default is inclusive on the range boundaries. This can be changed by specifying exclude_boundary option with options[temporal][exclude_boundary]=true. This option has no impact on periodic temporal searches.

Unfortunately, when spanning multiple years with one or more leap years in the range, there appears to be no way to deal with the day offset for days of the year on or after the leap days because the CMR simply expects Julian (ordinal) days as the 3rd and 4th values in the range.

In other words, using @betolink's example, day 182 is July 1 in non-leap years, but June 30 in leap years. Since you cannot tell the CMR to use 182 or 183 (depending on leap years). There seems to be no convenient way to deal with this. If necessary, you would likely need to adjust your day range and do a bit of filtering of the query results if you need very specific dates.

Regardless, if we really want to be complete with what the CMR supports, in addition to being able to specify both Julian dates, we must also be able to specify only one of the Julian days (start or end), which is what the examples above seem to show, and which the following working examples show:

Notice that the only difference between these 2 examples is that the first one starts with Julian day 10, and the second one ends with Julian day 10.

Regardless, I agree that being able to specify the Julian days as MM-DD values would be helpful, but we should support both formats because there may very well be cases where a user is given the Julian days to use, not the MM-DD values, which would require the reverse conversion if only MM-DD format were supported.

Finally, I suggest that if we stick with the tuple format (a more specific structure might be more helpful, but that's for another discussion), that we do not specify the Julian days as a nested tuple, but rather at the top of the tuple, e.g.: ("2014", "2024", 182, 212) or ("2014", "2024", "07-01", "07-30"), or similar.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: 🆕 New
Development

No branches or pull requests

3 participants