Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chance pairs? #297

Closed
selipot opened this issue Oct 11, 2023 · 10 comments · Fixed by #330
Closed

Chance pairs? #297

selipot opened this issue Oct 11, 2023 · 10 comments · Fixed by #330
Assignees
Labels
archive-label-analysis-functions Oceanographic Lagrangian analysis functions help wanted Extra attention is needed

Comments

@selipot
Copy link
Member

selipot commented Oct 11, 2023

Could we write a function that would efficiently find chance pairs in a Lagrangian dataset? @dhruvbalwada how would you like such a function to look like in terms of its input/output arguments?

Input arguments could be:

def chance_pairs(lon,lat,time,rowsize,max_spatial_distance,max_time_distance):

where rowsize is a list of integers specifying the number of data points in each row of the input ragged arrays lon, lat, time and the other arguments specify the maximum spatial and temporal distances defining the chance encounters? What should be the output? Pairs of ids that would need to be passed? double indices of rows and columns in rows?

@selipot selipot added help wanted Extra attention is needed archive-label-analysis-functions Oceanographic Lagrangian analysis functions labels Oct 11, 2023
@milancurcic
Copy link
Member

The interface looks good. I'd call max_spatial_distance with distance_tolerance, to be consistent with tolerances elsewhere in the library.

I think the result should at least include pairs of IDs and times at which they crossed the distance tolerance. So perhaps the result could be a list of tuples of (id1, id2, time).

What is max_time_distance?

@selipot
Copy link
Member Author

selipot commented Oct 11, 2023

I am thinking of chance pairs existing in two dimensions: distance and time. As an example you can then search for particles that came within 1 km and 1 day of each other. So we might want to define a single norm based on these parameters.

@miniufo
Copy link

miniufo commented Oct 12, 2023

Just interested in this issue. Some papers use a range of distance to select chance pair. For example, given an initial separation $r_0=5$ km, we may select any chance pair whose separation is within [5-dr, 5+dr] (here dr could be a tolerance of 0.2 km). So I guess max_spatial_distance could be a list like [dmin, dmax] for a range?

I am also interested in the returning of this function. Does this return particle pairs (which is linked to this issue)?

@milancurcic
Copy link
Member

A-ha, so it's actually useful to search for pairs where $r_{min} < r < r_{max}$, rather than just $r < r_{max}$. That makes sense. I think making this tolerance a float | tuple[float, float] would be a good approach.

@miniufo
Copy link

miniufo commented Oct 12, 2023

That's good. But I am not sure about the time, as many Lagrangian data have uniform time intervals (e.g., 1hr or 6hr). Not sure if time range is necessary or not. Maybe it is necessary for the raw data.

@dhruvbalwada
Copy link

dhruvbalwada commented Oct 12, 2023

Thanks for tagging me here. I am definitely interested in this function.

A few of things come to mind, which might be worth thinking about.

  • I think @miniufo suggestion to pick chance pairs within a tolerance should be implemented. When doing relative dispersion calculations, the theory requests that we consider particle pairs with some initial separation $r_0$ (not $ < r_0$). So $r_0 \pm dr$ is better.
  • What would be a good place to think about selecting pairs over range of scales. This can be for two purposes:
    • Looking at relative dispersion statistics as a function of $r_0$. In this case we would like to track particles in time, starting from the time when separation was $r_0$. (e.g. see figure B1 in this, where we compare two different initial r_0).
    • Looking at structure function statistics. In this case we don't care about keeping track of the pairs in time, but ideally want to divide a separation axis $r$ into bins, and then consider pairs in each bin for a single concurrent time.
  • Another issue comes up when thinking about chance pairs. The same pair of drifters can come close again and again. So many chance pairs can be formed with the same one pair of drifters. This should be achievable with the function.
  • I agree with @selipot requirement to allow for choosing pairs based on both separations in distance and time. This can be beneficial when sometimes we might want to augment statistics by considering pairs that are allowed to be separated some in time. This can be beneficial if looking at some larger scale aspects, where one might consider that the flow is relatively frozen for some time.

@philippemiron
Copy link
Contributor

philippemiron commented Oct 12, 2023

The algorithm for this is pretty simple:

for t in traj:
  for p in t:
    search_close(all_p, p)

of course, we don't want to implement O(N^2) algorithm so we could initially create a KD-Tree to speed up searching for nearby points.

Are there any other known optimizations for this?

@milancurcic
Copy link
Member

milancurcic commented Nov 8, 2023

Consider a chance_pairs call where the user provides both distance and time tolerance parameters. Should these criteria be evaluated as a boolean AND or OR when determining whether a pair is a chance pair? It seems to me that AND would be more useful.

@selipot
Copy link
Member Author

selipot commented Nov 8, 2023

I say AND

@milancurcic milancurcic self-assigned this Nov 13, 2023
@milancurcic
Copy link
Member

I'll work on this next.

@milancurcic milancurcic mentioned this issue Nov 21, 2023
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
archive-label-analysis-functions Oceanographic Lagrangian analysis functions help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants