Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Log collection form ELS API for given time block can become too large #9

Closed
hartfordfive opened this issue Nov 18, 2016 · 1 comment

Comments

@hartfordfive
Copy link
Owner

Although the ELS API currently does allow a count of items to be specified along with a timestamp start and end rage, it does not return any type of header returning how many logs items in total are within given time range. Due to this, a large number of logs may be downloaded which can become quite heavy for in-memory processing.

As a solution, the logs should initially be saved to a gzip file and then read from this file into smaller chunks.

@hartfordfive hartfordfive added this to the beta-0.2.0 milestone Nov 18, 2016
@hartfordfive
Copy link
Owner Author

The previously proposed solution was applied although still wasn't effective enough in terms of total time for downloading/processing.

Description of the Issue:

Current log file download times can exceed 5 minutes and processing time can go up to 10 minutes.
Each ticker itteration is currently defaulted to 30 minutes but unfortunately the ticket for the next itteration doesn't start until the current processing is completed, so in this case it's (30 + 5 + 10) = 45 minutes.
Now keep on adding this time over the period of a day and within 24 hours you can easily end up with a 4 to 5 hour delay in processing time, instead of a more reasonable 30 minutes. These delays will only grow as the traffic increases and the log files increase in size.

Proposed Solution:

Once the period has passed for beat's ticker, two functions would be called asynchronously to perform the following

  1. Download ELS Log File: This function creates X goroutine(s) (# yet to be determined, maybe a pool) to each download the log files (sequentially/in-parallel) parts and place them on log_files_ready channel once completed.
    • If 2 minute segments, then 15 files total for 30 minutes
    • If 5 minute segements, then 6 files total for 30 minutes
  2. Process/Publish Individual Log Entries: Have another function (also via a goroutine) process these files asynchronously from the log_files_ready channel as they are ready
    • X goroutine(s) (# yet to be determined, maybe a pool) are created so that each can open the log file and then send of its processed events via PublishEvent

This may not be the absolute best solution, but it should be more affective than the current one. If more optimizations need to be done later, I'll deal with it then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant