Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Claude Opus - enhancements to video_reader.cc #302

Open
johndpope opened this issue Jun 5, 2024 · 0 comments
Open

Claude Opus - enhancements to video_reader.cc #302

johndpope opened this issue Jun 5, 2024 · 0 comments

Comments

@johndpope
Copy link

I basically concatenate source files into one i only did for /src/video (so there maybe a lot more to do) -

im not a cc developer - but im interested in seeing if intel codecs could speed things up - or gpu fixes could make this go faster.

Here are a few potential performance optimizations I noticed in the provided code:

1. In the VideoReader constructor:
- Avoid calling `GetFrameCount()` multiple times. Cache the result.
- Consider moving the `SetVideoStream()` call earlier before other initialization that depends on the active stream being set.

2. In `VideoReader::GetFrameCount()`:
- Cache the computed frame count so it doesn't need to be recomputed on every call.

3. In `VideoReader::NextFrameImpl()`:
- Avoid calling `GetFrameCount()` repeatedly in the loop condition. Use the cached value.
- Move the `SeekAccurate()` call outside the inner retry loop to avoid unnecessary seeking.

4. In `VideoReader::GetBatch()`:
- Reserve space for `indices` vector before pushing elements to avoid unnecessary reallocations.
- Check if `unique_indices` is empty before entering the frame copying loop to avoid unnecessary iterations.

5. Consider using a monotonic clock instead of a real-time clock for timestamp bookkeeping operations to avoid any issues with system time changes.

6. In `CUThreadedDecoder::ProcessFrame()`:
- Avoid calling `GetFrameCount()` for every frame. Use a cached value.
- Move the `discard_pts_` lookup outside the inner loop to avoid repeated map lookups.

7. In `FFMPEGThreadedDecoder`:
- Consider using a lock-free queue implementation for `pkt_queue_` and `frame_queue_` to reduce lock contention.
- Avoid calling `GetFrameCount()` repeatedly. Cache the result.

8. Profile the code to identify any hot spots or bottlenecks and focus optimization efforts on those areas.

9. Make sure to compile the code with optimizations enabled and consider using profile-guided optimization (PGO) to further improve performance.

10. Experiment with different thread pool sizes and configurations to find the optimal balance between parallelism and resource utilization.

Remember to measure the performance impact of each optimization and profile the code to ensure the changes are actually beneficial in your specific use case.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant