Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Video decoding problem: some downloaded videos cannot be decoded by decord #266

Open
SCZwangxiao opened this issue Nov 30, 2023 · 2 comments

Comments

@SCZwangxiao
Copy link
Contributor

For some download videos (around 1/30 in my crawled YouTube dataset), they cannot be loaded by Python decord package, there will be an error:

Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "/usr/local/lib/python3.8/dist-packages/decord/video_reader.py", line 54, in __init__
    self._handle = _CAPI_VideoReaderGetVideoReader(
  File "/usr/local/lib/python3.8/dist-packages/decord/_ffi/_ctypes/function.py", line 173, in __call__
    check_call(_LIB.DECORDFuncCall(
  File "/usr/local/lib/python3.8/dist-packages/decord/_ffi/base.py", line 78, in check_call
    raise DECORDError(err_str)
decord._ffi.base.DECORDError: [21:01:51] /github/workspace/src/video/video_reader.cc:151: Check failed: st_nb >= 0 (-1128613112 vs. 0) ERROR cannot find video stream with wanted index: -1

These videos are also unplayable using the default video player in MacOS.

However, these videos can be loaded using PyAV, and be played by Jupyter Notebook, which is so strange!

I've noticed that there is an under-developped feature self.specify_codec in YtDlpDownloader. The comments say it was relevant with HD videos for loading with decord. Is it related to my issue?

self.tmp_dir = tmp_dir
self.encode_formats = encode_formats
# TODO: figure out when to do this
# was relevant with HD videos for loading with decord
self.specify_codec = False
def __call__(self, url):

@SCZwangxiao
Copy link
Contributor Author

I've found the root cause of this problem.

Summary

The root cause is that decord library does not support AV1 codec currently, see issue here dmlc/decord#221.

For a temporary fix:

video_format_string = (
    f"wv*[height>={self.video_size}][ext=mp4][vcodec!=av01.0.01M.08]/"
    f"w[height>={self.video_size}][ext=mp4][vcodec!=av01.0.01M.08]/"
    f"bv/b[ext=mp4][vcodec!=av01.0.01M.08]"
)

Note that neither [codec=avc1] or [vcodec=avc1] works, because the yt-dlp says Requested format is not available even if the format does exist. The format string will fall back to "bv/b[ext=mp4][codec=avc1]", makeing the results quite large.

Finally, I think the fundamental solution is to add a new feature to pass user-defined format_string.

Detailed explanation

Take the youtube video 6EAhKcpVtFA as an example, it has the following formats (simplified):

% yt-dlp -F 6EAhKcpVtFA
[youtube] Extracting URL: 6EAhKcpVtFA
[youtube] 6EAhKcpVtFA: Downloading webpage
[youtube] 6EAhKcpVtFA: Downloading ios player API JSON
[youtube] 6EAhKcpVtFA: Downloading android player API JSON
[youtube] 6EAhKcpVtFA: Downloading m3u8 information
[info] Available formats for 6EAhKcpVtFA:
ID  EXT   RESOLUTION FPS CH │  FILESIZE   TBR PROTO │ VCODEC          VBR ACODEC      ABR ASR MORE INFO
───────────────────────────────────────────────────────────────────────────────────────────────────────────────
395 mp4   426x240     30    │   5.98MiB  172k https │ av01.0.00M.08  172k video only          240p, mp4_dash
229 mp4   426x240     30    │ ~10.77MiB  303k m3u8  │ avc1.4D4015    303k video only
133 mp4   426x240     30    │   5.02MiB  145k https │ avc1.4D4015    145k video only          240p, mp4_dash
604 mp4   426x240     30    │ ~10.18MiB  287k m3u8  │ vp09.00.20.08  287k video only
242 webm  426x240     30    │   6.65MiB  192k https │ vp09.00.20.08  192k video only          240p, webm_dash
396 mp4   640x360     30    │  11.80MiB  341k https │ av01.0.01M.08  341k video only          360p, mp4_dash
230 mp4   640x360     30    │ ~25.98MiB  731k m3u8  │ avc1.4D401E    731k video only
134 mp4   640x360     30    │  12.32MiB  356k https │ avc1.4D401E    356k video only          360p, mp4_dash
18  mp4   640x360     30  2 │ ≈17.13MiB  482k https │ avc1.42001E         mp4a.40.2       44k 360p
605 mp4   640x360     30    │ ~20.30MiB  572k m3u8  │ vp09.00.21.08  572k video only
243 webm  640x360     30    │  12.43MiB  359k https │ vp09.00.21.08  359k video only          360p, webm_dash
397 mp4   854x480     30    │  17.88MiB  516k https │ av01.0.04M.08  516k video only          480p, mp4_dash
231 mp4   854x480     30    │ ~44.13MiB 1242k m3u8  │ avc1.4D401F   1242k video only
135 mp4   854x480     30    │  23.18MiB  669k https │ avc1.4D401F    669k video only          480p, mp4_dash
606 mp4   854x480     30    │ ~33.34MiB  939k m3u8  │ vp09.00.30.08  939k video only
244 webm  854x480     30    │  22.08MiB  637k https │ vp09.00.30.08  637k video only          480p, webm_dash

Format 396 hit the first rule "wv*[height>=360][ext=mp4]" in

f"wv*[height>={self.video_size}][ext=mp4]{'[codec=avc1]' if self.specify_codec else ''}/"

@iejMac
Copy link
Owner

iejMac commented Dec 3, 2023

Ah thanks very much! Yes I do remember running into this a few times. And agreed, best solution is probably to parameterize the codec arg.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants