Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify whether "short" representations are permitted #178

Closed
ojw28 opened this issue Mar 25, 2018 · 16 comments
Closed

Clarify whether "short" representations are permitted #178

ojw28 opened this issue Mar 25, 2018 · 16 comments
Labels
discussion needed Issue needs general input from IOP members to move forward
Milestone

Comments

@ojw28
Copy link

ojw28 commented Mar 25, 2018

For on-demand DASH, within a period of a specified duration d, is it allowed for some representations to have durations significantly shorter than d if they would otherwise end with "empty" segments? This question arises specifically for caption/subtitle representations where there are no captions/subtitles near the end of the content (e.g. during credits). At least one packager I'm aware of lets caption/subtitle representations have significantly shorter durations in this case, rather than padding up to duration d with empty segments.

Is it explicitly specified either way whether this kind of "short" representation is permitted? If not, would it be possible to add a requirement one way or another to the DASH IF guidelines?

From my point of view as a player developer, I would rather see a requirement that such representations are always padded up to d with empty segments because:

  • It's common to see large gaps in subtitles in the middle of streams as well as at the end, and for that case padding with empty segments is the solution. It seems inconsistent to treat the end of the stream any differently.

  • For live streams that haven't ended yet, but haven't had any subtitles for a while, there's no real alternative to padding with empty segments, else the player can't disambiguate between "no subtitles" and "subtitle segments are late being added to the manifest". If a live stream ends and turns into an on-demand stream, it will therefore end with empty segments. It seems nice for this and content that was only ever on-demand to be consistent.

  • It simplifies player implementations, because it avoids having to handle cases like seeking into part of the period where some (but not all) representations have ended.

Conversely (and for completeness), some benefits of explicitly allowing representations to end early are:

  • It's marginally more efficient because the client doesn't need to request the empty segment(s) (and they don't need hosting / packaging etc).
@sandersaares
Copy link
Member

sandersaares commented Mar 26, 2018

From my perspective of content processing and solution development, I support the view that no gaps should exist and that padding segment should always be used. However, I do not believe IOP requires this.

There is an ongoing discussion in the live task force right now with regard to period cutting and it was mentioned that DASH amd 3 added a @presentationDuration attribute to representations, with the intended use to signal that the representation ends before the period end. Would this type of signaling make it more practical to implement player behavior in the scenarios you describe?

I would add a related question to you: is there any difficulty from a player developer's perspective when a period ends in the middle of a segment? Can the partial segment be cut without issues?

@ojw28
Copy link
Author

ojw28 commented Mar 26, 2018

The main scenario I describe is for on-demand DASH, where we can easily work out how long each representation is and therefore whether it's shorter than the period without additional signalling. It's not that this case is hugely impractical for a player to support. It's that it's yet another special case that just needn't exist. If a representation is allowed to end early then the player can't make nice assumptions like "there will be a segment I can request for any valid seek position within the period", so you end up with extra code paths through player implementations, which need extra tests etc. To summarize: Allowing short representations appears to add complexity pretty much everywhere for very little actual benefit.

Do you think this is something IOP could address, either by explicitly justifying why allowing short representations adds significant benefit, or by recommending that packagers do not do this?

I would add a related question to you: is there any difficulty from a player developer's perspective when a period ends in the middle of a segment? Can the partial segment be cut without issues?

I would say this also falls into the "not hugely impractical, but yet another special case" bucket. So the question I'd pose is: Is this feature absolutely required for a particular use case and/or does it add a really significant benefit over an existing alternative? If the answer is yes then it seems reasonable. If not then I'd much rather see packagers not do this kind of thing.

@haudiobe
Copy link
Contributor

We should discourage doing it. I doing, we should encourage to add presentationDuration. The client behaviour in this case should also be documented.
Create a CR on this

@sandersaares
Copy link
Member

sandersaares commented Apr 1, 2018

I would propose something like this as a starting point:


Segments should be provided for all Representations in a Period up until the end of the Period. If necessary, padding segments containing empty/blank/silent samples should be supplied to ensure there is no gap near the end of a Period.

The last segment may extend beyond the Period end point. Clients shall ignore any samples that exceed the bounds of the Period.

If there are missing segments at the end of a Representation, this shall be signaled by defining an adaptation set duration that is shorter than the Period duration using AdaptationSet@presentationDuration. This is discouraged due to limited client support.


Edit: updated proposal below

@ojw28
Copy link
Author

ojw28 commented Apr 2, 2018

I'd be interested to know what the use case for the last paragraph in the proposal is. I think, from a complexity point of view, it's important to avoid having multiple ways of doing the same thing unless there are strong justifications as to why both are needed. It feels like the serving side should always be able to generate padding segments, and that the proposal makes this the recommended approach. So why is it beneficial to also leave the possibility of not doing so open (with presentationDuration)? Particularly if in the same sentence the practice is going to be discouraged.

@sandersaares
Copy link
Member

sandersaares commented Apr 2, 2018

The use case that I heard was that one might want to e.g. terminate audio before video intentionally to ensure that it stops at a convenient transition point (e.g. moment of silence) when inserting ads.

I can sort of see the use case as having relevance given that you are not always encoding/segmenting and packaging at the same time. You might have sets of already encoded and segmented content that you assemble into multiple periods. In such a case, you cannot just dive into the media stream itself to manipulate it - it is already a done deal.

Of course, this would not be a very mainstream feature, hence why it would be discouraged. I do not expect (m)any implementations to really do this. Possibly in this case it might be more sensible to just say "not supported" to discourage even more strongly - I would say this feature is more likely to be misused than it is to be used in the way described above.

A lot of what goes into IOP is, in my view, similar to this - a feature that is not going to be commonly used but if it is used, should at least be done in an interoperable way. Perhaps this has some tie-in with regard to how we should handle the key words in v5? #175

@ojw28
Copy link
Author

ojw28 commented Apr 2, 2018

That use case sounds reasonable, but it doesn't sound like there would ever be insufficient content to provide segments up to the end of the period in that case. Wouldn't it be true that there exists content at least up to the end up the period (and probably beyond)? In which case it would IMO be preferable to still require that segments are provided up to the end of the period, even if presentationDuration is going to be used to specify that actual presentation should terminate early.

@sandersaares
Copy link
Member

Your make a fair point.

I wonder if this might conflict with how presentationDuration is specified in DASH. I believe it is specified in DASH amd 3 but I only saw it onscreen in a screen sharing session so far as I have not felt that paying 100€ to ISO is equal in value to getting the exact definition of this 😄

Upon further meditation I realize that a far more appropriate mechanism for signaling such editorial decisions as "drop some seconds of audio" are the custom descriptors that DASH defines. There is no need to modify segment addressing just to mute audio. The descriptors would also allow for a far wider range of flexibility such as a gradual fade-out.

As this was the only use case I have run into for dropping segments, I think I can now back the viewpoint that no segments should ever be missing with a clear conscience. Accordingly, I submit an updated proposal:


Segments shall be provided for all Representations in a Period up until the end of the Period. If necessary, padding segments containing empty/blank/silent samples shall be supplied to ensure there is no gap near the end of a Period.

The last segment may extend beyond the Period end point. Clients shall ignore any samples that exceed the bounds of the Period.

AdaptationSet@presentationDuration shall not be emitted by packagers and shall be ignored by clients if present.


For editorial manipulation, custom descriptors can be proposed by who needs them.

@ojw28
Copy link
Author

ojw28 commented Apr 16, 2018

That proposal sounds good to me.

@sandersaares
Copy link
Member

sandersaares commented Oct 24, 2018

This came up in yesterday's call again, the point raised being that there exists a lot of on-demand content that does not conform to this.

I agree that such content exists. I claim that such content should not be classified as interoperable content and should be considered outside DASH-IF IOP profiles.

To make on-demand content with "short" representations interoperable, the following possibilities exist:

  1. Fill the missing part of the timeline with "empty" segments (blank frames or silent audio samples).
  2. Start a new period when a representation disappears.
  3. Cut the uneven portion off the end, to eliminate "short" representations.

The idea that period timing is only "rough" and should not be relied upon for exact timing was also expressed in the call. I do not agree with this interpretation and expect period timing to always be accurate - the DASH timing model would be quite badly affected if period timing could not be relied upon. By accurate I mean:

  • a period must always contain continuous segments for every representation starting from the period start timestamp, except when the DVR window start is later than period start, in which case the period must supply content starting from DVR window start
  • a period musit always contain continuous segments for every representation up until the period end timestamp
  • for alignment purposes, it is fine to start/stop a period in the middle of a segment; players are expected to clip playback to the period boundary, not presenting the rest of the segment (though potentially decoding it if needed due to sample dependencies).

Missing segments are a related but separate topic that I think is not important here.

We should expect this for both live and on-demand profile.

@technogeek00
Copy link

@sandersaares I agree pretty thoroughly with your reasoning here, having streamed representations that are not aligned throughout the period make the timing model a lot harder and the player side fix of blank frames or silent audio samples is extremely dependent on the underlying device platform capabilities. For a good number of mass market devices the underlying decoding pipelines do not handle these empty segments well and result in pipeline failures, not mandating the usage of filler segments should make implementing an interoperable player far easier.

From our (Hulu) internal player work, the timing model works best when the period timing is accurately described by either explicit Period@start values or implied starts using a supplied Period@duration or inferred duration from the duration of the streamed representations (which we assume to be aligned). With the periods fixed within a timeline, all timing below them become much easier to reason about as all child elements are relative to the period time and in parallel with each other.

With the periods defining the overall timeline it is possible for the video and audio elementary streams to be sparse, but again you have to coupe with underlying pipelines not handling this sparseness well, so it is best to avoid sparseness if possible. For text and event streams that do not rely on underlying platform pipelines (at least in our experience), the sparseness is easily handled as the text and event streams have explicit timing for their elements and do not require the generation of filler data.

The one point you make that I would describe as hard to follow as a player is:

for alignment purposes, it is fine to start/stop a period in the middle of a segment; players are expected to clip playback to the period boundary, not presenting the rest of the segment (though potentially decoding it if needed due to sample dependencies).

This relies on a lot of control over the underlying media pipelines which you cannot always achieve, MSE based players using encrypted content would have trouble doing this for seamless transitions for instance. That said I believe the Period timing is free to cut mid-segment, players should attempt to respect this as closely as possible, if they can cut mid-segment they do, otherwise the timing model reflects the mid-segment cut, but internally the transition occurs following the segment with the extra segment time counted as a type of negative space.

@ojw28
Copy link
Author

ojw28 commented Nov 1, 2018

  • I also agree thoroughly with @sandersaares reasoning.
  • I also fairly strongly disagree with the idea that period timing is only "rough". I wasn't on the call and so don't know exactly what this was intended to mean, but the timing of periods as specified in the manifest is relied upon even for basic operations like seeking to the right position. The statement that such timing is only rough seems therefore equivalent to saying that DASH doesn't support basic operations in a sensible way.

@porcelijn
Copy link

👍 I like the direction you guys have taken on this topic. Indeed it makes much more sense to throw away some left over media than to play media that isn't there.

I have a small question though on how non-compliant contents would be treated from IOP perspective.

@sandersaares

I agree that such content exists. I claim that such content should not be classified as interoperable content and should be considered outside DASH-IF IOP profiles.

In the typical case of a (non sparse) subtitle track that ends way earlier the A/V credits, would it be fair to say that such a presentation ends instantly when the shortest (ie subs) track ends?

That is, in order to comply, the presentation could be altered to:

  • pad empty ttml/wvtt samples fragments till duration(subs) >= MPD@periodDuration
  • reduce MPD@periodDuration (effectively remove credits), or
  • designate subs track as "sparse"

@technogeek00 technogeek00 added this to the IOP v5 milestone Mar 19, 2019
@technogeek00 technogeek00 added question discussion needed Issue needs general input from IOP members to move forward labels Mar 19, 2019
@sandersaares
Copy link
Member

sandersaares commented Apr 2, 2019

In the typical case of a (non sparse) subtitle track that ends way earlier the A/V credits, would it be fair to say that such a presentation ends instantly when the shortest (ie subs) track ends?

I would rather say that such content is non-interoperable and consistent behavior cannot be expected across a wide range of client systems.

I would not say that the presentation ends there because you can only define the end point when operating under a common understanding of the timing model, which such content violates.

That is, in order to comply, the presentation could be altered to:

pad empty ttml/wvtt samples fragments till duration(subs) >= MPD@periodDuration

That would result in following the interoperable timing model and would indeed be a good approch.

reduce MPD@periodDuration (effectively remove credits), or

Also works, although I suspect less desirable.

designate subs track as "sparse"

I am not aware of such a concept as sparse tracks in DASH.

A fourth option (and IMO the easiest) would be to start a new period at the point where the subtitle track ends, with the new period not having any subtitles. The other tracks could be designated as period-connected and (provided client system support for seamless playback) continue playback seamlessly while also properly terminating the subtitle track.

@porcelijn
Copy link

A fourth option (and IMO the easiest) would be to start a new period at the point where the subtitle track ends, with the new period not having any subtitles. The other tracks could be designated as period-connected and (provided client system support for seamless playback) continue playback seamlessly while also properly terminating the subtitle track.

Multiple periods is actually a sound solution I had not thought of. That makes sense!

It obviously puts some strain on stitching implied or unintentional discontinuities introduced by the period edges, as the end of subtitle track may not coincide with start of new GOP and/or audio access point, but at least we would not need to worry about partial tracks inside those periods anymore. Should give packager and player development an incentive to focus on getting the period transitions right.

@sandersaares
Copy link
Member

This topic was used as feedback into the formulation of the interoperable timing model. As there has been no further discussion here for some time, I close it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion needed Issue needs general input from IOP members to move forward
Projects
None yet
Development

No branches or pull requests

5 participants