Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce AtlasEngine - A new text rendering prototype #11623

Merged
30 commits merged into from
Nov 13, 2021
Merged

Conversation

lhecker
Copy link
Member

@lhecker lhecker commented Oct 27, 2021

This commit introduces "AtlasEngine", a new text renderer based on DxEngine.
But unlike it, DirectWrite and Direct2D are only used to rasterize glyphs.
Blending and placing these glyphs into the target view is being done using
Direct3D and a simple HLSL shader. Since this new renderer more aggressively
assumes that the text is monospace, it simplifies the implementation:
The viewport is divided into cells, and its data is stored as a simple matrix.
Modifications to this matrix involve only simple pointer arithmetic and is easy
to understand. But just like with DxEngine however, DirectWrite
related code remains extremely complex and hard to understand.

Supported features:

  • Basic text rendering with grayscale AA
  • Foreground and background colors
  • Emojis, including zero width joiners
  • Underline, dotted underline, strikethrough
  • Custom font axes and features
  • Selections
  • All cursor styles
  • Full alpha support for all colors
  • Should work with Windows 7

Unsupported features:

  • A more conservative GPU memory usage
    The backing texture atlas for glyphs is grow-only and will not shrink.
    After 256MB of memory is used up (~20k glyphs) text output
    will be broken until the renderer is restarted.
  • ClearType
  • Remaining gridlines (left, right, top, bottom, double underline)
  • Hyperlinks don't get full underlines if hovered in WT
  • Softfonts
  • Non-default line renditions

Performance:

  • Runs at up to native display refresh rate
    Unfortunately the frame rate often drops below refresh rate, due us
    fighting over the buffer lock with other parts of the application.
  • CPU consumption is up to halved compared to DxEngine
    AtlasEngine is still highly unoptimized. Glyph hashing
    consumes up to a third of the current CPU time.
  • No regressions in WT performance
    VT parsing and related buffer management takes up most of the CPU time (~85%),
    due to which the AtlasEngine can't show any further improvements.
  • ~2x improvement in raw text throughput in OpenConsole
    compared to DxEngine running at 144 FPS
  • ≥10x improvement in colored VT output in WT/OpenConsole
    compared to DxEngine running at 144 FPS

Copy link
Member

@zadjii-msft zadjii-msft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well I'm at 29/56 but obviously haven't reviewed the scarygood stuff yet

src/host/settings.cpp Outdated Show resolved Hide resolved
src/host/selection.cpp Outdated Show resolved Hide resolved
@@ -74,5 +74,6 @@ HRESULT RenderEngineBase::PrepareLineTransform(const LineRendition /*lineRenditi
// - Blocks until the engine is able to render without blocking.
void RenderEngineBase::WaitUntilCanRender() noexcept
{
// do nothing by default
// Throttle the render loop a bit by default.
Sleep(8);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This Sleep(8) was removed from RenderThread and moved here, allowing AtlasEngine to run at more than 60 FPS.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look at that beautiful use of an interrobang.

src/interactivity/win32/window.cpp Outdated Show resolved Hide resolved
@@ -40,98 +40,61 @@ namespace Microsoft::Console::Render
{
public:
DxEngine();
~DxEngine();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a lot of churn for a renderer that isn't changing in this PR!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like the changes to this file include...

  • rearrange the functions
  • remove the <ctor> = default stuff
  • add const noexcept to ToggleShaderEffects
  • clean up the #includes at the top
  • remove _EnableDisplayAccess
  • remove the const after pointers (i.e. const XXX* const --> const XXX*)

Honestly, this seems mostly fine to me, just a bit jarring. Mainly concerned about the last two removals.

@@ -14,12 +14,16 @@ Author(s):

#pragma once

#include <d2d1.h>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fwiw this is some unusual layering -- giving the general interface knowledge of D2D primitives seems like the wrong way to go given that 4/6 of the engines aren't D2D-based

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good! Because I also don't like having D2D stuff in my API. The solution here would be to have our own enum instead. I think that would be a good fit for a followup PR, since it's a minor change.

Splitting this interface into one for 4/6 engines and one for 2/6 engines, would not just unnecessarily increase the binary size further, but also unnecessarily Javarize the code base if I'm honest. I think we are better served by both reducing the number of interfaces as well as the number of functions on those interfaces.
There's technically no reason for the IRenderEngine to have that many functions if we:

  • use inversion of control and write "config providers" instead of having setters for everything
  • use fat data transfer objects instead of finely granular setters
  • we only render at ~240 FPS at most. 240 COM calls per second are nothing to the CPU, so if we use both of the previous points, we can just ask the IRenderData provider 240 per second for the complete set of configuration anew and performance would be better than it is now (due to lack of CFG checks on each virtual call)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember discussing with @lhecker about something like this, which eventually narrowed down to a single question: should we pretend that we don't know what exact engines are
there in the codebase. Code-wise, we all know what the "perfect" solution might look like ("Javazire" perhaps?), but if that's causing too much performance penalty than it should, I'm OK with a more practical solution, even though it may seem a bit unorthodox.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been sorely tempted to do this in the past as well (add the d2d headers... its sort of where the initial til::point/til::coord/til::rectangle came from was avoiding that.)

We've proven @lhecker is super correct about us needing to reduce the number of calls and make big data transfers in a future work item.

// be abstracted away and integrated into the above or simply get removed.

// DxRenderer - getter
virtual HRESULT Enable() noexcept { return S_OK; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW this is our first foray into having a base implementation in this renderer; we should observe the cost it has on conhost

Copy link
Member Author

@lhecker lhecker Oct 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ha! I knew this question would come up. The answer is: none. At least not one we will care about for now.
The reason for this is that abstract classes aren't actually compiled into binaries. Only vtables are and those only exist on concrete classes. Since the number of concrete classes hasn't changed the size increase is limited to virtual_functions * concrete_classes * sizeof(void*) which is 168 byte for inbox conhost on x64. If I had put these functions into a new, special purpose interface just for AtlasEngine and DxEngine, we'd only save 84 byte. If we merge IRenderEngine and BaseEngine together however we already save like 2kB alone.

// pair is a simple clone of std::pair, with one difference:
// copy and move constructors and operators are explicitly defaulted.
// This allows pair to be std::is_trivially_copyable, if both T and S are.
// --> pair can be used with memcpy(), unlike std::pair.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need a followup to deduplicate til::rle_pair?

Copy link
Member Author

@lhecker lhecker Oct 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No I think it's fine if their live their independent lives...
til::pair has first/second members and til::rle_pair calls them length/value, which is a lot more descriptive.

src/renderer/atlas/AtlasEngine.cpp Outdated Show resolved Hide resolved
{
_recreateSizeDependentResources();
}
if (WI_IsFlagSet(_invalidations, InvalidationFlags::font))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like James' enumset

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After merging my enumset PR, I'll gladly use it.
Until then I'd like to refrain from depending on std::bitset.

static constexpr
#endif
// Why D3D11_CREATE_DEVICE_PREVENT_INTERNAL_THREADING_OPTIMIZATIONS:
// This flag prevents the driver from creating a large thread pool for things like shader computations
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this flag help DxEngine as well?

{
deviceFlags |= D3D11_CREATE_DEVICE_DEBUG;

const auto DXGIGetDebugInterface = reinterpret_cast<HRESULT(WINAPI*)(REFIID, void**)>(GetProcAddress(module.get(), "DXGIGetDebugInterface"));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is all of this stuff publicly documented? Is there a header you could use to get the signature for DXGIGetDebugxxx, so we can simplify this and use the wil module helper?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why use D3D11 and not D3D12?

Copy link
Member Author

@lhecker lhecker Oct 31, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me turn this question around: Why use D3D12 and not D3D11? Newer software isn't faster than old software.
But hardware is and D3D12 allows you to better access these newer capabilities.
However this engine is so simple that the core rendering loop only needs 3 graphics commands per frame. There's not much that the newer API could help with performance here.

But most importantly, this engine is supposed to work on Windows 7 which only supports D3D10 (accessible through D3D11), because it is used in Visual Studio 2019 (and that one supports Windows 7 until 2029 - not kidding). As such D3D12 could only ever be added as an alternative engine and not be the only implementation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I just found out that there's a "D3D12onWin7" document!
But it's only for x64 and doesn't support WARP, which we need because we have to support DirectX 9 hardware. 😔

One day I'll probably check out D3D12 anyways, because it's interesting. 😄
I suspect that it won't improve performance, because - as far as I understand it - D3D12 helps parallelize things, whereas this engine doesn't have enough complexity to benefit from this. But maybe I'm wrong and in that case it'd be fun having a cool D3D12 engine for modern hardware.

if (SUCCEEDED(DXGIGetDebugInterface(IID_PPV_ARGS(infoQueue.addressof()))))
{
// I didn't want to link with dxguid.lib just for getting DXGI_DEBUG_ALL. This GUID is publicly documented.
static constexpr GUID dxgiDebguAll = { 0xe48ae283, 0xda80, 0x490b, { 0x87, 0xe6, 0x43, 0xe9, 0xa9, 0xcf, 0xda, 0x8 } };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

debgu.

i'd encourage not excluding the entire atlas renderer from the spell checker ;)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The atlas engine introduces about 20 new words to the spell checker.
Personally I expect that I'm going to rewrite large chunks of the renderer in the future, potentially multiple times, until I find a good solution that fits all requirements.
That's why I've decided to put spell checking on hold for now. I'd like to enable spell checking when I feel that the implementation is good enough. Until then this saves me quite a headache since I can't run the spell check locally.

src/renderer/atlas/AtlasEngine.cpp Outdated Show resolved Hide resolved
src/renderer/atlas/AtlasEngine.cpp Show resolved Hide resolved
@DHowett
Copy link
Member

DHowett commented Oct 27, 2021

Bad news! It doesn't work on ARM64.

@zadjii-msft
Copy link
Member

  • WT: no perf change
    (Additional WT buffer management and VT processing dwarves rendering overhead at ~85% vs. ~5% CPU usage.)
  • It's really not fast
    This PR doesn't include any changes necessary to make this engine fast.
    The current text buffer API is modeled around GDI and DirectWrite and doesn't lend itself to this engine's design paradigms yet. It'll require a series of followup PRs, each of which will unblock this engine's capabilities and bring significant performance improvements with them.

I'm curious what those proposals are, if you've got a good idea for what they will be. Is the fact that it doesn't really improve WT throughput mostly because of conpty limiting frames to 60FPS / painting a frame every time the screen scrolls? Or is there some other reason?

this kind of chart: #1064 (comment) might be helpful for illustrative purposes

@DHowett

This comment has been minimized.

@DHowett

This comment has been minimized.

@skyline75489
Copy link
Collaborator

Regarding to @zadjii-msft's question about throughput in WT: besides the reason you mentioned, the "Text Buffer Rewrite of 2021" epic (deliberately no link here to reduce the noise) is a huge deal when it comes to pure throughput. I'm pretty sure @lhecker has a lot to say here.

If this PR is to be merged later, I think it definitely deserves its own follow-up issue to track future works. We don't want to touch those old issues any more, right? Because of...obvious reasons.

@@ -923,6 +923,10 @@
<value>Controls what happens when the application emits a BEL character.</value>
<comment>A description for what the "bell style" setting does. Presented near "Profile_BellStyle".{Locked="BEL"}</comment>
</data>
<data name="Profile_UseAtlasEngine.Header" xml:space="preserve">
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was expecting this to be an option in "Rendering" section, where you can choose to use "Software rendering". Like a dropdown menu to choose between DX, Atlas and DX-software. But this is just my opinion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

meh, I kinda like this being per-profile. For example, WSL profiles with this + passthrough (#11264) would probably have the highest possible throughput, but powershell has a super hard time with passthrough.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did it this way as it simplifies comparing the DxEngine and AtlasEngine with each other.
You can have two profiles for the same shell using different engines.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was also expecting it to be in the other location... but I agree with the utility of setting it per profile.

}

// Due to the current IRenderEngine interface (that wasn't refactored yet) we need to assemble
// the current buffer line first as the remaining function operates on whole lines of text.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zadjii-msft An example of how text buffer implementation bottlenecks the throughput.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this really needs a better way of bulk transferring this information.

//
// # What do we want?
//
// Segment a line of text (_api.bufferLine) into unicode "clusters".
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a small library https://github.com/contour-terminal/libunicode here, which is used in https://github.com/contour-terminal/contour to do similar thing. I think there might be other OSS libraries that can also be helpful.

By know the clusters segmentation before generating glyphs, you can safely avoid "call IDWriteFontFallback::MapCharacters first". Same idea as in #10858

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last I recall, @christianparpart hadn't yet finished up on that lib though I do agree that WT could very much benefit from it, once he's ready to declare it suitable for others.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh i can help on that. I am in commute but can carefully read and reply tonight. :)

Copy link

@christianparpart christianparpart Oct 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. Hello!
So libunicode I specifically developed to serve the Unicode needs in my terminal emulator. I could have used ICU, but I really did not feel comfortable with it nor did the API convince me to be suitable for my usecase.
So all in all, I'd love to get libunicode evolve into a general purpose unicode library, in fact it is, but I'd like it to be more complete (no BiDi algo implemented yet for example).

When i was learning about how to properly render all kinds of emoji and other complex unicode sequences including having support for programming ligatures, I did read up on "Blink's text stack" as well as Blink's source code. You may see some API similarity when you look at libunicode, such as emoji segmentation and script segmentation is inspired by Blink's API.

Since the whole subject was so god damn complicated (at least if you start from scratch with like-zero prior knowledge) I decided to write it all up such that I can read it myself whenever I need to get into the subject again.

I can highly recommend you having a look at it: A Look into a terminal emulator's text stack. (so I do not need to spam this thread with details or unnecessarily duplicate text in here).

If there are any questions (or corrections!), I'm all open and in. :-)

p.s.: wrt libunicode, I'd be happy if it serves a purpose outside contour. Licensing should not be an issue. Maybe this lib will also help others :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unsubscribing this issue due to my own personal reason. Huge thanks to @christianparpart for jumping in and offering valuable information on this specific topic.

src/inc/operators.hpp Outdated Show resolved Hide resolved
// ## The unfortunate preface
//
// DirectWrite seems "reluctant" to segment text into clusters and I found no API which offers simply that.
// What it offers are a large number of low level APIs that can sort of be used in combination to do this.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is exactly what I have implemented in libunicode. As I use harfbuzz for text shaping, harfbuzz also does not provide this functionality.

//
// DirectWrite seems "reluctant" to segment text into clusters and I found no API which offers simply that.
// What it offers are a large number of low level APIs that can sort of be used in combination to do this.
// The resulting text parsing is very slow unfortunately, consuming up to 95% of rendering time in extreme cases.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Text segmentation isn't too expensive. Text shaping indeed the hell is. But since you can also segment by word boundaries, it should be easy to use a "word" as cache key for the text shaping result and the resulting glyph indices in that result again can be used as cache key into the texture atlas. I know there are some terminal emulators out there trying hard not to implement complex text shaping but in my personal experience (and with the caching explained also in the document above), I do not see any reason not to do that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently I'm using "clusters" the way harfbuzz defines them:

In text shaping, a cluster is a sequence of characters that needs to be treated as a single, indivisible unit. A single letter or symbol can be a cluster of its own. Other clusters correspond to longer subsequences of the input code points — such as a ligature or conjunct form — and require the shaper to ensure that the cluster is not broken during the shaping process.

I wrote this DirectWrite code to break the text into such clusters and use the characters in each cluster as the key for a hash map that holds texture atlas indices. I tried using glyph indices as the hash key first, but if you do that you need to pair the indices with the font face they come from (since each font has it's own list of glyph indices right?). But how do you hash a IDWriteFontFace? I didn't find a good answer to this question so I'm just hashing the characters of clusters for now instead of the glyphs they result in.
The reason I wrote it this way (segmentation into clusters) is because it seemed to me the most straightforward choice of building a Unicode glyph atlas.

Does caching the shaping result of words bring a worthwhile performance advantage, despite the overhead of the cache itself?

I'm asking as the biggest CPU consumption occurs during MapCharacters in this renderer, or in other words during font fallback. This contributes about 85% CPU usage when drawing a viewport full of Hindi characters. Both IDWriteTextAnalyzer1::GetTextComplexity as well as IDWriteTextAnalyzer::GetGlyphs which do the cluster segmentation for me are rather fast fortunately (~10% CPU usage).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In text shaping, a cluster is a sequence of characters that needs to be treated as a single, indivisible unit
...

I wrote this DirectWrite code to break the text into such clusters

Exactly. And especially with regards to what harfbuzz says, that is a grapheme cluster (user perceived character composed of one or more codepoints). This segmentation step should NOT be done by the renderer but instead on the VT parser side already. At least that is how I do it, and I highly recommend doing so because this information is also required for correct cursor placement. The link I posted above goes in more detail on that matter. And apart from that, I started drafting a Unicode Core spec for terminals that you can read up here. I'd love you (all) to have a look at it. It clearly defines the semantics on that matter and I wanted to ping the WT team on that spec anyways in order to find a common concensus in the TE land (at least a little) such that toolkits/apps can rely on this new extended behaviour. Feedback on this we do not need to spam here, just use that repo's discussion area or there where also jerch and j4james helped me already on getting this started.

I tried using glyph indices as the hash key first, but if you do that you need to pair the indices with the font face they come from

Yup. I'm doing that here and there - mind, I'm not using the most efficient way for cache keys here either. I'll very soon move to strong cache keys so that I can avoid that double-mapping and to also have simply integer values as keys.

Does caching the shaping result of words bring a worthwhile performance advantage, despite the overhead of the cache itself?

I know that my code was definitely slow before I started to implement caching. So I started naive, too. Then moved to unordered_map for caching, which is naive and dangerous, and now it's an LRU-cache, much better in terms of resource usage. the key should (as mentioned) by en integer to further improve lookup performance and avoid an extra strcmp inside the hash map impl.

Text shaping is definitely expensive! I chose caching words (delimited by spaces but also by common SGR) because the text shaper cannot deal with mixed font faces in one shape call as I am passing in one font face per shape call (either regular or italic/bold/...), and then walk down the font-fallback chain until a shape call could fully shape the whole word. This result ends up in the cache, so that I do not need to rerun the whole font-fallback queue again and again ... .

I'm asking as the biggest CPU consumption occurs during MapCharacters in this renderer, or in other words during font fallback. This contributes about 85% CPU usage when drawing a viewport full of Hindi characters.

That is exactly the part that is hidden behind the cache. I shape only if I don't have it in cache, and the shape() function does all the fallback logic.

Ok, sorry, all those hidden links point to master instead of explicit commit hash. They may not work for years. ;-( (I may fix that later?)

Copy link
Member

@DHowett DHowett Oct 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This segmentation step should NOT be done by the renderer but instead on the VT parser side already. At least that is how I do it, and I highly recommend doing so because this information is also required for correct cursor placement.

This worries me. (It's possible I'm misunderstanding -- I am jumping between threads!) I know that ideally we'd be doing segmentation earlier, but I don't think we can do it that early². Cursor positioning with regards to the measurable size of a set of codepoints specifically cannot depend on the font in the terminal, right? Apart from any changes resulting from future Unicode spec docs¹, there's no easy way for a client over e.g. ssh to be able to do cursor positioning (or glyph positioning, even) accounting for the font.

¹ This is really cool, by the way! @reli-msft over here is working on one as well. Renzhi, please make sure you're looped in on Christian's spec proposal.

² Under the assumption that the font would be required for segmentation. If it's not, carry on, we should almost certainly have a better text buffer.

³ (Thanks for jumping in to discuss this with us!)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh. I believe my misapprehension was that we would need to use the segmented stream as the backing store. "Doing segmentation" while data streams in doesn't mean that we need to impose any requirement on the font that a client application would be able to observe.

Copy link

@christianparpart christianparpart Oct 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey; so segmentation is done in multiple stages.

  1. grapheme cluster (GC) segmentation can be done on the VT backend side. (font independant). this is the required here because with that you can reliably associate characters (GC) to grid cells.
  2. glyph run segmentation is done on the renderer side. so after the characters (grapheme clusters) have been aligned into the grid already. this does therefore not even impact VT backend performance and is a step that can be easily cached (tm). (also font independant). This stage of segmentation is required to properly feed the text shaping engine (harfbuzz, DirectWrite, CoreText), see my text-stack-link I posted above :-)

Apart from any changes resulting from future Unicode spec docs¹, there's no easy way for a client over e.g. ssh to be able to do cursor positioning (or glyph positioning, even) accounting for the font.

Luckily this should never depend on font and currently also does not. All the algorithms (I used) are font file independent.

¹ This is really cool, by the way! @reli-msft over here is working on one as well. Renzhi, please make sure you're looped in on Christian's spec proposal.

Yeah would be nice if we can converge here on something. In my TE I do not yet implement the VT sequence proposed in that spec, but the semantics are on by default in my case anyways. I'll be adding this mode soon though - unless you have strong objections against such a thing, then we need to rethink, but I'm pretty certain that is the right way to go. Mind: BiDi is explicitly not addressed. That's another big undefined thing in the TE-land that could be specced at a later point. :)

@github-actions

This comment has been minimized.

Comment on lines +407 to +409
const auto& features = _fontRenderData->DefaultFontFeatures();
#pragma warning(suppress : 26492) // Don't use const_cast to cast away const or volatile (type.3).
DWRITE_TYPOGRAPHIC_FEATURES typographicFeatures = { const_cast<DWRITE_FONT_FEATURE*>(features.data()), gsl::narrow<uint32_t>(features.size()) };
Copy link
Member Author

@lhecker lhecker Oct 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand how this code is passing right now, but this fails the AuditMode for me locally so I was forced to change this code to get this to pass. &featureList[0] isn't valid code for null pointers so that got fixed as well (dereferencing a null pointer to a struct is UB IIRC).

Bonus effect: The std::vector for features isn't copied on every call anymore.
Negative effect: Const-correctness for the DWrite headers really isn't good. DWRITE_TYPOGRAPHIC_FEATURES wants its argument to be a mutable pointer. Had to use const_cast to circumvent that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh well. if it works it works. I'm not that sad about casting away const for APIs that didn't define it.

ghost pushed a commit that referenced this pull request Oct 29, 2021
FontInfoBase and it's descendents are missing noexcept annotations, which
virally forces other code to not be noexcept as well during AuditMode checks.
Apart from adding noexcept, this commit also
* Passes std::wstring_view by reference.
* Pass the FillLegacyNameBuffer argument as a simple pointer-to-array,
  allowing us to fill the buffer with a single memcpy.
  (gsl::span's iterators inhibit any internal STL optimizations.)
* Move operator== declarations inside the class to reduce code size.

All other changes are an effect of the virality of noexcept.

This is an offshoot from #11623.

## Validation Steps Performed
* It still compiles ✔️
Comment on lines -2323 to -2325
// expected read is either the size of the buffer or the number of characters remaining, whichever is smaller.
DWORD const dwReadExpected = (DWORD)std::min(cbBuffer, cchExpectedText - cchRead);

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case you wonder: I can't compile these tests locally otherwise, as the v143 toolchain introduced a few new warnings at /W4.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So do you have to #pragma suppress these? Or... do we just not need dwReadExpected?

Copy link
Member Author

@lhecker lhecker Nov 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The warnings were due to these variables never being used.

src/interactivity/win32/lib/win32.LIB.vcxproj Show resolved Hide resolved
{
_emplaceGlyph(mappedFontFace.get(), scale, idx + i, idx + i + 1u);
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this doesn't save us that much in the majority case because of the regional variations in i/j that ruin it for Cascadia Code, the default font.

Copy link
Member Author

@lhecker lhecker Nov 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cascadia Mono is the default font isn't it? That one definitely doesn't have the i/j issue and all basic ASCII characters are considered "simple" for that font (which makes it a lot faster than Cascadia Code). 🙂

// Additionally IDWriteTextAnalyzer::GetGlyphs requires an instance of DWRITE_SCRIPT_ANALYSIS,
// which can only be attained by running IDWriteTextAnalyzer::AnalyzeScript first.
//
// Font fallback with IDWriteFontFallback::MapCharacters is very slow.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nnnn ok. Now I understand why you want that other segmenting/shaping.

// > IDXGISwapChain::Present: Partial Presentation (using a dirty rects or scroll) is not supported
// > for SwapChains created with DXGI_SWAP_EFFECT_DISCARD or DXGI_SWAP_EFFECT_FLIP_DISCARD.
// ---> No need to call IDXGISwapChain1::Present1.
// TODO: Would IDXGISwapChain1::Present1 and its dirty rects have benefits for remote desktop?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solid maybe. Do note that the most recent versions of Remote Desktop have an automatic identification of areas of high motion on the screen and can effectively MPEG-video-streamify what's happening there to the remote client as a subsection of the display. I've successfully played Guild Wars 2 through an RDP session and watched YouTube videos through one at reasonable resolution (720p+). I bet we don't have to worry about this as much as we think we do (or did in the past).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RDP maintainers confirmed poggers.


AtlasEngine::f32x4 AtlasEngine::_getGammaRatios(float gamma) noexcept
{
static constexpr f32x4 gammaIncorrectTargetRatios[13]{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gammaIncorrectTargetRatios

"Incorrect"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah there's also gamma-correct correction, uhm, somewhere iykwim.
Now you don't start assuming that I know what I'm doing here, alright? I think - and that's a huge, the way Donald calls stuff huge - stretch of my saying "I think"... I think gamma incorrect correction applies to display-native gamma surfaces and gamma correct correction (lol) applies to sRGB surfaces. gammaIncorrectTargetRatios was """definitely""" the table to be used for D3D non-sRGB surfaces. I think. Maybe, but probably.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. I thought you were making a joke on the opposite of "correct" as you explained to me the inverse correction required over chat the other day.

// (avoiding either a batch flush or the system maintaining multiple copies of the resource behind the scenes).
//
// Since our shader only draws whatever is in the atlas, and since we don't replace glyph tiles that are in use,
// we can safely (?) tell the GPU that we don't overwrite parts of our atlas that are in use.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

safely (?)

the ? made me laugh

Copy link
Member Author

@lhecker lhecker Nov 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't you know how people need to print "don't eat the plastic" on food because people would choke on it?
Well the D3D folks didn't do that. I'm choking.

miniksa
miniksa previously approved these changes Nov 11, 2021
Copy link
Member

@miniksa miniksa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not going to hold this back. Overall, this is fantastic and I'm so excited. It's exactly what I hoped for and more. There's still plenty to do, but it's definitely more with-it than DxEngine was when it launched, so I think it's fair to roll with it and improve from here.

Thank you so much for all of this, Leonard!

Copy link
Member

@carlos-zamora carlos-zamora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still have AtlasEngine.*.cpp to review, but I don't think I'll get through them today. So figured I'd post a few comments for now.

src/renderer/atlas/shader_vs.hlsl Outdated Show resolved Hide resolved
src/renderer/atlas/shader_ps.hlsl Show resolved Hide resolved
@@ -0,0 +1,759 @@
// Copyright (c) Microsoft Corporation.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This header file is huge. Would there be any value/detriment to breaking it up into a few separate header files in src/renderer/atlas? I get that they're all a part of the atlas engine, but it's a lot.

That might also help provide descriptions for each one and make this a bit more clear (as abstracts)? It's just pretty overwhelming and for somebody like me who doesn't know much about the renderer, it's pretty easy to get lost and lose context by the time I get to line 200/759 of this file.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would definitely be helpful to break it up. I intend to do so in the future when I'm more certain about what's needed and what isn't. The other AtlasEngine is just as messy as the header file and needs a similar treatment.
It would be difficult for me to clean it up at this point already, because it's highly likely that I'll rewrite at least half of the code in the header file (anything related to the glyph cache basically).

miniksa
miniksa previously approved these changes Nov 12, 2021
Copy link
Member

@DHowett DHowett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

52/56, obviously deferred the hard ones for ROUND II

@@ -1873,36 +1865,47 @@ namespace winrt::Microsoft::Terminal::Control::implementation
// The family is only used to determine if the font is truetype or
// not, but DX doesn't use that info at all.
// The Codepage is additionally not actually used by the DX engine at all.
FontInfo actualFont = { fontFace, 0, fontWeight.Weight, { 0, gsl::narrow_cast<short>(fontHeight) }, CP_UTF8, false };
FontInfo actualFont = { fontFace, 0, fontWeight.Weight, { 0, gsl::narrow_cast<short>(fontSize) }, CP_UTF8, false };
FontInfoDesired desiredFont = { actualFont };

// Create a DX engine and initialize it with our font and DPI. We'll
// then use it to measure how much space the requested rows and columns
// will take up.
// TODO: MSFT:21254947 - use a static function to do this instead of
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm moving this MSFT backlog item out to GitHub -- we really should do this.

src/interactivity/win32/lib/win32.LIB.vcxproj Show resolved Hide resolved
<HeaderFileOutput>$(OutDir)$(ProjectName)\%(Filename).h</HeaderFileOutput>
<TreatWarningAsError>true</TreatWarningAsError>
<AdditionalOptions>/Zpc %(AdditionalOptions)</AdditionalOptions>
<AdditionalOptions Condition="'$(Configuration)'=='Release'">/O3 /Qstrip_debug /Qstrip_reflect %(AdditionalOptions)</AdditionalOptions>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

magic compiler flags -- cause for concern?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They're not that magic at all! For some weird reason neither of these 3 parameters are available as msbuild options.
But they're what we at Microsoft use ourselves in our own sample code. If you search for these inside this GitHub org you'll find dozens if not hundreds of results. 🙂
/O3 is for future compat and enabled all available optimizations and /Qstrip_debug/reflect remove debug/reflection metadata from the bytecode, reducing it's size a lot.

<ItemDefinitionGroup>
<ClCompile>
<PrecompiledHeaderFile>pch.h</PrecompiledHeaderFile>
<AdditionalIncludeDirectories>$(OutDir)$(ProjectName)\;%(AdditionalIncludeDirectories)</AdditionalIncludeDirectories>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW this is only necessary for projects that produce header files as part of the build. does Atlas have that?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the shaders are compiled into header files at $(OutDir)$(ProjectName)\%(Filename).h.

[[nodiscard]] HRESULT GetFontSize(_Out_ COORD* const pFontSize) noexcept override;
[[nodiscard]] HRESULT IsGlyphWideByFont(const std::wstring_view glyph, _Out_ bool* const pResult) noexcept override;
[[nodiscard]] HRESULT GetFontSize(_Out_ COORD* pFontSize) noexcept override;
[[nodiscard]] HRESULT IsGlyphWideByFont(std::wstring_view glyph, _Out_ bool* pResult) noexcept override;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe these signature changes either (1) do not matter for Bgfx/Wddmcon, or (2) will fail the windows build because we did not update Bgfx/Wddm 😄

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A while ago I hacked our project files to build the WDDM one and it worked.
If you're just asking about const-ness of these declarations then those don't matter: #11623 (comment)

Copy link
Member

@DHowett DHowett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Header: reviewed.


bool is_inline() const noexcept
{
return (__builtin_bit_cast(uintptr_t, allocated) & 1) != 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I don't totally understand. Since this is a union, how do we ensure that the inline value (which shares storage with allocated) doesn't have this bit set in this position?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I had to think about how this works for a while. This will always hold a T, either internal or allocated, but sometimes that T will have been allocated with extra space at the end.
The T inside this union needs to be complicit - it has to have a member at the end that is effectively flexible but also has the appropriate amount of padding (if it wants).

I still don't understand how we ensure that the inlined value doesn't have this bit set.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SmallObjectOptimizer is pretty much an implementation detail of AtlasKey and AtlasValue and the code doesn't make any sense whatsoever outside of that context.
The reason this works is because both AtlasKeyData as well as AtlasValueData coincidentially don't use the first bit for anything except for flagging whether inlined storage is used (for instance CellFlags::Inlined).

This code doesn't have to be super duper good, because I'm planning to make it a lot more robust and simpler with my custom hashmap implementation, which is required in order to get the LRU behavior we need. It will simultaneously allow us write simpler code here, as we can choose ourselves how to manage our linking behavior. For instance we don't need to allocate AtlasValue on the heap, if we simply use the linked list of the LRU itself to chain multiple atlas positions together.
In either case I expect this code to be rewritten and only the idea to remain, but in an improved form.

}

T* initialize(size_t byteSize)
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it's UB to not placement-new a T into the returned space, even if T is_trivially_copyable and has_unique_object_representation.

Copy link
Member Author

@lhecker lhecker Nov 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh it totally is! I can't even use malloc() here, because malloc() is UB in C++ in general. P0593R6 should deal with this. Until then we should just assume that the compiler will be reasonable and treat neither malloc() nor operator new() as UB....

// * Minimum alignment is 4 bytes (like `#pragma pack 4`)
// * Members cannot straddle 16 byte boundaries
// This means a structure like {u32; u32; u32; u32x2} would require
// padding so that it is {u32; u32; u32; <4 byte padding>; u32x2}.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can the job here be done with alignas?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is! The way this code is written, as long as others adhere to this style, it'll produce compilation errors if you violate this warning.

Copy link
Member

@DHowett DHowett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LET'S DO THIS.

@DHowett DHowett added the AutoMerge Marked for automatic merge by the bot when requirements are met label Nov 13, 2021
@ghost
Copy link

ghost commented Nov 13, 2021

Hello @DHowett!

Because this pull request has the AutoMerge label, I will be glad to assist with helping to merge this pull request once all check-in policies pass.

p.s. you can customize the way I help with merging this pull request, such as holding this pull request until a specific person approves. Simply @mention me (@msftbot) and give me an instruction to get started! Learn more here.

@ghost ghost merged commit 2353349 into main Nov 13, 2021
@ghost ghost deleted the dev/lhecker/atlas-engine branch November 13, 2021 00:10
Copy link

@jero288 jero288 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice

ghost pushed a commit that referenced this pull request Dec 1, 2021
## Summary of the Pull Request

Currently, the TermControl and ControlCore recieve a settings object that implements `IControlSettings`. They use for this for both reading the settings they should use, and also storing some runtime overrides to those settings (namely, `Opacity`). The object they recieve currently is a `T.S.M.TerminalSettings` object, as well as another `TerminalSettings` object if the user wants to have an `unfocusedAppearance`. All these are all hosted in the same process, so everything is fine and dandy. 

With the upcoming move to having the Terminal split into multiple processes, this will no longer work. If the `ControlCore` in the Content Process is given a pointer to a `TerminalSettings` in a certain Window Process, and that control is subsequently moved to another window, then there's no guarantee that the original `TerminalSettings` object continues to exist. In this scenario, when window 1 is closed, now the Core is unable to read any settings, because the process that owned that object no longer exists. 

The solution to this issue is to have the `ControlCore`'s own their own copy of the settings they were created with. that way, they can be confident those settings will always exist. Enter `ControlSettings`, a dumb struct for just storing all the contents of the Settings. I used x-macros for this, so that we don't need to copy-paste into this file every time we add a setting. 

Changing this has all sorts of other fallout effects:
* Previewing a scheme/anything is a tad bit more annoying. Before, we could just sneak the previewed scheme into a `TerminalSettings` that lived between the settings we created the control with, and the settings they were actually using, and it would _just work_. Even explaining that here, it sounds like magic, because it was. However, now, the TermControl can't use a layered `TerminalSettings` for the settings anymore. Now we need to actually read out the current color table, and set the whole scheme when we change it. So now there's also a `Microsoft.Terminal.Core.Scheme` _struct_ for holding that data. 
  - Why a `struct`? Because that will go across the process boundary as a blob, rather than as a pointer to an object in the other process. That way we can transit the whole struct from window to core safely. 
* A TermControl doesn't have a `IControlSettings` at all anymore - it initalizes itself via the settings in the `Core`. This will be useful for tear-out, when we need to have the `TermControl` initialize itself from just a `ControlCore`, without being able to rebuild the settings from scratch.
* The `TabTests` that were written under the assumption that the Control had a layered `TerminalSettings` obviously broke, as they were designed to. They've been modified to reflect the new reality.
* When we initialize the Control, we give it the settings and the `UnfocusedAppearance` all at once. If we don't give it an `unfocusedAppearance`, it will just use the focused appearance as the unfocused appearance.
* The Control no longer can _write_ settings to the `ControlSettings`. We don't want to be storing things in there. Pretty much everything we set in the control, we store somewhere other than in the settings object itself. However, `opacity` and `useAcrylic`, we need to store in a handy new `RUNTIME_SETTING` property. We can write those runtime overrides to those properties.  
* We no longer store the color scheme for a pane in the persisted state. I'm tracking that in #9800. I don't think it's too hard to add back, but I wanted this in front of eyes sooner than later.

## References

* #1256
* #5000
* #9794 has the scheme previewing in it.
* #9818 is WAY more possible now.

## PR Checklist
* [x] Surprisingly there wasn't ever a card or issue for this one. This was only ever a bullet point in #5000. 
* A bunch of these issues were fixed along the way, though I never intended to fix them:
  * [x] Closes #11571
  * [x] Closes #11586
  * [x] Closes #7219
  * [x] Closes #11067
  * [x] I think #11623 actually ended up resolving this one, but I'm double tapping on it here: Closes #5703
* [x] I work here
* [x] Tests added/passed
* [n/a] Requires documentation to be updated

## Detailed Description of the Pull Request / Additional comments

Along the way I tried to clean up code where possible, but not too agressively. 

I didn't end up converting the various `MockTerminalSettings` classes used in tests to the x macros quite yet. I wanted to merge this with #11416 in `main` before I went too crazy.

## Validation Steps Performed

* [x] Scheme previewing works
* [x] Adjusting the font size works
* [x] focused/unfocused appearances still work
* [x] mouse-wheeling opacity still works
* [x] acrylic & cleartype still does the right thing
* [x] saving the settings still works
* [x] going wild on sliding the opacity slider in the settings doesn't crash the terminal
* [x] toggling retro effects with a keybinding still works
* [x] toggling retro effects with the command palette works
* [x] The matrix of (`useAcrylic(true,false)`)x(`opacity(50,100)`)x(`antialiasingMode(cleartype, grayscale)`) works as expected. Slightly changed, falls back to grayscale more often, but looks more right.
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-AtlasEngine Area-Rendering Text rendering, emoji, complex glyph & font-fallback issues AutoMerge Marked for automatic merge by the bot when requirements are met Product-Conhost For issues in the Console codebase Product-Terminal The new Windows Terminal.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants