Reduce string allocation from TextFactory #14531

heejaechang · 2016-10-14T18:12:59Z

@asecchia adrian found an issue where we using editor text buffer for closed file allocates way more than using compiler's implementation of SourceText.

also, default implementation of TextFactory OOP uses can benefit from it by not allocating string as much as before for a big file.

heejaechang · 2016-10-14T18:13:18Z

@tmat Tomas, will this work?

CyrusNajmabadi · 2016-10-14T18:46:08Z

src/Compilers/Core/Portable/Text/LargeText.cs


-                    chunks.Add(chunk);
+        private static ImmutableArray<char[]> ReadFromTextRead(TextReader reader, int maxCharRemainingGuess, bool throwIfBinaryDetected)


ReadFromTextReader.

tmat · 2016-10-14T20:42:47Z

@nguerrera worked in this area recently.

nguerrera · 2016-10-14T20:52:26Z

src/Compilers/Core/Portable/Text/SourceText.cs

@@ -97,6 +97,37 @@ public static SourceText From(string text, Encoding encoding = null, SourceHashA
            return new StringText(text, encoding, checksumAlgorithm: checksumAlgorithm);
        }

+#pragma warning disable RS0026 // Do not add multiple public overloads with optional parameters


There are already some global suppressions for other overloads. Consider doing the same for this one.

Why do we even have this warning if we suppress it every time?

will do. I just wanted quick check from you guys whether this is even okay to do.

is checksum and the embedded blob thing not there okay? I think I am using encoding wrong though. since I am getting TextReader, it should be already normalized to certain encoding, the given encoding should be just passed along like StringText.From(string, encoding), and not actually used anywhere else.

I think TextReader is morally equivalent to string here: you get chars not bytes out of it. As such, it is OK to defer making the embedded blob and to do so by re-encoding it using the given encoding. It would be good to add a FromSource_TextReader test to EmbeddedTextTests. It should work as well as passing the same text to the string overload.

In reply to: 83501039 [](ancestors = 83501039)

It's a good warning, but this class broke the rule already so suppressing to stay consistent. It is incredibly frustrating to have all of these overloads (which I observed while writing tests). For EmbeddedText.FromXxx equivalents, the warning helped me to make separate FromBytes, FromStream, etc. factories. So I think the warning has value. Maybe we should just put a disable/restore pair aorund the full set of From overloads here and not suppress every time we add a new one. (Hopefully we won't add more, though).

In reply to: 83500492 [](ancestors = 83500492)

heejaechang · 2016-10-14T23:44:24Z

@tmat @nguerrera @CyrusNajmabadi updated PR to address all feedbacks and added unit tests

allow SourceText From TextReader to have null encoding lke the one from string.

heejaechang · 2016-10-16T20:23:39Z

@gafter I would like to add SourceText.From which accepts TextReader as input for content. this is to reduce various allocations of string, char[] and etc.

CyrusNajmabadi · 2016-10-16T23:36:57Z

src/Compilers/Core/Portable/Text/LargeText.cs

+                // We must compute the checksum and embedded text blob now while we still have the original bytes in hand.
+                // We cannot re-encode to obtain checksum and blob as the encoding is not guaranteed to round-trip.
+                var checksum = CalculateChecksum(stream, checksumAlgorithm);
+                var embeddedTextBlob = canBeEmbedded ? EmbeddedText.CreateBlob(stream) : default(ImmutableArray<byte>);


What is the 'embeddedTextBlob'?

@nguerrera is probably better person to answer

It is the blob that encodes (optionally) embedded source in a PDB. See #12625 for design info.

CyrusNajmabadi · 2016-10-16T23:37:41Z

src/Compilers/Core/Portable/Text/LargeText.cs

-                    }
+        internal static SourceText Decode(TextReader reader, int length, Encoding encodingOpt, SourceHashAlgorithm checksumAlgorithm)
+        {
+            if (length == 0)


Can you put a comment mentioning why this optimization is necessary? i.e. what percentage of readers have length=0 and how much this saves us to optimize. Thanks!

I'm surprised we have any callers calling LargeText.Decode witha length of 0.

@dotnet/roslyn-compiler compiler team is probably the one should answer.

CyrusNajmabadi · 2016-10-16T23:38:54Z

src/Compilers/Core/Portable/Text/LargeText.cs

-                    {
-                        Array.Resize(ref chunk, charsRead);
-                    }
+            return new LargeText(chunks, encodingOpt, checksumAlgorithm);


Why 'LargeText'? 'Large' implies that it is used for, well, large inputs. But what if the length was only '1', would we still make a "Large" text?

Or, if LargeText is for texts of any size, perhaps its name should be changed. Maybe ChunkedSourceText?

Ah,n/m. This is LargeText.Decode, not SourceText.Decode. So the lenght check was already done before.

I am following existing decode pattern. @dotnet/roslyn-compiler probably better people to answer

CyrusNajmabadi · 2016-10-16T23:39:15Z

src/Compilers/Core/Portable/Text/LargeText.cs

-                    {
-                        throw new InvalidDataException();
-                    }
+        private static ImmutableArray<char[]> ReadFromTextReader(TextReader reader, int maxCharRemainingGuess, bool throwIfBinaryDetected)


ReadChunksFromTextReader.

CyrusNajmabadi · 2016-10-16T23:39:43Z

src/Compilers/Core/Portable/Text/LargeText.cs

-                    {
-                        throw new InvalidDataException();
-                    }
+        private static ImmutableArray<char[]> ReadFromTextReader(TextReader reader, int maxCharRemainingGuess, bool throwIfBinaryDetected)


Should this return an ImmutableArray<ImmutableArray<char>>? Is there a value in returning mutable arrays? Are callers every going to change them?

don't know this was what it was doing before. but not sure converting char[] to immutable array is worth anything since we need char[] to communicate any other .net api.

CyrusNajmabadi · 2016-10-16T23:42:22Z

src/Compilers/Core/Portable/Text/LargeText.cs

+
+                if (charsRead < chunk.Length)
+                {
+                    Array.Resize(ref chunk, charsRead);


this resizing will not actually change 'chunk', it will allocate another array, copy the data inot it, then point 'ref chunk' at that new array. Given that, do we want to pool chunks so that we don't leak them in the case where we need to resize?

I don't believe there is any pool here for chunk.

You could add one :)

if there is data that shows it help. but I doubt that will help since the chunk will live as long as source text live and the pool will statically alive in memory and the size should be big assuming it is for large text.

CyrusNajmabadi · 2016-10-16T23:45:48Z

src/Workspaces/Core/Desktop/Workspace/Host/TemporaryStorage/TemporaryStorageServiceFactory.cs

@@ -337,6 +348,16 @@ public Task WriteStreamAsync(Stream stream, CancellationToken cancellationToken
                }
            }
        }
+
+        internal class TemporaryStorageTextReader : TextReader


There only appears to be one subclass of this type. Why not just roll this into DirectMemoryAccessReader?

AlekseyTs · 2016-10-17T17:35:32Z

src/Compilers/Core/Portable/PublicAPI.Unshipped.txt

@@ -813,6 +813,7 @@ static Microsoft.CodeAnalysis.SeparatedSyntaxList<TNode>.implicit operator Micro
 static Microsoft.CodeAnalysis.SeparatedSyntaxList<TNode>.implicit operator Microsoft.CodeAnalysis.SeparatedSyntaxList<TNode>(Microsoft.CodeAnalysis.SeparatedSyntaxList<Microsoft.CodeAnalysis.SyntaxNode> nodes) -> Microsoft.CodeAnalysis.SeparatedSyntaxList<TNode>
 static Microsoft.CodeAnalysis.Text.SourceText.From(System.IO.Stream stream, System.Text.Encoding encoding = null, Microsoft.CodeAnalysis.Text.SourceHashAlgorithm checksumAlgorithm = Microsoft.CodeAnalysis.Text.SourceHashAlgorithm.Sha1, bool throwIfBinaryDetected = false, bool canBeEmbedded = false) -> Microsoft.CodeAnalysis.Text.SourceText
 static Microsoft.CodeAnalysis.Text.SourceText.From(System.IO.Stream stream, System.Text.Encoding encoding, Microsoft.CodeAnalysis.Text.SourceHashAlgorithm checksumAlgorithm, bool throwIfBinaryDetected) -> Microsoft.CodeAnalysis.Text.SourceText
+static Microsoft.CodeAnalysis.Text.SourceText.From(System.IO.TextReader reader, int length, System.Text.Encoding encoding = null, Microsoft.CodeAnalysis.Text.SourceHashAlgorithm checksumAlgorithm = Microsoft.CodeAnalysis.Text.SourceHashAlgorithm.Sha1) -> Microsoft.CodeAnalysis.Text.SourceText


CC @AnthonyDGreen, @jaredpar, @gafter For new public API.

AlekseyTs · 2016-10-17T17:50:59Z

LGTM

nguerrera · 2016-10-17T19:34:26Z

LGTM

Reduce string allocation from TextFactory

dnfclas added the cla-already-signed label Oct 14, 2016

CyrusNajmabadi reviewed Oct 14, 2016

View reviewed changes

nguerrera reviewed Oct 14, 2016

View reviewed changes

review only

0f02de0

heejaechang force-pushed the sourcetextreader branch from fc546b6 to 0f02de0 Compare October 14, 2016 23:06

added unit tests

c8526d8

heejaechang changed the title ~~[Review Only] reduce string allocation from TextFactory~~ Reduce string allocation from TextFactory Oct 14, 2016

fixed test failures.

32c80e8

allow SourceText From TextReader to have null encoding lke the one from string.

CyrusNajmabadi reviewed Oct 16, 2016

View reviewed changes

AlekseyTs reviewed Oct 17, 2016

View reviewed changes

renamed some methods following suggestion.

f73dd86

heejaechang merged commit 779bc9a into dotnet:master Oct 17, 2016

heejaechang added a commit to heejaechang/roslyn that referenced this pull request Oct 21, 2016

Merge pull request dotnet#14531 from heejaechang/sourcetextreader

d5d6885

Reduce string allocation from TextFactory

heejaechang mentioned this pull request Oct 21, 2016

bring all improvements in master to RC for OOP #14677

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce string allocation from TextFactory #14531

Reduce string allocation from TextFactory #14531

heejaechang commented Oct 14, 2016

heejaechang commented Oct 14, 2016

CyrusNajmabadi Oct 14, 2016

heejaechang Oct 14, 2016

tmat commented Oct 14, 2016

nguerrera Oct 14, 2016

tmat Oct 14, 2016

heejaechang Oct 14, 2016

nguerrera Oct 14, 2016

nguerrera Oct 14, 2016

heejaechang commented Oct 14, 2016

heejaechang commented Oct 16, 2016

CyrusNajmabadi Oct 16, 2016

heejaechang Oct 17, 2016

nguerrera Oct 17, 2016

CyrusNajmabadi Oct 16, 2016

CyrusNajmabadi Oct 16, 2016

heejaechang Oct 17, 2016

CyrusNajmabadi Oct 16, 2016

CyrusNajmabadi Oct 16, 2016

heejaechang Oct 17, 2016

CyrusNajmabadi Oct 16, 2016

heejaechang Oct 17, 2016

CyrusNajmabadi Oct 16, 2016

heejaechang Oct 17, 2016

CyrusNajmabadi Oct 16, 2016

heejaechang Oct 17, 2016

CyrusNajmabadi Oct 17, 2016

heejaechang Oct 17, 2016

CyrusNajmabadi Oct 16, 2016

heejaechang Oct 17, 2016

AlekseyTs Oct 17, 2016

AlekseyTs commented Oct 17, 2016

nguerrera commented Oct 17, 2016


		chunks.Add(chunk);
		private static ImmutableArray<char[]> ReadFromTextRead(TextReader reader, int maxCharRemainingGuess, bool throwIfBinaryDetected)

Reduce string allocation from TextFactory #14531

Reduce string allocation from TextFactory #14531

Conversation

heejaechang commented Oct 14, 2016

heejaechang commented Oct 14, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tmat commented Oct 14, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

heejaechang commented Oct 14, 2016

heejaechang commented Oct 16, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AlekseyTs commented Oct 17, 2016

nguerrera commented Oct 17, 2016