Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

constexpr all the generate_canonical parameters #2498

Merged
merged 7 commits into from
Feb 7, 2022

Conversation

MattStephanson
Copy link
Contributor

...up to 64 bits of entropy. Leave the original implementation for
more bits and naughty generators (I'm looking at you, tr1) whose
min and max functions aren't static.

Fixes #1964. Alternative to #2452.

A slightly modified version of @CaseyCarter's release (/O2 /fp:strict for me) codegen test:

#include <random>

template <size_t kMax>
struct Engine {
    using result_type = size_t;

    static constexpr result_type min() noexcept {
        return 0;
    }
    static constexpr result_type max() noexcept {
        return kMax;
    }

    result_type operator()() noexcept;
};

template <class R, class G>
R meow(G& g) {
    return std::generate_canonical < R, ~std::size_t{ 0 } > (g);
}

template double meow(Engine<10'000>&);
template double meow(Engine<~size_t{0}>&);
asm diff
diff --git a/before.asm b/after.asm
index 5a4e672e..a5d6f74f 100644
--- a/before.asm
+++ b/after.asm
@@ -12,102 +12,43 @@ PUBLIC	??$meow@NU?$Engine@$0CHBA@@@@@YANAEAU?$Engine@$0CHBA@@@@Z ; meow<double,E
 PUBLIC	??$meow@NU?$Engine@$0?0@@@@YANAEAU?$Engine@$0?0@@@Z ; meow<double,Engine<-1> >
 PUBLIC	??$generate_canonical@N$0?0U?$Engine@$0CHBA@@@@std@@YANAEAU?$Engine@$0CHBA@@@@Z ; std::generate_canonical<double,-1,Engine<10000> >
 PUBLIC	??$generate_canonical@N$0?0U?$Engine@$0?0@@@std@@YANAEAU?$Engine@$0?0@@@Z ; std::generate_canonical<double,-1,Engine<-1> >
-PUBLIC	?min@?$Engine@$0CHBA@@@SA_KXZ			; Engine<10000>::min
-PUBLIC	?max@?$Engine@$0CHBA@@@SA_KXZ			; Engine<10000>::max
-PUBLIC	?min@?$Engine@$0?0@@SA_KXZ			; Engine<-1>::min
-PUBLIC	?max@?$Engine@$0?0@@SA_KXZ			; Engine<-1>::max
 PUBLIC	__real@3ff0000000000000
-PUBLIC	__real@404a800000000000
-PUBLIC	__real@40c3880000000000
+PUBLIC	__real@40c3888000000000
 PUBLIC	__real@43f0000000000000
-EXTRN	ceil:PROC
-EXTRN	log2:PROC
 EXTRN	??R?$Engine@$0CHBA@@@QEAA_KXZ:PROC		; Engine<10000>::operator()
 EXTRN	??R?$Engine@$0?0@@QEAA_KXZ:PROC			; Engine<-1>::operator()
 EXTRN	_fltused:DWORD
 ;	COMDAT pdata
 pdata	SEGMENT
 $pdata$??$meow@NU?$Engine@$0CHBA@@@@@YANAEAU?$Engine@$0CHBA@@@@Z DD imagerel $LN15
-	DD	imagerel $LN15+112
+	DD	imagerel $LN15+172
 	DD	imagerel $unwind$??$meow@NU?$Engine@$0CHBA@@@@@YANAEAU?$Engine@$0CHBA@@@@Z
 pdata	ENDS
 ;	COMDAT pdata
 pdata	SEGMENT
-$pdata$0$??$meow@NU?$Engine@$0CHBA@@@@@YANAEAU?$Engine@$0CHBA@@@@Z DD imagerel $LN15+112
-	DD	imagerel $LN15+192
-	DD	imagerel $chain$0$??$meow@NU?$Engine@$0CHBA@@@@@YANAEAU?$Engine@$0CHBA@@@@Z
-pdata	ENDS
-;	COMDAT pdata
-pdata	SEGMENT
-$pdata$1$??$meow@NU?$Engine@$0CHBA@@@@@YANAEAU?$Engine@$0CHBA@@@@Z DD imagerel $LN15+192
-	DD	imagerel $LN15+227
-	DD	imagerel $chain$1$??$meow@NU?$Engine@$0CHBA@@@@@YANAEAU?$Engine@$0CHBA@@@@Z
-pdata	ENDS
-;	COMDAT pdata
-pdata	SEGMENT
 $pdata$??$meow@NU?$Engine@$0?0@@@@YANAEAU?$Engine@$0?0@@@Z DD imagerel $LN15
-	DD	imagerel $LN15+112
+	DD	imagerel $LN15+69
 	DD	imagerel $unwind$??$meow@NU?$Engine@$0?0@@@@YANAEAU?$Engine@$0?0@@@Z
 pdata	ENDS
 ;	COMDAT pdata
 pdata	SEGMENT
-$pdata$0$??$meow@NU?$Engine@$0?0@@@@YANAEAU?$Engine@$0?0@@@Z DD imagerel $LN15+112
-	DD	imagerel $LN15+192
-	DD	imagerel $chain$0$??$meow@NU?$Engine@$0?0@@@@YANAEAU?$Engine@$0?0@@@Z
-pdata	ENDS
-;	COMDAT pdata
-pdata	SEGMENT
-$pdata$1$??$meow@NU?$Engine@$0?0@@@@YANAEAU?$Engine@$0?0@@@Z DD imagerel $LN15+192
-	DD	imagerel $LN15+227
-	DD	imagerel $chain$1$??$meow@NU?$Engine@$0?0@@@@YANAEAU?$Engine@$0?0@@@Z
-pdata	ENDS
-;	COMDAT pdata
-pdata	SEGMENT
 $pdata$??$generate_canonical@N$0?0U?$Engine@$0CHBA@@@@std@@YANAEAU?$Engine@$0CHBA@@@@Z DD imagerel $LN13
-	DD	imagerel $LN13+112
+	DD	imagerel $LN13+172
 	DD	imagerel $unwind$??$generate_canonical@N$0?0U?$Engine@$0CHBA@@@@std@@YANAEAU?$Engine@$0CHBA@@@@Z
 pdata	ENDS
 ;	COMDAT pdata
 pdata	SEGMENT
-$pdata$0$??$generate_canonical@N$0?0U?$Engine@$0CHBA@@@@std@@YANAEAU?$Engine@$0CHBA@@@@Z DD imagerel $LN13+112
-	DD	imagerel $LN13+192
-	DD	imagerel $chain$0$??$generate_canonical@N$0?0U?$Engine@$0CHBA@@@@std@@YANAEAU?$Engine@$0CHBA@@@@Z
-pdata	ENDS
-;	COMDAT pdata
-pdata	SEGMENT
-$pdata$1$??$generate_canonical@N$0?0U?$Engine@$0CHBA@@@@std@@YANAEAU?$Engine@$0CHBA@@@@Z DD imagerel $LN13+192
-	DD	imagerel $LN13+227
-	DD	imagerel $chain$1$??$generate_canonical@N$0?0U?$Engine@$0CHBA@@@@std@@YANAEAU?$Engine@$0CHBA@@@@Z
-pdata	ENDS
-;	COMDAT pdata
-pdata	SEGMENT
 $pdata$??$generate_canonical@N$0?0U?$Engine@$0?0@@@std@@YANAEAU?$Engine@$0?0@@@Z DD imagerel $LN13
-	DD	imagerel $LN13+112
+	DD	imagerel $LN13+69
 	DD	imagerel $unwind$??$generate_canonical@N$0?0U?$Engine@$0?0@@@std@@YANAEAU?$Engine@$0?0@@@Z
 pdata	ENDS
-;	COMDAT pdata
-pdata	SEGMENT
-$pdata$0$??$generate_canonical@N$0?0U?$Engine@$0?0@@@std@@YANAEAU?$Engine@$0?0@@@Z DD imagerel $LN13+112
-	DD	imagerel $LN13+192
-	DD	imagerel $chain$0$??$generate_canonical@N$0?0U?$Engine@$0?0@@@std@@YANAEAU?$Engine@$0?0@@@Z
-pdata	ENDS
-;	COMDAT pdata
-pdata	SEGMENT
-$pdata$1$??$generate_canonical@N$0?0U?$Engine@$0?0@@@std@@YANAEAU?$Engine@$0?0@@@Z DD imagerel $LN13+192
-	DD	imagerel $LN13+227
-	DD	imagerel $chain$1$??$generate_canonical@N$0?0U?$Engine@$0?0@@@std@@YANAEAU?$Engine@$0?0@@@Z
-pdata	ENDS
 ;	COMDAT __real@43f0000000000000
 CONST	SEGMENT
 __real@43f0000000000000 DQ 043f0000000000000r	; 1.84467e+19
 CONST	ENDS
-;	COMDAT __real@40c3880000000000
-CONST	SEGMENT
-__real@40c3880000000000 DQ 040c3880000000000r	; 10000
-CONST	ENDS
-;	COMDAT __real@404a800000000000
+;	COMDAT __real@40c3888000000000
 CONST	SEGMENT
-__real@404a800000000000 DQ 0404a800000000000r	; 53
+__real@40c3888000000000 DQ 040c3888000000000r	; 10001
 CONST	ENDS
 ;	COMDAT __real@3ff0000000000000
 CONST	SEGMENT
@@ -115,187 +56,44 @@ __real@3ff0000000000000 DQ 03ff0000000000000r	; 1
 CONST	ENDS
 ;	COMDAT xdata
 xdata	SEGMENT
-$chain$1$??$generate_canonical@N$0?0U?$Engine@$0?0@@@std@@YANAEAU?$Engine@$0?0@@@Z DD 021H
-	DD	imagerel $LN13
-	DD	imagerel $LN13+112
-	DD	imagerel $unwind$??$generate_canonical@N$0?0U?$Engine@$0?0@@@std@@YANAEAU?$Engine@$0?0@@@Z
-xdata	ENDS
-;	COMDAT xdata
-xdata	SEGMENT
-$chain$0$??$generate_canonical@N$0?0U?$Engine@$0?0@@@std@@YANAEAU?$Engine@$0?0@@@Z DD 020521H
-	DD	0e3405H
-	DD	imagerel $LN13
-	DD	imagerel $LN13+112
-	DD	imagerel $unwind$??$generate_canonical@N$0?0U?$Engine@$0?0@@@std@@YANAEAU?$Engine@$0?0@@@Z
+$unwind$??$generate_canonical@N$0?0U?$Engine@$0?0@@@std@@YANAEAU?$Engine@$0?0@@@Z DD 010401H
+	DD	04204H
 xdata	ENDS
 ;	COMDAT xdata
 xdata	SEGMENT
-$unwind$??$generate_canonical@N$0?0U?$Engine@$0?0@@@std@@YANAEAU?$Engine@$0?0@@@Z DD 0a3001H
-	DD	029830H
-	DD	038821H
-	DD	04781bH
-	DD	05680bH
-	DD	07002b206H
+$unwind$??$generate_canonical@N$0?0U?$Engine@$0CHBA@@@@std@@YANAEAU?$Engine@$0CHBA@@@@Z DD 0c3701H
+	DD	029837H
+	DD	03882dH
+	DD	047817H
+	DD	05680fH
+	DD	0e340aH
+	DD	07006b20aH
 xdata	ENDS
 ;	COMDAT xdata
 xdata	SEGMENT
-$chain$1$??$generate_canonical@N$0?0U?$Engine@$0CHBA@@@@std@@YANAEAU?$Engine@$0CHBA@@@@Z DD 021H
-	DD	imagerel $LN13
-	DD	imagerel $LN13+112
-	DD	imagerel $unwind$??$generate_canonical@N$0?0U?$Engine@$0CHBA@@@@std@@YANAEAU?$Engine@$0CHBA@@@@Z
+$unwind$??$meow@NU?$Engine@$0?0@@@@YANAEAU?$Engine@$0?0@@@Z DD 010401H
+	DD	04204H
 xdata	ENDS
 ;	COMDAT xdata
 xdata	SEGMENT
-$chain$0$??$generate_canonical@N$0?0U?$Engine@$0CHBA@@@@std@@YANAEAU?$Engine@$0CHBA@@@@Z DD 020521H
-	DD	0e3405H
-	DD	imagerel $LN13
-	DD	imagerel $LN13+112
-	DD	imagerel $unwind$??$generate_canonical@N$0?0U?$Engine@$0CHBA@@@@std@@YANAEAU?$Engine@$0CHBA@@@@Z
+$unwind$??$meow@NU?$Engine@$0CHBA@@@@@YANAEAU?$Engine@$0CHBA@@@@Z DD 0c3701H
+	DD	029837H
+	DD	03882dH
+	DD	047817H
+	DD	05680fH
+	DD	0e340aH
+	DD	07006b20aH
 xdata	ENDS
-;	COMDAT xdata
-xdata	SEGMENT
-$unwind$??$generate_canonical@N$0?0U?$Engine@$0CHBA@@@@std@@YANAEAU?$Engine@$0CHBA@@@@Z DD 0a3001H
-	DD	029830H
-	DD	038821H
-	DD	04781bH
-	DD	05680bH
-	DD	07002b206H
-xdata	ENDS
-;	COMDAT xdata
-xdata	SEGMENT
-$chain$1$??$meow@NU?$Engine@$0?0@@@@YANAEAU?$Engine@$0?0@@@Z DD 021H
-	DD	imagerel $LN15
-	DD	imagerel $LN15+112
-	DD	imagerel $unwind$??$meow@NU?$Engine@$0?0@@@@YANAEAU?$Engine@$0?0@@@Z
-xdata	ENDS
-;	COMDAT xdata
-xdata	SEGMENT
-$chain$0$??$meow@NU?$Engine@$0?0@@@@YANAEAU?$Engine@$0?0@@@Z DD 020521H
-	DD	0e3405H
-	DD	imagerel $LN15
-	DD	imagerel $LN15+112
-	DD	imagerel $unwind$??$meow@NU?$Engine@$0?0@@@@YANAEAU?$Engine@$0?0@@@Z
-xdata	ENDS
-;	COMDAT xdata
-xdata	SEGMENT
-$unwind$??$meow@NU?$Engine@$0?0@@@@YANAEAU?$Engine@$0?0@@@Z DD 0a3001H
-	DD	029830H
-	DD	038821H
-	DD	04781bH
-	DD	05680bH
-	DD	07002b206H
-xdata	ENDS
-;	COMDAT xdata
-xdata	SEGMENT
-$chain$1$??$meow@NU?$Engine@$0CHBA@@@@@YANAEAU?$Engine@$0CHBA@@@@Z DD 021H
-	DD	imagerel $LN15
-	DD	imagerel $LN15+112
-	DD	imagerel $unwind$??$meow@NU?$Engine@$0CHBA@@@@@YANAEAU?$Engine@$0CHBA@@@@Z
-xdata	ENDS
-;	COMDAT xdata
-xdata	SEGMENT
-$chain$0$??$meow@NU?$Engine@$0CHBA@@@@@YANAEAU?$Engine@$0CHBA@@@@Z DD 020521H
-	DD	0e3405H
-	DD	imagerel $LN15
-	DD	imagerel $LN15+112
-	DD	imagerel $unwind$??$meow@NU?$Engine@$0CHBA@@@@@YANAEAU?$Engine@$0CHBA@@@@Z
-xdata	ENDS
-;	COMDAT xdata
-xdata	SEGMENT
-$unwind$??$meow@NU?$Engine@$0CHBA@@@@@YANAEAU?$Engine@$0CHBA@@@@Z DD 0a3001H
-	DD	029830H
-	DD	038821H
-	DD	04781bH
-	DD	05680bH
-	DD	07002b206H
-xdata	ENDS
-; Function compile flags: /Ogtpy
-;	COMDAT ?max@?$Engine@$0?0@@SA_KXZ
-_TEXT	SEGMENT
-?max@?$Engine@$0?0@@SA_KXZ PROC				; Engine<-1>::max, COMDAT
-; File c:\Users\Matt\source\repos\STL\stl\inc\gen-can-test.cpp
-; Line 11
-	mov	rax, -1
-; Line 12
-	ret	0
-?max@?$Engine@$0?0@@SA_KXZ ENDP				; Engine<-1>::max
-_TEXT	ENDS
-; Function compile flags: /Ogtpy
-;	COMDAT ?min@?$Engine@$0?0@@SA_KXZ
-_TEXT	SEGMENT
-?min@?$Engine@$0?0@@SA_KXZ PROC				; Engine<-1>::min, COMDAT
-; File c:\Users\Matt\source\repos\STL\stl\inc\gen-can-test.cpp
-; Line 8
-	xor	eax, eax
-; Line 9
-	ret	0
-?min@?$Engine@$0?0@@SA_KXZ ENDP				; Engine<-1>::min
-_TEXT	ENDS
-; Function compile flags: /Ogtpy
-;	COMDAT ?max@?$Engine@$0CHBA@@@SA_KXZ
-_TEXT	SEGMENT
-?max@?$Engine@$0CHBA@@@SA_KXZ PROC			; Engine<10000>::max, COMDAT
-; File c:\Users\Matt\source\repos\STL\stl\inc\gen-can-test.cpp
-; Line 11
-	mov	eax, 10000				; 00002710H
-; Line 12
-	ret	0
-?max@?$Engine@$0CHBA@@@SA_KXZ ENDP			; Engine<10000>::max
-_TEXT	ENDS
-; Function compile flags: /Ogtpy
-;	COMDAT ?min@?$Engine@$0CHBA@@@SA_KXZ
-_TEXT	SEGMENT
-?min@?$Engine@$0CHBA@@@SA_KXZ PROC			; Engine<10000>::min, COMDAT
-; File c:\Users\Matt\source\repos\STL\stl\inc\gen-can-test.cpp
-; Line 8
-	xor	eax, eax
-; Line 9
-	ret	0
-?min@?$Engine@$0CHBA@@@SA_KXZ ENDP			; Engine<10000>::min
-_TEXT	ENDS
 ; Function compile flags: /Ogtpy
 ;	COMDAT ??$generate_canonical@N$0?0U?$Engine@$0?0@@@std@@YANAEAU?$Engine@$0?0@@@Z
 _TEXT	SEGMENT
-_Gx$ = 112
+_Gx$ = 48
 ??$generate_canonical@N$0?0U?$Engine@$0?0@@@std@@YANAEAU?$Engine@$0?0@@@Z PROC ; std::generate_canonical<double,-1,Engine<-1> >, COMDAT
 ; File C:\Users\Matt\source\repos\STL\stl\inc\random
-; Line 242
+; Line 272
 $LN13:
-	push	rdi
-	sub	rsp, 96					; 00000060H
-	movaps	XMMWORD PTR [rsp+80], xmm6
-	mov	rdi, rcx
-; Line 250
-	movsd	xmm6, QWORD PTR __real@3ff0000000000000
-	movaps	XMMWORD PTR [rsp+64], xmm7
-	movaps	XMMWORD PTR [rsp+48], xmm8
-	movsd	xmm8, QWORD PTR __real@43f0000000000000
-	movaps	XMMWORD PTR [rsp+32], xmm9
-	xorps	xmm9, xmm9
-	subsd	xmm8, xmm9
-	addsd	xmm8, xmm6
-; Line 252
-	movaps	xmm0, xmm8
-	call	log2
-	movsd	xmm1, QWORD PTR __real@404a800000000000
-	divsd	xmm1, xmm0
-	movaps	xmm0, xmm1
-	call	ceil
-	cvttsd2si eax, xmm0
-; Line 253
-	mov	ecx, 1
-	xorps	xmm7, xmm7
-	cmp	eax, ecx
-	cmovl	eax, ecx
-; Line 258
-	test	eax, eax
-	jle	SHORT $LN3@generate_c
-; Line 250
-	mov	QWORD PTR [rsp+112], rbx
-	mov	ebx, eax
-$LL4@generate_c:
-; Line 259
-	mov	rcx, rdi
+	sub	rsp, 40					; 00000028H
+; Line 288
 	call	??R?$Engine@$0?0@@QEAA_KXZ		; Engine<-1>::operator()
 	mov	rcx, rax
 	xorps	xmm0, xmm0
@@ -310,25 +108,13 @@ $LN10@generate_c:
 	cvtsi2sd xmm0, rax
 	addsd	xmm0, xmm0
 $LN11@generate_c:
-	subsd	xmm0, xmm9
-	mulsd	xmm0, xmm6
-; Line 260
-	mulsd	xmm6, xmm8
-	addsd	xmm7, xmm0
-	sub	rbx, 1
-	jne	SHORT $LL4@generate_c
-; Line 258
-	mov	rbx, QWORD PTR [rsp+112]
-$LN3@generate_c:
-; Line 264
-	movaps	xmm8, XMMWORD PTR [rsp+48]
-	movaps	xmm9, XMMWORD PTR [rsp+32]
-	divsd	xmm7, xmm6
-	movaps	xmm6, XMMWORD PTR [rsp+80]
-	movaps	xmm0, xmm7
-	movaps	xmm7, XMMWORD PTR [rsp+64]
-	add	rsp, 96					; 00000060H
-	pop	rdi
+	xorps	xmm1, xmm1
+	subsd	xmm0, xmm1
+	addsd	xmm0, xmm1
+; Line 292
+	divsd	xmm0, QWORD PTR __real@43f0000000000000
+; Line 293
+	add	rsp, 40					; 00000028H
 	ret	0
 ??$generate_canonical@N$0?0U?$Engine@$0?0@@@std@@YANAEAU?$Engine@$0?0@@@Z ENDP ; std::generate_canonical<double,-1,Engine<-1> >
 _TEXT	ENDS
@@ -338,45 +124,27 @@ _TEXT	SEGMENT
 _Gx$ = 112
 ??$generate_canonical@N$0?0U?$Engine@$0CHBA@@@@std@@YANAEAU?$Engine@$0CHBA@@@@Z PROC ; std::generate_canonical<double,-1,Engine<10000> >, COMDAT
 ; File C:\Users\Matt\source\repos\STL\stl\inc\random
-; Line 242
+; Line 272
 $LN13:
+	mov	QWORD PTR [rsp+8], rbx
 	push	rdi
 	sub	rsp, 96					; 00000060H
 	movaps	XMMWORD PTR [rsp+80], xmm6
 	mov	rdi, rcx
-; Line 250
-	movsd	xmm6, QWORD PTR __real@3ff0000000000000
 	movaps	XMMWORD PTR [rsp+64], xmm7
+	xorps	xmm6, xmm6
+; Line 285
+	movsd	xmm7, QWORD PTR __real@3ff0000000000000
+	mov	ebx, 4
 	movaps	XMMWORD PTR [rsp+48], xmm8
-	movsd	xmm8, QWORD PTR __real@40c3880000000000
+	xorps	xmm8, xmm8
 	movaps	XMMWORD PTR [rsp+32], xmm9
-	xorps	xmm9, xmm9
-	subsd	xmm8, xmm9
-	addsd	xmm8, xmm6
-; Line 252
-	movaps	xmm0, xmm8
-	call	log2
-	movsd	xmm1, QWORD PTR __real@404a800000000000
-	divsd	xmm1, xmm0
-	movaps	xmm0, xmm1
-	call	ceil
-	cvttsd2si eax, xmm0
-; Line 253
-	mov	ecx, 1
-	xorps	xmm7, xmm7
-	cmp	eax, ecx
-	cmovl	eax, ecx
-; Line 258
-	test	eax, eax
-	jle	SHORT $LN3@generate_c
-; Line 250
-	mov	QWORD PTR [rsp+112], rbx
-	mov	ebx, eax
+	movsd	xmm9, QWORD PTR __real@40c3888000000000
 $LL4@generate_c:
-; Line 259
+; Line 288
 	mov	rcx, rdi
 	call	??R?$Engine@$0CHBA@@@QEAA_KXZ		; Engine<10000>::operator()
-	mov	rcx, rax
+	mov	rdx, rax
 	xorps	xmm0, xmm0
 	test	rax, rax
 	js	SHORT $LN10@generate_c
@@ -384,28 +152,26 @@ $LL4@generate_c:
 	jmp	SHORT $LN11@generate_c
 $LN10@generate_c:
 	shr	rax, 1
-	and	ecx, 1
-	or	rax, rcx
+	and	edx, 1
+	or	rax, rdx
 	cvtsi2sd xmm0, rax
 	addsd	xmm0, xmm0
 $LN11@generate_c:
-	subsd	xmm0, xmm9
-	mulsd	xmm0, xmm6
-; Line 260
-	mulsd	xmm6, xmm8
-	addsd	xmm7, xmm0
+	subsd	xmm0, xmm8
+	mulsd	xmm0, xmm7
+; Line 289
+	mulsd	xmm7, xmm9
+	addsd	xmm6, xmm0
 	sub	rbx, 1
 	jne	SHORT $LL4@generate_c
-; Line 258
+; Line 293
 	mov	rbx, QWORD PTR [rsp+112]
-$LN3@generate_c:
-; Line 264
 	movaps	xmm8, XMMWORD PTR [rsp+48]
 	movaps	xmm9, XMMWORD PTR [rsp+32]
-	divsd	xmm7, xmm6
-	movaps	xmm6, XMMWORD PTR [rsp+80]
-	movaps	xmm0, xmm7
+	divsd	xmm6, xmm7
 	movaps	xmm7, XMMWORD PTR [rsp+64]
+	movaps	xmm0, xmm6
+	movaps	xmm6, XMMWORD PTR [rsp+80]
 	add	rsp, 96					; 00000060H
 	pop	rdi
 	ret	0
@@ -414,47 +180,14 @@ _TEXT	ENDS
 ; Function compile flags: /Ogtpy
 ;	COMDAT ??$meow@NU?$Engine@$0?0@@@@YANAEAU?$Engine@$0?0@@@Z
 _TEXT	SEGMENT
-g$ = 112
+g$ = 48
 ??$meow@NU?$Engine@$0?0@@@@YANAEAU?$Engine@$0?0@@@Z PROC ; meow<double,Engine<-1> >, COMDAT
 ; File c:\Users\Matt\source\repos\STL\stl\inc\gen-can-test.cpp
 ; Line 18
 $LN15:
-	push	rdi
-	sub	rsp, 96					; 00000060H
-	movaps	XMMWORD PTR [rsp+80], xmm6
-	mov	rdi, rcx
+	sub	rsp, 40					; 00000028H
 ; File C:\Users\Matt\source\repos\STL\stl\inc\random
-; Line 250
-	movsd	xmm6, QWORD PTR __real@3ff0000000000000
-	movaps	XMMWORD PTR [rsp+64], xmm7
-	movaps	XMMWORD PTR [rsp+48], xmm8
-	movsd	xmm8, QWORD PTR __real@43f0000000000000
-	movaps	XMMWORD PTR [rsp+32], xmm9
-	xorps	xmm9, xmm9
-	subsd	xmm8, xmm9
-	addsd	xmm8, xmm6
-; Line 252
-	movaps	xmm0, xmm8
-	call	log2
-	movsd	xmm1, QWORD PTR __real@404a800000000000
-	divsd	xmm1, xmm0
-	movaps	xmm0, xmm1
-	call	ceil
-	cvttsd2si eax, xmm0
-; Line 253
-	mov	ecx, 1
-	xorps	xmm7, xmm7
-	cmp	eax, ecx
-	cmovl	eax, ecx
-; Line 258
-	test	eax, eax
-	jle	SHORT $LN5@meow
-; Line 250
-	mov	QWORD PTR [rsp+112], rbx
-	mov	ebx, eax
-$LL6@meow:
-; Line 259
-	mov	rcx, rdi
+; Line 288
 	call	??R?$Engine@$0?0@@QEAA_KXZ		; Engine<-1>::operator()
 	mov	rcx, rax
 	xorps	xmm0, xmm0
@@ -469,30 +202,14 @@ $LN12@meow:
 	cvtsi2sd xmm0, rax
 	addsd	xmm0, xmm0
 $LN13@meow:
-	subsd	xmm0, xmm9
-	mulsd	xmm0, xmm6
-; Line 260
-	mulsd	xmm6, xmm8
-	addsd	xmm7, xmm0
-	sub	rbx, 1
-	jne	SHORT $LL6@meow
-; Line 258
-	mov	rbx, QWORD PTR [rsp+112]
-$LN5@meow:
+	xorps	xmm1, xmm1
+	subsd	xmm0, xmm1
+	addsd	xmm0, xmm1
+; Line 292
+	divsd	xmm0, QWORD PTR __real@43f0000000000000
 ; File c:\Users\Matt\source\repos\STL\stl\inc\gen-can-test.cpp
 ; Line 20
-	movaps	xmm8, XMMWORD PTR [rsp+48]
-	movaps	xmm9, XMMWORD PTR [rsp+32]
-; File C:\Users\Matt\source\repos\STL\stl\inc\random
-; Line 263
-	divsd	xmm7, xmm6
-; File c:\Users\Matt\source\repos\STL\stl\inc\gen-can-test.cpp
-; Line 20
-	movaps	xmm6, XMMWORD PTR [rsp+80]
-	movaps	xmm0, xmm7
-	movaps	xmm7, XMMWORD PTR [rsp+64]
-	add	rsp, 96					; 00000060H
-	pop	rdi
+	add	rsp, 40					; 00000028H
 	ret	0
 ??$meow@NU?$Engine@$0?0@@@@YANAEAU?$Engine@$0?0@@@Z ENDP ; meow<double,Engine<-1> >
 _TEXT	ENDS
@@ -504,44 +221,26 @@ g$ = 112
 ; File c:\Users\Matt\source\repos\STL\stl\inc\gen-can-test.cpp
 ; Line 18
 $LN15:
+	mov	QWORD PTR [rsp+8], rbx
 	push	rdi
 	sub	rsp, 96					; 00000060H
 	movaps	XMMWORD PTR [rsp+80], xmm6
 	mov	rdi, rcx
-; File C:\Users\Matt\source\repos\STL\stl\inc\random
-; Line 250
-	movsd	xmm6, QWORD PTR __real@3ff0000000000000
 	movaps	XMMWORD PTR [rsp+64], xmm7
+	xorps	xmm6, xmm6
+; File C:\Users\Matt\source\repos\STL\stl\inc\random
+; Line 285
+	movsd	xmm7, QWORD PTR __real@3ff0000000000000
+	mov	ebx, 4
 	movaps	XMMWORD PTR [rsp+48], xmm8
-	movsd	xmm8, QWORD PTR __real@40c3880000000000
+	xorps	xmm8, xmm8
 	movaps	XMMWORD PTR [rsp+32], xmm9
-	xorps	xmm9, xmm9
-	subsd	xmm8, xmm9
-	addsd	xmm8, xmm6
-; Line 252
-	movaps	xmm0, xmm8
-	call	log2
-	movsd	xmm1, QWORD PTR __real@404a800000000000
-	divsd	xmm1, xmm0
-	movaps	xmm0, xmm1
-	call	ceil
-	cvttsd2si eax, xmm0
-; Line 253
-	mov	ecx, 1
-	xorps	xmm7, xmm7
-	cmp	eax, ecx
-	cmovl	eax, ecx
-; Line 258
-	test	eax, eax
-	jle	SHORT $LN5@meow
-; Line 250
-	mov	QWORD PTR [rsp+112], rbx
-	mov	ebx, eax
+	movsd	xmm9, QWORD PTR __real@40c3888000000000
 $LL6@meow:
-; Line 259
+; Line 288
 	mov	rcx, rdi
 	call	??R?$Engine@$0CHBA@@@QEAA_KXZ		; Engine<10000>::operator()
-	mov	rcx, rax
+	mov	rdx, rax
 	xorps	xmm0, xmm0
 	test	rax, rax
 	js	SHORT $LN12@meow
@@ -549,33 +248,31 @@ $LL6@meow:
 	jmp	SHORT $LN13@meow
 $LN12@meow:
 	shr	rax, 1
-	and	ecx, 1
-	or	rax, rcx
+	and	edx, 1
+	or	rax, rdx
 	cvtsi2sd xmm0, rax
 	addsd	xmm0, xmm0
 $LN13@meow:
-	subsd	xmm0, xmm9
-	mulsd	xmm0, xmm6
-; Line 260
-	mulsd	xmm6, xmm8
-	addsd	xmm7, xmm0
+	subsd	xmm0, xmm8
+	mulsd	xmm0, xmm7
+; Line 289
+	mulsd	xmm7, xmm9
+	addsd	xmm6, xmm0
 	sub	rbx, 1
 	jne	SHORT $LL6@meow
-; Line 258
-	mov	rbx, QWORD PTR [rsp+112]
-$LN5@meow:
 ; File c:\Users\Matt\source\repos\STL\stl\inc\gen-can-test.cpp
 ; Line 20
+	mov	rbx, QWORD PTR [rsp+112]
 	movaps	xmm8, XMMWORD PTR [rsp+48]
 	movaps	xmm9, XMMWORD PTR [rsp+32]
 ; File C:\Users\Matt\source\repos\STL\stl\inc\random
-; Line 263
-	divsd	xmm7, xmm6
+; Line 292
+	divsd	xmm6, xmm7
 ; File c:\Users\Matt\source\repos\STL\stl\inc\gen-can-test.cpp
 ; Line 20
-	movaps	xmm6, XMMWORD PTR [rsp+80]
-	movaps	xmm0, xmm7
 	movaps	xmm7, XMMWORD PTR [rsp+64]
+	movaps	xmm0, xmm6
+	movaps	xmm6, XMMWORD PTR [rsp+80]
 	add	rsp, 96					; 00000060H
 	pop	rdi
 	ret	0

Note that in the case when the range is 10,001, not a power-of-two, the number of iterations is a hard-coded 4.

...up to 64 bits of entropy. Leave the original implementation for
more bits and naughty generators (I'm looking at you, tr1) whose
min and max functions aren't static.

Fixes microsoft#1964. Alternative to microsoft#2452.
@MattStephanson MattStephanson requested a review from a team as a code owner January 21, 2022 16:45
@CaseyCarter CaseyCarter added the performance Must go faster label Jan 21, 2022
stl/inc/random Outdated Show resolved Hide resolved
stl/inc/random Outdated Show resolved Hide resolved
stl/inc/random Outdated Show resolved Hide resolved
stl/inc/random Show resolved Hide resolved
stl/inc/random Outdated Show resolved Hide resolved
stl/inc/random Outdated Show resolved Hide resolved
@CaseyCarter CaseyCarter changed the title constexpr all the generate_canonical paramters constexpr all the generate_canonical parameters Jan 22, 2022
@MattStephanson
Copy link
Contributor Author

@AlexGuteniev It looks like #2343 made _Countr_zero non-constexpr except in C++20 (possibly based on feedback from @StephanTLavavej) . Is there an easy way to restore this, or should I just remove the power-of-two special case here?

@AlexGuteniev
Copy link
Contributor

I think this I did on my own ant it is intentional to make _Countr_zero optimization enabled in pre-C++20 mode.

Though there are some ways to restore it:

  • Go ahead and make _Countr_zero optimization C++20-only back. The std::countr_zero case is anyway C++20 only, there are few pre-C++20 usages.
  • Use the fallback directly in your helper. As it is always used to evaluate constexpr expression, it does not hurt to use _Countr_zero_falback
  • Change _Countr_zero to enable optimization, and don't guard for C++20. Create pre-C++20 vestrion of std::is_constant_evaluated(). See _Is_constant_evaluated from Implement vectorized min_ / max_element for ints #2447:
#ifndef __CUDACC__
_NODISCARD constexpr bool _Is_constant_evaluated() noexcept { // Internal function for any standard mode
    return __builtin_is_constant_evaluated();
}
#endif // __CUDACC__

I generally think the last option is the way to go, as going further with optimization we will eventually need this anyway.
Thoughts, @StephanTLavavej ?

@StephanTLavavej
Copy link
Member

@MattStephanson @AlexGuteniev As the only call in product code is to create a constexpr int, I like the idea of directly calling the fallback - that is the least disruptive to the existing code, and has no downsides (of added complexity, etc.).

That said, since this code is at compile-time, the only reason to have the power-of-two special case is for throughput. Does this make a measurable difference, or prevent errors like the constexpr step limit? If there's no observable throughput difference and no correctness impact, then the power-of-two special case is just more code to reason about, and I would suggest eliminating it.

@MattStephanson
Copy link
Contributor Author

That said, since this code is at compile-time, the only reason to have the power-of-two special case is for throughput. Does this make a measurable difference, or prevent errors like the constexpr step limit? If there's no observable throughput difference and no correctness impact, then the power-of-two special case is just more code to reason about, and I would suggest eliminating it.

I haven't benchmarked it, but it does seem cleaner to just eliminate that branch. That does require special-casing the _Range = 2^64 case, though, but that can be folded into the _Bits == 0 check, since both just return 1;.

Copy link
Member

@CaseyCarter CaseyCarter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verr nice!

@StephanTLavavej StephanTLavavej self-assigned this Feb 2, 2022
Copy link
Member

@StephanTLavavej StephanTLavavej left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! The issues I found were small/nitpicky so I'll go ahead and validate/push changes.

stl/inc/random Outdated Show resolved Hide resolved
stl/inc/random Show resolved Hide resolved
stl/inc/random Outdated Show resolved Hide resolved
@StephanTLavavej StephanTLavavej removed their assignment Feb 4, 2022
@StephanTLavavej StephanTLavavej self-assigned this Feb 6, 2022
@StephanTLavavej
Copy link
Member

I'm mirroring this to the MSVC-internal repo - please notify me if any further changes are pushed.

@CaseyCarter CaseyCarter merged commit a3b2a89 into microsoft:main Feb 7, 2022
@CaseyCarter
Copy link
Member

Thanks for picking up and generalizing my lazy special-case fix!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster
Projects
None yet
Development

Successfully merging this pull request may close these issues.

<random>: generate_canonical() could avoid calling log2() at runtime
4 participants