Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor generate_canonical #2452

Closed
wants to merge 2 commits into from
Closed

Conversation

CaseyCarter
Copy link
Member

... to calculate parameters at compile-time for random engines with power-of-two range. This greatly improves debug codegen, and release codegen for compilers that do not constant fold std::ceil and/or std::log2.

Fixes #1964.

Release codegen diff for:

#include <random>

template<class R>
struct Engine {
    using result_type = R;

    static constexpr R min() noexcept { return 0; }
    static constexpr R max() noexcept { return ~R{0}; }

    R operator()() noexcept;
};

template <class R, class G>
R meow(G& g) {
    return std::generate_canonical<R, ~std::size_t{0}>(g);
}

template float meow(Engine<unsigned>&);
template double meow(Engine<unsigned long long>&);

is

assembly diff
diff --git a/before.asm b/after.asm
index 73492c34..84624417 100644
--- a/before.asm
+++ b/after.asm
@@ -9,525 +9,190 @@ PUBLIC	??$meow@MU?$Engine@I@@@@YAMAEAU?$Engine@I@@@Z	; meow<float,Engine<unsigne
 PUBLIC	??$meow@NU?$Engine@_K@@@@YANAEAU?$Engine@_K@@@Z	; meow<double,Engine<unsigned __int64> >
 PUBLIC	??$generate_canonical@M$0?0U?$Engine@I@@@std@@YAMAEAU?$Engine@I@@@Z ; std::generate_canonical<float,-1,Engine<unsigned int> >
 PUBLIC	??$generate_canonical@N$0?0U?$Engine@_K@@@std@@YANAEAU?$Engine@_K@@@Z ; std::generate_canonical<double,-1,Engine<unsigned __int64> >
-PUBLIC	__real@3f800000
-PUBLIC	__real@3ff0000000000000
-PUBLIC	__real@404a800000000000
-PUBLIC	__real@41c00000
-PUBLIC	__real@43f0000000000000
-PUBLIC	__real@4f800000
-EXTRN	log2:PROC
-EXTRN	log2f:PROC
+PUBLIC	__real@2f800000
+PUBLIC	__real@3bf0000000000000
 EXTRN	??R?$Engine@I@@QEAAIXZ:PROC			; Engine<unsigned int>::operator()
 EXTRN	??R?$Engine@_K@@QEAA_KXZ:PROC			; Engine<unsigned __int64>::operator()
-EXTRN	ceil:PROC
-EXTRN	ceilf:PROC
 EXTRN	_fltused:DWORD
 ;	COMDAT pdata
 pdata	SEGMENT
-$pdata$??$meow@MU?$Engine@I@@@@YAMAEAU?$Engine@I@@@Z DD imagerel $LN18
-	DD	imagerel $LN18+102
+$pdata$??$meow@MU?$Engine@I@@@@YAMAEAU?$Engine@I@@@Z DD imagerel $LN13
+	DD	imagerel $LN13+39
 	DD	imagerel $unwind$??$meow@MU?$Engine@I@@@@YAMAEAU?$Engine@I@@@Z
 pdata	ENDS
 ;	COMDAT pdata
 pdata	SEGMENT
-$pdata$0$??$meow@MU?$Engine@I@@@@YAMAEAU?$Engine@I@@@Z DD imagerel $LN18+102
-	DD	imagerel $LN18+159
-	DD	imagerel $chain$0$??$meow@MU?$Engine@I@@@@YAMAEAU?$Engine@I@@@Z
-pdata	ENDS
-;	COMDAT pdata
-pdata	SEGMENT
-$pdata$1$??$meow@MU?$Engine@I@@@@YAMAEAU?$Engine@I@@@Z DD imagerel $LN18+159
-	DD	imagerel $LN18+194
-	DD	imagerel $chain$1$??$meow@MU?$Engine@I@@@@YAMAEAU?$Engine@I@@@Z
-pdata	ENDS
-;	COMDAT pdata
-pdata	SEGMENT
 $pdata$??$meow@NU?$Engine@_K@@@@YANAEAU?$Engine@_K@@@Z DD imagerel $LN15
-	DD	imagerel $LN15+102
+	DD	imagerel $LN15+83
 	DD	imagerel $unwind$??$meow@NU?$Engine@_K@@@@YANAEAU?$Engine@_K@@@Z
 pdata	ENDS
 ;	COMDAT pdata
 pdata	SEGMENT
-$pdata$0$??$meow@NU?$Engine@_K@@@@YANAEAU?$Engine@_K@@@Z DD imagerel $LN15+102
-	DD	imagerel $LN15+185
-	DD	imagerel $chain$0$??$meow@NU?$Engine@_K@@@@YANAEAU?$Engine@_K@@@Z
-pdata	ENDS
-;	COMDAT pdata
-pdata	SEGMENT
-$pdata$1$??$meow@NU?$Engine@_K@@@@YANAEAU?$Engine@_K@@@Z DD imagerel $LN15+185
-	DD	imagerel $LN15+220
-	DD	imagerel $chain$1$??$meow@NU?$Engine@_K@@@@YANAEAU?$Engine@_K@@@Z
-pdata	ENDS
-;	COMDAT pdata
-pdata	SEGMENT
-$pdata$??$generate_canonical@M$0?0U?$Engine@I@@@std@@YAMAEAU?$Engine@I@@@Z DD imagerel $LN16
-	DD	imagerel $LN16+102
+$pdata$??$generate_canonical@M$0?0U?$Engine@I@@@std@@YAMAEAU?$Engine@I@@@Z DD imagerel $LN11
+	DD	imagerel $LN11+39
 	DD	imagerel $unwind$??$generate_canonical@M$0?0U?$Engine@I@@@std@@YAMAEAU?$Engine@I@@@Z
 pdata	ENDS
 ;	COMDAT pdata
 pdata	SEGMENT
-$pdata$0$??$generate_canonical@M$0?0U?$Engine@I@@@std@@YAMAEAU?$Engine@I@@@Z DD imagerel $LN16+102
-	DD	imagerel $LN16+159
-	DD	imagerel $chain$0$??$generate_canonical@M$0?0U?$Engine@I@@@std@@YAMAEAU?$Engine@I@@@Z
-pdata	ENDS
-;	COMDAT pdata
-pdata	SEGMENT
-$pdata$1$??$generate_canonical@M$0?0U?$Engine@I@@@std@@YAMAEAU?$Engine@I@@@Z DD imagerel $LN16+159
-	DD	imagerel $LN16+194
-	DD	imagerel $chain$1$??$generate_canonical@M$0?0U?$Engine@I@@@std@@YAMAEAU?$Engine@I@@@Z
-pdata	ENDS
-;	COMDAT pdata
-pdata	SEGMENT
 $pdata$??$generate_canonical@N$0?0U?$Engine@_K@@@std@@YANAEAU?$Engine@_K@@@Z DD imagerel $LN13
-	DD	imagerel $LN13+102
+	DD	imagerel $LN13+83
 	DD	imagerel $unwind$??$generate_canonical@N$0?0U?$Engine@_K@@@std@@YANAEAU?$Engine@_K@@@Z
 pdata	ENDS
-;	COMDAT pdata
-pdata	SEGMENT
-$pdata$0$??$generate_canonical@N$0?0U?$Engine@_K@@@std@@YANAEAU?$Engine@_K@@@Z DD imagerel $LN13+102
-	DD	imagerel $LN13+185
-	DD	imagerel $chain$0$??$generate_canonical@N$0?0U?$Engine@_K@@@std@@YANAEAU?$Engine@_K@@@Z
-pdata	ENDS
-;	COMDAT pdata
-pdata	SEGMENT
-$pdata$1$??$generate_canonical@N$0?0U?$Engine@_K@@@std@@YANAEAU?$Engine@_K@@@Z DD imagerel $LN13+185
-	DD	imagerel $LN13+220
-	DD	imagerel $chain$1$??$generate_canonical@N$0?0U?$Engine@_K@@@std@@YANAEAU?$Engine@_K@@@Z
-pdata	ENDS
-;	COMDAT __real@4f800000
-CONST	SEGMENT
-__real@4f800000 DD 04f800000r			; 4.29497e+09
-CONST	ENDS
-;	COMDAT __real@43f0000000000000
-CONST	SEGMENT
-__real@43f0000000000000 DQ 043f0000000000000r	; 1.84467e+19
-CONST	ENDS
-;	COMDAT __real@41c00000
+;	COMDAT __real@3bf0000000000000
 CONST	SEGMENT
-__real@41c00000 DD 041c00000r			; 24
+__real@3bf0000000000000 DQ 03bf0000000000000r	; 5.42101e-20
 CONST	ENDS
-;	COMDAT __real@404a800000000000
+;	COMDAT __real@2f800000
 CONST	SEGMENT
-__real@404a800000000000 DQ 0404a800000000000r	; 53
-CONST	ENDS
-;	COMDAT __real@3ff0000000000000
-CONST	SEGMENT
-__real@3ff0000000000000 DQ 03ff0000000000000r	; 1
-CONST	ENDS
-;	COMDAT __real@3f800000
-CONST	SEGMENT
-__real@3f800000 DD 03f800000r			; 1
+__real@2f800000 DD 02f800000r			; 2.32831e-10
 CONST	ENDS
 ;	COMDAT xdata
 xdata	SEGMENT
-$chain$1$??$generate_canonical@N$0?0U?$Engine@_K@@@std@@YANAEAU?$Engine@_K@@@Z DD 021H
-	DD	imagerel $LN13
-	DD	imagerel $LN13+102
-	DD	imagerel $unwind$??$generate_canonical@N$0?0U?$Engine@_K@@@std@@YANAEAU?$Engine@_K@@@Z
+$unwind$??$generate_canonical@N$0?0U?$Engine@_K@@@std@@YANAEAU?$Engine@_K@@@Z DD 010401H
+	DD	04204H
 xdata	ENDS
 ;	COMDAT xdata
 xdata	SEGMENT
-$chain$0$??$generate_canonical@N$0?0U?$Engine@_K@@@std@@YANAEAU?$Engine@_K@@@Z DD 020521H
-	DD	0e3405H
-	DD	imagerel $LN13
-	DD	imagerel $LN13+102
-	DD	imagerel $unwind$??$generate_canonical@N$0?0U?$Engine@_K@@@std@@YANAEAU?$Engine@_K@@@Z
-xdata	ENDS
-;	COMDAT xdata
-xdata	SEGMENT
-$unwind$??$generate_canonical@N$0?0U?$Engine@_K@@@std@@YANAEAU?$Engine@_K@@@Z DD 0a2c01H
-	DD	02982cH
-	DD	038819H
-	DD	047813H
-	DD	05680bH
-	DD	07002b206H
-xdata	ENDS
-;	COMDAT xdata
-xdata	SEGMENT
-$chain$1$??$generate_canonical@M$0?0U?$Engine@I@@@std@@YAMAEAU?$Engine@I@@@Z DD 021H
-	DD	imagerel $LN16
-	DD	imagerel $LN16+102
-	DD	imagerel $unwind$??$generate_canonical@M$0?0U?$Engine@I@@@std@@YAMAEAU?$Engine@I@@@Z
+$unwind$??$generate_canonical@M$0?0U?$Engine@I@@@std@@YAMAEAU?$Engine@I@@@Z DD 010401H
+	DD	04204H
 xdata	ENDS
 ;	COMDAT xdata
 xdata	SEGMENT
-$chain$0$??$generate_canonical@M$0?0U?$Engine@I@@@std@@YAMAEAU?$Engine@I@@@Z DD 020521H
-	DD	0e3405H
-	DD	imagerel $LN16
-	DD	imagerel $LN16+102
-	DD	imagerel $unwind$??$generate_canonical@M$0?0U?$Engine@I@@@std@@YAMAEAU?$Engine@I@@@Z
+$unwind$??$meow@NU?$Engine@_K@@@@YANAEAU?$Engine@_K@@@Z DD 010401H
+	DD	04204H
 xdata	ENDS
 ;	COMDAT xdata
 xdata	SEGMENT
-$unwind$??$generate_canonical@M$0?0U?$Engine@I@@@std@@YAMAEAU?$Engine@I@@@Z DD 0a2c01H
-	DD	02982cH
-	DD	038819H
-	DD	047813H
-	DD	05680bH
-	DD	07002b206H
-xdata	ENDS
-;	COMDAT xdata
-xdata	SEGMENT
-$chain$1$??$meow@NU?$Engine@_K@@@@YANAEAU?$Engine@_K@@@Z DD 021H
-	DD	imagerel $LN15
-	DD	imagerel $LN15+102
-	DD	imagerel $unwind$??$meow@NU?$Engine@_K@@@@YANAEAU?$Engine@_K@@@Z
-xdata	ENDS
-;	COMDAT xdata
-xdata	SEGMENT
-$chain$0$??$meow@NU?$Engine@_K@@@@YANAEAU?$Engine@_K@@@Z DD 020521H
-	DD	0e3405H
-	DD	imagerel $LN15
-	DD	imagerel $LN15+102
-	DD	imagerel $unwind$??$meow@NU?$Engine@_K@@@@YANAEAU?$Engine@_K@@@Z
-xdata	ENDS
-;	COMDAT xdata
-xdata	SEGMENT
-$unwind$??$meow@NU?$Engine@_K@@@@YANAEAU?$Engine@_K@@@Z DD 0a2c01H
-	DD	02982cH
-	DD	038819H
-	DD	047813H
-	DD	05680bH
-	DD	07002b206H
-xdata	ENDS
-;	COMDAT xdata
-xdata	SEGMENT
-$chain$1$??$meow@MU?$Engine@I@@@@YAMAEAU?$Engine@I@@@Z DD 021H
-	DD	imagerel $LN18
-	DD	imagerel $LN18+102
-	DD	imagerel $unwind$??$meow@MU?$Engine@I@@@@YAMAEAU?$Engine@I@@@Z
-xdata	ENDS
-;	COMDAT xdata
-xdata	SEGMENT
-$chain$0$??$meow@MU?$Engine@I@@@@YAMAEAU?$Engine@I@@@Z DD 020521H
-	DD	0e3405H
-	DD	imagerel $LN18
-	DD	imagerel $LN18+102
-	DD	imagerel $unwind$??$meow@MU?$Engine@I@@@@YAMAEAU?$Engine@I@@@Z
-xdata	ENDS
-;	COMDAT xdata
-xdata	SEGMENT
-$unwind$??$meow@MU?$Engine@I@@@@YAMAEAU?$Engine@I@@@Z DD 0a2c01H
-	DD	02982cH
-	DD	038819H
-	DD	047813H
-	DD	05680bH
-	DD	07002b206H
+$unwind$??$meow@MU?$Engine@I@@@@YAMAEAU?$Engine@I@@@Z DD 010401H
+	DD	04204H
 xdata	ENDS
 ; Function compile flags: /Ogtpy
 ;	COMDAT ??$generate_canonical@N$0?0U?$Engine@_K@@@std@@YANAEAU?$Engine@_K@@@Z
 _TEXT	SEGMENT
-_Gx$ = 112
+_Gx$ = 48
 ??$generate_canonical@N$0?0U?$Engine@_K@@@std@@YANAEAU?$Engine@_K@@@Z PROC ; std::generate_canonical<double,-1,Engine<unsigned __int64> >, COMDAT
-; File c:\Program Files\Microsoft Visual Studio\2022\Preview\VC\Tools\MSVC\14.31.30818\include\random
-; Line 242
-$LN13:
-	push	rdi
-	sub	rsp, 96					; 00000060H
-	movaps	XMMWORD PTR [rsp+80], xmm6
-	mov	rdi, rcx
-	movaps	XMMWORD PTR [rsp+64], xmm7
-	movaps	XMMWORD PTR [rsp+48], xmm8
-; Line 252
-	movsd	xmm8, QWORD PTR __real@43f0000000000000
-	movaps	xmm0, xmm8
-	movaps	XMMWORD PTR [rsp+32], xmm9
-	call	log2
-	movsd	xmm1, QWORD PTR __real@404a800000000000
-	divsd	xmm1, xmm0
-	movaps	xmm0, xmm1
-	call	ceil
-; Line 256
-	movsd	xmm6, QWORD PTR __real@3ff0000000000000
-	mov	ecx, 1
-	cvttsd2si eax, xmm0
-	xorps	xmm9, xmm9
-	xorps	xmm7, xmm7
-	cmp	eax, ecx
-	cmovl	eax, ecx
-; Line 258
-	test	eax, eax
-	jle	SHORT $LN3@generate_c
+; File c:\STL\stl\inc\random
 ; Line 252
-	mov	QWORD PTR [rsp+112], rbx
-	mov	ebx, eax
-	npad	3
-$LL4@generate_c:
-; Line 259
-	mov	rcx, rdi
+$LN13:
+	sub	rsp, 40					; 00000028H
+; Line 270
 	call	??R?$Engine@_K@@QEAA_KXZ		; Engine<unsigned __int64>::operator()
 	mov	rcx, rax
 	xorps	xmm0, xmm0
 	test	rax, rax
 	js	SHORT $LN10@generate_c
 	cvtsi2sd xmm0, rax
-	jmp	SHORT $LN11@generate_c
+	xorps	xmm1, xmm1
+	addsd	xmm0, xmm1
+; Line 274
+	mulsd	xmm0, QWORD PTR __real@3bf0000000000000
+; Line 292
+	add	rsp, 40					; 00000028H
+	ret	0
 $LN10@generate_c:
+; Line 270
 	shr	rax, 1
 	and	ecx, 1
 	or	rax, rcx
+	xorps	xmm1, xmm1
 	cvtsi2sd xmm0, rax
 	addsd	xmm0, xmm0
-$LN11@generate_c:
-	subsd	xmm0, xmm9
-	mulsd	xmm0, xmm6
-; Line 260
-	mulsd	xmm6, xmm8
-	addsd	xmm7, xmm0
-	sub	rbx, 1
-	jne	SHORT $LL4@generate_c
-; Line 258
-	mov	rbx, QWORD PTR [rsp+112]
-$LN3@generate_c:
-; Line 264
-	movaps	xmm8, XMMWORD PTR [rsp+48]
-	movaps	xmm9, XMMWORD PTR [rsp+32]
-	divsd	xmm7, xmm6
-	movaps	xmm6, XMMWORD PTR [rsp+80]
-	movaps	xmm0, xmm7
-	movaps	xmm7, XMMWORD PTR [rsp+64]
-	add	rsp, 96					; 00000060H
-	pop	rdi
+	addsd	xmm0, xmm1
+; Line 274
+	mulsd	xmm0, QWORD PTR __real@3bf0000000000000
+; Line 292
+	add	rsp, 40					; 00000028H
 	ret	0
 ??$generate_canonical@N$0?0U?$Engine@_K@@@std@@YANAEAU?$Engine@_K@@@Z ENDP ; std::generate_canonical<double,-1,Engine<unsigned __int64> >
 _TEXT	ENDS
 ; Function compile flags: /Ogtpy
 ;	COMDAT ??$generate_canonical@M$0?0U?$Engine@I@@@std@@YAMAEAU?$Engine@I@@@Z
 _TEXT	SEGMENT
-_Gx$ = 112
+_Gx$ = 48
 ??$generate_canonical@M$0?0U?$Engine@I@@@std@@YAMAEAU?$Engine@I@@@Z PROC ; std::generate_canonical<float,-1,Engine<unsigned int> >, COMDAT
-; File c:\Program Files\Microsoft Visual Studio\2022\Preview\VC\Tools\MSVC\14.31.30818\include\random
-; Line 242
-$LN16:
-	push	rdi
-	sub	rsp, 96					; 00000060H
-	movaps	XMMWORD PTR [rsp+80], xmm6
-	mov	rdi, rcx
-	movaps	XMMWORD PTR [rsp+64], xmm7
-	movaps	XMMWORD PTR [rsp+48], xmm8
-; File c:\Program Files\Microsoft Visual Studio\2022\Preview\VC\Tools\MSVC\14.31.30818\include\cmath
-; Line 191
-	movss	xmm8, DWORD PTR __real@4f800000
-	movaps	xmm0, xmm8
-	movaps	XMMWORD PTR [rsp+32], xmm9
-	call	log2f
-; File c:\Program Files\Microsoft Visual Studio\2022\Preview\VC\Tools\MSVC\14.31.30818\include\random
+; File c:\STL\stl\inc\random
 ; Line 252
-	movss	xmm1, DWORD PTR __real@41c00000
-	divss	xmm1, xmm0
-; File c:\Program Files\Microsoft Visual Studio\2022\Preview\VC\Tools\MSVC\14.31.30818\include\cmath
-; Line 70
-	movaps	xmm0, xmm1
-	call	ceilf
-; File c:\Program Files\Microsoft Visual Studio\2022\Preview\VC\Tools\MSVC\14.31.30818\include\random
-; Line 256
-	movss	xmm7, DWORD PTR __real@3f800000
-	mov	ecx, 1
-	cvttss2si eax, xmm0
-	xorps	xmm9, xmm9
-	xorps	xmm6, xmm6
-	cmp	eax, ecx
-	cmovl	eax, ecx
-; Line 258
-	test	eax, eax
-	jle	SHORT $LN3@generate_c
-; File c:\Program Files\Microsoft Visual Studio\2022\Preview\VC\Tools\MSVC\14.31.30818\include\cmath
-; Line 191
-	mov	QWORD PTR [rsp+112], rbx
-	mov	ebx, eax
-	npad	3
-$LL4@generate_c:
-; File c:\Program Files\Microsoft Visual Studio\2022\Preview\VC\Tools\MSVC\14.31.30818\include\random
-; Line 259
-	mov	rcx, rdi
+$LN11:
+	sub	rsp, 40					; 00000028H
+; Line 270
 	call	??R?$Engine@I@@QEAAIXZ			; Engine<unsigned int>::operator()
+	xorps	xmm0, xmm0
 	mov	eax, eax
 	xorps	xmm1, xmm1
-	cvtsi2ss xmm1, rax
-	subss	xmm1, xmm9
-	mulss	xmm1, xmm7
-; Line 260
-	mulss	xmm7, xmm8
-	addss	xmm6, xmm1
-	sub	rbx, 1
-	jne	SHORT $LL4@generate_c
-; Line 258
-	mov	rbx, QWORD PTR [rsp+112]
-$LN3@generate_c:
-; Line 264
-	movaps	xmm8, XMMWORD PTR [rsp+48]
-	movaps	xmm9, XMMWORD PTR [rsp+32]
-	divss	xmm6, xmm7
-	movaps	xmm7, XMMWORD PTR [rsp+64]
-	movaps	xmm0, xmm6
-	movaps	xmm6, XMMWORD PTR [rsp+80]
-	add	rsp, 96					; 00000060H
-	pop	rdi
+	cvtsi2ss xmm0, rax
+	addss	xmm0, xmm1
+; Line 274
+	mulss	xmm0, DWORD PTR __real@2f800000
+; Line 292
+	add	rsp, 40					; 00000028H
 	ret	0
 ??$generate_canonical@M$0?0U?$Engine@I@@@std@@YAMAEAU?$Engine@I@@@Z ENDP ; std::generate_canonical<float,-1,Engine<unsigned int> >
 _TEXT	ENDS
 ; Function compile flags: /Ogtpy
 ;	COMDAT ??$meow@NU?$Engine@_K@@@@YANAEAU?$Engine@_K@@@Z
 _TEXT	SEGMENT
-g$ = 112
+g$ = 48
 ??$meow@NU?$Engine@_K@@@@YANAEAU?$Engine@_K@@@Z PROC	; meow<double,Engine<unsigned __int64> >, COMDAT
 ; File c:\Users\Casey\OneDrive\Desktop\repro2.cpp
 ; Line 14
 $LN15:
-	push	rdi
-	sub	rsp, 96					; 00000060H
-	movaps	XMMWORD PTR [rsp+80], xmm6
-	mov	rdi, rcx
-	movaps	XMMWORD PTR [rsp+64], xmm7
-	movaps	XMMWORD PTR [rsp+48], xmm8
-; File c:\Program Files\Microsoft Visual Studio\2022\Preview\VC\Tools\MSVC\14.31.30818\include\random
-; Line 252
-	movsd	xmm8, QWORD PTR __real@43f0000000000000
-	movaps	xmm0, xmm8
-	movaps	XMMWORD PTR [rsp+32], xmm9
-	call	log2
-	movsd	xmm1, QWORD PTR __real@404a800000000000
-	divsd	xmm1, xmm0
-	movaps	xmm0, xmm1
-	call	ceil
-; Line 256
-	movsd	xmm6, QWORD PTR __real@3ff0000000000000
-	mov	ecx, 1
-	cvttsd2si eax, xmm0
-	xorps	xmm9, xmm9
-	xorps	xmm7, xmm7
-; Line 253
-	cmp	eax, ecx
-	cmovl	eax, ecx
-; Line 258
-	test	eax, eax
-	jle	SHORT $LN5@meow
-; Line 252
-	mov	QWORD PTR [rsp+112], rbx
-	mov	ebx, eax
-	npad	3
-$LL6@meow:
-; Line 259
-	mov	rcx, rdi
+	sub	rsp, 40					; 00000028H
+; File c:\STL\stl\inc\random
+; Line 270
 	call	??R?$Engine@_K@@QEAA_KXZ		; Engine<unsigned __int64>::operator()
 	mov	rcx, rax
 	xorps	xmm0, xmm0
 	test	rax, rax
 	js	SHORT $LN12@meow
 	cvtsi2sd xmm0, rax
-	jmp	SHORT $LN13@meow
+	xorps	xmm1, xmm1
+	addsd	xmm0, xmm1
+; Line 274
+	mulsd	xmm0, QWORD PTR __real@3bf0000000000000
+; File c:\Users\Casey\OneDrive\Desktop\repro2.cpp
+; Line 16
+	add	rsp, 40					; 00000028H
+	ret	0
 $LN12@meow:
+; File c:\STL\stl\inc\random
+; Line 270
 	shr	rax, 1
 	and	ecx, 1
 	or	rax, rcx
+	xorps	xmm1, xmm1
 	cvtsi2sd xmm0, rax
 	addsd	xmm0, xmm0
-$LN13@meow:
-	subsd	xmm0, xmm9
-	mulsd	xmm0, xmm6
-; Line 260
-	mulsd	xmm6, xmm8
-	addsd	xmm7, xmm0
-	sub	rbx, 1
-	jne	SHORT $LL6@meow
-; Line 258
-	mov	rbx, QWORD PTR [rsp+112]
-$LN5@meow:
-; File c:\Users\Casey\OneDrive\Desktop\repro2.cpp
-; Line 16
-	movaps	xmm8, XMMWORD PTR [rsp+48]
-	movaps	xmm9, XMMWORD PTR [rsp+32]
-; File c:\Program Files\Microsoft Visual Studio\2022\Preview\VC\Tools\MSVC\14.31.30818\include\random
-; Line 263
-	divsd	xmm7, xmm6
+	addsd	xmm0, xmm1
+; Line 274
+	mulsd	xmm0, QWORD PTR __real@3bf0000000000000
 ; File c:\Users\Casey\OneDrive\Desktop\repro2.cpp
 ; Line 16
-	movaps	xmm6, XMMWORD PTR [rsp+80]
-	movaps	xmm0, xmm7
-	movaps	xmm7, XMMWORD PTR [rsp+64]
-	add	rsp, 96					; 00000060H
-	pop	rdi
+	add	rsp, 40					; 00000028H
 	ret	0
 ??$meow@NU?$Engine@_K@@@@YANAEAU?$Engine@_K@@@Z ENDP	; meow<double,Engine<unsigned __int64> >
 _TEXT	ENDS
 ; Function compile flags: /Ogtpy
 ;	COMDAT ??$meow@MU?$Engine@I@@@@YAMAEAU?$Engine@I@@@Z
 _TEXT	SEGMENT
-g$ = 112
+g$ = 48
 ??$meow@MU?$Engine@I@@@@YAMAEAU?$Engine@I@@@Z PROC	; meow<float,Engine<unsigned int> >, COMDAT
 ; File c:\Users\Casey\OneDrive\Desktop\repro2.cpp
 ; Line 14
-$LN18:
-	push	rdi
-	sub	rsp, 96					; 00000060H
-	movaps	XMMWORD PTR [rsp+80], xmm6
-	mov	rdi, rcx
-	movaps	XMMWORD PTR [rsp+64], xmm7
-	movaps	XMMWORD PTR [rsp+48], xmm8
-; File c:\Program Files\Microsoft Visual Studio\2022\Preview\VC\Tools\MSVC\14.31.30818\include\cmath
-; Line 191
-	movss	xmm8, DWORD PTR __real@4f800000
-	movaps	xmm0, xmm8
-	movaps	XMMWORD PTR [rsp+32], xmm9
-	call	log2f
-; File c:\Program Files\Microsoft Visual Studio\2022\Preview\VC\Tools\MSVC\14.31.30818\include\random
-; Line 252
-	movss	xmm1, DWORD PTR __real@41c00000
-	divss	xmm1, xmm0
-; File c:\Program Files\Microsoft Visual Studio\2022\Preview\VC\Tools\MSVC\14.31.30818\include\cmath
-; Line 70
-	movaps	xmm0, xmm1
-	call	ceilf
-; File c:\Program Files\Microsoft Visual Studio\2022\Preview\VC\Tools\MSVC\14.31.30818\include\random
-; Line 256
-	movss	xmm7, DWORD PTR __real@3f800000
-	mov	ecx, 1
-	cvttss2si eax, xmm0
-	xorps	xmm9, xmm9
-	xorps	xmm6, xmm6
-; Line 253
-	cmp	eax, ecx
-	cmovl	eax, ecx
-; Line 258
-	test	eax, eax
-	jle	SHORT $LN5@meow
-; File c:\Program Files\Microsoft Visual Studio\2022\Preview\VC\Tools\MSVC\14.31.30818\include\cmath
-; Line 191
-	mov	QWORD PTR [rsp+112], rbx
-	mov	ebx, eax
-	npad	3
-$LL6@meow:
-; File c:\Program Files\Microsoft Visual Studio\2022\Preview\VC\Tools\MSVC\14.31.30818\include\random
-; Line 259
-	mov	rcx, rdi
+$LN13:
+	sub	rsp, 40					; 00000028H
+; File c:\STL\stl\inc\random
+; Line 270
 	call	??R?$Engine@I@@QEAAIXZ			; Engine<unsigned int>::operator()
+	xorps	xmm0, xmm0
 	mov	eax, eax
 	xorps	xmm1, xmm1
-	cvtsi2ss xmm1, rax
-	subss	xmm1, xmm9
-	mulss	xmm1, xmm7
-; Line 260
-	mulss	xmm7, xmm8
-	addss	xmm6, xmm1
-	sub	rbx, 1
-	jne	SHORT $LL6@meow
-; Line 258
-	mov	rbx, QWORD PTR [rsp+112]
-$LN5@meow:
-; File c:\Users\Casey\OneDrive\Desktop\repro2.cpp
-; Line 16
-	movaps	xmm8, XMMWORD PTR [rsp+48]
-	movaps	xmm9, XMMWORD PTR [rsp+32]
-; File c:\Program Files\Microsoft Visual Studio\2022\Preview\VC\Tools\MSVC\14.31.30818\include\random
-; Line 263
-	divss	xmm6, xmm7
+	cvtsi2ss xmm0, rax
+	addss	xmm0, xmm1
+; Line 274
+	mulss	xmm0, DWORD PTR __real@2f800000
 ; File c:\Users\Casey\OneDrive\Desktop\repro2.cpp
 ; Line 16
-	movaps	xmm7, XMMWORD PTR [rsp+64]
-	movaps	xmm0, xmm6
-	movaps	xmm6, XMMWORD PTR [rsp+80]
-	add	rsp, 96					; 00000060H
-	pop	rdi
+	add	rsp, 40					; 00000028H
 	ret	0
 ??$meow@MU?$Engine@I@@@@YAMAEAU?$Engine@I@@@Z ENDP	; meow<float,Engine<unsigned int> >
 _TEXT	ENDS

Note the predominance of red at the end of the diff (it's 535 lines before and 200 after). If people aren't convinced by the assembly diff alone, shout and I'll benchmark something.

... to calculate parameters at compile-time for random engines with power-of-two range. This greatly improves debug codegen, and release codegen for compilers that do not constant fold `std::ceil` and/or `std::log2`.

Fixes microsoft#1964.
@CaseyCarter CaseyCarter added the performance Must go faster label Jan 1, 2022
@CaseyCarter CaseyCarter requested a review from a team as a code owner January 1, 2022 05:48
@@ -238,32 +238,60 @@ private:
vector<result_type> _Myvec;
};

template <class _Real, size_t _Minbits, class _Result, _Result _Ix>
_NODISCARD constexpr int _Generate_canonical_helper() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Helper" names aren't actually helpful. Maybe _Generate_canonical_times_ceil() ?

stl/inc/random Show resolved Hide resolved
Comment on lines +283 to +290
_Real _Ans{0};
_Real _Factor{1};
for (int _Idx = 0; _Idx < _Kx; ++_Idx) { // add in another set of bits
_Ans += (static_cast<_Real>(_Gx()) - _Gxmin) * _Factor;
_Factor *= _Rx;
}

return _Ans / _Factor;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This repeats the above block, can it be fused?

_RNG_REQUIRE_REALTYPE(generate_canonical, _Real);

const size_t _Digits = static_cast<size_t>(numeric_limits<_Real>::digits);
const size_t _Minbits = _Digits < _Bits ? _Digits : _Bits;
constexpr auto _Digits = static_cast<size_t>(numeric_limits<_Real>::digits);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Pre existing) convert with {} as you changed _NRAND to do so.

@MattStephanson
Copy link
Contributor

A while ago I was playing with making this constexpr for all cases by eliminating log2 entirely and instead doing some equivalent integer arithmetic (calculate powers of _Rx until >= (1 << _Minbits), taking care to avoid overflow). Would maintainers have the interest/bandwidth to consider that as an alternative to this PR?

@CaseyCarter
Copy link
Member Author

A while ago I was playing with making this constexpr for all cases by eliminating log2 entirely and instead doing some equivalent integer arithmetic (calculate powers of _Rx until >= (1 << _Minbits), taking care to avoid overflow). Would maintainers have the interest/bandwidth to consider that as an alternative to this PR?

Yes, please! Note that generate_canonical currently needs to tolerate the TR1 engines, including some that use non-integral result types. We really don't want to make changes in observable behavior for anything that old and deprecated, so I tried to leave that patch separate and untouched. It would probably be a better idea to just fork the function into generate_canonical (which requires a type that meets the URBG requirements) and _NRAND (that is unchanged and terrible if it wants to be to avoid observable behavior changes for cruft).

@StephanTLavavej

This comment has been minimized.

Conflict with formatting changes on main in `<random>` - pick mine and reformat.
MattStephanson added a commit to MattStephanson/STL that referenced this pull request Jan 21, 2022
...up to 64 bits of entropy. Leave the original implementation for
more bits and naughty generators (I'm looking at you, tr1) whose
min and max functions aren't static.

Fixes microsoft#1964. Alternative to microsoft#2452.
@CaseyCarter
Copy link
Member Author

Abandoning in favor of #2498.

@CaseyCarter CaseyCarter deleted the canon branch August 20, 2024 20:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster
Projects
None yet
Development

Successfully merging this pull request may close these issues.

<random>: generate_canonical() could avoid calling log2() at runtime
4 participants