Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory fault in tests/variables.sh #23

Closed
McDutchie opened this issue Jun 17, 2020 · 7 comments
Closed

Memory fault in tests/variables.sh #23

McDutchie opened this issue Jun 17, 2020 · 7 comments
Labels
blocker This had better be fixed before releasing bug Something is not working

Comments

@McDutchie
Copy link

McDutchie commented Jun 17, 2020

Attn: @JohnoKing
I may have been too quick in merging #22. On my system, that commit introduced a memory fault in one of the regression tests in variables.c:

$ bin/shtests -x -p variables
#### Regression-testing /usr/local/src/ksh93/ksh/arch/darwin.i386-64/bin/ksh ####
test variables begins at 2020-06-17+19:37:26
+ [0s|S0,tests/variables.sh,L26,e0] alias err_exit='err_exit $LINENO'

[...long-ish trace trimmed...]

+ [0s|S0,tests/variables.sh,L172,e1] COUNT=0
+ [0s|S0,tests/variables.sh,L173,e1] (( COUNT++ ))
+ [0s|S0,tests/variables.sh,L174,e1] (( COUNT != 1 || ACCESS!=2 ))
+ [0s|S0,tests/variables.sh,L177,e0] LANG=C
+ [0s|S0,tests/variables.sh,L177,e0] 1> /dev/null 2>& 1
shtests[343]: eval: line 1: 61715: Memory fault
test variables failed at 2020-06-17+19:37:26 with exit code 267 [ 125 tests (killed by SIGSEGV) ]

Here is the backtrace from lldb:

* thread #1, stop reason = signal SIGSTOP
  * frame #0: 0x0000000105f605a3 ksh`lookup + 734
    frame #1: 0x0000000105f8808d ksh`varsub + 6329
    frame #2: 0x0000000105f84a1e ksh`copyto + 2580
    frame #3: 0x0000000105f85660 ksh`sh_macexpand + 468
    frame #4: 0x0000000105f8a91b ksh`sh_macpat + 135
    frame #5: 0x0000000105fa595d ksh`sh_exec + 4194
    frame #6: 0x0000000105fa56ea ksh`sh_exec + 3567
    frame #7: 0x0000000105f5cb2b ksh`exfile + 3243
    frame #8: 0x0000000105f5da27 ksh`sh_main + 3367
    frame #9: 0x00007fff5832b3d5 libdyld.dylib`start + 1
    frame #10: 0x00007fff5832b3d5 libdyld.dylib`start + 1

It makes little sense to me. Why SIGSTOP when it was killed by SIGSEGV?

@JohnoKing
Copy link

It looks like it was this change that caused it:

diff --git a/src/cmd/ksh93/sh/xec.c b/src/cmd/ksh93/sh/xec.c
index 6bb4355..1a0428c 100644
--- a/src/cmd/ksh93/sh/xec.c
+++ b/src/cmd/ksh93/sh/xec.c
@@ -3505,11 +3505,6 @@ static void sh_funct(Shell_t *shp,Namval_t *np,int argn, char *argv[],struct arg
 	nv_putval(SH_PATHNAMENOD,shp->st.filename,NV_NOFREE);
 	shp->pipepid = pipepid;
 	np->nvalue.rp->running  -= 2;
+	if(np->nvalue.rp && np->nvalue.rp->running==1)
+	{
+		np->nvalue.rp->running = 0;
+		_nv_unset(np, NV_RDONLY);
+	}
 }
 
 /*

@McDutchie
Copy link
Author

Interesting. If you comment out just the _nv_unset(np, NV_RDONLY); and leave the rest, then the memory fault disappears but you get funny regression test failures in variables.sh, as well as two expected ones in functions.sh:

test functions begins at 2020-06-17+20:02:15
	functions.sh[1270]: unset of POSIX function fails when it is still running
	functions.sh[1272]: unset of ksh function fails when it is still running
test functions failed at 2020-06-17+20:02:18 with exit code 2 [ 113 tests 2 errors ]
[...]
test variables begins at 2020-06-17+20:03:18
	variables.sh[193]: foo.get discipline not working after unset
	variables.sh[205]: $@ not equal to ${@}
	variables.sh[205]: $* not equal to ${*}
	variables.sh[205]: $! not equal to ${!}
	variables.sh[209]: ${!%?} not correct
	variables.sh[213]: ${!#?} not correct
	variables.sh[205]: $# not equal to ${#}
	variables.sh[209]: ${#%?} not correct
	variables.sh[213]: ${##?} not correct
	variables.sh[218]: ${##} not correct
	variables.sh[205]: $- not equal to ${-}
	variables.sh[209]: ${-%?} not correct
	variables.sh[213]: ${-#?} not correct
	variables.sh[218]: ${#-} not correct
	variables.sh[205]: $? not equal to ${?}
	variables.sh[209]: ${?%?} not correct
	variables.sh[213]: ${?#?} not correct
	variables.sh[218]: ${#?} not correct
	variables.sh[205]: $$ not equal to ${$}
	variables.sh[209]: ${$%?} not correct
	variables.sh[213]: ${$#?} not correct
test variables failed at 2020-06-17+20:03:20 with exit code 21 [ 125 tests 21 errors ]

@JohnoKing
Copy link

For the moment the patch to xec.c needs to be reverted, as regressions aren't acceptable.

@McDutchie
Copy link
Author

Agreed. But I'll have it error out instead, pending an actual fix. Silent failures aren't acceptable either.

@McDutchie
Copy link
Author

McDutchie commented Jun 17, 2020

It is more complicated than I thought. For instance, this bit from tests/variables.sh was working, and should not error out now:

unset -n foo
foo=junk
function foo.get
{
        .sh.value=stuff
        unset -f foo.get
}
if      [[ $foo != stuff ]]
then    err_exit "foo.get discipline not working"
fi
if      [[ $foo != junk ]]
then    err_exit "foo.get discipline not working after unset"
fi

@McDutchie
Copy link
Author

I have been unable to implement an error message without also making it impossible for discipline functions to unset themselves, something that used to work. So I'll have to restore old behaviour for now. The segfault stays fixed, but now both types of functions silently fail to unset themselves. :-/ I'll reopen #21 to deal with that.

@McDutchie McDutchie added blocker This had better be fixed before releasing bug Something is not working labels Jun 17, 2020
@McDutchie
Copy link
Author

McDutchie commented Jun 18, 2020

@JohnoKing, I found the cause of the crash.

As my backtrace showed, it was indeed in lookup(), which is the function for looking up a variable's discipline function (which makes sense of where the regression test crash occurred):

if(nq && nq->nvalue.rp->running==1)

That line fails to check whether nq->nvalue.rp is even a valid pointer before querying nq->nvalue.rp->running, and if it isn't, that's an instant segfault. And it makes complete sense that this pointer would be zero'd after calling _nv_unset in the backported fix.

In ksh 93v-, that line was changed to:

        if(nq && nq->nvalue.rp && nq->nvalue.rp->running==1)

Applying that along with reapplying the reverted fix, fixes everything. A commit is on the way.

McDutchie added a commit that referenced this issue Jun 18, 2020
Applying the fix for 'unset -f' exposed a crashing bug in lookup()
in sh/nvdisc.c, which is the function for looking up discipline
functions. This is what caused tests/variables.sh to crash.
Ref.: #23 (comment)

src/cmd/ksh93/sh/nvdisc.c: lookup():
- To avoid segfault, check that the function pointer nq->nvalue.rp
  is actually set before checking if nq->nvalue.rp->running==1.

src/cmd/ksh93/sh/xec.c,
src/cmd/ksh93/tests/functions.sh:
- Uncomment the 'unset -f' fix from b7932e8.

Resolves #21 (again).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocker This had better be fixed before releasing bug Something is not working
Projects
None yet
Development

No branches or pull requests

2 participants