Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Weird token issue reproduction #1022

Closed
wants to merge 2 commits into from

Conversation

ondrejmirtes
Copy link
Contributor

Hi,
I'm getting a weird character at the end of tokens array with PHP-Parser 5.x.

This is how it looks like in Xdebug:

Screenshot 2024-09-04 at 23 34 10

When I'm trying to dump it:

		var_dump(ord($lastToken->text));
		var_dump(bin2hex($lastToken->text));
		var_dump(strlen($lastToken->text));

The output looks like this:

int(0)
string(2) "00"
int(1)

Which is the same as empty string (https://3v4l.org/60Utu) except for the length.

I'm attaching a test case that shows this behaviour.

When I run the equivalent test case on PHP-Parser 4.x, the printed character is LF (newline) which is expected. But not on master (5.x).

Any help is appreciated. Thank you!

@nikic
Copy link
Owner

nikic commented Sep 5, 2024

See https://github.com/nikic/PHP-Parser/blob/master/UPGRADE-5.0.md#changes-to-token-representation:

The token array is now an array of Tokens, rather than an array of arrays and strings. Additionally, the token array is now terminated by a sentinel token with ID 0.

The weird character is a null byte \0.

Does the sentinel token cause issues for phpstan?

@ondrejmirtes
Copy link
Contributor Author

It breaks the assumption the tokens can be simply concatenated together to get the file contents again.

In my application I'm manipulating the tokens because I'm changing code comments. And when comparing the concatenated tokens with PHPUnit's assertStringEqualsFile, it says I'm expecting a binary string:

Failed asserting that two strings are equal.
--- Expected
+++ Actual
@@ @@
-'<?php SOME PHP CODE'
+Binary String: 0x3c3f7068700a0a6e616d65737061636...a7d0a00

I've pushed a commit here that shows the behaviour: a003d03

Should I just always unset the last token to get rid of this problem? Thanks.

@nikic
Copy link
Owner

nikic commented Sep 5, 2024

Should I just always unset the last token to get rid of this problem? Thanks.

The parser and pretty printer expect that token to be there -- but when converting to string, you should indeed ignore the last token. Basically the -1 here:

$result .= $this->origTokens->getTokenCode($pos, count($origTokens) - 1, 0);

@ondrejmirtes
Copy link
Contributor Author

Yeah, thanks, this is sufficient :)

@ondrejmirtes ondrejmirtes deleted the nul-byte branch September 5, 2024 11:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants