Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CODEGEN] ARM Popcount lowering rule and codegen updates #1235

Merged
merged 5 commits into from
Jun 12, 2018

Conversation

cowanmeg
Copy link
Contributor

@cowanmeg cowanmeg commented Jun 5, 2018

TVM compiler changes for low precision operators

  • ARM popcount lowering rule
  • Codegen updates to support reinterpreting vectors, and accessing upper/lower halves separately.

Thanks for contributing to TVM! Please refer to guideline http://docs.tvm.ai/contribute/ for useful information and tips. After the pull request is submitted, please request code reviews from others in the community.

@tqchen tqchen changed the title ARM Popcount lowering rule and codegen updates [CODEGEN] ARM Popcount lowering rule and codegen updates Jun 6, 2018
@tqchen
Copy link
Member

tqchen commented Jun 6, 2018

There is an unintended change that reverts the submodule to an older version. Please update the submodule (HalideIR) to the latest version. You can do it by git pull under the HalideIR folder

@@ -366,7 +366,7 @@ llvm::Value* CodeGenLLVM::CreateBroadcast(llvm::Value* value, int lanes) {
llvm::Value* CodeGenLLVM::CreateVecSlice(llvm::Value* vec, int begin, int extent) {
int num_elems = static_cast<int>(vec->getType()->getVectorNumElements());
if (extent == num_elems && begin == 0) return vec;
CHECK_LT(begin + extent, num_elems);
CHECK_LT(begin + extent, num_elems+1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CHECK_LT-> CHECK_LE

return CodeGenCPU::CreateIntrinsic(op);
}

Expr CodeGenARM::ARMPopcount(const Call *call) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will need a regression test for this rule. please add a test case to arm popcount, to a new file tests/python/unittest/test_codegen_arm.py .

Since we don't have ARM device to verify, what we can do is to dump out the asm file(Maybe we can patch GetSource in llvm module to support get_source("asm") ) and verify the neons sequence is as expected.

::llvm::Intrinsic::ID vpaddu_id = ::llvm::Intrinsic::arm_neon_vpaddlu;


Type uint8_type = Type(e.type().code(), 8, e.type().bits() * e.type().lanes() / 8);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move the typedef after the fallback guard, add comment that the division is always dividable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment about what this specific pattern of neon sequence is

@tqchen tqchen added the status: need update need update based on feedbacks label Jun 12, 2018
@tqchen tqchen merged commit be29ac7 into apache:master Jun 12, 2018
@tqchen
Copy link
Member

tqchen commented Jun 12, 2018

Thanks, this is merged!

@ajtulloch
Copy link
Contributor

Nice!

tqchen pushed a commit to tqchen/tvm that referenced this pull request Jul 6, 2018
mnuyens pushed a commit to mnuyens/tvm that referenced this pull request Jul 10, 2018
sergei-mironov pushed a commit to sergei-mironov/tvm that referenced this pull request Aug 8, 2018
@cowanmeg cowanmeg deleted the low-precision branch April 5, 2019 18:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants