Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add cross-compiling support for arm architecture. #1698

Merged
merged 18 commits into from
Apr 11, 2017

Conversation

Xreki
Copy link
Contributor

@Xreki Xreki commented Mar 23, 2017

目前只能编译WITH_PYTHON=OFF的版本。

准备Android的交叉编译环境:

$ your-ndk-root/build/tools/make-standalone-toolchain.sh --arch=arm --platform=android-21 --install-dir=arm-android-21-gcc

交叉编译Android版Paddle,主要涉及到以下修改:

  • cmake编译配置,支持两种配置方式

编译方式1: 使用cmake本身对Android交叉编译的支持,要求cmake-3.7以上版本。cmake系统根据是否设置了CMAKE_SYSTEM_NAME来判断是否在进行交叉编译,会根据配置自动添加相应的编译选项,并且设置CMAKE_CROSSCOMPILING变量为TRUE。而Paddle的cmake文件在检测到交叉编译Android版本时,也会自动地设置WITH_AVX=OFF; WITH_GPU=OFF; WITH_RDMA=OFF; WITH_PYTHON=OFF

PROTOBUF=path/to/mixed-version-of-protobuf
OPENBLAS_ROOT=path/to/openblas-arm_soft_fp_abi
cmake -DCMAKE_SYSTEM_NAME=Android \
      -DCMAKE_ANDROID_STANDALONE_TOOLCHAIN=your-standalone-toolchains/arm-android-21-gcc \
      -DCMAKE_ANDROID_ARCH_ABI=armeabi-v7a \
      -DCMAKE_ANDROID_ARM_MODE=ON \
      -DCMAKE_ANDROID_ARM_NEON=ON \
      -DOPENBLAS_ROOT=${OPENBLAS_ROOT} \
      -DWITH_SWIG_PY=OFF \
      -DCMAKE_PREFIX_PATH="$PROTOBUF" \
      .. 

编译方式2: 手动配置编译选项。这种方式,cmake系统本身并不认为是在进行交叉编译,而是用户手动通过编译器、编译器选项在控制。

export PATH=your-standalone-toolchains/arm-android-21-gcc/bin:$PATH

PROTOBUF=path/to/mixed-version-of-protobuf
OPENBLAS_ROOT=path/to/openblas-arm_soft_fp_abi
cmake -DCMAKE_C_COMPILER=arm-linux-androideabi-gcc \
      -DCMAKE_CXX_COMPILER=arm-linux-androideabi-g++ \
      -DCMAKE_C_FLAGS="-marm -march=armv7-a -mfloat-abi=softfp -mfpu=neon -fPIE -pie" \
      -DCMAKE_CXX_FLAGS="-marm -march=armv7-a -mfloat-abi=softfp -mfpu=neon -fPIE -pie" \
      -DCMAKE_EXE_LINKER_FLAGS="-llog" \
      -DOPENBLAS_ROOT=${OPENBLAS_ROOT} \
      -DWITH_GPU=OFF \
      -DWITH_PYTHON=OFF \
      -DWITH_SWIG_PY=OFF \
      -DCMAKE_PREFIX_PATH="$PROTOBUF" \
      ..
  • 指令集相关代码实现
  • 系统功能支持的缺乏
    • 通过cpuid动态查询指令集的支持,目前默认设置成支持NEON指令,后期可以按照OpenBLAS的方式实现,Android版可以查询cpufeatures
    • 缺乏pthread组件:pthread_spinlock_tpthread_barrier_t
      • cmake中检查pthread_spinlock_tpthread_barrier_t这两个变量是否存在:
        • 存在,分别定义宏PADDLE_USE_PTHREAD_SPINLOCKPADDLE_USE_PTHREAD_BARRIER,直接采用pthread版本
        • 不存在,采用和mac系统上一样的替代实现(paddle/utils/arch/linux/Locks.cpp)。(若确认采用这种方式,后期可将paddle/utils/arch/linux/Locks.cpppaddle/utils/arch/osx/Locks.cpp合并成一个)
    • 缺乏std::to_string,实现了一个简单的内部版本(paddle/utils/StringUtil.h
  • 第三方库依赖
    • protobuf,需要编译host上的可执行程序protoc和target上的库libprotobuf.a
    • OpenBLAS,需要编译arm_soft_fp_abi分支
      • 由于Android发布的编译工具链不包含fortran编译器,因此只能编译NO_LAPACK版本,Paddle中添加宏PADDLE_USE_LAPACK控制lapack函数的调用
      • OpenBLAS编译命令:make TARGET=ARMV7 HOSTCC=gcc CC=arm-linux-androideabi-gcc ARM_SOFTFP_ABI=1 NOFORTRAN=1 USE_THREAD=0

cmake/simd.cmake Outdated
float32x4_t b = {1.0f, 2.0f, 3.0f, 4.0f};
float32x4_t c = vaddq_f32(a, b);
return 0;
}" NEON_FOUND)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个地方很奇怪,加上编译器对NEON指令的检查后,PC上cmake时,会出现如下错误

-- Looking for UINT64_MAX
-- Looking for UINT64_MAX - not found
-- Looking for UINT64_MAX
-- Looking for UINT64_MAX - not found
CMake Error at cmake/flags.cmake:82 (message):
  Cannot find symbol UINT64_MAX

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix了。主要是因为最后一次设置了set(CMAKE_REQUIRED_FLAGS ${NEON_FLAG}),后面做其他check时,都会使用CMAKE_REQUIRED_FLAGS来编译。

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我是了一下这个PR,还是有 Cannot find symbol UINT64_MAX报错。
另外,这个是在flags.cmake:83报错的,跟这段NEON检查有关系?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

你看一下build/CMakeFiles/CMakeError.log这个文件,看看是什么错误信息啊?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

嗯,是CMakeCache.txt没有清空导致的。不过,这里应该是将CMAKE_REQUIRED_FLAGS赋回原来的值,而不是清空为好。

CMakeLists.txt Outdated
@@ -65,6 +64,7 @@ include(external/openblas) # download, build, install openblas
include(external/swig) # download, build, install swig
include(external/warpctc) # download, build, install warpctc

include(simd) # set simd flag
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line 31获取不到AVX_FOUND

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix了。

@@ -13,10 +13,12 @@ See the License for the specific language governing permissions and
limitations under the License. */

#include "SIMDFunctions.h"
#ifdef __SSE__
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SSE3?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

太多地方出现__SSE__/AVX/__ARM_NEON__这种宏了,需要把这些代码单独梳理一下。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个是需要结合@gangliao#1607 这个工作一起来做?

cmake/simd.cmake Outdated
@@ -73,4 +76,26 @@ int main()
return 0;
}" AVX2_FOUND)

mark_as_advanced(MMX_FOUND SSE2_FOUND SSE3_FOUND AVX_FOUND AVX2_FOUND)
# Check NEON
set(CMAKE_REQUIRED_FLAGS ${NEON_FLAG})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

一般情况下都是用交叉编译吧,这段check需要吗?

@@ -163,8 +168,12 @@ void initMain(int argc, char** argv) {

installProfilerSwitch();

#ifdef __SSE__
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ARM里面flush_to_zero是怎么做的?

inline SpinLockPrivate() { pthread_spin_init(&lock_, 0); }
inline ~SpinLockPrivate() { pthread_spin_destroy(&lock_); }
inline SpinLockPrivate() {
#ifndef __ANDROID__
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这种每个函数里面都加一个宏的注释,基本破坏了整个代码的可读性。这里还不如把整个Locks注释掉。

@@ -19,7 +19,7 @@ limitations under the License. */
/// for MSVC
#define CPUID(info, x) __cpuidex(info, x, 0)

#else
#elif !defined(__ANDROID__)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ARM环境就不用编译这个CpuId.cpp文件了,这里也不用引入__ANDROID__宏。

See the License for the specific language governing permissions and
limitations under the License. */


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个文件与hl_sse_matrix_kernel.cuh看起来是可以合成一个的。

@@ -13,10 +13,12 @@ See the License for the specific language governing permissions and
limitations under the License. */

#include "SIMDFunctions.h"
#ifdef __SSE__
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

太多地方出现__SSE__/AVX/__ARM_NEON__这种宏了,需要把这些代码单独梳理一下。

Copy link
Contributor

@hedaoyuan hedaoyuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow这个PR可以编译出ARM版本,不过还有几个问题需要确认一下。

  1. 在Android API level 19下面编译会缺少一些符号,比如rand;level 21是可以的;所以,Paddle后续在Android上支持的最低版是21?
  2. cmake/external/gflags.cmake等需要增加-DCMAKE_C_COMPILER=${CMAKE_C_COMPILER}和-DCMAKE_CXX_COMPILER=${CMAKE_CXX_COMPILER}否则无法透传cmake .. -DCMAKE_C_COMPILER=...指定的较差编译环境。
  3. 由于protobuf的编译问题,当前没法直接cmake & make;

cmake/simd.cmake Outdated
float32x4_t b = {1.0f, 2.0f, 3.0f, 4.0f};
float32x4_t c = vaddq_f32(a, b);
return 0;
}" NEON_FOUND)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

嗯,是CMakeCache.txt没有清空导致的。不过,这里应该是将CMAKE_REQUIRED_FLAGS赋回原来的值,而不是清空为好。

Copy link
Collaborator

@wangkuiyi wangkuiyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个PR是确保可以build PaddlePaddle for ARM吗?如果是这样,应该更新(或者增加)对应的文档?

void SpinLock::lock() { m->lock(); }
void SpinLock::unlock() { m->unlock(); }

#ifdef PADDLE_USE_PTHREAD_BARRIER
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个条件编译是为支持ARM吗?我感觉这里需要一个comment说明为什么引入这个条件编译。

@@ -36,36 +40,101 @@ void Semaphore::wait() { sem_wait(&m->sem); }

void Semaphore::post() { sem_post(&m->sem); }

#ifdef PADDLE_USE_PTHREAD_SPINLOCK
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个条件编译是为支持ARM吗?我感觉这里需要一个comment说明为什么引入这个条件编译。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ARM和MAC的pthread库里面都没有pthread_spinlock_tpthread_barrier_t这两个类型以及对应的接口函数,这里我考虑有两种方式实现:

  1. 在cmake里面检查pthread_spinlock_tpthread_barrier_t这个两个变量是否存在,存在则定义宏PADDLE_USE_PTHREAD_SPINLOCKPADDLE_USE_PTHREAD_BARRIER#else ... #endif里面采用paddle/utils/arch/osx/Locks.cpp里面的实现,后期可以考虑将这两个Locks.cpp文件合并。
  2. 新增paddle/utils/arch/android/Locks.cpp,采用paddle/utils/arch/linux/Locks.cpp里面的SemaphorePrivate实现和paddle/utils/arch/osx/Locks.cpp里面的SpinLockPrivateThreadBarrierPrivate实现。

或者其他建议。。。

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

如果 SpinLockPrivate、ThreadBarrierPrivate 是可以自己实现,不依赖pthread的,是不是可以在各种情况下都自己实现?

如果是两种情况里选择一种,貌似第二种更容易看明白(如果和 arch/osx/Locks.cpp 没有太多代码重复的话)。

有一个建议: arch/osx 和 arch/android 这两个名字不合理,因为 osx 和 android 都不是 arch,而是 os。arm 和 x86 和 x64 是 arch。如果这个不适合在这个pr里修改,可否创建一个issue提醒改改目录名?

Copy link
Contributor Author

@Xreki Xreki Mar 30, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

列举一下Locks.cpp里面实现的内容和方式吧:

SemaphonePrivate SpinLockPrivate ThreadBarrierPrivate
linux sem_t pthread_spinlock_t pthread_barrier_t
android sem_t std::atomic_flag pthread_mutex_t & pthread_cond_t
osx dispatch_semaphore_t (mac独有) std::atomic_flag pthread_mutex_t & pthread_cond_t

其中,使用std::atomic_flag实现的SpinLockPrivate和使用pthread_mutex_t & pthread_cond_t实现的ThreadBarrierPrivate@gangliao 针对mac系统实现,我看android系统上可用,就拿过来用了。

arch/osx 和 arch/android 这两个名字不合理,因为 osx 和 android 都不是 arch,而是 os。arm 和 x86 和 x64 是 arch。如果这个不适合在这个pr里修改,可否创建一个issue提醒改改目录名?

好的。 #1728

@hedaoyuan
Copy link
Contributor

这个PR是确保可以build PaddlePaddle for ARM吗?如果是这样,应该更新(或者增加)对应的文档?

是的,需要有How to build这个益群好像已经在写了吧?这个PR实际上解决的是ARM+Android的编译,其他ARM环境(Linux + ARM)的编译需要基于这个PR继续Fix。

include(system)

if(ANDROID)
cmake_minimum_required(VERSION 3.7)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个需要最低3.7?我这边是3.2.2的也是可以编译的。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

你用的是编译方式2吧,如果使用编译方式1,cmake3.2.2会出现如下Warning,而且不会自动设置编译选项:

CMake Warning:
  Manually-specified variables were not used by the project:

    CMAKE_ANDROID_ARCH_ABI
    CMAKE_ANDROID_ARM_MODE
    CMAKE_ANDROID_ARM_NEON
    CMAKE_ANDROID_STANDALONE_TOOLCHAIN

这些系统cmake变量是在3.7版本后才加入的。(cmake-toolchains文档

@Xreki
Copy link
Contributor Author

Xreki commented Mar 29, 2017

@wangkuiyi @hedaoyuan

在Android API level 19下面编译会缺少一些符号,比如rand;level 21是可以的;所以,Paddle后续在Android上支持的最低版是21?

API level 19(Android 4.4)我试了一遍,主要存在以下问题:

  • Paddle主体:
    • rand_r,toolchain中确实没有搜到相关定义
    • getline,toolchain中有std::getline
  • 第三方依赖
    • glog中,符号POSIX_FADV_DONTNEED和posix_fadvise()未定义,API 21 (Android 5.0)才支持 (难怪tensorflow的demo最低要求API 21,/(ㄒoㄒ)/~~)

cmake/external/gflags.cmake等需要增加-DCMAKE_C_COMPILER=${CMAKE_C_COMPILER}和-DCMAKE_CXX_COMPILER=${CMAKE_CXX_COMPILER}否则无法透传cmake .. -DCMAKE_C_COMPILER=...指定的较差编译环境。

CMAKE_C_COMPILERCMAKE_CXX_COMPILER似乎会自动传,make VERBOSE=1可以看到gflags都是在用arm-linux-androideabi-gcc编译。目前的提交中,已经在gflagsgloggtestwarpctczlib这几个external库的cmake配置中,手动设置了Paddle的CMAKE_C_COMPILERCMAKE_CXX_COMPILERCMAKE_C_FLAGSCMAKE_CXX_FLAGS

由于protobuf的编译问题,当前没法直接cmake & make

是的,protobuf和openblas都需要提前编译好,在cmake时传入。如果需要,后期可以修改cmake改成自动编译。另外cmake时还有些交叉编译的选项需要手动配置。

需要有How to build这个益群好像已经在写了吧?

目前还没写,在这个pr中我简单介绍了两种编译方式。文档之后补上 :-D

@Xreki
Copy link
Contributor Author

Xreki commented Apr 11, 2017

该PR相关的进一步工作

  1. CMAKE相关

    • 添加ARM架构(-march=x86/arm/armv7-a/armv8-a等)配置接口,根据不同的架构自动配置编译flag。(cmake自带的cross-compiling可以支持) Add cross-compiling toolchain files for Android and Raspberry Pi. #1973
    • 指令集相关的代码调整,会涉及到一些CMAKE编译调整。
    • ARM版预测库编译控制,只需要编译Paddle核心、预测相关源码,不需要编译所有代码。
  2. 指令集相关代码实现

    • 需要能够支持编译scalar和neon simd版本。
    • SIMD相关代码调整,简化代码中的一些宏(SSE/AVX/NEON)控制。
  3. 系统功能支持的缺乏

    • cpuid,暂时不支持。
    • pthread_spinlock_t,主干代码中没有用到,暂时不支持。
    • pthread_barrier_t,参考mac架构上的实现。
    • std::to_string,实现内部版本。
  4. 需要手动编译的第三方库依赖

    • OpenBLAS
    • protobuf
  5. 文档

    • 编译说明文档

一些工作将在后续的pr中继续完善。

@Xreki Xreki merged commit d324ed7 into PaddlePaddle:develop Apr 11, 2017
@Xreki Xreki added the Android label Sep 30, 2017
@Xreki Xreki deleted the build_arm branch October 18, 2017 06:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants