Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Mac OS X port #138

Merged
merged 55 commits into from
Sep 29, 2016
Merged

Add Mac OS X port #138

merged 55 commits into from
Sep 29, 2016

Conversation

gangliao
Copy link
Contributor

No description provided.

gangliao and others added 30 commits September 9, 2016 19:46
* Also make use cmake find to zlib.
* circle link in osx, use reverse link all libs instead. But
  maybe osx just don't care circle link.
# Conflicts:
#	paddle/math/tests/test_perturbation.cpp
@@ -239,7 +239,7 @@ void Matrix::toNumpyMatInplace(float** view_data, int* dim1,
}
void Matrix::copyToNumpyMat(float** view_m_data, int* dim1,
int* dim2) throw(UnsupportError) {
static_assert(sizeof(paddle::real) == sizeof(float),
static_assert(sizeof(float) == sizeof(float),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sizeof(paddle::real)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@@ -251,12 +251,12 @@ void Matrix::copyToNumpyMat(float** view_m_data, int* dim1,
if (auto cpuMat = dynamic_cast<paddle::CpuMatrix*>(m->mat.get())) {
auto src = cpuMat->getData();
auto dest = *view_m_data;
std::memcpy(dest, src, sizeof(paddle::real) * (*dim1) * (*dim2));
std::memcpy(dest, src, sizeof(float) * (*dim1) * (*dim2));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sizeof(paddle::real)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

} else if (auto gpuMat = dynamic_cast<paddle::GpuMatrix*>(m->mat.get())) {
auto src = gpuMat->getData();
auto dest = *view_m_data;
hl_memcpy_device2host(dest, src,
sizeof(paddle::real) * (*dim1) * (*dim2));
sizeof(float) * (*dim1) * (*dim2));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sizeof(paddle::real)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@@ -56,9 +69,9 @@ def parent_dir_str(self):

def libs_str(self):
libs = [
"-Wl,--whole-archive",
whole_start,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for apple, do we need to use " "-Wl,-force_load" instead, as what you does in cmake

Copy link
Contributor Author

@gangliao gangliao Sep 29, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already done this via "-Wl, all_load" in line 48, /Users/liaogang/baidu/Paddle/paddle/setup.py.in.

NeuralNetwork* NeuralNetwork::newNeuralNetwork(
const std::string& name,
NeuralNetwork* rootNetwork) {
if (newCustomNeuralNetwork) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

newCustomNeuralNetwork needs to be kept. If not possible on MacOS, we can use #ifdef

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use cmake to fix this

 if("${CMAKE_CXX_COMPILER_ID}" STREQUAL "Clang")
            list(APPEND LIBS "-undefined dynamic_lookup")
        endif()

CHECK_PY(pyModule) << "Import module " << moduleName << " failed.";
PyObjectPtr pyDict(PyModule_GetDict(pyModule.get()));
CHECK_PY(pyDict) << "Get Dict failed.";
PyObjectPtr pyClass(PyDict_GetItemString(pyDict.get(), className.c_str()));
LOG(INFO) << "createPythonClass className.c_str():" << className.c_str();
// LOG(INFO) << "createPythonClass className.c_str():" << className.c_str();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either uncomment it or delete it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

// #ifndef _XOPEN_SOURCE
// #warning "no _XOPEN_SOURCE defined in Python.h"
// #endif

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uncomment or delete

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

#endif
CHECK_NE(tid, -1);
return tid;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line 21-32 can share code with getTID() in Stat.cpp

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

#define __NR_gettid 224
#endif
pid_t tid = syscall(__NR_gettid);
#endif
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getTID()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@@ -146,12 +146,12 @@ TEST(compareSparse, remote_cpu) {
TEST(compareSparse, cpu10_local_vs_remote) {
FLAGS_local = 1; // disable remote sparse update in parameter config
std::vector<ParameterPtr> localParameters =
trainerOnePassTest(configFile1, true, 10);
trainerOnePassTest(configFile1, true, 2);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why can't "10" work?

@@ -44,8 +44,8 @@ set(ATLAS_LIB_SEARCH_PATHS
/usr/lib
/usr/lib/blas/atlas
/usr/lib/atlas
/usr/lib/atlas-base # special for ubuntu 14.04.
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this line code be revert

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

you will already have Python 2.7.10 and Numpy 1.8 installed.

The best option is to use the package manager homebrew to handle installations and upgrades for you.
To install homebrew, first open a terminal window (you can find Terminal in the Utilities folder in Applications), and issue the command:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

give a link to homebrew site

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

# Build gtest
mkdir build && cmake ..
make
# Install gtest library
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is make install no good for gtest?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After make install, print 'make install' is dangerous and not supported. Instead, see README for how to integrate Google Test into your build system.

**only cpu**

```bash
cmake -DWITH_GPU=OFF -DWITH_DOC=OFF
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Invoke cmake in source directory will cause a lot of cmake generated file in it.

It may be better to create a build directory, them cmake .. -DWITH_GPU=OFF -DWITH_DOC=OFF.

And -DWITH_DOC is OFF by default.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

mkdir build
cd build
# you can add build option here, such as:
cmake -DWITH_GPU=ON -DWITH_DOC=OFF -DCMAKE_INSTALL_PREFIX=<path to install> ..
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change order to

cmake .. -DWITH_GPU=ON -DWITH_DOC=OFF -DCMAKE_INSTALL_PREFIX=<path to install>

in case of user ignore .. at last.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

```
**Note**

And if you set WITH_SWIG_PY=ON, you have to install related python predict api at the same time:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not only in WITH_SWIG_PY=ON. The python module is also need be installed.

Paddle will automatically install python dependencies at first time user run paddle commands, such as paddle version, paddle train. It may require sudo privileges.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@@ -95,7 +95,7 @@ float Matrix::get(size_t x, size_t y) const throw(RangeError) {
}

void Matrix::set(size_t x, size_t y, float val) throw(RangeError,
UnsupportError) {
UnsupportError) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should not change this line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed


return __longlong_as_double(old);
}
namespace paddle {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is not related to osx port.

Could extract it to another PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bug found in MAC OSX when open WITH_DOUBLE

@@ -84,4 +84,10 @@ int main(int argc, char** argv) {
return ret;
}

#else
Copy link
Collaborator

@reyoung reyoung Sep 29, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This lines may be can removed in mac osx.
Since the whole-archive was not correct before, this lines ware added.

I will give a PR to you, and check it is cool or not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@@ -90,7 +94,7 @@ TEST(checkGradient, multi) {
TEST(checkGradient, hsigmoid) { checkGradientTest(configFile2, false, false); }

TEST(checkGradient, chunk) {
EXPECT_EQ(0, system("python2 trainer/tests/gen_proto_data.py"));
EXPECT_EQ(0, system("python trainer/tests/gen_proto_data.py"));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is python2 not good?

But in some linux distribution, python3 is default python

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MAC OS X do not have python2. I will add some condition branch to avoid potential hazard.

@gangliao gangliao merged commit 2920b6b into PaddlePaddle:master Sep 29, 2016
zhhsplendid pushed a commit to zhhsplendid/Paddle that referenced this pull request Sep 25, 2019
Meiyim pushed a commit to Meiyim/Paddle that referenced this pull request May 21, 2021
Meiyim pushed a commit to Meiyim/Paddle that referenced this pull request May 21, 2021
zhoutianzi666 pushed a commit to zhoutianzi666/Paddle that referenced this pull request May 23, 2022
AnnaTrainingG pushed a commit to AnnaTrainingG/Paddle that referenced this pull request Sep 19, 2022
danleifeng pushed a commit to danleifeng/Paddle that referenced this pull request Oct 19, 2022
* optimize mem in  uniq slot feature

* cherry-pick var slot_feature

* fix kernel overflow && add max feature num flag

Co-authored-by: huwei02 <53012141+huwei02@users.noreply.github.com>
qingshui pushed a commit to qingshui/Paddle that referenced this pull request Nov 14, 2022
* Optimizing the zero key problem in the push phase

* Optimize CUDA thread parallelism in MergeGrad phase

* Optimize CUDA thread parallelism in MergeGrad phase

* Performance optimization, segment gradient merging

* Performance optimization, segment gradient merging

* Optimize pullsparse and increase keys aggregation

* sync gpugraph to gpugraph_v2 (#86)

* change load node and edge from local to cpu (#83)

* change load node and edge

* remove useless code

Co-authored-by: root <root@yq01-inf-hic-k8s-a100-ab2-0009.yq01.baidu.com>

* extract pull sparse as single stage(#85)

Co-authored-by: yangjunchao <yangjunchao@baidu.com>

Co-authored-by: miaoli06 <106585574+miaoli06@users.noreply.github.com>
Co-authored-by: root <root@yq01-inf-hic-k8s-a100-ab2-0009.yq01.baidu.com>
Co-authored-by: chao9527 <33347532+chao9527@users.noreply.github.com>
Co-authored-by: yangjunchao <yangjunchao@baidu.com>

* [GPUGraph] graph sample v2 (#87)

* change load node and edge from local to cpu (#83)

* change load node and edge

* remove useless code

Co-authored-by: root <root@yq01-inf-hic-k8s-a100-ab2-0009.yq01.baidu.com>

* extract pull sparse as single stage(#85)

Co-authored-by: yangjunchao <yangjunchao@baidu.com>

* support ssdsparsetable;test=develop (#81)

* graph sample v2

* remove log

Co-authored-by: miaoli06 <106585574+miaoli06@users.noreply.github.com>
Co-authored-by: root <root@yq01-inf-hic-k8s-a100-ab2-0009.yq01.baidu.com>
Co-authored-by: chao9527 <33347532+chao9527@users.noreply.github.com>
Co-authored-by: yangjunchao <yangjunchao@baidu.com>
Co-authored-by: danleifeng <52735331+danleifeng@users.noreply.github.com>

* Release cpu graph

* uniq nodeid (#89)

* compatible whole HBM mode (#91)

Co-authored-by: yangjunchao <yangjunchao@baidu.com>

* Gpugraph v2 (#93)

* compatible whole HBM mode

* unify flag for graph emd storage mode and graph struct storage mode

* format

Co-authored-by: yangjunchao <yangjunchao@baidu.com>

* split generate batch into multi stage (#92)

* split generate batch into multi stage

* fix conflict

Co-authored-by: root <root@yq01-inf-hic-k8s-a100-ab2-0009.yq01.baidu.com>

* [GpuGraph] Uniq feature (#95)

* uniq feature

* uniq feature

* uniq feature

* [GpuGraph]  global startid (#98)

* uniq feature

* uniq feature

* uniq feature

* global startid

* load node edge seperately and release graph (#99)

* load node edge seperately and release graph

* load node edge seperately and release graph

Co-authored-by: root <root@yq01-inf-hic-k8s-a100-ab2-0009.yq01.baidu.com>

* v2 infer (#102)

* optimize begin pass and end pass (#106)

Co-authored-by: yangjunchao <yangjunchao@baidu.com>

* fix ins no (#104)

* [GPUGraph] fix FillOneStep args (#107)

* fix ins no

* fix FillOnestep args

* fix bug for whole hbm mode (#110)

Co-authored-by: yangjunchao <yangjunchao@baidu.com>

* [GPUGraph] fix infer && add infer_table_cap (#108)

* fix ins no

* fix FillOnestep args

* fix infer && add infer table cap

* fix infer

* 【PSCORE】perform ssd sparse table  (#111)

* perform ssd sparsetable;test=develop

Conflicts:
	paddle/fluid/framework/fleet/ps_gpu_wrapper.cc

* perform ssd sparsetable;test=develop

* remove debug code;

* remove debug code;

* add jemalloc cmake;test=develop

* fix wrapper;test=develop

* fix sample core (#114)

* [GpuGraph] optimize shuffle batch (#115)

* fix sample core

* optimize shuffle batch

* release gpu mem when sample end (#116)

Co-authored-by: root <root@yq01-inf-hic-k8s-a100-ab2-0009.yq01.baidu.com>

* fix class not found err (PaddlePaddle#118)

Co-authored-by: root <root@yq01-inf-hic-k8s-a100-ab2-0009.yq01.baidu.com>

* optimize sample (PaddlePaddle#117)

* optimize sample

* optimize sample

Co-authored-by: yangjunchao <yangjunchao@baidu.com>

* fix clear gpu mem (PaddlePaddle#119)

Co-authored-by: root <root@yq01-inf-hic-k8s-a100-ab2-0009.yq01.baidu.com>

* fix sample core (PaddlePaddle#121)

Co-authored-by: yangjunchao <yangjunchao@baidu.com>

* add ssd cache (PaddlePaddle#123)

* add ssd cache;test=develop

* add ssd cache;test=develop

* add ssd cache;test=develop

* add multi epoch train & fix train table change ins & save infer embeding  (PaddlePaddle#129)

* add multi epoch train & fix train table change ins & save infer embedding

* change epoch finish judge

* change epoch finish change

Co-authored-by: root <root@yq01-inf-hic-k8s-a100-ab2-0009.yq01.baidu.com>

* Add debug log (PaddlePaddle#131)

* Add debug log

* Add debug log

Co-authored-by: root <root@yq01-inf-hic-k8s-a100-ab2-0008.yq01.baidu.com>

* optimize mem in  uniq slot feature (PaddlePaddle#130)

* [GpuGraph] cherry pick var slot feature && fix load multi path node (PaddlePaddle#136)

* optimize mem in  uniq slot feature

* cherry-pick var slot_feature

Co-authored-by: huwei02 <53012141+huwei02@users.noreply.github.com>

* [GpuGraph] fix kernel overflow (PaddlePaddle#138)

* optimize mem in  uniq slot feature

* cherry-pick var slot_feature

* fix kernel overflow && add max feature num flag

Co-authored-by: huwei02 <53012141+huwei02@users.noreply.github.com>

* fix ssd cache;test=develop (PaddlePaddle#139)

* slot feature secondary storage (PaddlePaddle#140)

* slot feature secondary storage

* slot feature secondary storage

Co-authored-by: yangjunchao <yangjunchao@baidu.com>

Co-authored-by: root <root@yq01-inf-hic-k8s-a100-ab2-0008.yq01.baidu.com>
Co-authored-by: xuewujiao <105861147+xuewujiao@users.noreply.github.com>
Co-authored-by: miaoli06 <106585574+miaoli06@users.noreply.github.com>
Co-authored-by: root <root@yq01-inf-hic-k8s-a100-ab2-0009.yq01.baidu.com>
Co-authored-by: chao9527 <33347532+chao9527@users.noreply.github.com>
Co-authored-by: yangjunchao <yangjunchao@baidu.com>
Co-authored-by: Thunderbrook <52529258+Thunderbrook@users.noreply.github.com>
Co-authored-by: danleifeng <52735331+danleifeng@users.noreply.github.com>
Co-authored-by: huwei02 <53012141+huwei02@users.noreply.github.com>
zmxdream pushed a commit to zmxdream/Paddle that referenced this pull request Dec 7, 2022
* Optimizing the zero key problem in the push phase

* Optimize CUDA thread parallelism in MergeGrad phase

* Optimize CUDA thread parallelism in MergeGrad phase

* Performance optimization, segment gradient merging

* Performance optimization, segment gradient merging

* Optimize pullsparse and increase keys aggregation

* sync gpugraph to gpugraph_v2 (PaddlePaddle#86)

* change load node and edge from local to cpu (PaddlePaddle#83)

* change load node and edge

* remove useless code

Co-authored-by: root <root@yq01-inf-hic-k8s-a100-ab2-0009.yq01.baidu.com>

* extract pull sparse as single stage(PaddlePaddle#85)

Co-authored-by: yangjunchao <yangjunchao@baidu.com>

Co-authored-by: miaoli06 <106585574+miaoli06@users.noreply.github.com>
Co-authored-by: root <root@yq01-inf-hic-k8s-a100-ab2-0009.yq01.baidu.com>
Co-authored-by: chao9527 <33347532+chao9527@users.noreply.github.com>
Co-authored-by: yangjunchao <yangjunchao@baidu.com>

* [GPUGraph] graph sample v2 (PaddlePaddle#87)

* change load node and edge from local to cpu (PaddlePaddle#83)

* change load node and edge

* remove useless code

Co-authored-by: root <root@yq01-inf-hic-k8s-a100-ab2-0009.yq01.baidu.com>

* extract pull sparse as single stage(PaddlePaddle#85)

Co-authored-by: yangjunchao <yangjunchao@baidu.com>

* support ssdsparsetable;test=develop (PaddlePaddle#81)

* graph sample v2

* remove log

Co-authored-by: miaoli06 <106585574+miaoli06@users.noreply.github.com>
Co-authored-by: root <root@yq01-inf-hic-k8s-a100-ab2-0009.yq01.baidu.com>
Co-authored-by: chao9527 <33347532+chao9527@users.noreply.github.com>
Co-authored-by: yangjunchao <yangjunchao@baidu.com>
Co-authored-by: danleifeng <52735331+danleifeng@users.noreply.github.com>

* Release cpu graph

* uniq nodeid (PaddlePaddle#89)

* compatible whole HBM mode (PaddlePaddle#91)

Co-authored-by: yangjunchao <yangjunchao@baidu.com>

* Gpugraph v2 (PaddlePaddle#93)

* compatible whole HBM mode

* unify flag for graph emd storage mode and graph struct storage mode

* format

Co-authored-by: yangjunchao <yangjunchao@baidu.com>

* split generate batch into multi stage (PaddlePaddle#92)

* split generate batch into multi stage

* fix conflict

Co-authored-by: root <root@yq01-inf-hic-k8s-a100-ab2-0009.yq01.baidu.com>

* [GpuGraph] Uniq feature (PaddlePaddle#95)

* uniq feature

* uniq feature

* uniq feature

* [GpuGraph]  global startid (PaddlePaddle#98)

* uniq feature

* uniq feature

* uniq feature

* global startid

* load node edge seperately and release graph (PaddlePaddle#99)

* load node edge seperately and release graph

* load node edge seperately and release graph

Co-authored-by: root <root@yq01-inf-hic-k8s-a100-ab2-0009.yq01.baidu.com>

* v2 infer (PaddlePaddle#102)

* optimize begin pass and end pass (PaddlePaddle#106)

Co-authored-by: yangjunchao <yangjunchao@baidu.com>

* fix ins no (PaddlePaddle#104)

* [GPUGraph] fix FillOneStep args (PaddlePaddle#107)

* fix ins no

* fix FillOnestep args

* fix bug for whole hbm mode (PaddlePaddle#110)

Co-authored-by: yangjunchao <yangjunchao@baidu.com>

* [GPUGraph] fix infer && add infer_table_cap (PaddlePaddle#108)

* fix ins no

* fix FillOnestep args

* fix infer && add infer table cap

* fix infer

* 【PSCORE】perform ssd sparse table  (PaddlePaddle#111)

* perform ssd sparsetable;test=develop

Conflicts:
	paddle/fluid/framework/fleet/ps_gpu_wrapper.cc

* perform ssd sparsetable;test=develop

* remove debug code;

* remove debug code;

* add jemalloc cmake;test=develop

* fix wrapper;test=develop

* fix sample core (PaddlePaddle#114)

* [GpuGraph] optimize shuffle batch (PaddlePaddle#115)

* fix sample core

* optimize shuffle batch

* release gpu mem when sample end (PaddlePaddle#116)

Co-authored-by: root <root@yq01-inf-hic-k8s-a100-ab2-0009.yq01.baidu.com>

* fix class not found err (PaddlePaddle#118)

Co-authored-by: root <root@yq01-inf-hic-k8s-a100-ab2-0009.yq01.baidu.com>

* optimize sample (PaddlePaddle#117)

* optimize sample

* optimize sample

Co-authored-by: yangjunchao <yangjunchao@baidu.com>

* fix clear gpu mem (PaddlePaddle#119)

Co-authored-by: root <root@yq01-inf-hic-k8s-a100-ab2-0009.yq01.baidu.com>

* fix sample core (PaddlePaddle#121)

Co-authored-by: yangjunchao <yangjunchao@baidu.com>

* add ssd cache (PaddlePaddle#123)

* add ssd cache;test=develop

* add ssd cache;test=develop

* add ssd cache;test=develop

* add multi epoch train & fix train table change ins & save infer embeding  (PaddlePaddle#129)

* add multi epoch train & fix train table change ins & save infer embedding

* change epoch finish judge

* change epoch finish change

Co-authored-by: root <root@yq01-inf-hic-k8s-a100-ab2-0009.yq01.baidu.com>

* Add debug log (PaddlePaddle#131)

* Add debug log

* Add debug log

Co-authored-by: root <root@yq01-inf-hic-k8s-a100-ab2-0008.yq01.baidu.com>

* optimize mem in  uniq slot feature (PaddlePaddle#130)

* [GpuGraph] cherry pick var slot feature && fix load multi path node (PaddlePaddle#136)

* optimize mem in  uniq slot feature

* cherry-pick var slot_feature

Co-authored-by: huwei02 <53012141+huwei02@users.noreply.github.com>

* [GpuGraph] fix kernel overflow (PaddlePaddle#138)

* optimize mem in  uniq slot feature

* cherry-pick var slot_feature

* fix kernel overflow && add max feature num flag

Co-authored-by: huwei02 <53012141+huwei02@users.noreply.github.com>

* fix ssd cache;test=develop (PaddlePaddle#139)

* slot feature secondary storage (PaddlePaddle#140)

* slot feature secondary storage

* slot feature secondary storage

Co-authored-by: yangjunchao <yangjunchao@baidu.com>

Co-authored-by: root <root@yq01-inf-hic-k8s-a100-ab2-0008.yq01.baidu.com>
Co-authored-by: xuewujiao <105861147+xuewujiao@users.noreply.github.com>
Co-authored-by: miaoli06 <106585574+miaoli06@users.noreply.github.com>
Co-authored-by: root <root@yq01-inf-hic-k8s-a100-ab2-0009.yq01.baidu.com>
Co-authored-by: chao9527 <33347532+chao9527@users.noreply.github.com>
Co-authored-by: yangjunchao <yangjunchao@baidu.com>
Co-authored-by: Thunderbrook <52529258+Thunderbrook@users.noreply.github.com>
Co-authored-by: danleifeng <52735331+danleifeng@users.noreply.github.com>
Co-authored-by: huwei02 <53012141+huwei02@users.noreply.github.com>
qizhaoaoe pushed a commit to qizhaoaoe/Paddle that referenced this pull request Mar 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.