Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xdl.parsers.pb格式如何解析为文本数据? #339

Open
mrchor opened this issue Aug 13, 2020 · 4 comments
Open

xdl.parsers.pb格式如何解析为文本数据? #339

mrchor opened this issue Aug 13, 2020 · 4 comments

Comments

@mrchor
Copy link

mrchor commented Aug 13, 2020

我想问下,是否可以用普通Python模块解析xdl.parsers.pb格式?xdl不太好编译。

@deerluffy
Copy link

解析这个是为了什么呢?xdl 编译还好,装个docker 比较容啊

@mrchor
Copy link
Author

mrchor commented Aug 21, 2020

哦哦,好的,想另外问一些ESMM模型对应的Ali-CCP数据集的处理,有两个问题:
1)这个里面的所有特征都是放到一个embedding矩阵表去lookup的么?是否有测试过各自特征域分别建立embedding矩阵表的效果?
2)针对这个embedding矩阵表是进行了hash处理呢,还是做的raw千万级的embedding矩阵表进行训练呢?

@deerluffy
Copy link

没有使用过,不敢乱答。只说一下之前做的一些处理,会把特征映射到一个id空间内,比如gender-男映射为1 gender-女为2 ,这样基本一个embedding就可以表示所有特征id,xdl会把embedding 分散在多个ps 之间进行存储。
看到阿里天池的数据,你这是作比赛吗?

@mrchor
Copy link
Author

mrchor commented Aug 24, 2020

这个不是比赛数据,是阿里开源出来的多任务模型对应的数据集。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants