Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

训练内存泄漏问题 #33790

Closed
yuandanfei opened this issue Jun 27, 2021 · 5 comments
Closed

训练内存泄漏问题 #33790

yuandanfei opened this issue Jun 27, 2021 · 5 comments
Assignees

Comments

@yuandanfei
Copy link

yuandanfei commented Jun 27, 2021

1)PaddlePaddle版本:2.1.0
2)系统环境:aistudio
描述:训练过程中内存一直增加

import os, cv2, paddle
import numpy as np
import io
from PIL import Image

class WeedDataset(paddle.io.Dataset):
    def __init__(self, lists_path, mode='train'):
        super().__init__()
        assert os.path.exists(lists_path), f"错误:{lists_path}不存在!"             # 检测文件是否存在
        assert mode in ['train', 'valid'], "错误:数据读取模式必须为'train'或'valid'!" # 检测数据读取模式

        self.lists_path = lists_path                   # 列表文件路径
        self.lists_dire = os.path.split(lists_path)[0] # 列表文件目录

        self.image_list = [] # 图像路径列表
        self.label_list = [] # 标签编号列表

        with open(self.lists_path, 'r') as f: # 打开列表文件
            for line in f.readlines(): # 遍历每行记录
                image_path, label_id = line.strip().split() # 读取一行记录
                self.image_list.append(os.path.join(self.lists_dire, image_path)) # 添加图像路径
                self.label_list.append(label_id)                                  # 添加标签编号

        if mode == 'train':
            self.transforms = paddle.vision.transforms.Compose([
                paddle.vision.transforms.RandomCrop(size=256, padding=24), # 随机填充裁剪
                paddle.vision.transforms.RandomHorizontalFlip(prob=0.5),   # 随机水平翻转
                paddle.vision.transforms.Transpose(order=(2, 0, 1)),       # 变换图像通道
                paddle.vision.transforms.Normalize(mean=127.5, std=127.5)  # 图像数据归一
            ])
        else:
            self.transforms = paddle.vision.transforms.Compose([
                paddle.vision.transforms.Transpose(order=(2, 0, 1)),       # 变换图像通道
                paddle.vision.transforms.Normalize(mean=127.5, std=127.5)  # 图像数据归一
            ])

    def __getitem__(self, index):
        with open(self.image_list[index], 'rb') as f: # 打开图像文件
            image = Image.open(io.BytesIO(f.read()))
            if image.mode != 'RGB':
                image = img.convert('RGB')

        image = np.array(self.transforms(image), dtype='float32')

        label = np.array(self.label_list[index], dtype='int64')

        return image, label

    def __len__(self):
        return len(self.image_list)

train_batch = 256                # 训练批次大小
num_classes = 9                  # 物体类别数量
train_path  = './data/train.txt' # 训练数据路径
save_path   = './out/'           # 模型保存路径
epoch_num   = 600                # 训练轮次总数
save_freq   = 200                # 模型保存频率
lr_start    = 0.5                # 余弦初学习率
momentum    = 0.9                # 优化器动量值
l2_decay    = 0.00005            # 权重衰减系数

train_dataset = WeedDataset(train_path, 'train') # 训练数据

train_loader = paddle.io.DataLoader( # 训练集迭代器
    dataset=train_dataset,
    batch_size=train_batch,
    shuffle=True,
    num_workers=0,
    return_list=True,
    use_buffer_reader=True,
    use_shared_memory=False
)

block = paddle.vision.models.resnet.BottleneckBlock            # 残差模块
depth = 50                                                     # 网络深度
model = paddle.vision.models.ResNet(block, depth, num_classes) # 声明模型
model = paddle.Model(model)                                    # 封装模型

scheduler = paddle.optimizer.lr.CosineAnnealingDecay( # 学习率策略
    learning_rate=lr_start, 
    T_max=epoch_num
)

optimizer = paddle.optimizer.Momentum(                # 优化器算法
    learning_rate=scheduler,
    momentum=momentum,
    weight_decay=paddle.regularizer.L2Decay(l2_decay),
    parameters=model.parameters()
)

model.prepare(
    optimizer=optimizer,              # 优化算法
    loss=paddle.nn.CrossEntropyLoss() # 损失函数
)

model.fit(
    train_data=train_loader, # 训练数据
    epochs    =epoch_num,    # 训练轮次
    save_dir  =save_path,    # 保存目录
    save_freq =save_freq,    # 保存频率
    verbose   =1,            # 打印日志
)
@paddle-bot-old
Copy link

您好,我们已经收到了您的问题,会安排技术人员尽快解答您的问题,请耐心等待。请您再次检查是否提供了清晰的问题描述、复现代码、环境&版本、报错信息等。同时,您也可以通过查看官网API文档常见问题历史IssueAI社区来寻求解答。祝您生活愉快~

Hi! We've received your issue and please be patient to get responded. We will arrange technicians to answer your questions as soon as possible. Please make sure that you have posted enough message to demo your request. You may also check out the APIFAQGithub Issue and AI community to get the answer.Have a nice day!

@yuandanfei
Copy link
Author

PaddlePaddle版本降为版本:2.0.2后内存泄漏问题消失

@rs-lsl
Copy link

rs-lsl commented Jun 28, 2021

感谢感谢感谢感谢感谢

@heavengate
Copy link
Contributor

你好,你的数据集用的是哪个数据集呢

@qingqing01
Copy link
Contributor

Fixed in #34140

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants