Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

【Hackathon 5th No.33】Fix test_atleast_nd at Mac -part #59103

Merged
merged 20 commits into from
Nov 21, 2023

Conversation

megemini
Copy link
Contributor

PR types

Bug fixes

PR changes

Others

Description

验证 mac 下单测失败 #58323

Copy link

paddle-bot bot commented Nov 17, 2023

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@paddle-bot paddle-bot bot added the contributor External developers label Nov 17, 2023
@luotao1 luotao1 self-assigned this Nov 17, 2023
@luotao1 luotao1 changed the title [Fix] Fix test_atleast_nd at Mac 【Hackathon 5th No.33】Fix test_atleast_nd at Mac Nov 17, 2023
@megemini
Copy link
Contributor Author

Update 20231119

之前测试用例中存在两个问题:

1. numpy 版本低,抛出不同的异常类型

问题现象

之前出现的测试挂掉:

单测 : test_atleast_nd, PR: 57785,CI : PR-CI-Mac-Python3,url : https://xly.bce.baidu.com/paddlepaddle/paddle/newipipe/detail/9541163
单测 : test_atleast_nd, PR: 59070,CI : PR-CI-Mac-Python3,url : https://xly.bce.baidu.com/paddlepaddle/paddle/newipipe/detail/9541296
单测 : test_atleast_nd, PR: 57879,CI : PR-CI-Mac-Python3,url : https://xly.bce.baidu.com/paddlepaddle/paddle/newipipe/detail/9541705
单测 : test_atleast_nd, PR: 59040,CI : PR-CI-Mac-Python3,url : https://xly.bce.baidu.com/paddlepaddle/paddle/newipipe/detail/9541888

问题分析

分析日志可以看到:

2023-11-16 20:49:54 Requirement already satisfied: numpy>=1.20 in /Users/paddle/Library/Python/3.9/lib/python/site-packages (from -r python/unittest_py/requirements.txt (line 14)) (1.23.5)
2023-11-16 20:49:55 Collecting numpy>=1.20 (from -r python/unittest_py/requirements.txt (line 14))
2023-11-16 20:49:55   Obtaining dependency information for numpy>=1.20 from https://files.pythonhosted.org/packages/b1/c0/563ef35266a30adfb9801bd1b366bc4f67ff9cfed5e707ae2831b3f6a27c/numpy-1.26.2-cp39-cp39-macosx_10_9_x86_64.whl.metadata
2023-11-16 20:49:55   Using cached numpy-1.26.2-cp39-cp39-macosx_10_9_x86_64.whl.metadata (61 kB)

环境中的 numpy 版本为 1.23.5,注意,后面的 1.26.2 是分析依赖使用,不是安装的版本。

此时,对于单测中的混合收入,如:(123, [123]) 只会提示遗弃警告,而不是抛出异常:

In [1]: import numpy as np

In [2]: np.array((123, [123]))
<ipython-input-2-ed5359ff8a8d>:1: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
  np.array((123, [123]))
Out[2]: array([123, list([123])], dtype=object)

In [3]: np.__version__
Out[3]: '1.23.5'

由此,/home/shun/Documents/Projects/paddle/megemini/Paddle/python/paddle/base/executor.py 对于混合输入没有抛出 ValueError ,而是日志中的 TypeError

def _as_lodtensor(data, place, dtype=None):
...
        elif isinstance(data, (list, tuple)):
            data = np.array(data) # 只有 warning,没有 error
            if data.dtype == np.object_:
                raise TypeError(
                    "\n\tFaild to convert input data to a regular ndarray :\n\t* Usually "
                    "this means the input data contains nested lists with different lengths. "
                    "Please consider using 'base.create_lod_tensor' to convert it to a LoD-Tensor."
                )
            data = data.astype(dtype)
        else:
            raise TypeError(
                f"Convert data of type {type(data)} to Tensor is not supported"
            )

对于没有挂掉的 CI,分析日志:

2023-11-18 18:56:06 Requirement already satisfied: numpy>=1.20 in /Users/paddle/Library/Python/3.9/lib/python/site-packages (from -r python/unittest_py/requirements.txt (line 14)) (1.24.4)
2023-11-18 18:56:07 Collecting numpy>=1.20 (from -r python/unittest_py/requirements.txt (line 14))
2023-11-18 18:56:07   Obtaining dependency information for numpy>=1.20 from https://files.pythonhosted.org/packages/b1/c0/563ef35266a30adfb9801bd1b366bc4f67ff9cfed5e707ae2831b3f6a27c/numpy-1.26.2-cp39-cp39-macosx_10_9_x86_64.whl.metadata
2023-11-18 18:56:07   Using cached numpy-1.26.2-cp39-cp39-macosx_10_9_x86_64.whl.metadata (61 kB)

使用 numpy 版本为 1.24.4

In [1]: import numpy as np

In [2]: np.array((123, [123]))
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[2], line 1
----> 1 np.array((123, [123]))

ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (2,) + inhomogeneous part.

In [3]: np.__version__
Out[3]: '1.24.4'

此处抛出 ValueError,符合原来单测中 with self.assertRaises(ValueError),因此可以通过 CI 测试。

解决方案

通过在单测中捕获两种异常解决:

class TestAtleastErrorCombineInputs(BaseTest):
    """test combine inputs, like: `at_leastNd((x, y))`, where paddle treats like numpy"""

    def test_all(self):
        with self.assertRaises((ValueError, TypeError)):
            self._test_dygraph_api(
                self.inputs, self.dtypes, self.shapes, self.names
            )

        with self.assertRaises((ValueError, TypeError)):
            self._test_static_api(
                self.inputs, self.dtypes, self.shapes, self.names
            )

此处没有通过判断 numpy 版本进行分流测试,因为不确定后续 numpy 是否还会变更。

结论

修改前:

  • 在 aistudio 环境中更改 numpy 版本为 1.23.5 可以复现错误
  • 在 aistudio 环境中更改 numpy 版本为 1.24.4 错误消失

修改后:

  • 在 aistudio 环境中更改 numpy 版本为 1.23.5 错误消失,warning 仍然存在
  • 在 aistudio 环境中更改 numpy 版本为 1.24.4 错误消失

2. 没有保证在动态图中测试

问题现象

如果在执行单测 TestAtleastAsTensorMethod 前,环境没有进入 动态图,则会抛出异常:

data = var assign_0.tmp_0 : LOD_TENSOR.shape().dtype(int64).stop_gradient(True), dtype = None, place = Place(cpu)
stop_gradient = True

    def _to_tensor_non_static(data, dtype=None, place=None, stop_gradient=True):
    ...
            else:
>               raise TypeError(
                    "Can't constructs a 'paddle.Tensor' with data type {}, data type must be scalar|list|tuple|np.ndarray|paddle.Tensor".format(
                        type(data)
                    )
                )
E               TypeError: Can't constructs a 'paddle.Tensor' with data type <class 'paddle.base.framework.Variable'>, data type must be scalar|list|tuple|np.ndarray|paddle.Tensor

/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddle/tensor/creation.py:607: TypeError
问题分析
  • 单测执行前应该先保证测试环境,静态图/动态图
  • 由于这个测试用例是后来加的,大意了 🫣 ... ...
In [1]: import paddle

In [2]: paddle.enable_static()

In [3]: tensor = paddle.to_tensor(123)

In [4]: paddle.disable_static()

In [5]: tensor.atleast_1d()
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[5], line 1
----> 1 tensor.atleast_1d()

File /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddle/tensor/manipulation.py:4060, in atleast_1d(name, *inputs)
   4058 out = []
   4059 for tensor in inputs:
-> 4060     tensor = paddle.to_tensor(tensor)
   4061     if tensor.dim() == 0:
   4062         result = tensor.reshape((1,))

File /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddle/tensor/creation.py:794, in to_tensor(data, dtype, place, stop_gradient)
    792     place = _current_expected_place_()
    793 if in_dynamic_mode():
--> 794     return _to_tensor_non_static(data, dtype, place, stop_gradient)
    796 # call assign for static graph
    797 else:
    798     re_exp = re.compile(r'[(](.+?)[)]', re.S)

File /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddle/tensor/creation.py:607, in _to_tensor_non_static(data, dtype, place, stop_gradient)
    605     return data
    606 else:
--> 607     raise TypeError(
    608         "Can't constructs a 'paddle.Tensor' with data type {}, data type must be scalar|list|tuple|np.ndarray|paddle.Tensor".format(
    609             type(data)
    610         )
    611     )
    612 if not dtype:
    613     if data.dtype in [
    614         'float16',
    615         'float32',
   (...)
    618         'complex128',
    619     ]:

TypeError: Can't constructs a 'paddle.Tensor' with data type <class 'paddle.base.framework.Variable'>, data type must be scalar|list|tuple|np.ndarray|paddle.Tensor
解决方案

将 tensor 的创建放置到切换为动态图之后:

class TestAtleastAsTensorMethod(unittest.TestCase):
    def test_as_tensor_method(self):
        input = 123

        for place in PLACES:
            paddle.disable_static(place)

            tensor = paddle.to_tensor(input)

可以验证:

In [1]: import paddle

In [2]: paddle.disable_static()

In [3]: tensor = paddle.to_tensor(123)
tens
In [4]: tensor.atleast_1d()
W1119 12:58:47.352586 10509 gpu_resources.cc:119] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 12.0, Runtime API Version: 11.6
W1119 12:58:47.363787 10509 gpu_resources.cc:164] device: 0, cuDNN Version: 8.4.
Out[4]: 
Tensor(shape=[1], dtype=int64, place=Place(gpu:0), stop_gradient=True,
       [123])
结论

修改前:

  • 单测 TestAtleastAsTensorMethod 会抛出 TypError

修改后:

  • 单测 TestAtleastAsTensorMethod 会无异常

总结:

  • atleast_xd 这几个 api 本身应该没什么问题
  • 对 numpy 版本不熟悉导致第一个问题出现
  • 而第二个问题纯属自己大意了,需要面壁思过 ... ...

另外,单测增加了 set_device 语句,手动切换(动态图)运行设备 ~

@luotao1 @SigureMo

@luotao1 luotao1 changed the title 【Hackathon 5th No.33】Fix test_atleast_nd at Mac 【Hackathon 5th No.33】Fix test_atleast_nd at Mac -part Nov 21, 2023
@luotao1 luotao1 merged commit 9b36e53 into PaddlePaddle:develop Nov 21, 2023
28 checks passed
SecretXV pushed a commit to SecretXV/Paddle that referenced this pull request Nov 28, 2023
…9103)

* [Init] add atleast api

* [Add] add atleast test

* [Fix] import atleast

* [Change] test_atleast.py to test_atleast_nd.py and add bool data type test

* [Update] update dtype supports and unittest

* [Fix] dtype error unittest

* [Change] static test with test_with_pir_api

* [Add] atleast_Nd as tensor method

* [Fix] test_atleast_nd disable_static static graph

* [Change] rename test file

* [CAUTION] for debug

* [CAUTION] for debug

* [Fix] test cast with low numpy version

* [Add] add set_device
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants