diff --git a/docs/en/advanced_tutorials/data_element.md b/docs/en/advanced_tutorials/data_element.md
index 38965e8177..8a204c27f2 100644
--- a/docs/en/advanced_tutorials/data_element.md
+++ b/docs/en/advanced_tutorials/data_element.md
@@ -1008,7 +1008,7 @@ In this section, we use MMDetection to demonstrate how to migrate the abstract d
 
 ### 1. Simplify the module interface
 
-Detector's external interfaces can be significantly simplified and unified. In the training process of a single-stage detection and segmentation algorithm in MMDet 2.X, `SingleStageDetector` requires `img`, `img_metas`, `gt_bboxes`， `gt_labels` and `gt_bboxes_ignore` as the inputs, but `SingleStageInstanceSegmentor` requires `gt_masks` as well. This causes inconsistency in the training interface and affects flexibility.
+Detector's external interfaces can be significantly simplified and unified. In the training process of a single-stage detection and segmentation algorithm in MMDet 2.X, `SingleStageDetector` requires `img`, `img_metas`, `gt_bboxes`, `gt_labels` and `gt_bboxes_ignore` as the inputs, but `SingleStageInstanceSegmentor` requires `gt_masks` as well. This causes inconsistency in the training interface and affects flexibility.
 
 ```python
 class SingleStageDetector(BaseDetector):
diff --git a/docs/en/common_usage/better_optimizers.md b/docs/en/common_usage/better_optimizers.md
index c66b5e949c..23f4f075bf 100644
--- a/docs/en/common_usage/better_optimizers.md
+++ b/docs/en/common_usage/better_optimizers.md
@@ -4,7 +4,7 @@ This document provides some third-party optimizers supported by MMEngine, which
 
 ## D-Adaptation
 
-[D-Adaptation](https://github.com/facebookresearch/dadaptation) provides `DAdaptAdaGrad`, `DAdaptAdam` and `DAdaptSGD` optimziers。
+[D-Adaptation](https://github.com/facebookresearch/dadaptation) provides `DAdaptAdaGrad`, `DAdaptAdam` and `DAdaptSGD` optimizers.
 
 ```{note}
 If you use the optimizer provided by D-Adaptation, you need to upgrade mmengine to `0.6.0`.
@@ -35,7 +35,7 @@ runner.train()
 
 ## Lion-Pytorch
 
-[lion-pytorch](https://github.com/lucidrains/lion-pytorch) provides the `Lion` optimizer。
+[lion-pytorch](https://github.com/lucidrains/lion-pytorch) provides the `Lion` optimizer.
 
 ```{note}
 If you use the optimizer provided by Lion-Pytorch, you need to upgrade mmengine to `0.6.0`.
@@ -93,7 +93,7 @@ runner.train()
 
 ## bitsandbytes
 
-[bitsandbytes](https://github.com/TimDettmers/bitsandbytes) provides `AdamW8bit`, `Adam8bit`, `Adagrad8bit`, `PagedAdam8bit`, `PagedAdamW8bit`, `LAMB8bit`, `LARS8bit`, `RMSprop8bit`, `Lion8bit`, `PagedLion8bit` and `SGD8bit` optimziers。
+[bitsandbytes](https://github.com/TimDettmers/bitsandbytes) provides `AdamW8bit`, `Adam8bit`, `Adagrad8bit`, `PagedAdam8bit`, `PagedAdamW8bit`, `LAMB8bit`, `LARS8bit`, `RMSprop8bit`, `Lion8bit`, `PagedLion8bit` and `SGD8bit` optimizers.
 
 ```{note}
 If you use the optimizer provided by bitsandbytes, you need to upgrade mmengine to `0.9.0`.
@@ -124,7 +124,7 @@ runner.train()
 
 ## transformers
 
-[transformers](https://github.com/huggingface/transformers) provides `Adafactor` optimzier。
+[transformers](https://github.com/huggingface/transformers) provides `Adafactor` optimzier.
 
 ```{note}
 If you use the optimizer provided by transformers, you need to upgrade mmengine to `0.9.0`.
diff --git a/docs/en/common_usage/debug_tricks.md b/docs/en/common_usage/debug_tricks.md
index 641077f260..6df05055ea 100644
--- a/docs/en/common_usage/debug_tricks.md
+++ b/docs/en/common_usage/debug_tricks.md
@@ -30,7 +30,7 @@ train_dataloader = dict(
         type=dataset_type,
         data_prefix='data/cifar10',
         test_mode=False,
-        indices=5000,  # set indices=5000，represent every epoch only iterator 5000 samples
+        indices=5000,  # set indices=5000, represent every epoch only iterator 5000 samples
         pipeline=train_pipeline),
     sampler=dict(type='DefaultSampler', shuffle=True),
 )
diff --git a/docs/en/design/infer.md b/docs/en/design/infer.md
index f0f426e03d..c340a10f9f 100644
--- a/docs/en/design/infer.md
+++ b/docs/en/design/infer.md
@@ -87,10 +87,10 @@ OpenMMLab requires the `inferencer(img)` to output a `dict` containing two field
 
 When performing inference, the following steps are typically executed:
 
-1. preprocess：Input data preprocessing, including data reading, data preprocessing, data format conversion, etc.
+1. preprocess: Input data preprocessing, including data reading, data preprocessing, data format conversion, etc.
 2. forward: Execute `model.forwward`
-3. visualize：Visualization of predicted results.
-4. postprocess：Post-processing of predicted results, including result format conversion, exporting predicted results, etc.
+3. visualize: Visualization of predicted results.
+4. postprocess: Post-processing of predicted results, including result format conversion, exporting predicted results, etc.
 
 To improve the user experience of the inferencer,  we do not want users to have to configure parameters for each step when performing inference. In other words, we hope that users can simply configure parameters for the `__call__` interface without being aware of the above process and complete the inference.
 
@@ -173,8 +173,8 @@ Initializes and returns the `visualizer` required by the inferencer, which is eq
 
 Input arguments:
 
-- inputs：Input data, passed into `__call__`, usually a list of image paths or image data.
-- batch_size：batch size, passed in by the user when calling `__call__`.
+- inputs: Input data, passed into `__call__`, usually a list of image paths or image data.
+- batch_size: batch size, passed in by the user when calling `__call__`.
 - Other parameters: Passed in by the user and specified in `preprocess_kwargs`.
 
 Return:
@@ -187,7 +187,7 @@ The `preprocess` function is a generator function by default, which applies the
 
 Input arguments:
 
-- inputs：The batch data processed by `preprocess` function.
+- inputs: The batch data processed by `preprocess` function.
 - Other parameters: Passed in by the user and specified in `forward_kwargs`.
 
 Return:
@@ -204,9 +204,9 @@ This is an abstract method that must be implemented by the subclass.
 
 Input arguments:
 
-- inputs：The input data, which is the raw data without preprocessing.
-- preds：Predicted results of the model.
-- show：Whether to visualize.
+- inputs: The input data, which is the raw data without preprocessing.
+- preds: Predicted results of the model.
+- show: Whether to visualize.
 - Other parameters: Passed in by the user and specified in `visualize_kwargs`.
 
 Return:
@@ -221,12 +221,12 @@ This is an abstract method that must be implemented by the subclass.
 
 Input arguments:
 
-- preds：The predicted results of the model, which is a `list` type. Each element in the list represents the prediction result for a single data item. In the OpenMMLab series of algorithm libraries, the type of each element in the prediction result is `BaseDataElement`.
-- visualization：Visualization results
+- preds: The predicted results of the model, which is a `list` type. Each element in the list represents the prediction result for a single data item. In the OpenMMLab series of algorithm libraries, the type of each element in the prediction result is `BaseDataElement`.
+- visualization: Visualization results
 - return_datasample: Whether to maintain datasample for return. When set to `False`, the returned result is converted to a `dict`.
 - Other parameters: Passed in by the user and specified in `postprocess_kwargs`.
 
-Return：
+Return:
 
 - The type of the returned value is a dictionary containing both the visualization and prediction results. OpenMMLab requires the returned dictionary to have two keys: `predictions` and `visualization`.
 
@@ -234,9 +234,9 @@ Return：
 
 Input arguments:
 
-- inputs：The input data, usually a list of image paths or image data. Each element in `inputs` can also be other types of data as long as it can be processed by the `pipeline` returned by [init_pipeline](#_init_pipeline). When there is only one inference data in `inputs`, it does not have to be a `list`, `__call__` will internally wrap it into a list for further processing.
+- inputs: The input data, usually a list of image paths or image data. Each element in `inputs` can also be other types of data as long as it can be processed by the `pipeline` returned by [init_pipeline](#_init_pipeline). When there is only one inference data in `inputs`, it does not have to be a `list`, `__call__` will internally wrap it into a list for further processing.
 - return_datasample: Whether to convert datasample to dict for return.
-- batch_size：Batch size for inference, which will be further passed to the `preprocess` function.
+- batch_size: Batch size for inference, which will be further passed to the `preprocess` function.
 - Other parameters: Additional parameters assigned to `preprocess`, `forward`, `visualize`, and `postprocess` methods.
 
 Return:
diff --git a/docs/en/design/logging.md b/docs/en/design/logging.md
index 8110ef8579..68a976bfc1 100644
--- a/docs/en/design/logging.md
+++ b/docs/en/design/logging.md
@@ -74,11 +74,11 @@ history_buffer.min()
 # 1, the global minimum
 
 history_buffer.max(2)
-# 3，the maximum in [2, 3]
+# 3, the maximum in [2, 3]
 history_buffer.min()
 # 3, the global maximum
 history_buffer.mean(2)
-# 2.5，the mean value in [2, 3], (2 + 3) / (1 + 1)
+# 2.5, the mean value in [2, 3], (2 + 3) / (1 + 1)
 history_buffer.mean()
 # 2, the global mean, (1 + 2 + 3) / (1 + 1 + 1)
 history_buffer = HistoryBuffer([1, 2, 3], [2, 2, 2])  # Cases when counts are not 1
@@ -431,7 +431,7 @@ In the case of multiple processes in multiple nodes without storage, logs are or
 
 ```text
 # without shared storage
-# node 0：
+# node 0:
 work_dir/20230228_141908
 ├── 20230306_183634_${hostname}_device0_rank0.log
 ├── 20230306_183634_${hostname}_device1_rank1.log
@@ -442,7 +442,7 @@ work_dir/20230228_141908
 ├── 20230306_183634_${hostname}_device6_rank6.log
 ├── 20230306_183634_${hostname}_device7_rank7.log
 
-# node 7：
+# node 7:
 work_dir/20230228_141908
 ├── 20230306_183634_${hostname}_device0_rank56.log
 ├── 20230306_183634_${hostname}_device1_rank57.log
diff --git a/docs/en/get_started/15_minutes.md b/docs/en/get_started/15_minutes.md
index 7902c5dbee..7ec7ed09d7 100644
--- a/docs/en/get_started/15_minutes.md
+++ b/docs/en/get_started/15_minutes.md
@@ -1,6 +1,6 @@
 # 15 minutes to get started with MMEngine
 
-In this tutorial, we'll take training a ResNet-50 model on CIFAR-10 dataset as an example. We will build a complete and configurable pipeline for both training and validation in only 80 lines of code with `MMEgnine`.
+In this tutorial, we'll take training a ResNet-50 model on CIFAR-10 dataset as an example. We will build a complete and configurable pipeline for both training and validation in only 80 lines of code with `MMEngine`.
 The whole process includes the following steps:
 
 - [15 minutes to get started with MMEngine](#15-minutes-to-get-started-with-mmengine)
diff --git a/docs/en/migration/hook.md b/docs/en/migration/hook.md
index 0d4ac06dd2..a6276a2dd4 100644
--- a/docs/en/migration/hook.md
+++ b/docs/en/migration/hook.md
@@ -156,7 +156,7 @@ This tutorial compares the difference in function, mount point, usage and implem
   <tr>
     <td>after each iteration</td>
     <td>after_train_iter</td>
-    <td>after_train_iter, with additional args: batch_idx、data_batch, and outputs</td>
+    <td>after_train_iter, with additional args: batch_idx, data_batch, and outputs</td>
   </tr>
   <tr>
     <td rowspan="6">Validation related</td>
@@ -187,7 +187,7 @@ This tutorial compares the difference in function, mount point, usage and implem
   <tr>
     <td>after each iteration</td>
     <td>after_val_iter</td>
-    <td>after_val_iter, with additional args: batch_idx、data_batch and outputs</td>
+    <td>after_val_iter, with additional args: batch_idx, data_batch and outputs</td>
   </tr>
   <tr>
     <td rowspan="6">Test related</td>
@@ -218,7 +218,7 @@ This tutorial compares the difference in function, mount point, usage and implem
   <tr>
     <td>after each iteration</td>
     <td>None</td>
-    <td>after_test_iter, with additional args: batch_idx、data_batch and outputs</td>
+    <td>after_test_iter, with additional args: batch_idx, data_batch and outputs</td>
   </tr>
 </tbody>
 </table>
diff --git a/docs/en/migration/model.md b/docs/en/migration/model.md
index c956871ede..1f1b5b1d6c 100644
--- a/docs/en/migration/model.md
+++ b/docs/en/migration/model.md
@@ -393,7 +393,7 @@ MMCV will wrap the model with distributed wrapper before building the runner, wh
    cfg = dict(model_wrapper_cfg='MMSeparateDistributedDataParallel')
    runner = Runner(
        model=model,
-       ..., # 其他配置
+       ...,
        launcher='pytorch',
        cfg=cfg)
    ```
diff --git a/docs/en/notes/changelog.md b/docs/en/notes/changelog.md
index 30f9b0e1e6..08a54c7130 100644
--- a/docs/en/notes/changelog.md
+++ b/docs/en/notes/changelog.md
@@ -687,7 +687,7 @@ A total of 16 developers contributed to this release. Thanks [@BayMaxBHL](https:
 ### Bug Fixes
 
 - Fix error calculation of `eta_min` in `CosineRestartParamScheduler` by [@Z-Fran](https://github.com/Z-Fran) in https://github.com/open-mmlab/mmengine/pull/639
-- Fix `BaseDataPreprocessor.cast_data` could not handle string data by [@HAOCHENYE](https://github.com/HAOCHENYE) in https://github.com/open-mmlab/mmengine/pull/602
+- Fix `BaseDataPreprocessor.cast_data` could not handle string data by [@HAOCHENYE](https://github.com/HAOCHENYE) in https://github.com/open-mmlab/mmengine/pull/602
 - Make `autocast` compatible with mps by [@HAOCHENYE](https://github.com/HAOCHENYE) in https://github.com/open-mmlab/mmengine/pull/587
 - Fix error format of log message by [@HAOCHENYE](https://github.com/HAOCHENYE) in https://github.com/open-mmlab/mmengine/pull/508
 - Fix error implementation of `is_model_wrapper` by [@HAOCHENYE](https://github.com/HAOCHENYE) in https://github.com/open-mmlab/mmengine/pull/640
diff --git a/docs/en/tutorials/model.md b/docs/en/tutorials/model.md
index 5a3785457a..0d96210edb 100644
--- a/docs/en/tutorials/model.md
+++ b/docs/en/tutorials/model.md
@@ -43,7 +43,7 @@ Usually, we should define a model to implement the body of the algorithm. In MME
 Benefits from the `BaseModel`, we only need to make the model inherit from `BaseModel`, and implement the `forward` function to perform the training, testing, and validation process.
 
 ```{note}
-BaseModel inherits from [BaseModule](../advanced_tutorials/initialize.md)，which can be used to initialize the model parameters dynamically.
+BaseModel inherits from [BaseModule](../advanced_tutorials/initialize.md), which can be used to initialize the model parameters dynamically.
 ```
 
 [**forward**](mmengine.model.BaseModel.forward): The arguments of `forward` need to match with the data given by [DataLoader](https://pytorch.org/tutorials/beginner/basics/data_tutorial.html). If the DataLoader samples a tuple `data`, `forward` needs to accept the value of unpacked `*data`. If DataLoader returns a dict `data`, `forward` needs to accept the key-value of unpacked `**data`. `forward` also accepts `mode` parameter, which is used to control the running branch:
diff --git a/docs/en/tutorials/param_scheduler.md b/docs/en/tutorials/param_scheduler.md
index d8e6ecef2e..94d779069e 100644
--- a/docs/en/tutorials/param_scheduler.md
+++ b/docs/en/tutorials/param_scheduler.md
@@ -40,7 +40,7 @@ for epoch in range(10):
 
 `mmengine.optim.scheduler` supports most of PyTorch's learning rate schedulers such as `ExponentialLR`, `LinearLR`, `StepLR`, `MultiStepLR`, etc. Please refer to [parameter scheduler API documentation](https://mmengine.readthedocs.io/en/latest/api/optim.html#scheduler) for all of the supported schedulers.
 
-MMEngine also supports adjusting momentum with parameter schedulers. To use momentum schedulers, replace `LR` in the class name to `Momentum`, such as `ExponentialMomentum`，`LinearMomentum`. Further, we implement the general parameter scheduler ParamScheduler, which is used to adjust the specified hyperparameters in the optimizer, such as weight_decay, etc. This feature makes it easier to apply some complex hyperparameter tuning strategies.
+MMEngine also supports adjusting momentum with parameter schedulers. To use momentum schedulers, replace `LR` in the class name to `Momentum`, such as `ExponentialMomentum`, `LinearMomentum`. Further, we implement the general parameter scheduler ParamScheduler, which is used to adjust the specified hyperparameters in the optimizer, such as weight_decay, etc. This feature makes it easier to apply some complex hyperparameter tuning strategies.
 
 Different from the above example, MMEngine usually does not need to manually implement the training loop and call `optimizer.step()`. The runner will automatically manage the training progress and control the execution of the parameter scheduler through `ParamSchedulerHook`.
 
diff --git a/docs/en/tutorials/runner.md b/docs/en/tutorials/runner.md
index 490390961a..6f0ec45821 100644
--- a/docs/en/tutorials/runner.md
+++ b/docs/en/tutorials/runner.md
@@ -20,7 +20,7 @@ Pros and cons lie in both approaches. For the former one, beginners may be lost
 
 We argue that the key to learning runner is using it as a memo. You should remember its most commonly used arguments and only focus on those less used when in need, since default values usually work fine. In the following, we will provide a beginner-friendly example to illustrate the most commonly used arguments of the runner, along with advanced guidelines for those less used.
 
-### A beginer-friendly example
+### A beginner-friendly example
 
 ```{hint}
 In this tutorial, we hope you can focus more on overall architecture instead of implementation details. This "top-down" way of thinking is exactly what we advocate. Don't worry, you will definitely have plenty of opportunities and guidance afterward to focus on modules you want to improve.
diff --git a/mmengine/config/config.py b/mmengine/config/config.py
index a782c1a00c..1fe0a9ec24 100644
--- a/mmengine/config/config.py
+++ b/mmengine/config/config.py
@@ -901,7 +901,7 @@ def _file2dict(
                     # 2. Set `_scope_` for the outer dict variable for the base
                     # config.
                     # 3. Set `scope` attribute for each base variable.
-                    # Different from `_scope_`， `scope` is not a key of base
+                    # Different from `_scope_`, `scope` is not a key of base
                     # dict, `scope` attribute will be parsed to key `_scope_`
                     # by function `_parse_scope` only if the base variable is
                     # accessed by the current config.
diff --git a/mmengine/logging/message_hub.py b/mmengine/logging/message_hub.py
index 69be60e751..82565d8832 100644
--- a/mmengine/logging/message_hub.py
+++ b/mmengine/logging/message_hub.py
@@ -317,7 +317,7 @@ def get_info(self, key: str, default: Optional[Any] = None) -> Any:
         if key not in self.runtime_info:
             return default
         else:
-            # TODO： There are restrictions on objects that can be saved
+            # TODO: There are restrictions on objects that can be saved
             # return copy.deepcopy(self._runtime_info[key])
             return self._runtime_info[key]
 
diff --git a/mmengine/model/base_model/data_preprocessor.py b/mmengine/model/base_model/data_preprocessor.py
index af84246874..a101855203 100644
--- a/mmengine/model/base_model/data_preprocessor.py
+++ b/mmengine/model/base_model/data_preprocessor.py
@@ -235,7 +235,7 @@ def __init__(self,
         self.pad_value = pad_value
 
     def forward(self, data: dict, training: bool = False) -> Union[dict, list]:
-        """Performs normalization、padding and bgr2rgb conversion based on
+        """Performs normalization, padding and bgr2rgb conversion based on
         ``BaseDataPreprocessor``.
 
         Args:
diff --git a/mmengine/visualization/visualizer.py b/mmengine/visualization/visualizer.py
index 3b31c67727..a0d579a4d9 100644
--- a/mmengine/visualization/visualizer.py
+++ b/mmengine/visualization/visualizer.py
@@ -989,7 +989,7 @@ def draw_featmap(featmap: torch.Tensor,
                     f'overlaid_image: {overlaid_image.shape[:2]} and '
                     f'featmap: {featmap.shape[1:]} are not same, '
                     f'the feature map will be interpolated. '
-                    f'This may cause mismatch problems ！')
+                    f'This may cause mismatch problems !')
                 if resize_shape is None:
                     featmap = F.interpolate(
                         featmap[None],