diff --git a/README.md b/README.md index a88686b..1f08631 100644 --- a/README.md +++ b/README.md @@ -1,7 +1,10 @@ # pandas-vet -![tests](https://github.com/deppen8/pandas-vet/workflows/Test%20and%20lint/badge.svg -) +`pandas-vet` is a plugin for `flake8` that provides opinionated linting for `pandas` code. + +[![docs](https://img.shields.io/badge/deppen8.github.io%2Fpandas--vet-181717?logo=githubpages&label=docs)](https://deppen8.github.io/pandas-vet) + +[![Test and lint](https://github.com/deppen8/pandas-vet/actions/workflows/testing.yml/badge.svg)](https://github.com/deppen8/pandas-vet/actions/workflows/testing.yml) [![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black) [![PyPI - License](https://img.shields.io/pypi/l/pandas-vet.svg)](https://github.com/deppen8/pandas-vet/blob/main/LICENSE) @@ -12,164 +15,50 @@ [![Conda Version](https://img.shields.io/conda/vn/conda-forge/pandas-vet.svg)](https://anaconda.org/conda-forge/pandas-vet) [![Conda Downloads](https://img.shields.io/conda/dn/conda-forge/pandas-vet.svg)](https://anaconda.org/conda-forge/pandas-vet) -`pandas-vet` is a plugin for `flake8` that provides opinionated linting for `pandas` code. - -It began as a project during the PyCascades 2019 sprints. - -## Motivation - -Starting with `pandas` can be daunting. The usual internet help sites are littered with different ways to do the same thing and some features that the `pandas` docs themselves discourage live on in the API. `pandas-vet` is (hopefully) a way to help make `pandas` a little more friendly for newcomers by taking some opinionated stances about `pandas` best practices. It is designed to help users reduce the `pandas` universe. +## Basic usage -The idea to create a linter was sparked by [Ania Kapuścińska](https://twitter.com/lambdanis)'s talk at PyCascades 2019, ["Lint your code responsibly!"](https://youtu.be/hAnCiTpxXPg?t=21814). +Take the following script, `drop_column.py`, which contains valid pandas code: -Many of the opinions stem from [Ted Petrou's](https://twitter.com/TedPetrou) excellent [Minimally Sufficient Pandas](https://medium.com/dunder-data/minimally-sufficient-pandas-a8e67f2a2428). Other ideas are drawn from `pandas` docs or elsewhere. The [Pandas in Black and White](https://deppen8.github.io/pandas-bw/) flashcards have a lot of the same opinions too. +```python +# drop_column.py +import pandas -## Installation - -`pandas-vet` is a plugin for `flake8`. If you don't have `flake8` already, it will install automatically when you install `pandas-vet`. - -The plugin is on PyPI and can be installed with: - -```bash -pip install pandas-vet +df = pandas.DataFrame({ + 'col_a': [i for i in range(20)], + 'col_b': [j for j in range(20, 40)] +}) +df.drop(columns='col_b', inplace=True) ``` -It can also be installed with `conda`: +With `pandas-vet` installed, if we run Flake8 on this script, we will see three warnings raised. ```bash -conda install -c conda-forge pandas-vet -``` - -`pandas-vet` is tested under Python 3.8, 3.9, 3.10, and 3.11 as defined in our [GitHub Actions](https://github.com/deppen8/pandas-vet/blob/main/.github/workflows/testing.yml) - -## Usage - -Once installed successfully in an environment that also has `flake8` installed, `pandas-vet` should run whenever `flake8` is run. +$ flake8 drop_column.py -```bash -flake8 ... +./drop_column.py:2:1: PD001 pandas should always be imported as 'import pandas as pd' +./drop_column.py:4:1: PD901 'df' is a bad variable name. Be kinder to your future self. +./drop_column.py:7:1: PD002 'inplace = True' should be avoided; it has inconsistent behavior ``` -See the [`flake8` docs](http://flake8.pycqa.org/en/latest/user/invocation.html) for more information. - -For a full list of implemented warnings, see [the list below](#list-of-warnings). - -## Contributing - -`pandas-vet` is still in the very early stages. Contributions are welcome from the community on code, tests, docs, and just about anything else. - -### Code of Conduct - -Because this project started during the PyCascades 2019 sprints, we adopt the PyCascades minimal expectation that we "Be excellent to each another". Beyond that, we follow the Python Software Foundation's [Community Code of Conduct](https://www.python.org/psf/codeofconduct/). - -### Steps to contributing - -1. Please submit an issue (or draft PR) first describing the types of changes you'd like to implement. - -2. Fork the repo and create a new branch for your enhancement/fix. - -3. Get a development environment set up with your favorite environment manager (`conda`, `virtualenv`, etc.). - - 0. You must use at least python 3.6 to develop, for [black](https://github.com/psf/black) support. - - 1. You can create one from `pip install -r requirements_dev.txt` or, if you use Docker, you can build an image from the Dockerfile included in this repo. - - 2. Once your enviroment is set up you will need to install pandas-vet in development mode. Use `pip install -e .` (use this if you are alreay in your virtual enviroment) or `pip install -e ` (use this one if not in the virtual enviroment and prefer to state explicitly where it is going). - -4. Write code, docs, etc. - -5. We use `pytest`, `flake8`, and `black` to validate our codebase. TravisCI integration will complain on pull requests if there are any failing tests or lint violations. To check these locally, run the following commands: - - ```bash - pytest --cov="pandas_vet" - ``` +We can use these to improve the code. - ```bash - flake8 pandas_vet setup.py tests --exclude tests/data - ``` +```python +# pandastic_drop_column.py +import pandas as pd - ```bash - black --check pandas_vet setup.py tests --exclude tests/data - ``` - -6. Push to your forked repo. - -7. Submit pull request to the parent repo from your branch. Be sure to write a clear message and reference the Issue # that relates to your pull request. - -8. Feel good about giving back to open source projects. - -### How to add a check to the linter - -1. Write tests. At a *minimum*, you should have test cases where the linter should catch "bad" `pandas` and test cases where the linter should allow "good" `pandas`. - -2. Write your check function in `/pandas-vet/__init__.py`. - -3. Run `flake8` and `pytest` on the linter itself (see [Steps to contributing](#steps-to-contributing)) - -## Contributors - -### PyCascades 2019 sprints team - -- Sam Beck -- [Jacob Deppen](https://twitter.com/jacob_deppen) -- [Walt](https://github.com/wadells) -- Charles Simchick -- [Aly Sivji](https://twitter.com/CaiusSivjus) -- Tim Smith - -### PyCascades 2020 sprints team - -- dat-boris -- [Jacob Deppen](https://twitter.com/jacob_deppen) -- jvano74 -- keturn -- Rhornberger -- tojo13 -- [Walt](https://github.com/wadells) - -### Other awesome contributors - -- Earl Clark -- Leandro Leites -- pwoolvett -- sigmavirus24 - -## List of warnings - -**PD001:** pandas should always be imported as 'import pandas as pd' - -**PD002:** 'inplace = True' should be avoided; it has inconsistent behavior - -**PD003:** '.isna' is preferred to '.isnull'; functionality is equivalent - -**PD004:** '.notna' is preferred to '.notnull'; functionality is equivalent - -**PD005:** Use arithmetic operator instead of method - -**PD006:** Use comparison operator instead of method - -**PD007:** '.ix' is deprecated; use more explicit '.loc' or '.iloc' - -**PD008:** Use '.loc' instead of '.at'. If speed is important, use numpy. - -**PD009:** Use '.iloc' instead of '.iat'. If speed is important, use numpy. - -**PD010** '.pivot_table' is preferred to '.pivot' or '.unstack'; provides same functionality - -**PD011** Use '.to_numpy()' instead of '.values'; 'values' is ambiguous - -**PD012** '.read_csv' is preferred to '.read_table'; provides same functionality - -**PD013** '.melt' is preferred to '.stack'; provides same functionality +ab_dataset = pd.DataFrame({ + 'col_a': [i for i in range(20)], + 'col_b': [j for j in range(20, 40)] +}) +a_dataset = ab_dataset.drop(columns='col_b') +``` -**PD015** Use '.merge' method instead of 'pd.merge' function. They have equivalent functionality. +For a full list, see the [Supported warnings](https://deppen8.github.io/pandas-vet/guides/warnings.html) page of the documentation. -### *Very* Opinionated Warnings +## Motivation -These warnings are turned off by default. To enable them, add the `-annoy` flag to your command, e.g., +Starting with [pandas](https://pandas.pydata.org/) can be daunting. The usual internet help sites are littered with different ways to do the same thing and some features that the pandas docs themselves discourage live on in the API. `pandas-vet` is (hopefully) a way to help make pandas a little more friendly for newcomers by taking some opinionated stances about pandas best practices. It is designed to help users reduce the pandas universe. -```bash -flake8 --annoy my_file.py -``` +The idea to create a linter was sparked by [Ania Kapuścińska](https://twitter.com/lambdanis)'s talk at PyCascades 2019, ["Lint your code responsibly!"](https://youtu.be/hAnCiTpxXPg?t=21814). The package was largely developed at the PyCascades 2019 sprints. -**PD901** 'df' is a bad variable name. Be kinder to your future self. +Many of the opinions stem from [Ted Petrou's](https://twitter.com/TedPetrou) excellent [Minimally Sufficient Pandas](https://medium.com/dunder-data/minimally-sufficient-pandas-a8e67f2a2428). Other ideas are drawn from pandas docs or elsewhere. The [Pandas in Black and White](https://deppen8.github.io/pandas-bw/) flashcards have a lot of the same opinions too. diff --git a/docs/_config.yml b/docs/_config.yml index f49caa1..3814d7c 100644 --- a/docs/_config.yml +++ b/docs/_config.yml @@ -33,7 +33,7 @@ parse: myst_enable_extensions: # default extensions to enable in the myst parser. See https://myst-parser.readthedocs.io/en/latest/using/syntax-optional.html # - amsmath - colon_fence - # - deflist + - deflist - dollarmath # - html_admonition # - html_image diff --git a/docs/_toc.yml b/docs/_toc.yml index b4d0283..49a6fd2 100644 --- a/docs/_toc.yml +++ b/docs/_toc.yml @@ -4,7 +4,9 @@ parts: - caption: Guides chapters: - file: guides/install + - file: guides/warnings - file: guides/dev + - file: guides/contributors - caption: API docs chapters: - glob: api/* diff --git a/docs/guides/contributors.md b/docs/guides/contributors.md new file mode 100644 index 0000000..026448a --- /dev/null +++ b/docs/guides/contributors.md @@ -0,0 +1,29 @@ +# Contributors + +An incomplete list of amazing people. If you've contributed an Issue, a PR, or anything else and aren't listed here, please open a PR to add yourself! + +## PyCascades 2019 sprints team + +- Sam Beck +- [Jacob Deppen](https://github.com/deppen8) +- [Walt](https://github.com/wadells) +- Charles Simchick +- [Aly Sivji](https://twitter.com/CaiusSivjus) +- Tim Smith + +## PyCascades 2020 sprints team + +- dat-boris +- [Jacob Deppen](https://github.com/deppen8) +- jvano74 +- keturn +- Rhornberger +- tojo13 +- [Walt](https://github.com/wadells) + +## Other awesome contributors + +- Earl Clark +- Leandro Leites +- pwoolvett +- sigmavirus24 diff --git a/docs/guides/dev.md b/docs/guides/dev.md index 6e1659c..d3ddab9 100644 --- a/docs/guides/dev.md +++ b/docs/guides/dev.md @@ -1,10 +1,10 @@ -# Development guide +# Development -## Access the repo +Contributions are welcome from the community on code, tests, docs, and just about anything else. -The `pandas-vet` repo lives in GitHub at [https://github.com/deppen8/pandas-vet](https://github.com/deppen8/pandas-vet). +## Code of Conduct -Clone the repo to get started. +Because this project started during the PyCascades 2019 sprints, we adopt the PyCascades minimal expectation that we "Be excellent to each another". Beyond that, we follow the Python Software Foundation's [Community Code of Conduct](https://www.python.org/psf/codeofconduct/). ## Project management @@ -20,50 +20,60 @@ The best way to install Hatch on your machine is to use [`pipx`](https://pipxpro pipx install hatch ``` -### Version bumping +## Contribution workflow -To bump the package version, run `hatch version `. For example, to bump the minor version number, run: +1. Please submit an [Issue](https://github.com/deppen8/pandas-vet/issues) (or draft PR) first describing the types of changes you'd like to implement. -```bash -hatch version minor -``` +2. Fork the GitHub repository. -This will bump the version number in the `pyproject.toml` file and elsewhere. +3. Create a new branch for your enhancement/fix. -```{note} -You can also bump using the full version string, e.g., `hatch version 0.1.0`. See the full [Hatch versioning documentation](https://hatch.pypa.io/latest/version) for more details. -``` +4. Write code, [tests](tests), [docs](documentation), etc. See [How to add a check to the linter](linter-check). -### Build the package +5. We use `pytest`, `flake8`, `isort`, and `black` to test, lint, and format our codebase. You can invoke these with a single Hatch command from the root of the repository. -Building the package is as simple as running `hatch build` from the root of the repo. This will build both sdist and wheel versions of the package and place them in the `/dist` directory. + ```bash + hatch run dev:tests + ``` -```bash -hatch build -``` + This command will build the necessary virtual environments, install the package in editable mode, and run the tests, linting, and formatting checks. -```{note} -See the full [Hatch build documentation](https://hatch.pypa.io/latest/build) for more details. -``` + These are the same commands that are run in the CI/CD pipeline. See the [Tests](tests) section below for more details. + +6. Push your branch to your forked repository. + +7. Submit a pull request to the parent repository from your branch. Be sure to write a clear message and reference any Issue # that relates to your pull request. + +8. Feel good about giving back to open source projects. + +(linter-check)= -### Custom scripts +## How to add a check to the linter -We have a few predefined scripts that are useful for development. These are defined in the `pyproject.toml` file and can be run with `hatch run :`. For example, to run a combination of `isort`, `black`, and `flake8` you can run: +1. Write tests. At a *minimum*, you should have test cases where the linter should catch "bad" `pandas` and test cases where the linter should allow "good" `pandas`. + +2. Write your check function in `/pandas-vet/__init__.py`. + +3. Run `hatch run dev:tests` and fix any errors. + +## Custom scripts + +In addition to the `tests` script used in CI/CD, we have a few predefined scripts that are useful for development. These are defined in the `pyproject.toml` file and can be run with `hatch run dev:`. For example, to run a combination of `isort`, `black`, and `flake8` you can run: ```bash hatch run dev:format ``` -These are the same scripts that are run in the CI/CD pipeline. - The available custom scripts are: - `check` - Run `isort`, `black`, and `flake8` without making changes (i.e., a dry-run). - `format` - Run `isort`, `black`, and `flake8`. +- `tests` - Run the test suite and format. - `docs` - Build the docs with Jupyter Book. -- `tests` - Run the test suite. -### Testing +(tests)= + +## Tests We use `pytest` for testing. Tests live in the `/tests` directory. The use of [`pytest` fixtures](https://docs.pytest.org/en/stable/explanation/fixtures.html) is encouraged and these should typically be stored in `/tests/conftest.py`, though in some limited cases could be isolated to a particular test module. @@ -85,21 +95,23 @@ Coverage XML written to file results/coverage.xml ``` ```` -### Documentation +(documentation)= + +## Documentation The documentation is built with a combination of docstrings in Google docstring format and Jupyter Book deployment. Jupyter Book provides an easy way to combine traditional API docs built with Sphinx, Markdown docs like this page, and Jupyter Notebook guides/tutorials. -#### Docstrings +### Docstrings Docstrings should use the [Google docstring style](https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html). Each class and function should have a docstring. -When the docs are built with Jupyter Book, these Sphinx will detect them and turn them into some nicely-formatted API docs. +When the docs are built with Jupyter Book, Sphinx will detect the docstrings and turn them into some nicely-formatted API docs. -#### Jupyter Book +### Jupyter Book -[Jupyter Book](https://jupyterbook.org/intro.html) provides the engine for the docs. Documentation pages can be written in reStructuredText, Markdown, or Jupyter Notebooks. See the Jupyter Book documentation for additional features available. +[Jupyter Book](https://jupyterbook.org/intro.html) provides the engine for the docs. Documentation pages can be written in reStructuredText, Markdown, or Jupyter Notebooks. See the [Jupyter Book documentation](https://jupyterbook.org/intro.html) for additional features available. -#### Build the docs locally +### Build the docs locally The docs are built and deployed to GitLab Pages as part of the CI/CD pipeline in GitLab. However, if you'd like to examine them locally, you can build them yourself with `hatch run dev:docs`, e.g., @@ -159,4 +171,8 @@ Any errors in the build will be logged as part of this output. ## CI/CD -CI/CD is handled by GitHub Actions. Configuration can be found in the `.github/workflows/testing.yml` file. +CI/CD is handled by GitHub Actions. Configuration can be found in the `.github/workflows/` folder. + +When a pull request is submitted, the code is tested, linted, and formatted (`hatch run dev:tests`) and the docs are built (`hatch run dev:docs`). If any of these steps fail, the pull request will be marked as failing. + +When a pull request is merged into the `main` branch, the code is tested, linted, and formatted (`hatch run dev:tests`), the docs are built (`hatch run dev:docs`), and the docs are deployed to GitHub Pages. diff --git a/docs/guides/warnings.md b/docs/guides/warnings.md new file mode 100644 index 0000000..714b018 --- /dev/null +++ b/docs/guides/warnings.md @@ -0,0 +1,48 @@ +# Supported warnings + +These are the warnings that `pandas-vet` currently supports. + +PD001 +: pandas should always be imported as 'import pandas as pd' + +PD002 +: 'inplace = True' should be avoided; it has inconsistent behavior + +PD003 +: '.isna' is preferred to '.isnull'; functionality is equivalent + +PD004 +: '.notna' is preferred to '.notnull'; functionality is equivalent + +PD005 +: Use arithmetic operator instead of method + +PD006 +: Use comparison operator instead of method + +PD007 +: '.ix' is deprecated; use more explicit '.loc' or '.iloc' + +PD008 +: Use '.loc' instead of '.at'. If speed is important, use numpy. + +PD009 +: Use '.iloc' instead of '.iat'. If speed is important, use numpy. + +PD010 +: '.pivot_table' is preferred to '.pivot' or '.unstack'; provides same functionality + +PD011 +: Use '.to_numpy()' instead of '.values'; 'values' is ambiguous + +PD012 +: '.read_csv' is preferred to '.read_table'; provides same functionality + +PD013 +: '.melt' is preferred to '.stack'; provides same functionality + +PD015 +: Use '.merge' method instead of 'pd.merge' function. They have equivalent functionality. + +PD901 +: 'df' is a bad variable name. Be kinder to your future self. diff --git a/docs/index.md b/docs/index.md index 29121f9..c3b7d95 100644 --- a/docs/index.md +++ b/docs/index.md @@ -1,4 +1,63 @@ # pandas-vet Documentation -```{tableofcontents} +`pandas-vet` is a plugin for Flake8 that provides opinionated linting for pandas code. + +[![GitHub repository](https://img.shields.io/badge/deppen8%2Fpandas--vet-181717?logo=github&label=repo)](https://github.com/deppen8/pandas-vet) + +[![Test and lint](https://github.com/deppen8/pandas-vet/actions/workflows/testing.yml/badge.svg)](https://github.com/deppen8/pandas-vet/actions/workflows/testing.yml) +[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black) +[![PyPI - License](https://img.shields.io/pypi/l/pandas-vet.svg)](https://github.com/deppen8/pandas-vet/blob/main/LICENSE) + +[![PyPI](https://img.shields.io/pypi/v/pandas-vet.svg)](https://pypi.org/project/pandas-vet/) +[![PyPI - Status](https://img.shields.io/pypi/status/pandas-vet.svg)](https://pypi.org/project/pandas-vet/) +[![PyPI - Downloads](https://img.shields.io/pypi/dm/pandas-vet.svg)](https://pypi.org/project/pandas-vet/) +[![Conda Version](https://img.shields.io/conda/vn/conda-forge/pandas-vet.svg)](https://anaconda.org/conda-forge/pandas-vet) +[![Conda Downloads](https://img.shields.io/conda/dn/conda-forge/pandas-vet.svg)](https://anaconda.org/conda-forge/pandas-vet) + +## Basic usage + +Take the following script, `drop_column.py`, which contains valid pandas code: + +```python +# drop_column.py +import pandas + +df = pandas.DataFrame({ + 'col_a': [i for i in range(20)], + 'col_b': [j for j in range(20, 40)] +}) +df.drop(columns='col_b', inplace=True) +``` + +With `pandas-vet` installed, if we run Flake8 on this script, we will see three warnings raised. + +```bash +$ flake8 drop_column.py + +./drop_column.py:2:1: PD001 pandas should always be imported as 'import pandas as pd' +./drop_column.py:4:1: PD901 'df' is a bad variable name. Be kinder to your future self. +./drop_column.py:7:1: PD002 'inplace = True' should be avoided; it has inconsistent behavior +``` + +We can use these to improve the code. + +```python +# pandastic_drop_column.py +import pandas as pd + +ab_dataset = pd.DataFrame({ + 'col_a': [i for i in range(20)], + 'col_b': [j for j in range(20, 40)] +}) +a_dataset = ab_dataset.drop(columns='col_b') ``` + +For a full list, see the [Supported warnings](./guides/warnings) page. + +## Motivations + +Starting with [pandas](https://pandas.pydata.org/) can be daunting. The usual internet help sites are littered with different ways to do the same thing and some features that the pandas docs themselves discourage live on in the API. `pandas-vet` is (hopefully) a way to help make pandas a little more friendly for newcomers by taking some opinionated stances about pandas best practices. It is designed to help users reduce the pandas universe. + +The idea to create a linter was sparked by [Ania Kapuścińska](https://twitter.com/lambdanis)'s talk at PyCascades 2019, ["Lint your code responsibly!"](https://youtu.be/hAnCiTpxXPg?t=21814). The package was largely developed at the PyCascades 2019 sprints. + +Many of the opinions stem from [Ted Petrou's](https://twitter.com/TedPetrou) excellent [Minimally Sufficient Pandas](https://medium.com/dunder-data/minimally-sufficient-pandas-a8e67f2a2428). Other ideas are drawn from pandas docs or elsewhere. The [Pandas in Black and White](https://deppen8.github.io/pandas-bw/) flashcards have a lot of the same opinions too.