Skip to content

Commit

Permalink
Merge pull request #128 from deppen8/enhance-docs-20230809
Browse files Browse the repository at this point in the history
big docs overhaul
  • Loading branch information
deppen8 committed Aug 9, 2023
2 parents eef66cf + 8408c35 commit 30d0302
Show file tree
Hide file tree
Showing 7 changed files with 225 additions and 182 deletions.
181 changes: 35 additions & 146 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,10 @@
# pandas-vet

![tests](https://github.com/deppen8/pandas-vet/workflows/Test%20and%20lint/badge.svg
)
`pandas-vet` is a plugin for `flake8` that provides opinionated linting for `pandas` code.

[![docs](https://img.shields.io/badge/deppen8.github.io%2Fpandas--vet-181717?logo=githubpages&label=docs)](https://deppen8.github.io/pandas-vet)

[![Test and lint](https://github.com/deppen8/pandas-vet/actions/workflows/testing.yml/badge.svg)](https://github.com/deppen8/pandas-vet/actions/workflows/testing.yml)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
[![PyPI - License](https://img.shields.io/pypi/l/pandas-vet.svg)](https://github.com/deppen8/pandas-vet/blob/main/LICENSE)

Expand All @@ -12,164 +15,50 @@
[![Conda Version](https://img.shields.io/conda/vn/conda-forge/pandas-vet.svg)](https://anaconda.org/conda-forge/pandas-vet)
[![Conda Downloads](https://img.shields.io/conda/dn/conda-forge/pandas-vet.svg)](https://anaconda.org/conda-forge/pandas-vet)

`pandas-vet` is a plugin for `flake8` that provides opinionated linting for `pandas` code.

It began as a project during the PyCascades 2019 sprints.

## Motivation

Starting with `pandas` can be daunting. The usual internet help sites are littered with different ways to do the same thing and some features that the `pandas` docs themselves discourage live on in the API. `pandas-vet` is (hopefully) a way to help make `pandas` a little more friendly for newcomers by taking some opinionated stances about `pandas` best practices. It is designed to help users reduce the `pandas` universe.
## Basic usage

The idea to create a linter was sparked by [Ania Kapuścińska](https://twitter.com/lambdanis)'s talk at PyCascades 2019, ["Lint your code responsibly!"](https://youtu.be/hAnCiTpxXPg?t=21814).
Take the following script, `drop_column.py`, which contains valid pandas code:

Many of the opinions stem from [Ted Petrou's](https://twitter.com/TedPetrou) excellent [Minimally Sufficient Pandas](https://medium.com/dunder-data/minimally-sufficient-pandas-a8e67f2a2428). Other ideas are drawn from `pandas` docs or elsewhere. The [Pandas in Black and White](https://deppen8.github.io/pandas-bw/) flashcards have a lot of the same opinions too.
```python
# drop_column.py
import pandas

## Installation

`pandas-vet` is a plugin for `flake8`. If you don't have `flake8` already, it will install automatically when you install `pandas-vet`.

The plugin is on PyPI and can be installed with:

```bash
pip install pandas-vet
df = pandas.DataFrame({
'col_a': [i for i in range(20)],
'col_b': [j for j in range(20, 40)]
})
df.drop(columns='col_b', inplace=True)
```

It can also be installed with `conda`:
With `pandas-vet` installed, if we run Flake8 on this script, we will see three warnings raised.

```bash
conda install -c conda-forge pandas-vet
```

`pandas-vet` is tested under Python 3.8, 3.9, 3.10, and 3.11 as defined in our [GitHub Actions](https://github.com/deppen8/pandas-vet/blob/main/.github/workflows/testing.yml)

## Usage

Once installed successfully in an environment that also has `flake8` installed, `pandas-vet` should run whenever `flake8` is run.
$ flake8 drop_column.py

```bash
flake8 ...
./drop_column.py:2:1: PD001 pandas should always be imported as 'import pandas as pd'
./drop_column.py:4:1: PD901 'df' is a bad variable name. Be kinder to your future self.
./drop_column.py:7:1: PD002 'inplace = True' should be avoided; it has inconsistent behavior
```

See the [`flake8` docs](http://flake8.pycqa.org/en/latest/user/invocation.html) for more information.

For a full list of implemented warnings, see [the list below](#list-of-warnings).

## Contributing

`pandas-vet` is still in the very early stages. Contributions are welcome from the community on code, tests, docs, and just about anything else.

### Code of Conduct

Because this project started during the PyCascades 2019 sprints, we adopt the PyCascades minimal expectation that we "Be excellent to each another". Beyond that, we follow the Python Software Foundation's [Community Code of Conduct](https://www.python.org/psf/codeofconduct/).

### Steps to contributing

1. Please submit an issue (or draft PR) first describing the types of changes you'd like to implement.

2. Fork the repo and create a new branch for your enhancement/fix.

3. Get a development environment set up with your favorite environment manager (`conda`, `virtualenv`, etc.).

0. You must use at least python 3.6 to develop, for [black](https://github.com/psf/black) support.

1. You can create one from `pip install -r requirements_dev.txt` or, if you use Docker, you can build an image from the Dockerfile included in this repo.

2. Once your enviroment is set up you will need to install pandas-vet in development mode. Use `pip install -e .` (use this if you are alreay in your virtual enviroment) or `pip install -e <path>` (use this one if not in the virtual enviroment and prefer to state explicitly where it is going).

4. Write code, docs, etc.

5. We use `pytest`, `flake8`, and `black` to validate our codebase. TravisCI integration will complain on pull requests if there are any failing tests or lint violations. To check these locally, run the following commands:

```bash
pytest --cov="pandas_vet"
```
We can use these to improve the code.

```bash
flake8 pandas_vet setup.py tests --exclude tests/data
```
```python
# pandastic_drop_column.py
import pandas as pd

```bash
black --check pandas_vet setup.py tests --exclude tests/data
```

6. Push to your forked repo.

7. Submit pull request to the parent repo from your branch. Be sure to write a clear message and reference the Issue # that relates to your pull request.

8. Feel good about giving back to open source projects.

### How to add a check to the linter

1. Write tests. At a *minimum*, you should have test cases where the linter should catch "bad" `pandas` and test cases where the linter should allow "good" `pandas`.

2. Write your check function in `/pandas-vet/__init__.py`.

3. Run `flake8` and `pytest` on the linter itself (see [Steps to contributing](#steps-to-contributing))

## Contributors

### PyCascades 2019 sprints team

- Sam Beck
- [Jacob Deppen](https://twitter.com/jacob_deppen)
- [Walt](https://github.com/wadells)
- Charles Simchick
- [Aly Sivji](https://twitter.com/CaiusSivjus)
- Tim Smith

### PyCascades 2020 sprints team

- dat-boris
- [Jacob Deppen](https://twitter.com/jacob_deppen)
- jvano74
- keturn
- Rhornberger
- tojo13
- [Walt](https://github.com/wadells)

### Other awesome contributors

- Earl Clark
- Leandro Leites
- pwoolvett
- sigmavirus24

## List of warnings

**PD001:** pandas should always be imported as 'import pandas as pd'

**PD002:** 'inplace = True' should be avoided; it has inconsistent behavior

**PD003:** '.isna' is preferred to '.isnull'; functionality is equivalent

**PD004:** '.notna' is preferred to '.notnull'; functionality is equivalent

**PD005:** Use arithmetic operator instead of method

**PD006:** Use comparison operator instead of method

**PD007:** '.ix' is deprecated; use more explicit '.loc' or '.iloc'

**PD008:** Use '.loc' instead of '.at'. If speed is important, use numpy.

**PD009:** Use '.iloc' instead of '.iat'. If speed is important, use numpy.

**PD010** '.pivot_table' is preferred to '.pivot' or '.unstack'; provides same functionality

**PD011** Use '.to_numpy()' instead of '.values'; 'values' is ambiguous

**PD012** '.read_csv' is preferred to '.read_table'; provides same functionality

**PD013** '.melt' is preferred to '.stack'; provides same functionality
ab_dataset = pd.DataFrame({
'col_a': [i for i in range(20)],
'col_b': [j for j in range(20, 40)]
})
a_dataset = ab_dataset.drop(columns='col_b')
```

**PD015** Use '.merge' method instead of 'pd.merge' function. They have equivalent functionality.
For a full list, see the [Supported warnings](https://deppen8.github.io/pandas-vet/guides/warnings.html) page of the documentation.

### *Very* Opinionated Warnings
## Motivation

These warnings are turned off by default. To enable them, add the `-annoy` flag to your command, e.g.,
Starting with [pandas](https://pandas.pydata.org/) can be daunting. The usual internet help sites are littered with different ways to do the same thing and some features that the pandas docs themselves discourage live on in the API. `pandas-vet` is (hopefully) a way to help make pandas a little more friendly for newcomers by taking some opinionated stances about pandas best practices. It is designed to help users reduce the pandas universe.

```bash
flake8 --annoy my_file.py
```
The idea to create a linter was sparked by [Ania Kapuścińska](https://twitter.com/lambdanis)'s talk at PyCascades 2019, ["Lint your code responsibly!"](https://youtu.be/hAnCiTpxXPg?t=21814). The package was largely developed at the PyCascades 2019 sprints.

**PD901** 'df' is a bad variable name. Be kinder to your future self.
Many of the opinions stem from [Ted Petrou's](https://twitter.com/TedPetrou) excellent [Minimally Sufficient Pandas](https://medium.com/dunder-data/minimally-sufficient-pandas-a8e67f2a2428). Other ideas are drawn from pandas docs or elsewhere. The [Pandas in Black and White](https://deppen8.github.io/pandas-bw/) flashcards have a lot of the same opinions too.
2 changes: 1 addition & 1 deletion docs/_config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ parse:
myst_enable_extensions: # default extensions to enable in the myst parser. See https://myst-parser.readthedocs.io/en/latest/using/syntax-optional.html
# - amsmath
- colon_fence
# - deflist
- deflist
- dollarmath
# - html_admonition
# - html_image
Expand Down
2 changes: 2 additions & 0 deletions docs/_toc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,9 @@ parts:
- caption: Guides
chapters:
- file: guides/install
- file: guides/warnings
- file: guides/dev
- file: guides/contributors
- caption: API docs
chapters:
- glob: api/*
29 changes: 29 additions & 0 deletions docs/guides/contributors.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
# Contributors

An incomplete list of amazing people. If you've contributed an Issue, a PR, or anything else and aren't listed here, please open a PR to add yourself!

## PyCascades 2019 sprints team

- Sam Beck
- [Jacob Deppen](https://github.com/deppen8)
- [Walt](https://github.com/wadells)
- Charles Simchick
- [Aly Sivji](https://twitter.com/CaiusSivjus)
- Tim Smith

## PyCascades 2020 sprints team

- dat-boris
- [Jacob Deppen](https://github.com/deppen8)
- jvano74
- keturn
- Rhornberger
- tojo13
- [Walt](https://github.com/wadells)

## Other awesome contributors

- Earl Clark
- Leandro Leites
- pwoolvett
- sigmavirus24
84 changes: 50 additions & 34 deletions docs/guides/dev.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
# Development guide
# Development

## Access the repo
Contributions are welcome from the community on code, tests, docs, and just about anything else.

The `pandas-vet` repo lives in GitHub at [https://github.com/deppen8/pandas-vet](https://github.com/deppen8/pandas-vet).
## Code of Conduct

Clone the repo to get started.
Because this project started during the PyCascades 2019 sprints, we adopt the PyCascades minimal expectation that we "Be excellent to each another". Beyond that, we follow the Python Software Foundation's [Community Code of Conduct](https://www.python.org/psf/codeofconduct/).

## Project management

Expand All @@ -20,50 +20,60 @@ The best way to install Hatch on your machine is to use [`pipx`](https://pipxpro
pipx install hatch
```

### Version bumping
## Contribution workflow

To bump the package version, run `hatch version <version_segment>`. For example, to bump the minor version number, run:
1. Please submit an [Issue](https://github.com/deppen8/pandas-vet/issues) (or draft PR) first describing the types of changes you'd like to implement.

```bash
hatch version minor
```
2. Fork the GitHub repository.

This will bump the version number in the `pyproject.toml` file and elsewhere.
3. Create a new branch for your enhancement/fix.

```{note}
You can also bump using the full version string, e.g., `hatch version 0.1.0`. See the full [Hatch versioning documentation](https://hatch.pypa.io/latest/version) for more details.
```
4. Write code, [tests](tests), [docs](documentation), etc. See [How to add a check to the linter](linter-check).

### Build the package
5. We use `pytest`, `flake8`, `isort`, and `black` to test, lint, and format our codebase. You can invoke these with a single Hatch command from the root of the repository.

Building the package is as simple as running `hatch build` from the root of the repo. This will build both sdist and wheel versions of the package and place them in the `/dist` directory.
```bash
hatch run dev:tests
```

```bash
hatch build
```
This command will build the necessary virtual environments, install the package in editable mode, and run the tests, linting, and formatting checks.

```{note}
See the full [Hatch build documentation](https://hatch.pypa.io/latest/build) for more details.
```
These are the same commands that are run in the CI/CD pipeline. See the [Tests](tests) section below for more details.

6. Push your branch to your forked repository.

7. Submit a pull request to the parent repository from your branch. Be sure to write a clear message and reference any Issue # that relates to your pull request.

8. Feel good about giving back to open source projects.

(linter-check)=

### Custom scripts
## How to add a check to the linter

We have a few predefined scripts that are useful for development. These are defined in the `pyproject.toml` file and can be run with `hatch run <env_name>:<script_name>`. For example, to run a combination of `isort`, `black`, and `flake8` you can run:
1. Write tests. At a *minimum*, you should have test cases where the linter should catch "bad" `pandas` and test cases where the linter should allow "good" `pandas`.

2. Write your check function in `/pandas-vet/__init__.py`.

3. Run `hatch run dev:tests` and fix any errors.

## Custom scripts

In addition to the `tests` script used in CI/CD, we have a few predefined scripts that are useful for development. These are defined in the `pyproject.toml` file and can be run with `hatch run dev:<script_name>`. For example, to run a combination of `isort`, `black`, and `flake8` you can run:

```bash
hatch run dev:format
```

These are the same scripts that are run in the CI/CD pipeline.

The available custom scripts are:

- `check` - Run `isort`, `black`, and `flake8` without making changes (i.e., a dry-run).
- `format` - Run `isort`, `black`, and `flake8`.
- `tests` - Run the test suite and format.
- `docs` - Build the docs with Jupyter Book.
- `tests` - Run the test suite.

### Testing
(tests)=

## Tests

We use `pytest` for testing. Tests live in the `/tests` directory. The use of [`pytest` fixtures](https://docs.pytest.org/en/stable/explanation/fixtures.html) is encouraged and these should typically be stored in `/tests/conftest.py`, though in some limited cases could be isolated to a particular test module.

Expand All @@ -85,21 +95,23 @@ Coverage XML written to file results/coverage.xml
```
````

### Documentation
(documentation)=

## Documentation

The documentation is built with a combination of docstrings in Google docstring format and Jupyter Book deployment. Jupyter Book provides an easy way to combine traditional API docs built with Sphinx, Markdown docs like this page, and Jupyter Notebook guides/tutorials.

#### Docstrings
### Docstrings

Docstrings should use the [Google docstring style](https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html). Each class and function should have a docstring.

When the docs are built with Jupyter Book, these Sphinx will detect them and turn them into some nicely-formatted API docs.
When the docs are built with Jupyter Book, Sphinx will detect the docstrings and turn them into some nicely-formatted API docs.

#### Jupyter Book
### Jupyter Book

[Jupyter Book](https://jupyterbook.org/intro.html) provides the engine for the docs. Documentation pages can be written in reStructuredText, Markdown, or Jupyter Notebooks. See the Jupyter Book documentation for additional features available.
[Jupyter Book](https://jupyterbook.org/intro.html) provides the engine for the docs. Documentation pages can be written in reStructuredText, Markdown, or Jupyter Notebooks. See the [Jupyter Book documentation](https://jupyterbook.org/intro.html) for additional features available.

#### Build the docs locally
### Build the docs locally

The docs are built and deployed to GitLab Pages as part of the CI/CD pipeline in GitLab. However, if you'd like to examine them locally, you can build them yourself with `hatch run dev:docs`, e.g.,
Expand Down Expand Up @@ -159,4 +171,8 @@ Any errors in the build will be logged as part of this output.
## CI/CD
CI/CD is handled by GitHub Actions. Configuration can be found in the `.github/workflows/testing.yml` file.
CI/CD is handled by GitHub Actions. Configuration can be found in the `.github/workflows/` folder.
When a pull request is submitted, the code is tested, linted, and formatted (`hatch run dev:tests`) and the docs are built (`hatch run dev:docs`). If any of these steps fail, the pull request will be marked as failing.
When a pull request is merged into the `main` branch, the code is tested, linted, and formatted (`hatch run dev:tests`), the docs are built (`hatch run dev:docs`), and the docs are deployed to GitHub Pages.
Loading

0 comments on commit 30d0302

Please sign in to comment.