diff --git a/CHANGELOG.md b/CHANGELOG.md index 47c0d56c..db09de64 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -10,7 +10,7 @@ Scribe-Data tries to follow [semantic versioning](https://semver.org/), a MAJOR. Emojis for the following are chosen based on [gitmoji](https://gitmoji.dev/). -# [Upcoming] Scribe-Data 3.2.0 +## [Upcoming] Scribe-Data 3.2.0 ### ✨ Features @@ -34,7 +34,7 @@ Emojis for the following are chosen based on [gitmoji](https://gitmoji.dev/). - Tensorflow was removed from the download wiki process to fix build problems on Macs. -# Scribe-Data 3.1.0 +## Scribe-Data 3.1.0 ### ✨ Features @@ -47,7 +47,7 @@ Emojis for the following are chosen based on [gitmoji](https://gitmoji.dev/). - Database output column names are now zero indexed to better align with Python and other language standards. -# Scribe-Data 3.0.0 +## Scribe-Data 3.0.0 ### ✨ Features @@ -79,7 +79,7 @@ Emojis for the following are chosen based on [gitmoji](https://gitmoji.dev/). - The statements in translation files have been fixed as they were improperly defined after a file was moved. -# Scribe-Data 2.2.2 +## Scribe-Data 2.2.2 ### ✨ Features @@ -89,26 +89,26 @@ Emojis for the following are chosen based on [gitmoji](https://gitmoji.dev/). - The export filenames for emoji keywords were renamed to reflect their usage in autosuggestions and soon autocompletions as well. -# Scribe-Data 2.2.1 +## Scribe-Data 2.2.1 ### ✨ Features - The number of suggested emojis for words can now be limited. - The total number of emojis that suggestions can be made for can now be limited. -# Scribe-Data 2.2.0 +## Scribe-Data 2.2.0 ### ✨ Features - Scribe-Data now allows the user to create JSONs of word-emoji key-value pairs ([#24](https://github.com/scribe-org/Scribe-Data/issues/24)). -# Scribe-Data 2.1.0 +## Scribe-Data 2.1.0 ### ✨ Features - Scribe-Data can now split Wikidata queries into multiple stages to break up those that were too large to run ([#21](https://github.com/scribe-org/Scribe-Data/issues/21)). -# Scribe-Data 2.0.0 +## Scribe-Data 2.0.0 ### ✨ Features @@ -122,7 +122,7 @@ Emojis for the following are chosen based on [gitmoji](https://gitmoji.dev/). - The error messages for incorrect args in update_data.py have been updated. -# Scribe-Data 1.0.1 +## Scribe-Data 1.0.1 ### ✨ Features @@ -134,7 +134,7 @@ Emojis for the following are chosen based on [gitmoji](https://gitmoji.dev/). - Hard coded strings for Spanish formatting files were fixed. - The paths of update_data.py were changed to match the new package structure. -# Scribe-Data 1.0.0 +## Scribe-Data 1.0.0 ### 🚀 Deployment diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 9444c027..9f2c6c21 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -10,7 +10,7 @@ If you have questions or would like to communicate with the team, please [join u -# **Contents** +## **Contents** - [First steps as a contributor](#first-steps) - [Learning the tech stack](#learning-the-tech) @@ -67,7 +67,7 @@ Scribe is very open to contributions from people in the early stages of their co -# Development environment [`⇧`](#contents) +## Development environment [`⇧`](#contents) The development environment for Scribe-Data can be installed via the following steps: @@ -114,7 +114,7 @@ git remote add upstream https://github.com/scribe-org/Scibe-Data.git -# Issues and projects [`⇧`](#contents) +## Issues and projects [`⇧`](#contents) The [issue tracker for Scribe-Data](https://github.com/scribe-org/Scribe-Data/issues) is the preferred channel for [bug reports](#bug-reports), [features requests](#feature-requests) and [submitting pull requests](#pull-requests). Scribe also organizes related issues into [projects](https://github.com/scribe-org/Scribe-Data/projects). @@ -125,7 +125,7 @@ Be sure to check the [`-next release-`](https://github.com/scribe-org/Scribe-Dat -# Bug reports [`⇧`](#contents) +## Bug reports [`⇧`](#contents) A bug is a _demonstrable problem_ that is caused by the code in the repository. Good bug reports are extremely helpful - thank you! @@ -151,13 +151,13 @@ Again, thank you for your time in reporting issues! -# Feature requests [`⇧`](#contents) +## Feature requests [`⇧`](#contents) Feature requests are more than welcome! Please take a moment to find out whether your idea fits with the scope and aims of the project. When making a suggestion, provide as much detail and context as possible, and further make clear the degree to which you would like to contribute in its development. Feature requests are marked with the [`feature`](https://github.com/scribe-org/Scribe-Data/issues?q=is%3Aopen+is%3Aissue+label%3Afeature) label, and can be made using the [feature request](https://github.com/scribe-org/Scribe-Data/issues/new?assignees=&labels=feature&template=feature_request.yml) template. -# Pull requests [`⇧`](#contents) +## Pull requests [`⇧`](#contents) Good pull requests - patches, improvements and new features - are the foundation of our community making Scribe-Data. They should remain focused in scope and avoid containing unrelated commits. Note that all contributions to this project will be made under [the specified license](https://github.com/scribe-org/Scribe-Data/blob/main/LICENSE.txt) and should follow the coding indentation and style standards ([contact us](https://matrix.to/#/#scribe_community:matrix.org) if unsure). @@ -204,7 +204,7 @@ Thank you in advance for your contributions! -# Data edits [`⇧`](#contents) +## Data edits [`⇧`](#contents) > [!NOTE]\ > Please see the [Wikidata and Scribe Guide](https://github.com/scribe-org/Organization/blob/main/WIKIDATAGUIDE.md) for an overview of [Wikidata](https://www.wikidata.org/) and how Scribe uses it. @@ -213,6 +213,6 @@ Scribe does not accept direct edits to the grammar JSON files as they are source -# Documentation [`⇧`](#contents) +## Documentation [`⇧`](#contents) Documentation is an invaluable way to contribute to coding projects as it allows others to more easily understand the project structure and contribute. Issues related to documentation are marked with the [`documentation`](https://github.com/scribe-org/Scribe-Data/labels/documentation) label. diff --git a/README.md b/README.md index eea67e2f..b219344e 100644 --- a/README.md +++ b/README.md @@ -2,7 +2,8 @@ Scribe Logo -[![platforms](https://img.shields.io/badge/Wikidata-990000.svg?logo=wikidata&logoColor=ffffff)](https://github.com/scribe-org/Scribe-Data) +[![rtd](https://img.shields.io/readthedocs/scribe-data.svg?logo=read-the-docs)](http://scribe-data.readthedocs.io/en/latest/) +[![platform](https://img.shields.io/badge/Wikidata-990000.svg?logo=wikidata&logoColor=ffffff)](https://github.com/scribe-org/Scribe-Data) [![issues](https://img.shields.io/github/issues/scribe-org/Scribe-Data?label=%20&logo=github)](https://github.com/scribe-org/Scribe-Data/issues) [![language](https://img.shields.io/badge/Python%203-306998.svg?logo=python&logoColor=ffffff)](https://github.com/scribe-org/Scribe-Data/blob/main/CONTRIBUTING.md) [![pypi](https://img.shields.io/pypi/v/scribe-data.svg?label=%20&color=4B8BBE)](https://pypi.org/project/scribe-data/) diff --git a/docs/Makefile b/docs/Makefile new file mode 100644 index 00000000..92f501f1 --- /dev/null +++ b/docs/Makefile @@ -0,0 +1,19 @@ +# Minimal makefile for Sphinx documentation + +# You can set these variables from the command line, and also +# from the environment for the first two. +SPHINXOPTS ?= +SPHINXBUILD ?= sphinx-build +SOURCEDIR = source +BUILDDIR = build + +# Put it first so that "make" without argument is like "make help". +help: + @$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) + +.PHONY: help Makefile + +# Catch-all target: route all unknown targets to Sphinx using the new +# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS). +%: Makefile + @$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) diff --git a/docs/make.bat b/docs/make.bat new file mode 100644 index 00000000..6247f7e2 --- /dev/null +++ b/docs/make.bat @@ -0,0 +1,35 @@ +@ECHO OFF + +pushd %~dp0 + +REM Command file for Sphinx documentation + +if "%SPHINXBUILD%" == "" ( + set SPHINXBUILD=sphinx-build +) +set SOURCEDIR=source +set BUILDDIR=build + +if "%1" == "" goto help + +%SPHINXBUILD% >NUL 2>NUL +if errorlevel 9009 ( + echo. + echo.The 'sphinx-build' command was not found. Make sure you have Sphinx + echo.installed, then set the SPHINXBUILD environment variable to point + echo.to the full path of the 'sphinx-build' executable. Alternatively you + echo.may add the Sphinx directory to PATH. + echo. + echo.If you don't have Sphinx installed, grab it from + echo.http://sphinx-doc.org/ + exit /b 1 +) + +%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O% +goto end + +:help +%SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O% + +:end +popd diff --git a/docs/requirements.txt b/docs/requirements.txt new file mode 100644 index 00000000..e802eebb --- /dev/null +++ b/docs/requirements.txt @@ -0,0 +1,5 @@ +m2r2 +numpydoc +scribe-data +sphinx<7.0.0 +sphinx_rtd_theme diff --git a/docs/source/_docs_internal/CONTRIBUTING_NO_BACK_LINKS.md b/docs/source/_docs_internal/CONTRIBUTING_NO_BACK_LINKS.md new file mode 100644 index 00000000..53718882 --- /dev/null +++ b/docs/source/_docs_internal/CONTRIBUTING_NO_BACK_LINKS.md @@ -0,0 +1,218 @@ +# Contributing to Scribe-Data + +Thank you for your interest in contributing! + +Please take a moment to review this document in order to make the contribution process easy and effective for everyone involved. + +Following these guidelines helps to communicate that you respect the time of the developers managing and developing this open-source project. In return, and in accordance with this project's [code of conduct](https://github.com/scribe-org/Scribe-Data/blob/main/.github/CODE_OF_CONDUCT.md), other contributors will reciprocate that respect in addressing your issue or assessing changes and features. + +If you have questions or would like to communicate with the team, please [join us in our public Matrix chat rooms](https://matrix.to/#/#scribe_community:matrix.org). We'd be happy to hear from you! + + + +## **Contents** + +- [First steps as a contributor](#first-steps) +- [Learning the tech stack](#learning-the-tech) +- [Development environment](#dev-env) +- [Issues and projects](#issues-projects) +- [Bug reports](#bug-reports) +- [Feature requests](#feature-requests) +- [Pull requests](#pull-requests) +- [Data edits](#data-edits) +- [Documentation](#documentation) + + + +## First steps as a contributor + +Thank you for your interest in contributing to Scribe-Data! We look forward to welcoming you to the community and working with you to build an tools for language learners to communicate effectively :) The following are some suggested steps for people interested in joining our community: + +- Please join the [public Matrix chat](https://matrix.to/#/#scribe_community:matrix.org) to connect with the community + - [Matrix](https://matrix.org/) is a network for secure, decentralized communication + - Scribe would suggest that you use the [Element](https://element.io/) client + - The [General](https://matrix.to/#/!yQJjLmluvlkWttNhKo:matrix.org?via=matrix.org) and [Data](https://matrix.to/#/#ScribeData:matrix.org) channels would be great places to start! + - Feel free to introduce yourself and tell us what your interests are if you're comfortable :) +- Read through this contributing guide for all the information you need to contribute +- Look into issues marked [`good first issue`](https://github.com/scribe-org/Scribe-Data/issues?q=is%3Aopen+is%3Aissue+label%3A%22good+first+issue%22) and the [Projects board](https://github.com/orgs/scribe-org/projects/1) to get a better understanding of what you can work on +- Check out our [public designs on Figma](https://www.figma.com/file/c8945w2iyoPYVhsqW7vRn6/scribe_public_designs?type=design&node-id=405-464&mode=design&t=E3ccS9Z8MDVSizQ4-0) to understand Scribes's goals and direction +- Consider joining our [bi-weekly developer sync](https://etherpad.wikimedia.org/p/scribe-dev-sync)! + +> [!NOTE] +> Those new to Python or wanting to work on their Python skills are more than welcome to contribute! The team would be happy to help you on your development journey :) + + + +## Learning the tech stack + +Scribe is very open to contributions from people in the early stages of their coding journey! The following is a select list of documentation pages to help you understand the technologies we use. + +
Docs for those new to programming +

+ +- [Mozilla Developer Network Learning Area](https://developer.mozilla.org/en-US/docs/Learn) + - Doing MDN sections for HTML, CSS and JavaScript is the best ways to get into web development! + +

+
+ +
Python learning docs +

+ +- [Python getting started guide](https://docs.python.org/3/tutorial/introduction.html) +- [Python getting started resources](https://www.python.org/about/gettingstarted/) + +

+
+ + + +## Development environment + +The development environment for Scribe-Data can be installed via the following steps: + +1. [Fork](https://docs.github.com/en/get-started/quickstart/fork-a-repo) the [Scribe-Data repo](https://github.com/scribe-org/Scribe-Data), clone your fork, and configure the remotes: + +> [!NOTE] +> +>
Consider using SSH +> +>

+> +> Alternatively to using HTTPS as in the instructions below, consider SSH to interact with GitHub from the terminal. SSH allows you to connect without a user-pass authentication flow. +> +> To run git commands with SSH, remember then to substitute the HTTPS URL, `https://github.com/...`, with the SSH one, `git@github.com:...`. +> +> - e.g. Cloning now becomes `git clone git@github.com:/Scribe-Data.git` +> +> GitHub also has their documentation on how to [Generate a new SSH key](https://docs.github.com/en/authentication/connecting-to-github-with-ssh/generating-a-new-ssh-key-and-adding-it-to-the-ssh-agent) 🔑 +> +>

+>
+ +```bash +# Clone your fork of the repo into the current directory. +git clone https://github.com//Scribe-Data.git +# Navigate to the newly cloned directory. +cd Scribe-Data +# Assign the original repo to a remote called "upstream". +git remote add upstream https://github.com/scribe-org/Scibe-Data.git +``` + +- Now, if you run `git remote -v` you should see two remote repositories named: + - `origin` (forked repository) + - `upstream` (Scribe-Data repository) + +2. Use [Anaconda](https://www.anaconda.com/) to create the local development environment within your Scribe-Data directory: + + ```bash + conda env create -f environment.yml + ``` + +> [!NOTE] +> Feel free to contact the team in the [Data room on Matrix](https://matrix.to/#/#ScribeData:matrix.org) if you're having problems getting your environment setup! + + + +## Issues and projects + +The [issue tracker for Scribe-Data](https://github.com/scribe-org/Scribe-Data/issues) is the preferred channel for [bug reports](#bug-reports), [features requests](#feature-requests) and [submitting pull requests](#pull-requests). Scribe also organizes related issues into [projects](https://github.com/scribe-org/Scribe-Data/projects). + +> [!NOTE]\ +> Just because an issue is assigned on GitHub doesn't mean that the team isn't interested in your contribution! Feel free to write [in the issues](https://github.com/scribe-org/Scribe-Data/issues) and we can potentially reassign it to you. + +Be sure to check the [`-next release-`](https://github.com/scribe-org/Scribe-Data/labels/-next%20release-) and [`-priority-`](https://github.com/scribe-org/Scribe-Data/labels/-priority-) labels in the [issues](https://github.com/scribe-org/Scribe-Data/issues) for those that are most important, as well as those marked [`good first issue`](https://github.com/scribe-org/Scribe-Data/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22) that are tailored for first time contributors. + + + +## Bug reports + +A bug is a _demonstrable problem_ that is caused by the code in the repository. Good bug reports are extremely helpful - thank you! + +Guidelines for bug reports: + +1. **Use the GitHub issue search** to check if the issue has already been reported. + +2. **Check if the issue has been fixed** by trying to reproduce it using the latest `main` or development branch in the repository. + +3. **Isolate the problem** to make sure that the code in the repository is _definitely_ responsible for the issue. + +**Great Bug Reports** tend to have: + +- A quick summary +- Steps to reproduce +- What you expected would happen +- What actually happens +- Notes (why this might be happening, things tried that didn't work, etc) + +To make the above steps easier, the Scribe team asks that contributors report bugs using the [bug report template](https://github.com/scribe-org/Scribe-Data/issues/new?assignees=&labels=feature&template=bug_report.yml), with these issues further being marked with the [`bug`](https://github.com/scribe-org/Scribe-Data/issues?q=is%3Aopen+is%3Aissue+label%3Abug) label. + +Again, thank you for your time in reporting issues! + + + +## Feature requests + +Feature requests are more than welcome! Please take a moment to find out whether your idea fits with the scope and aims of the project. When making a suggestion, provide as much detail and context as possible, and further make clear the degree to which you would like to contribute in its development. Feature requests are marked with the [`feature`](https://github.com/scribe-org/Scribe-Data/issues?q=is%3Aopen+is%3Aissue+label%3Afeature) label, and can be made using the [feature request](https://github.com/scribe-org/Scribe-Data/issues/new?assignees=&labels=feature&template=feature_request.yml) template. + + + +## Pull requests + +Good pull requests - patches, improvements and new features - are the foundation of our community making Scribe-Data. They should remain focused in scope and avoid containing unrelated commits. Note that all contributions to this project will be made under [the specified license](https://github.com/scribe-org/Scribe-Data/blob/main/LICENSE.txt) and should follow the coding indentation and style standards ([contact us](https://matrix.to/#/#scribe_community:matrix.org) if unsure). + +**Please ask first** before embarking on any significant pull request (implementing features, refactoring code, etc), otherwise you risk spending a lot of time working on something that the developers might not want to merge into the project. With that being said, major additions are very appreciated! + +When making a contribution, adhering to the [GitHub flow](https://guides.github.com/introduction/flow/index.html) process is the best way to get your work merged: + +1. If you cloned a while ago, get the latest changes from upstream: + + ```bash + git checkout + git pull upstream + ``` + +2. Create a new topic branch (off the main project development branch) to contain your feature, change, or fix: + + ```bash + git checkout -b + ``` + +3. Commit your changes in logical chunks, and please try to adhere to [Conventional Commits](https://www.conventionalcommits.org/en/v1.0.0/). + +> [!NOTE] +> The following are tools and methods to help you write good commit messages ✨ +> +> - [commitlint](https://commitlint.io/) helps write [Conventional Commits](https://www.conventionalcommits.org/en/v1.0.0/) +> - Git's [interactive rebase](https://docs.github.com/en/github/getting-started-with-github/about-git-rebase) cleans up commits + +4. Locally merge (or rebase) the upstream development branch into your topic branch: + + ```bash + git pull --rebase upstream + ``` + +5. Push your topic branch up to your fork: + + ```bash + git push origin + ``` + +6. [Open a Pull Request](https://help.github.com/articles/using-pull-requests/) with a clear title and description. + +Thank you in advance for your contributions! + + + +## Data edits + +> [!NOTE]\ +> Please see the [Wikidata and Scribe Guide](https://github.com/scribe-org/Organization/blob/main/WIKIDATAGUIDE.md) for an overview of [Wikidata](https://www.wikidata.org/) and how Scribe uses it. + +Scribe does not accept direct edits to the grammar JSON files as they are sourced from [Wikidata](https://www.wikidata.org/). Edits can be discussed and the [Scribe-Data](https://github.com/scribe-org/Scribe-Data) queries will be changed and ran before an update. If there is a problem with one of the files, then the fix should be made on [Wikidata](https://www.wikidata.org/) and not on Scribe. Feel free to let us know that edits have been made by [opening an issue](https://github.com/scribe-org/Scribe-Data/issues) and we'll be happy to integrate them! + + + +## Documentation + +Documentation is an invaluable way to contribute to coding projects as it allows others to more easily understand the project structure and contribute. Issues related to documentation are marked with the [`documentation`](https://github.com/scribe-org/Scribe-Data/labels/documentation) label. diff --git a/docs/source/_docs_internal/index.rst b/docs/source/_docs_internal/index.rst new file mode 100644 index 00000000..3298a748 --- /dev/null +++ b/docs/source/_docs_internal/index.rst @@ -0,0 +1,4 @@ +.. toctree:: + :hidden: + + CONTRIBUTING_NO_BACK_LINKS diff --git a/docs/source/_static/index.rst b/docs/source/_static/index.rst new file mode 100644 index 00000000..e69de29b diff --git a/docs/source/checkquery.rst b/docs/source/checkquery.rst new file mode 100644 index 00000000..c935119c --- /dev/null +++ b/docs/source/checkquery.rst @@ -0,0 +1,6 @@ +checkquery +========== + +.. automodule:: scribe_data.checkquery + :members: + :private-members: diff --git a/docs/source/conf.py b/docs/source/conf.py new file mode 100644 index 00000000..92dc06b9 --- /dev/null +++ b/docs/source/conf.py @@ -0,0 +1,177 @@ +# Configuration file for the Sphinx documentation builder. +# +# This file only contains a selection of the most common options. For a full +# list see the documentation: +# https://www.sphinx-doc.org/en/master/usage/configuration.html + +# -- Path setup -------------------------------------------------------------- + +# If extensions (or modules to document with autodoc) are in another directory, +# add these directories to sys.path here. If the directory is relative to the +# documentation root, use os.path.abspath to make it absolute, like shown here. + +import os +import sys + +sys.path.insert(0, os.path.abspath("..")) + +# -- Project information ----------------------------------------------------- + +project = "Scribe-Data" +copyright = "2024, Scribe-Data developers (GPL 3.0 License)" +author = "Scribe-Data developers" + +# The full version, including alpha/beta/rc tags +release = "3.1.0" + + +# -- General configuration --------------------------------------------------- + +# Add any Sphinx extension module names here, as strings. They can be +# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom +# ones. +extensions = [ + "m2r2", + "sphinx.ext.autodoc", + "numpydoc", + "sphinx.ext.viewcode", + "sphinx.ext.imgmath", +] + +numpydoc_show_inherited_class_members = False +numpydoc_show_class_members = False + +# NOT to sort autodoc functions in alphabetical order +autodoc_member_order = "bysource" + +# To avoid installing dependencies when building doc +# https://stackoverflow.com/a/15912502/8729698 +autodoc_mock_imports = [ + "beautifulsoup4", + "emoji", + "langcodes", + "language_data", + "mwparserfromhell", + "pandas", + "PyICU", + "pytest", + "pytest-cov", + "sentencepiece", + "SPARQLWrapper", + "tabulate", + "transformers", +] + +# Add any paths that contain templates here, relative to this directory. +templates_path = ["_templates"] + +# List of patterns, relative to source directory, that match files and +# directories to ignore when looking for source files. +# This pattern also affects html_static_path and html_extra_path. +exclude_patterns = [] + +# The suffix(es) of source filenames. +# You can specify multiple suffix as a list of string: +# +# source_suffix = ['.rst', '.md'] +source_suffix = ".rst" + +# The master toctree document. +master_doc = "index" + +# The name of the Pygments (syntax highlighting) style to use. +pygments_style = "sphinx" + + +# -- Options for HTML output ---------------------------------------------- + +# The theme to use for HTML and HTML Help pages. See the documentation for +# a list of builtin themes. +# +import sphinx_rtd_theme + +html_theme = "sphinx_rtd_theme" + +html_theme_path = [sphinx_rtd_theme.get_html_theme_path()] + +# Theme options are theme-specific and customize the look and feel of a theme +# further. For a list of options available for each theme, see the +# documentation. +# +# html_theme_options = {} + +# Add any paths that contain custom static files (such as style sheets) here, +# relative to this directory. They are copied after the builtin static files, +# so a file named "default.css" will overwrite the builtin "default.css". +html_static_path = ["_static"] + +# Custom sidebar templates, must be a dictionary that maps document names +# to template names. +# +# This is required for the alabaster theme +# refs: http://alabaster.readthedocs.io/en/latest/installation.html#sidebars +html_sidebars = { + "**": ["relations.html", "searchbox.html"] +} # needs 'show_related': True theme option to display + + +# -- Options for HTMLHelp output ------------------------------------------ + +# Output file base name for HTML help builder. +htmlhelp_basename = "Scribe-Data_doc" + + +# -- Options for LaTeX output --------------------------------------------- + +latex_elements = { + # The paper size ('letterpaper' or 'a4paper'). + # + # 'papersize': 'letterpaper', + # The font size ('10pt', '11pt' or '12pt'). + # + # 'pointsize': '10pt', + # Additional stuff for the LaTeX preamble. + # + # 'preamble': '', + # Latex figure (float) alignment + # + # 'figure_align': 'htbp', +} + +# Grouping the document tree into LaTeX files. List of tuples +# (source start file, target name, title, +# author, documentclass [howto, manual, or own class]). +latex_documents = [ + ( + master_doc, + "Scribe-Data.tex", + "Scribe-Data Documentation", + "scribe-org", + "manual", + ) +] + + +# -- Options for manual page output --------------------------------------- + +# One entry per manual page. List of tuples +# (source start file, name, description, authors, manual section). +man_pages = [(master_doc, "Scribe-Data", "Scribe-Data Documentation", [author], 1)] + + +# -- Options for Texinfo output ------------------------------------------- + +# Grouping the document tree into Texinfo files. List of tuples +# (source start file, target name, title, author, +# dir menu entry, description, category) +texinfo_documents = [ + ( + master_doc, + "Scribe-Data", + "Scribe-Data Documentation", + author, + "Scribe-Data", + "Wikidata and Wikipedia data extraction for Scribe applications", + "Miscellaneous", + ) +] diff --git a/docs/source/extract_transform/index.rst b/docs/source/extract_transform/index.rst new file mode 100644 index 00000000..053586c5 --- /dev/null +++ b/docs/source/extract_transform/index.rst @@ -0,0 +1,2 @@ +extract_transform +================= diff --git a/docs/source/index.rst b/docs/source/index.rst new file mode 100644 index 00000000..262f9d2e --- /dev/null +++ b/docs/source/index.rst @@ -0,0 +1,78 @@ +.. image:: https://raw.githubusercontent.com/scribe-org/Scribe-Data/main/.github/resources/images/ScribeDataLogo.png + :height: 150 + :align: center + :target: https://github.com/scribe-org/Scribe-Data +============ + +|rtd| |platform| |issues| |language| |pypi| |pypistatus| |license| |coc| |mastodon| |matrix| |codestyle| + +.. |rtd| image:: https://img.shields.io/readthedocs/scribe-data.svg?logo=read-the-docs + :target: http://scribe-datareadthedocs.io/en/latest/ + +.. |platform| image:: https://img.shields.io/badge/Wikidata-990000.svg?logo=wikidata&logoColor=ffffff + :target: https://github.com/scribe-org/Scribe-Data + +.. |issues| image:: https://img.shields.io/github/issues/scribe-org/Scribe-Data?label=%20&logo=github + :target: https://github.com/scribe-org/Scribe-Data/issues + +.. |language| image:: https://img.shields.io/badge/Python%203-306998.svg?logo=python&logoColor=ffffff + :target: https://github.com/scribe-org/Scribe-Data/blob/main/CONTRIBUTING.md + +.. |pypi| image:: https://img.shields.io/pypi/v/scribe-data.svg?label=%20&color=4B8BBE + :target: https://pypi.org/project/scribe-data/ + +.. |pypistatus| image:: https://img.shields.io/pypi/status/scribe-data.svg?label=%20 + :target: https://pypi.org/project/scribe-data/ + +.. |license| image:: https://img.shields.io/github/license/scribe-org/Scribe-Data.svg?label=%20 + :target: https://github.com/scribe-org/Scribe-Data/blob/main/LICENSE.txt + +.. |coc| image:: https://img.shields.io/badge/Contributor%20Covenant-ff69b4.svg + :target: https://github.com/scribe-org/Scribe-Data/blob/main/.github/CODE_OF_CONDUCT.md + +.. |mastodon| image:: https://img.shields.io/badge/Mastodon-6364FF.svg?logo=mastodon&logoColor=ffffff + :target: https://wikis.world/@scribe + +.. |matrix| image:: https://img.shields.io/badge/Matrix-000000.svg?logo=matrix&logoColor=ffffff + :target: https://matrix.to/#/#scribe_community:matrix.org + +.. |codestyle| image:: https://img.shields.io/badge/black-000000.svg + :target: https://github.com/psf/black + +Wikidata and Wikipedia data extraction for Scribe applications + +Installation +------------ +.. code-block:: shell + + pip install scribe-data + +.. code-block:: shell + + git clone https://github.com/scribe-org/Scribe-Data.git + cd Scribe-Data + python setup.py install + +.. code-block:: python + + import scribe_data + +.. toctree:: + :maxdepth: 2 + :caption: Contents + + extract_transform/index + load/index + checkquery + utils + notes + +.. toctree:: + :hidden: + + _docs_internal/index + +Project Indices +=============== + +* :ref:`genindex` diff --git a/docs/source/load/index.rst b/docs/source/load/index.rst new file mode 100644 index 00000000..74b060d7 --- /dev/null +++ b/docs/source/load/index.rst @@ -0,0 +1,2 @@ +load +==== diff --git a/docs/source/notes.rst b/docs/source/notes.rst new file mode 100644 index 00000000..18cfa461 --- /dev/null +++ b/docs/source/notes.rst @@ -0,0 +1,9 @@ +.. mdinclude:: _docs_internal/CONTRIBUTING_NO_BACK_LINKS.md + +License +======= + +.. literalinclude:: ../../LICENSE.txt + :language: text + +.. mdinclude:: ../../CHANGELOG.md diff --git a/docs/source/utils.rst b/docs/source/utils.rst new file mode 100644 index 00000000..0606fca6 --- /dev/null +++ b/docs/source/utils.rst @@ -0,0 +1,8 @@ +utils +===== + +The :py:mod:`utils` module provides utility functions for data extraction, formatting and loading. + +.. automodule:: scribe_data.utils + :members: + :private-members: diff --git a/environment.yml b/environment.yml index 6ea655db..581c9211 100644 --- a/environment.yml +++ b/environment.yml @@ -15,10 +15,10 @@ dependencies: - pip: - black>=23.7.0 - emoji>=2.8.0 + - langcodes>=3.0.0 + - language_data>=1.0.0 - mwparserfromhell>=0.6.5 - PyICU>=2.10.2 # Make sure to fulfill PyICU dependencies, see https://gitlab.pyicu.org/main/pyicu#installing-pyicu - python-dateutil>=2.8.2 - regex>=2023.8.8 - SPARQLWrapper>=2.0.0 - - langcodes>=3.0.0 - - language_data>=1.0.0 diff --git a/requirements.txt b/requirements.txt index 5134ebf9..8a372a93 100644 --- a/requirements.txt +++ b/requirements.txt @@ -3,6 +3,8 @@ black>=19.10b0 certifi>=2020.12.5 defusedxml==0.7.1 emoji>=2.2.0 +langcodes>=3.0.0 +language_data>=1.0.0 mwparserfromhell>=0.6 packaging>=20.9 pandas>=1.5.3 @@ -15,5 +17,3 @@ SPARQLWrapper>=2.0.0 tabulate>=0.8.9 tqdm==4.56.1 transformers>=4.12 -langcodes>=3.0.0 -language_data>=1.0.0 diff --git a/src/scribe_data/extract_transform/process_wiki.py b/src/scribe_data/extract_transform/process_wiki.py index fbbe2de2..c29a9cba 100644 --- a/src/scribe_data/extract_transform/process_wiki.py +++ b/src/scribe_data/extract_transform/process_wiki.py @@ -21,7 +21,7 @@ from SPARQLWrapper import JSON, POST, SPARQLWrapper from tqdm.auto import tqdm -from scribe_data.utils import ( # get_android_data_path, get_desktop_data_path, +from scribe_data.utils import ( get_ios_data_path, get_language_qid, get_language_words_to_ignore, diff --git a/src/scribe_data/utils.py b/src/scribe_data/utils.py index e4a13299..b93422eb 100644 --- a/src/scribe_data/utils.py +++ b/src/scribe_data/utils.py @@ -27,8 +27,9 @@ from importlib import resources from pathlib import Path from typing import Any + import langcodes -from langcodes import * +from langcodes import Language PROJECT_ROOT = "Scribe-Data" @@ -164,6 +165,7 @@ def get_language_iso(language: str) -> str: ) from e return iso_code + def get_language_from_iso(iso: str) -> str: """ Returns the language name for the given ISO.