A tool to extract plain (unformatted) multilingual text, redirects, links and categories from wikipedia backups (dumps). Designed to prepare clean training data for AI training / Machine Learning software.
-
Updated
Nov 11, 2023 - Python
A tool to extract plain (unformatted) multilingual text, redirects, links and categories from wikipedia backups (dumps). Designed to prepare clean training data for AI training / Machine Learning software.
(Atlas-of-Learning-I ) (Term-base) Source code and all the data extracted while developing the Atlas of learning model using the Python extractor developed for PDF and wikipedia parsing .
A Touhou Wiki parser that returns a list of Touhou Arranges plus their Circles and Albums, including in HTML using GitHub Pages.
Analysis of human typing and keyboard layout efficiency.
A Touhou Wiki parser that returns the "Touhou Puppet Play" Touhoudex with every Touhoumon and their Stats.
Add a description, image, and links to the wiki-parser topic page so that developers can more easily learn about it.
To associate your repository with the wiki-parser topic, visit your repo's landing page and select "manage topics."