Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial set of user docs translated xliff files from Crowdin #17106

Merged
merged 12 commits into from
Sep 3, 2024

Conversation

michaelDCurran
Copy link
Member

@michaelDCurran michaelDCurran commented Sep 3, 2024

This pr adds a newly generated changes.xliff for English, which has also been uploaded to Crowdin.

This PR updates the user docs github action to upload the English changes.xliff to Crowdin if it has changed.

This pr also includes the initial set of translated user docs xliff files from Crowdin.
So far that is 20 translations of the user guide, and 7 translations of changes (what's new).

scons will see that these are newer than their markdown files, and rebuild the markdown files from these, and then build the html from the rebuilt markdown files.

Copy link
Contributor

coderabbitai bot commented Sep 3, 2024

Walkthrough

The changes introduce a new GitHub Actions workflow for automating the synchronization of English user documentation with translation files. Additionally, a Python script is added to manage markdown translations through XLIFF files, including functionalities for generating, updating, and translating markdown content. Comprehensive unit tests for the translation module are also introduced to validate its functionality.

Changes

File(s) Change Summary
.github/workflows/regenerate_english_userDocs_translation_source.yml New workflow to update English user documentation for translation, handling Markdown and XLIFF files.
sconstruct Added functionality to generate Markdown files from localized XLIFF files, excluding English.
tests/unit/test_markdownTranslate.py Introduced unit tests for the markdownTranslate module, validating various translation functionalities.
user_docs/markdownTranslate.py New script for managing markdown translations through XLIFF files, including generation and updating functions.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant GitHub Actions
    participant Python Script
    participant Crowdin

    User->>GitHub Actions: Push changes to beta branch
    GitHub Actions->>Python Script: Check modified Markdown files
    Python Script->>Python Script: Update corresponding XLIFF files
    Python Script-->>GitHub Actions: Commit changes to XLIFF
    GitHub Actions->>Crowdin: Upload updated XLIFF file
Loading

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share
Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai generate interesting stats about this repository and render them as a table.
    • @coderabbitai show all the console.log statements in this repository.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@seanbudd seanbudd changed the base branch from master to beta September 3, 2024 03:54
@seanbudd seanbudd added this to the 2024.4 milestone Sep 3, 2024
@michaelDCurran
Copy link
Member Author

Err, this should be for beta.

@michaelDCurran
Copy link
Member Author

Fixed now.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Inline review comments failed to post

Actionable comments posted: 14

Outside diff range, codebase verification and nitpick comments (2)
tests/unit/test_markdownTranslate.py (1)

15-39: LGTM: Well-structured test class setup with a minor suggestion.

The test class is well-organized with appropriate setup and teardown methods. The helper method runMarkdownTranslateCommand is a good practice for running commands and handling errors.

Consider capturing and including the command output in the error message for easier debugging:

	def runMarkdownTranslateCommand(self, description: str, args: list[str]):
		try:
			result = subprocess.run([sys.executable, self.markdownTranslateScriptPath, *args], check=True, capture_output=True, text=True)
		except subprocess.CalledProcessError as e:
			message = f"Failed when trying to {description} with command: {' '.join(args)}\nOutput: {e.output}\nError: {e.stderr}"
			self.fail(message)
user_docs/markdownTranslate.py (1)

50-61: Enhance security by sanitizing filePath in getLastCommitID.

The getLastCommitID function does not sanitize the filePath argument before using it in a subprocess.run call. This poses a potential security risk for command injection. Consider adding validation or sanitization to ensure filePath is safe to use.

  • Location: user_docs/markdownTranslate.py, lines 50-61
Analysis chain

Enhance security in git command execution.

The getLastCommitID function uses subprocess.run to execute a git command. While it uses check=True to raise an exception on non-zero exit codes, it's important to ensure that the filePath argument is properly sanitized to prevent command injection.

Verify that filePath is properly sanitized before being used in the git command:

Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check if filePath is properly sanitized before being used in git commands
rg -n 'subprocess\.run\(\["git".*\$\{?filePath' user_docs/markdownTranslate.py

Length of output: 80


Script:

#!/bin/bash
# Search for the getLastCommitID function definition and examine the usage of filePath
ast-grep --lang python --pattern $'def getLastCommitID(filePath: $_) {\n  $$$\n}' user_docs/markdownTranslate.py

Length of output: 577


Script:

#!/bin/bash
# Correct the pattern to search for the getLastCommitID function definition and examine the usage of filePath
ast-grep --lang python --pattern 'def getLastCommitID(filePath):' user_docs/markdownTranslate.py -A 10

Length of output: 104


Script:

#!/bin/bash
# Use rg to search for the getLastCommitID function definition in the file
rg -A 10 'def getLastCommitID' user_docs/markdownTranslate.py

Length of output: 480

Comments failed to post (14)
.github/workflows/regenerate_english_userDocs_translation_source.yml (4)

11-29: Consider using a requirements.txt file for dependency management.

The current setup installs the required Python packages directly in the workflow. While this works, using a requirements.txt file would be a more maintainable approach, especially if the number of dependencies grows in the future.

Consider creating a requirements.txt file in your repository with the following content:

lxml
requests

Then, modify the "Install dependencies" step as follows:

- name: Install dependencies
  run: |
    python -m pip install --upgrade pip
    pip install -r requirements.txt

This approach centralizes dependency management and makes it easier to maintain and update dependencies in the future.


30-53: Enhance error handling and logging in the XLIFF update process.

While the current implementation is functional, it could benefit from improved error handling and more detailed logging. This would make troubleshooting easier if issues arise during the XLIFF update process.

Consider the following enhancements:

  1. Add error handling around the Python script execution:
try {
    python user_docs/markdownTranslate.py updateXliff -x $xliff -m $file -o $tempXliff
    if ($LASTEXITCODE -ne 0) {
        throw "Python script failed with exit code $LASTEXITCODE"
    }
} catch {
    Write-Error "Failed to update XLIFF file: $_"
    exit 1
}
  1. Add more detailed logging:
Write-Host "Starting XLIFF update process for $file"
# ... existing code ...
Write-Host "XLIFF update process completed successfully for $file"
  1. Consider adding a summary at the end of the process:
$updatedFiles = @()
# ... in the foreach loop ...
$updatedFiles += $xliff
# ... after the loop ...
Write-Host "XLIFF update process completed. Updated files: $($updatedFiles -join ', ')"

These changes will provide more visibility into the process and make it easier to identify and resolve any issues that may occur.


55-83: Improve security handling of SSH key.

The current implementation writes the SSH private key to a file, which could potentially be a security risk if the runner is compromised.

Consider using the ssh-agent to manage the SSH key more securely. Here's a suggested improvement:

- name: Set up SSH key
  env:
    SSH_PRIVATE_KEY: ${{ secrets.XLIFF_DEPLOY_PRIVATE_KEY }}
  run: |
    mkdir -p ~/.ssh
    ssh-keyscan github.com >> ~/.ssh/known_hosts
    ssh-agent -a $SSH_AUTH_SOCK > /dev/null
    echo "$SSH_PRIVATE_KEY" | tr -d '\r' | ssh-add -
  env:
    SSH_AUTH_SOCK: /tmp/ssh_agent.sock

- name: Commit and Push changes
  env:
    GIT_SSH_COMMAND: "ssh -o StrictHostKeyChecking=no -i ~/.ssh/id_rsa"
  run: |
    # ... rest of your existing script ...

This approach uses ssh-agent to manage the key in memory, reducing the risk of the key being exposed on disk.

Additionally, consider using a GitHub Action like webfactory/ssh-agent which handles SSH key setup securely:

- uses: webfactory/ssh-agent@v0.5.0
  with:
    ssh-private-key: ${{ secrets.XLIFF_DEPLOY_PRIVATE_KEY }}

This would replace the manual SSH key setup and provide a more secure, tested solution.


85-103: Enhance Crowdin upload process.

While the current implementation works, it could benefit from improved error handling and more flexibility.

Consider the following improvements:

  1. Add error handling for the Python script execution:
try {
    python appVeyor/crowdinSync.py uploadSourceFile 18 user_docs/en/userguide.xliff
    if ($LASTEXITCODE -ne 0) {
        throw "Crowdin upload failed with exit code $LASTEXITCODE"
    }
    Write-Host "Successfully uploaded userGuide.xliff to Crowdin"
} catch {
    Write-Error "Failed to upload to Crowdin: $_"
    exit 1
}
  1. Make the file ID configurable:

Instead of hardcoding the file ID (18), consider storing it in a GitHub secret or variable:

env:
  crowdinProjectID: ${{ vars.CROWDIN_PROJECT_ID }}
  crowdinAuthToken: ${{ secrets.CROWDIN_AUTH_TOKEN }}
  crowdinUserGuideFileID: ${{ vars.CROWDIN_USERGUIDE_FILE_ID }}

Then use this variable in your script:

python appVeyor/crowdinSync.py uploadSourceFile $env:crowdinUserGuideFileID user_docs/en/userguide.xliff
  1. Consider adding a mechanism to upload other XLIFF files if needed in the future:
$xliffFiles = @{
    "userGuide.xliff" = $env:crowdinUserGuideFileID
    # Add more files here as needed
}

foreach ($file in $xliffFiles.Keys) {
    $changed = git diff --name-only ${{GITHUB.SHA}}.. -- "user_docs/en/$file"
    if ($changed) {
        Write-Host "Uploading $file to Crowdin"
        python appVeyor/crowdinSync.py uploadSourceFile $xliffFiles[$file] "user_docs/en/$file"
    } else {
        Write-Host "Not uploading $file to Crowdin as it has not changed"
    }
}

These changes will make the Crowdin upload process more robust, flexible, and easier to maintain as the project grows.

tests/unit/test_markdownTranslate.py (1)

41-135: LGTM: Comprehensive test method with suggestions for improvement.

The test_markdownTranslate method covers various functionalities of the markdownTranslate script, including generating XLIFF files, regenerating markdown files, updating XLIFF files, and translating to French. Each step is followed by a verification, which is a good practice.

Consider the following improvements for better readability and maintainability:

  1. Break down the large test method into smaller, focused test methods. This will make it easier to identify which specific functionality fails if a test doesn't pass.

  2. Use parameterized tests to reduce code duplication for similar test cases.

  3. Create helper methods for common operations, such as file path creation.

Here's an example of how you could refactor a part of the test:

import unittest
from parameterized import parameterized

class TestMarkdownTranslate(unittest.TestCase):
    # ... (existing setup code) ...

    def _get_file_path(self, filename):
        return os.path.join(self.outDir.name if filename.startswith("rebuilt_") else self.testDir, filename)

    @parameterized.expand([
        ("2024.2", "en_2024.2_userGuide"),
        ("2024.3beta6", "en_2024.3beta6_userGuide"),
    ])
    def test_generate_and_verify_markdown(self, version, file_prefix):
        xliff_file = f"{file_prefix}.xliff"
        md_file = f"{file_prefix}.md"
        rebuilt_md_file = f"rebuilt_{md_file}"

        self.runMarkdownTranslateCommand(
            f"Generate an xliff file from the English {version} user guide markdown file",
            ["generateXliff", "-m", self._get_file_path(md_file), "-o", self._get_file_path(xliff_file)],
        )

        self.runMarkdownTranslateCommand(
            f"Regenerate the {version} markdown file from the generated {version} xliff file",
            ["generateMarkdown", "-x", self._get_file_path(xliff_file), "-o", self._get_file_path(rebuilt_md_file), "-u"],
        )

        self.runMarkdownTranslateCommand(
            f"Ensure the regenerated {version} markdown file matches the original {version} markdown file",
            ["ensureMarkdownFilesMatch", self._get_file_path(rebuilt_md_file), self._get_file_path(md_file)],
        )

    # ... (other test methods) ...

This refactoring improves readability, reduces duplication, and makes it easier to add new test cases in the future.

user_docs/markdownTranslate.py (9)

544-649: Consider organizing CLI setup into a separate function.

The command-line interface setup is quite lengthy and could benefit from being organized into a separate function for better readability and maintainability.

Consider refactoring the CLI setup into a separate function:

+def setup_cli():
+    mainParser = argparse.ArgumentParser()
+    commandParser = mainParser.add_subparsers(title="commands", dest="command", required=True)
+    
+    # Setup for generateXliff
+    generateXliffParser = commandParser.add_parser("generateXliff")
+    generateXliffParser.add_argument(
+        "-m",
+        "--markdown",
+        dest="md",
+        type=str,
+        required=True,
+        help="The markdown file to generate the xliff file for",
+    )
+    generateXliffParser.add_argument(
+        "-o", "--output", dest="output", type=str, required=True, help="The file to output the xliff file to"
+    )
+    
+    # Setup for other commands...
+    
+    return mainParser

 if __name__ == "__main__":
-    mainParser = argparse.ArgumentParser()
-    commandParser = mainParser.add_subparsers(title="commands", dest="command", required=True)
-    # ... (rest of the CLI setup)
+    parser = setup_cli()
+    args = parser.parse_args()
     
-    args = mainParser.parse_args()
     match args.command:
         case "generateXliff":
             generateXliff(mdPath=args.md, outputPath=args.output)
         # ... (rest of the command handling)

Committable suggestion was skipped due to low confidence.


38-47: Consider adding error handling for file operations.

The createAndDeleteTempFilePath_contextManager function is well-implemented as a context manager. However, it might be beneficial to add error handling for file creation and deletion operations.

Consider wrapping the file operations in try-except blocks to handle potential IOErrors:

 @contextlib.contextmanager
 def createAndDeleteTempFilePath_contextManager(
 	dir: str | None = None, prefix: str | None = None, suffix: str | None = None
 ) -> Generator[str, None, None]:
 	"""A context manager that creates a temporary file and deletes it when the context is exited"""
 	with tempfile.NamedTemporaryFile(dir=dir, prefix=prefix, suffix=suffix, delete=False) as tempFile:
 		tempFilePath = tempFile.name
 		tempFile.close()
 		yield tempFilePath
+		try:
 			os.remove(tempFilePath)
+		except OSError as e:
+			print(f"Error deleting temporary file {tempFilePath}: {e}")

Committable suggestion was skipped due to low confidence.


397-442: Consider adding a progress bar for better user feedback.

The generateMarkdown function processes potentially large files. Adding a progress bar could provide better feedback to the user.

Consider using the tqdm library to add a progress bar:

+from tqdm import tqdm
+
 def generateMarkdown(xliffPath: str, outputPath: str, translated: bool = True) -> Result_generateMarkdown:
 	print(f"Generating markdown file {prettyPathString(outputPath)} from {prettyPathString(xliffPath)}...")
 	res = Result_generateMarkdown()
 	with contextlib.ExitStack() as stack:
 		outputFile = stack.enter_context(open(outputPath, "w", encoding="utf8", newline=""))
 		xliff = lxml.etree.parse(xliffPath)
 		xliffRoot = xliff.getroot()
 		namespace = {"xliff": "urn:oasis:names:tc:xliff:document:2.0"}
 		if xliffRoot.tag != "{urn:oasis:names:tc:xliff:document:2.0}xliff":
 			raise ValueError("Not an xliff file")
 		skeletonNode = xliffRoot.find("./xliff:file/xliff:skeleton", namespaces=namespace)
 		if skeletonNode is None:
 			raise ValueError("No skeleton found in xliff file")
 		skeletonContent = xmlUnescape(skeletonNode.text).strip()
+		total_lines = len(skeletonContent.splitlines())
+		pbar = tqdm(total=total_lines, desc="Generating Markdown")
 		for line in skeletonContent.splitlines(keepends=True):
 			res.numTotalLines += 1
 			if m := re_translationID.match(line):
 				prefix, ID, suffix = m.groups()
 				res.numTranslatableStrings += 1
 				unit = xliffRoot.find(f'./xliff:file/xliff:unit[@id="{ID}"]', namespaces=namespace)
 				if unit is not None:
 					segment = unit.find("./xliff:segment", namespaces=namespace)
 					if segment is not None:
 						source = segment.find("./xliff:source", namespaces=namespace)
 						if translated:
 							target = segment.find("./xliff:target", namespaces=namespace)
 						else:
 							target = None
 						if target is not None and target.text:
 							res.numTranslatedStrings += 1
 							translation = xmlUnescape(target.text)
 						elif source is not None and source.text:
 							translation = xmlUnescape(source.text)
 						else:
 							raise ValueError(f"No source or target found for unit {ID}")
 					else:
 						raise ValueError(f"No segment found for unit {ID}")
 				else:
 					raise ValueError(f"Cannot locate Unit {ID} in xliff file")
 				outputFile.write(f"{prefix}{translation}{suffix}\n")
 			else:
 				outputFile.write(line)
+			pbar.update(1)
+		pbar.close()
 		print(
 			f"Generated markdown file with {res.numTotalLines} total lines, {res.numTranslatableStrings} translatable strings, and {res.numTranslatedStrings} translated strings"
 		)
 		return res

Committable suggestion was skipped due to low confidence.


75-80: Consider adding error handling for file path operations.

The getRawGithubURLForPath function combines multiple operations. It might be beneficial to add error handling for cases where the file path is invalid or not within the git repository.

Consider adding a try-except block to handle potential OSError or ValueError exceptions:

 def getRawGithubURLForPath(filePath: str) -> str:
-	gitDirPath = getGitDir()
-	commitID = getLastCommitID(filePath)
-	relativePath = os.path.relpath(os.path.abspath(filePath), gitDirPath)
-	relativePath = relativePath.replace("\\", "/")
-	return f"{RAW_GITHUB_REPO_URL}/{commitID}/{relativePath}"
+	try:
+		gitDirPath = getGitDir()
+		commitID = getLastCommitID(filePath)
+		relativePath = os.path.relpath(os.path.abspath(filePath), gitDirPath)
+		relativePath = relativePath.replace("\\", "/")
+		return f"{RAW_GITHUB_REPO_URL}/{commitID}/{relativePath}"
+	except (OSError, ValueError) as e:
+		raise ValueError(f"Error generating GitHub URL for {filePath}: {e}")
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

def getRawGithubURLForPath(filePath: str) -> str:
	try:
		gitDirPath = getGitDir()
		commitID = getLastCommitID(filePath)
		relativePath = os.path.relpath(os.path.abspath(filePath), gitDirPath)
		relativePath = relativePath.replace("\\", "/")
		return f"{RAW_GITHUB_REPO_URL}/{commitID}/{relativePath}"
	except (OSError, ValueError) as e:
		raise ValueError(f"Error generating GitHub URL for {filePath}: {e}")

650-673: Consider adding global error handling and logging.

The main execution block could benefit from a global try-except block to catch and log any unexpected errors that might occur during command execution.

Consider adding a global try-except block and logging:

+import logging
+
+logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
+
 if __name__ == "__main__":
-    match args.command:
-        case "generateXliff":
-            generateXliff(mdPath=args.md, outputPath=args.output)
-        case "updateXliff":
-            updateXliff(
-                xliffPath=args.xliff,
-                mdPath=args.md,
-                outputPath=args.output,
-            )
-        case "generateMarkdown":
-            generateMarkdown(xliffPath=args.xliff, outputPath=args.output, translated=args.translated)
-        case "translateXliff":
-            translateXliff(
-                xliffPath=args.xliff,
-                lang=args.lang,
-                pretranslatedMdPath=args.pretranslatedMd,
-                outputPath=args.output,
-            )
-        case "pretranslateLangs":
-            pretranslateAllPossibleLanguages(langsDir=args.langsDir, mdBaseName=args.mdBaseName)
-        case "ensureMarkdownFilesMatch":
-            ensureMarkdownFilesMatch(path1=args.path1, path2=args.path2)
-        case _:
-            raise ValueError(f"Unknown command: {args.command}")
+    try:
+        match args.command:
+            case "generateXliff":
+                generateXliff(mdPath=args.md, outputPath=args.output)
+            case "updateXliff":
+                updateXliff(
+                    xliffPath=args.xliff,
+                    mdPath=args.md,
+                    outputPath=args.output,
+                )
+            case "generateMarkdown":
+                generateMarkdown(xliffPath=args.xliff, outputPath=args.output, translated=args.translated)
+            case "translateXliff":
+                translateXliff(
+                    xliffPath=args.xliff,
+                    lang=args.lang,
+                    pretranslatedMdPath=args.pretranslatedMd,
+                    outputPath=args.output,
+                )
+            case "pretranslateLangs":
+                pretranslateAllPossibleLanguages(langsDir=args.langsDir, mdBaseName=args.mdBaseName)
+            case "ensureMarkdownFilesMatch":
+                ensureMarkdownFilesMatch(path1=args.path1, path2=args.path2)
+            case _:
+                raise ValueError(f"Unknown command: {args.command}")
+    except Exception as e:
+        logging.error(f"An error occurred: {e}")
+        raise
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

	import logging

	logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')

	if __name__ == "__main__":
		try:
			match args.command:
				case "generateXliff":
					generateXliff(mdPath=args.md, outputPath=args.output)
				case "updateXliff":
					updateXliff(
						xliffPath=args.xliff,
						mdPath=args.md,
						outputPath=args.output,
					)
				case "generateMarkdown":
					generateMarkdown(xliffPath=args.xliff, outputPath=args.output, translated=args.translated)
				case "translateXliff":
					translateXliff(
						xliffPath=args.xliff,
						lang=args.lang,
						pretranslatedMdPath=args.pretranslatedMd,
						outputPath=args.output,
					)
				case "pretranslateLangs":
					pretranslateAllPossibleLanguages(langsDir=args.langsDir, mdBaseName=args.mdBaseName)
				case "ensureMarkdownFilesMatch":
					ensureMarkdownFilesMatch(path1=args.path1, path2=args.path2)
				case _:
					raise ValueError(f"Unknown command: {args.command}")
		except Exception as e:
			logging.error(f"An error occurred: {e}")
			raise

117-135: Consider adding more detailed logging.

The generateSkeleton function could benefit from more detailed logging, especially for larger files where the process might take some time.

Consider adding more granular logging:

 def generateSkeleton(mdPath: str, outputPath: str) -> Result_generateSkeleton:
 	print(f"Generating skeleton file {prettyPathString(outputPath)} from {prettyPathString(mdPath)}...")
 	res = Result_generateSkeleton()
 	with (
 		open(mdPath, "r", encoding="utf8") as mdFile,
 		open(outputPath, "w", encoding="utf8", newline="") as outputFile,
 	):
+		total_lines = sum(1 for _ in mdFile)
+		mdFile.seek(0)
 		for mdLine in mdFile.readlines():
 			res.numTotalLines += 1
 			skelLine = skeletonizeLine(mdLine)
 			if skelLine:
 				res.numTranslationPlaceholders += 1
 			else:
 				skelLine = mdLine
 			outputFile.write(skelLine)
+			if res.numTotalLines % 1000 == 0:
+				print(f"Processed {res.numTotalLines}/{total_lines} lines...")
 		print(
 			f"Generated skeleton file with {res.numTotalLines} total lines and {res.numTranslationPlaceholders} translation placeholders"
 		)
 		return res

Committable suggestion was skipped due to low confidence.


165-209: Consider breaking down the updateSkeleton function.

The updateSkeleton function is quite long and complex. Consider breaking it down into smaller, more manageable functions to improve readability and maintainability.

Consider extracting the main loop into a separate function:

 def updateSkeleton(
 	origMdPath: str, newMdPath: str, origSkelPath: str, outputPath: str
 ) -> Result_updateSkeleton:
 	print(
 		f"Creating updated skeleton file {prettyPathString(outputPath)} from {prettyPathString(origSkelPath)} with changes from {prettyPathString(origMdPath)} to {prettyPathString(newMdPath)}..."
 	)
 	res = Result_updateSkeleton()
 	with contextlib.ExitStack() as stack:
 		origMdFile = stack.enter_context(open(origMdPath, "r", encoding="utf8"))
 		newMdFile = stack.enter_context(open(newMdPath, "r", encoding="utf8"))
 		origSkelFile = stack.enter_context(open(origSkelPath, "r", encoding="utf8"))
 		outputFile = stack.enter_context(open(outputPath, "w", encoding="utf8", newline=""))
 		mdDiff = difflib.ndiff(origMdFile.readlines(), newMdFile.readlines())
 		origSkelLines = iter(origSkelFile.readlines())
+		res = process_diff_lines(mdDiff, origSkelLines, outputFile)
+		print(
+			f"Updated skeleton file with {res.numAddedLines} added lines "
+			f"({res.numAddedTranslationPlaceholders} translation placeholders), "
+			f"{res.numRemovedLines} removed lines ({res.numRemovedTranslationPlaceholders} translation placeholders), "
+			f"and {res.numUnchangedLines} unchanged lines ({res.numUnchangedTranslationPlaceholders} translation placeholders)"
+		)
+		return res
+
+def process_diff_lines(mdDiff, origSkelLines, outputFile) -> Result_updateSkeleton:
+	res = Result_updateSkeleton()
 		for mdDiffLine in mdDiff:
 			if mdDiffLine.startswith("?"):
 				continue
 			if mdDiffLine.startswith(" "):
 				res.numUnchangedLines += 1
 				skelLine = next(origSkelLines)
 				if re_translationID.match(skelLine):
 					res.numUnchangedTranslationPlaceholders += 1
 				outputFile.write(skelLine)
 			elif mdDiffLine.startswith("+"):
 				res.numAddedLines += 1
 				skelLine = skeletonizeLine(mdDiffLine[2:])
 				if skelLine:
 					res.numAddedTranslationPlaceholders += 1
 				else:
 					skelLine = mdDiffLine[2:]
 				outputFile.write(skelLine)
 			elif mdDiffLine.startswith("-"):
 				res.numRemovedLines += 1
 				origSkelLine = next(origSkelLines)
 				if re_translationID.match(origSkelLine):
 					res.numRemovedTranslationPlaceholders += 1
 			else:
 				raise ValueError(f"Unexpected diff line: {mdDiffLine}")
-		print(
-			f"Updated skeleton file with {res.numAddedLines} added lines "
-			f"({res.numAddedTranslationPlaceholders} translation placeholders), "
-			f"{res.numRemovedLines} removed lines ({res.numRemovedTranslationPlaceholders} translation placeholders), "
-			f"and {res.numUnchangedLines} unchanged lines ({res.numUnchangedTranslationPlaceholders} translation placeholders)"
-		)
 		return res
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

def updateSkeleton(
	origMdPath: str, newMdPath: str, origSkelPath: str, outputPath: str
) -> Result_updateSkeleton:
	print(
		f"Creating updated skeleton file {prettyPathString(outputPath)} from {prettyPathString(origSkelPath)} with changes from {prettyPathString(origMdPath)} to {prettyPathString(newMdPath)}..."
	)
	res = Result_updateSkeleton()
	with contextlib.ExitStack() as stack:
		origMdFile = stack.enter_context(open(origMdPath, "r", encoding="utf8"))
		newMdFile = stack.enter_context(open(newMdPath, "r", encoding="utf8"))
		origSkelFile = stack.enter_context(open(origSkelPath, "r", encoding="utf8"))
		outputFile = stack.enter_context(open(outputPath, "w", encoding="utf8", newline=""))
		mdDiff = difflib.ndiff(origMdFile.readlines(), newMdFile.readlines())
		origSkelLines = iter(origSkelFile.readlines())
		res = process_diff_lines(mdDiff, origSkelLines, outputFile)
		print(
			f"Updated skeleton file with {res.numAddedLines} added lines "
			f"({res.numAddedTranslationPlaceholders} translation placeholders), "
			f"{res.numRemovedLines} removed lines ({res.numRemovedTranslationPlaceholders} translation placeholders), "
			f"and {res.numUnchangedLines} unchanged lines ({res.numUnchangedTranslationPlaceholders} translation placeholders)"
		)
		return res

def process_diff_lines(mdDiff, origSkelLines, outputFile) -> Result_updateSkeleton:
	res = Result_updateSkeleton()
	for mdDiffLine in mdDiff:
		if mdDiffLine.startswith("?"):
			continue
		if mdDiffLine.startswith(" "):
			res.numUnchangedLines += 1
			skelLine = next(origSkelLines)
			if re_translationID.match(skelLine):
				res.numUnchangedTranslationPlaceholders += 1
			outputFile.write(skelLine)
		elif mdDiffLine.startswith("+"):
			res.numAddedLines += 1
			skelLine = skeletonizeLine(mdDiffLine[2:])
			if skelLine:
				res.numAddedTranslationPlaceholders += 1
			else:
				skelLine = mdDiffLine[2:]
			outputFile.write(skelLine)
		elif mdDiffLine.startswith("-"):
			res.numRemovedLines += 1
			origSkelLine = next(origSkelLines)
			if re_translationID.match(origSkelLine):
				res.numRemovedTranslationPlaceholders += 1
		else:
			raise ValueError(f"Unexpected diff line: {mdDiffLine}")
	return res

217-326: Consider adding progress logging for long-running operations.

The generateXliff function might benefit from progress logging, especially when processing large files.

Consider adding progress logging:

 def generateXliff(
 	mdPath: str,
 	outputPath: str,
 	skelPath: str | None = None,
 ) -> Result_generateXliff:
 	# If a skeleton file is not provided, first generate one
 	with contextlib.ExitStack() as stack:
 		if not skelPath:
 			skelPath = stack.enter_context(
 				createAndDeleteTempFilePath_contextManager(
 					dir=os.path.dirname(outputPath),
 					prefix=os.path.basename(mdPath),
 					suffix=".skel",
 				)
 			)
 			generateSkeleton(mdPath=mdPath, outputPath=skelPath)
 		with open(skelPath, "r", encoding="utf8") as skelFile:
 			skelContent = skelFile.read()
 	res = Result_generateXliff()
 	print(
 		f"Generating xliff file {prettyPathString(outputPath)} from {prettyPathString(mdPath)} and {prettyPathString(skelPath)}..."
 	)
 	with contextlib.ExitStack() as stack:
 		mdFile = stack.enter_context(open(mdPath, "r", encoding="utf8"))
 		outputFile = stack.enter_context(open(outputPath, "w", encoding="utf8", newline=""))
 		fileID = os.path.basename(mdPath)
 		mdUri = getRawGithubURLForPath(mdPath)
 		print(f"Including Github raw URL: {mdUri}")
 		outputFile.write(
 			'<?xml version="1.0"?>\n'
 			f'<xliff xmlns="urn:oasis:names:tc:xliff:document:2.0" version="2.0" srcLang="en">\n'
 			f'<file id="{fileID}" original="{mdUri}">\n'
 		)
 		outputFile.write(f"<skeleton>\n{xmlEscape(skelContent)}\n</skeleton>\n")
 		res.numTranslatableStrings = 0
+		total_lines = sum(1 for _ in mdFile)
+		mdFile.seek(0)
 		for lineNo, (mdLine, skelLine) in enumerate(
 			zip_longest(mdFile.readlines(), skelContent.splitlines(keepends=True)), start=1
 		):
 			mdLine = mdLine.rstrip()
 			skelLine = skelLine.rstrip()
 			if m := re_translationID.match(skelLine):
 				res.numTranslatableStrings += 1
 				prefix, ID, suffix = m.groups()
 				if prefix and not mdLine.startswith(prefix):
 					raise ValueError(f'Line {lineNo}: does not start with "{prefix}", {mdLine=}, {skelLine=}')
 				if suffix and not mdLine.endswith(suffix):
 					raise ValueError(f'Line {lineNo}: does not end with "{suffix}", {mdLine=}, {skelLine=}')
 				source = mdLine[len(prefix) : len(mdLine) - len(suffix)]
 				outputFile.write(
 					f'<unit id="{ID}">\n' "<notes>\n" f'<note appliesTo="source">line: {lineNo + 1}</note>\n'
 				)
 				if prefix:
 					outputFile.write(f'<note appliesTo="source">prefix: {xmlEscape(prefix)}</note>\n')
 				if suffix:
 					outputFile.write(f'<note appliesTo="source">suffix: {xmlEscape(suffix)}</note>\n')
 				outputFile.write(
 					"</notes>\n"
 					f"<segment>\n"
 					f"<source>{xmlEscape(source)}</source>\n"
 					"</segment>\n"
 					"</unit>\n"
 				)
 			else:
 				if mdLine != skelLine:
 					raise ValueError(f"Line {lineNo}: {mdLine=} does not match {skelLine=}")
+			if lineNo % 1000 == 0:
+				print(f"Processed {lineNo}/{total_lines} lines...")
 		outputFile.write("</file>\n" "</xliff>")
 		print(f"Generated xliff file with {res.numTranslatableStrings} translatable strings")
 		return res
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

def generateXliff(
	mdPath: str,
	outputPath: str,
	skelPath: str | None = None,
) -> Result_generateXliff:
	# If a skeleton file is not provided, first generate one
	with contextlib.ExitStack() as stack:
		if not skelPath:
			skelPath = stack.enter_context(
				createAndDeleteTempFilePath_contextManager(
					dir=os.path.dirname(outputPath),
					prefix=os.path.basename(mdPath),
					suffix=".skel",
				)
			)
			generateSkeleton(mdPath=mdPath, outputPath=skelPath)
		with open(skelPath, "r", encoding="utf8") as skelFile:
			skelContent = skelFile.read()
	res = Result_generateXliff()
	print(
		f"Generating xliff file {prettyPathString(outputPath)} from {prettyPathString(mdPath)} and {prettyPathString(skelPath)}..."
	)
	with contextlib.ExitStack() as stack:
		mdFile = stack.enter_context(open(mdPath, "r", encoding="utf8"))
		outputFile = stack.enter_context(open(outputPath, "w", encoding="utf8", newline=""))
		fileID = os.path.basename(mdPath)
		mdUri = getRawGithubURLForPath(mdPath)
		print(f"Including Github raw URL: {mdUri}")
		outputFile.write(
			'<?xml version="1.0"?>\n'
			f'<xliff xmlns="urn:oasis:names:tc:xliff:document:2.0" version="2.0" srcLang="en">\n'
			f'<file id="{fileID}" original="{mdUri}">\n'
		)
		outputFile.write(f"<skeleton>\n{xmlEscape(skelContent)}\n</skeleton>\n")
		res.numTranslatableStrings = 0
		total_lines = sum(1 for _ in mdFile)
		mdFile.seek(0)
		for lineNo, (mdLine, skelLine) in enumerate(
			zip_longest(mdFile.readlines(), skelContent.splitlines(keepends=True)), start=1
		):
			mdLine = mdLine.rstrip()
			skelLine = skelLine.rstrip()
			if m := re_translationID.match(skelLine):
				res.numTranslatableStrings += 1
				prefix, ID, suffix = m.groups()
				if prefix and not mdLine.startswith(prefix):
					raise ValueError(f'Line {lineNo}: does not start with "{prefix}", {mdLine=}, {skelLine=}')
				if suffix and not mdLine.endswith(suffix):
					raise ValueError(f'Line {lineNo}: does not end with "{suffix}", {mdLine=}, {skelLine=}')
				source = mdLine[len(prefix) : len(mdLine) - len(suffix)]
				outputFile.write(
					f'<unit id="{ID}">\n' "<notes>\n" f'<note appliesTo="source">line: {lineNo + 1}</note>\n'
				)
				if prefix:
					outputFile.write(f'<note appliesTo="source">prefix: {xmlEscape(prefix)}</note>\n')
				if suffix:
					outputFile.write(f'<note appliesTo="source">suffix: {xmlEscape(suffix)}</note>\n')
				outputFile.write(
					"</notes>\n"
					f"<segment>\n"
					f"<source>{xmlEscape(source)}</source>\n"
					"</segment>\n"
					"</unit>\n"
				)
			else:
				if mdLine != skelLine:
					raise ValueError(f"Line {lineNo}: {mdLine=} does not match {skelLine=}")
			if lineNo % 1000 == 0:
				print(f"Processed {lineNo}/{total_lines} lines...")
		outputFile.write("</file>\n" "</xliff>")
		print(f"Generated xliff file with {res.numTranslatableStrings} translatable strings")
		return res

328-387: Consider adding input validation for the lang parameter.

In the translateXliff function, it might be beneficial to add input validation for the lang parameter to ensure it's a valid language code.

Consider adding a validation step for the lang parameter:

 def translateXliff(
 	xliffPath: str,
 	lang: str,
 	pretranslatedMdPath: str,
 	outputPath: str,
 	allowBadAnchors: bool = False,
 ) -> Result_translateXliff:
+	# Validate lang parameter
+	if not re.match(r'^[a-z]{2,3}(-[A-Z]{2,3})?$', lang):
+		raise ValueError(f"Invalid language code: {lang}")
 	print(
 		f"Creating {lang} translated xliff file {prettyPathString(outputPath)} from {prettyPathString(xliffPath)} using {prettyPathString(pretranslatedMdPath)}..."
 	)
 	res = Result_translateXliff()
 	# ... rest of the function ...

Committable suggestion was skipped due to low confidence.

user_docs/en/changes.md Outdated Show resolved Hide resolved
@zstanecic
Copy link
Contributor

zstanecic commented Sep 3, 2024 via email

@michaelDCurran
Copy link
Member Author

@zstanecic For now at least, we are going to keep the changes file as is. However, now that it is on Crowdin, translators can choose to simply not translate those strings if they wish.
We may revisit the structure of the changes file at some point, but only after we have completed the move to Crowdin.
I will have more to say on the translators list once this pr is merged.

@wmhn1872265132
Copy link
Contributor

Please note that the .xliff file is now included into the installer, should this file be excluded?

@michaelDCurran
Copy link
Member Author

@wmhn1872265132 thanks for catching this. I've excluded xliff files now.

@cary-rowen
Copy link
Contributor

Hi @michaelDCurran

Just wanted to ask:
Will we still have a way to directly edit the markdown version of the user guide in the future?

Since we plan to make extensive revisions to the Simplified Chinese version of the User Guide in the future, I would like to see a way to directly edit the Markdown version of the User Guide instead of a po file.

For large-scale changes, we may prefer to use a text editor to edit the markdown file.

Also want a script that uses structure comparison to ensure that the structure is not broken.

Thanks

@michaelDCurran
Copy link
Member Author

@cary-rowen I'm sorry, but going forward documentation such as the user guide and changes files can only be translated on Crowdin, either through its interface, or using poedit to translate the xliff files.
If there are specific sections required by translations but is not yet in the English source, we can consider adding a placeholder in the English version.
I will have more to say on the translators list in the coming days once I have merged this pr.

@cary-rowen
Copy link
Contributor

Thanks Mic,

Will NV Access open source the script for converting markdown in the future? This allows us to edit markdown to xliff in a highly customizable way

Best,
Cary

@hwf1324
Copy link
Contributor

hwf1324 commented Sep 3, 2024

Or can I use a project like https://github.com/cataria-rocks/md2xliff to do so?

@michaelDCurran michaelDCurran merged commit 51cd079 into beta Sep 3, 2024
4 checks passed
@michaelDCurran michaelDCurran deleted the updatetranslations branch September 3, 2024 10:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants