Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XLSX output is truncated #1360

Open
chinyeungli opened this issue Aug 8, 2024 · 1 comment
Open

XLSX output is truncated #1360

chinyeungli opened this issue Aug 8, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@chinyeungli
Copy link
Contributor

Describe the bug
The description field from the JSON output is as follow:

  "description": "add and remove users and groups\n This package includes the 'adduser' and 'deluser' commands for creating\n and removing users.\n .\n  - 'adduser' creates new users and groups and adds existing users to\n    existing groups;\n  - 'deluser' removes users and groups and removes users from a given\n    group.\n .\n Adding users with 'adduser' is much easier than adding them manually.\n Adduser will choose appropriate UID and GID values, create a home\n directory, copy skeletal user configuration, and automate setting\n initial values for the user's password, real name and so on.\n .\n Deluser can back up and remove users' home directories\n and mail spool or all the files they own on the system.\n .\n A custom script can be executed after each of the commands.",

However, in the XLSX output, it got trancated in some of the newline character:

add and remove users and groups
 This package includes the 'adduser' and 'deluser' commands for creating
 and removing users.
 .
  - 'adduser' creates new users and groups and adds existing users to
@chinyeungli chinyeungli added the bug Something isn't working label Aug 8, 2024
@tdruez
Copy link
Contributor

tdruez commented Aug 12, 2024

@chinyeungli The Truncated description is the expected behavior with XLSX outputs.

From https://github.com/nexB/scancode.io/blob/main/scanpipe/pipes/output.py#L384

  • Truncate the "description" field to the first five lines.
if fieldname == "description":
        max_description_lines = 5
        value = "\n".join(value.splitlines(False)[:max_description_lines])

@pombredanne Could you provide some insight on those design decisions?

@chinyeungli As you can see, the XLSX format has some limitations and is not ideal for sharing "scan data". If your main concern is data integrity, the JSON format is preferred.

Convert the value to a string and perform these adaptations:
- Keep only unique values in lists, preserving ordering.
- Truncate the "description" field to the first five lines.
- Truncate any field too long to fit in an XLSX cell and report error.
- Create a combined license expression for expressions
- Normalize line endings
- Truncate the value to a maximum_length supported by XLSX.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants