Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check glyph CID difference across different SHS versions #259

Closed
NightFurySL2001 opened this issue Apr 18, 2020 · 9 comments
Closed

Check glyph CID difference across different SHS versions #259

NightFurySL2001 opened this issue Apr 18, 2020 · 9 comments

Comments

@NightFurySL2001
Copy link

Is there any list/file that specified the CID-glyph relationship? I am trying to get a modified subset from Genne Gothic compared to SHS 2.001 KR version but some of the CIDs in cmap are changed between 1.004 (Genne Gothic) and 2.001 (SHS). Is there any documentation of CID list or CID changes between versions?

P/S: also how is this image generated? Is there any software that can let devs to look through all the CIDs and variants per font?

@NightFurySL2001 NightFurySL2001 changed the title Check glyph CID difference Check glyph CID difference across different SHS versions Apr 18, 2020
@punchcutter
Copy link
Member

Sorry for the delay in responding. One way you can do this is to check the mapping with fontTools to compare any fonts, not just these. Here's something I threw together in a couple minutes so it could definitely be improved, but gives the basic idea. You just provide the first and second font and you will get the diffs.

import sys
from fontTools.ttLib import TTFont

path1 = sys.argv[1]
path2 = sys.argv[2]
font1 = TTFont(path1)
font2 = TTFont(path2)
cmap1 = font1['cmap'].getBestCmap()
cmap2 = font2['cmap'].getBestCmap()


def report_missing(cmap1, cmap2):
	# Report mappings in the first font that are not in the second font
	missing = []
	for uni, cid in cmap1.items():
		if uni not in cmap2:
			missing.append(hex(uni))

	if missing:
		report = "\n".join(missing)
		print(f'Mappings not found in {path2}:\n{report}')


def report_changed(cmap1, cmap2):
	# Report mappings that have changed
	changed = []
	for uni, cid in cmap1.items():
		if uni in cmap2:
			if cmap2.get(uni) != cid:
				changed.append(f'{hex(uni)} {cid} -> {cmap2.get(uni)}')
	if changed:
		report = "\n".join(changed)
		print(f'Mappings differ between {path1} and {path2}:\n{report}')

report_missing(cmap1, cmap2)
report_changed(cmap1, cmap2)

For what you are asking there will be a lot of differences, but a really minor example is the diff between SourceHanSansJP-Regular.otf and NotoSansJP-Regular.otf (explained here):

Mappings not found in NotoSansJP-Regular.otf:
0x2252
0x25c8

I'm not sure where that image was generated. It looks like @tamcy has a lot of those. That's definitely useful, wherever it came from. @tamcy is that something that can be shared?

@adobe-fonts adobe-fonts deleted a comment Apr 27, 2020
@NightFurySL2001
Copy link
Author

NightFurySL2001 commented Apr 27, 2020

@punchcutter Sadly this can't be used for my issue as I'm checking across different SHS versions, which had different CID mappings as new glyphs were introduced and old are removed, drastically changing the CIDs in SHS between 1.004 (where Genne Gothic is based on) and 2.001. There are a few glyphs (notably inherited glyphs that can used in inherited glyphs project) that are removed from SHS 2.001 which Genne Gothic had used to substitute glyphs in. From the SHS Readme page 28:

As a result of removing approximately 1,750 glyphs in order to make room for approximately 1,750 new glyphs, the CID assignments of the glyphs necessarily—and drastically—changed. The CID assignments of exactly 200 glyphs are unchanged from Version 1.004: 0–107, 2570–2633, 47223–47232, 47262–47272, 47281–47286, and 65484.

The picture below illustrates how CID changes do not reflect glyph changes across versions:
image
Character shown are Genne Gothic (based on SHS v1.004) on left, SHS v2.001 on right. Range are pick between U+9F00 and U+9FFF to illustrate differences in the same Unicode area.

Unless there are software that can detect vector outline changes, or official site can provide full glyph/CID changes across versions, I don't think there's an easy way to find out what characters are changed between SHS v1.004 and v2.001.

@punchcutter
Copy link
Member

@NightFurySL2001 A script like this does exactly what you said about finding differences in CID-mapping between different versions of SHS or any other fonts. If you also want to have visual differences then that's a completely different thing. I guess I'm not entirely clear what you are trying to do.

@NightFurySL2001
Copy link
Author

@punchcutter Well I want to find what characters are different visually between Genne Gothic and SHS KR v2.001.

The problem here is that, the same glyph from v1.004 will probably have a different CID number in v2.001, while different glyph between v1.004 and v2.001 will have the same CID number. Eg. with the picture, CID-47131 (and a lot of others) have the same CID number but different glyph shape (either pointing to the same word or different word entierly), while the other that have the fully same glyphs like U+9F2D that are different in CID number between versions, eg. CID-47221 in v1.004 moved to CID-47220 in v2.001.

I want to find out which glyphs differ in visual appearance between fonts, like the 龍 in Genne Gothic is different than SHS KR v2.001, and is probably removed in v2.001.

@tamcy
Copy link

tamcy commented May 5, 2020

@punchcutter Sorry for the late response. Had been very busy since last month.

Indeed, what @NightFurySL2001 posted in the first post came from a tool that I wrote.

This was developed for two purposes: (1) to help me inspect and identify issues regarding the Source Han Sans HK release so that I can report them, and (2) to help the development of my "opinionated" version of the font, Chiron Sans HK. The tool, which is simply a webpage printing characters in different regions on each row, helps me decide which glyph to use in my font, or a redesign is needed for a codepoint (okay there's a search function but you get my point).

Because of this goal, I'd made a lot of assumptions when writing it. In addition, it's HK version centric (the HK reference glyph is shown, alongside with the TW glyph), and you can only view the glyphs by codepoints (not CIDs. Non-default glyph like those only accessible through IVS are currently not supported). So this probably isn't what @NightFurySL2001 is looking for.

All Source Han Sans related information used by this tool came from this repository. In particular, the CMap file and the CID file are very useful to me. The former describes how codepoints are mapped to CIDs, while the latter contains a map of CIDs to glyph names - which looks like the "CID List" @NightFurySL2001 had asked for.

@NightFurySL2001
Copy link
Author

@tamcy Thank you for your reply.

  1. Thanks! The CID file is just what I am looking for. I am downloading the v1.004 source file (which Genne Gothic is based on) to check if there's a same CID file in there so I can start comparing. There may still be a few catches to make (like a glyph in an issue here that mentioned that TW glyph and CN glyph have their name swapped in v2.000 and thus have different glyph shape but same name across v1.004 and v2.001). It's still a good start so I'll try approaching from there.

  2. About the program: it'd be really good if we have a tool to see different version of glyphs for a codepoint without having to mess with the languages in Adobe and Office (which btw had a littttttle bit of problem handling language proofing and fonts). The left side which states the glyph name (-KR/-JP/-CN/-TW/-HK) will be useful to check whether a character is shared across languages as some differences are very minor which may be hard to detech with pure eyesight. Additionally you can add an option to show the CID below the glyph name so it shows both of them at the same time (suggestion). Tl;dr: we'd like a software/program that is able to do what your picture is showing, i.e. versions of glyphs across language region.
    (P/S: also maybe add more support for other standards like 中易宋体 for Simplified Chinese and other for Japanese/Korean so comparison to original standard can be made not only for Traditional Chinese, but all languages as well. which is probably hard.)

@hfhchan
Copy link

hfhchan commented May 5, 2020

中易宋體 has tons of errors compared to latest GB18030. Better not rely on it.

@NightFurySL2001
Copy link
Author

I have contacted a developer of the project and had gotten the info that I need. Closing this issue as solved.

@NightFurySL2001
Copy link
Author

NightFurySL2001 commented Aug 3, 2021

With a little help from my friend on sorting through all the infos, here is a full list:
shs-v1.004-full-removed.pdf
shs-1.004-full-removed.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants