Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GSSI data issues #3

Closed
iannesbitt opened this issue Jan 30, 2019 · 11 comments
Closed

GSSI data issues #3

iannesbitt opened this issue Jan 30, 2019 · 11 comments

Comments

@iannesbitt
Copy link

iannesbitt commented Jan 30, 2019

Hi Alain,

I tried to open some of my own GSSI data (example file TEST__001.DZT attached)
with GPRPy and ran into some trouble. My own import function reads it just fine.

I'm relatively sure that your import script thinks the header of my DZT is 8 bytes larger than it really is. After some head scratching, I was able to at least rewrite gprIO_DZT.py to at least be able to reshape the array correctly. But it seems to still get the datatype wrong (or perhaps there's still a byteshift somewhere), so the profile looks kind of weird. I still can't quite figure out why this is the case. What I've done to temporarily lazily patch it is rewrite gprpy.importdata to use a function I wrote called readgssi.dzt.readdzt_gprpy that loads DZT header data and array in the same format as gprpy.toolbox.gprIO_DZT does.

Line 17 in gprpy.py:

from readgssi import dzt

Lines 87 and 88:

#self.data, self.info = gprIO_DZT.readdzt(filename)
self.data, self.info = dzt.readdzt_gprpy(filename)

However since my file did not use a DMI (this was a lake profile so it uses scans per second instead), it has no distance data attached to it. So I've also rewritten line 90 to handle a ZeroDivisionError in case scans per meter is zero in order to calculate profile position:

try:
    self.profilePos = self.info["startposition"]+np.linspace(0.0,
                                                             self.data.shape[1]/self.info["scpmeter"],
                                                             self.data.shape[1])
except ZeroDivisionError:
    self.profilePos = np.linspace(0.0, self.data.shape[1]/self.info["scpsec"],
                                  self.data.shape[1])

This seems to work reasonably well but since I haven't changed anything in the gui it still displays the plot position in meters.

I have the changes in a fork and will open a pull request if you'd like. Otherwise, take a look at how gprpy and readgssi import DZTs and see if you can figure out how they read the array differently.

Best,
Ian

@iannesbitt
Copy link
Author

P.S. The fork is here: https://github.com/iannesbitt/GPRPy

@iannesbitt
Copy link
Author

Also, this file format description GSSI sent me may be of help to you: https://github.com/NSGeophysics/GPRPy/files/2810655/DZT.File.Format.6-14-16.pdf

@AlainPlattner
Copy link
Member

Hi Ian,

Thanks so much for pointing this out and for providing a solution! I wonder why the header in your DZT file is different from the headers in the DZT files I had been working with. Have you tested your import with the provided example data (gprpy/exampledata/GSSI/FILE____032.DZT)?

Could it be that perhaps GSSI updated their headers at some point and you have a newer system? In that case I should probably write gprIO_dzt such that it can discern between the two versions.

Do you have an example data file that you are ok sharing that I can use to test? What year was your system manufactured? Or do you have a special version?

Thanks and cheers,
Alain

@iannesbitt
Copy link
Author

iannesbitt commented Jan 30, 2019

Alain,

From what I understand, GSSI headers have changed many times over the years. The PDF I linked to describes some calculations you can do to determine which type of header a certain file has.

Yes, I can open your DZT with readgssi:
file____032_400mhzg1

I'm still not sure if I can open all of the old versions of DZT though. I've tested and been successful with probably three generations of the file structure (including some new ones with more than one radar channel) but I think there are more I haven't come across yet.

If you'd like to test with one of my files, I've provided one here:
https://github.com/NSGeophysics/GPRPy/files/2810391/TEST__001.DZT.zip

I love your software and can't wait to use it with my data!

Ian

@iannesbitt
Copy link
Author

Here's some more potentially useful info I have kicking around:

GSSI user manual from 2015. Pages 55-57 have similar information to that given in the 2016 DZT File Format pdf linked above:
GSSI - SIR-3000 Operation Manual.pdf

A 2002 USGS document which describes the DZT file format from that era (p.85-86):
usgs-fileformat.pdf

@AlainPlattner
Copy link
Member

Thank so much Ian,
It means a lot to me that you like GPRPy and I'm very thankful for your help with improving it. I'll try to figure out how to best work with the different DZT file formats this Saturday. I think with all the work that you have already done this should be solvable within a day or so.

@iannesbitt
Copy link
Author

This may be a bigger can of worms than you're willing to open, but I also have a dual-channel example file here if you'd like to test with that as well. Antenna 0 frequency is 800 MHz, and antenna 1 frequency is 300 MHz. Previous versions of my software could read dual channel files but I'm working on fixing a bug in the current version that causes it to crash when reading.

TEST__002.DZT.zip

@AlainPlattner
Copy link
Member

AlainPlattner commented Feb 2, 2019

I think I solved the original issue. I hadn't implemented extended headers, as it says in the file format documentation that you had attached:
// if rh_data < MINHEADSIZE then
// offset is MINHEADSIZE * rh_data
// else offset is MINHEADSIZE *rh_nchan
I added this distinction and also corrected that while 8 bit and 16 bit samples are unsigned int, 32 bit samples are signed int. The documentation sheet says: "Eight byte and sixteen byte samples are
unsigned integers. Thirty-two bit samples are signed integers." (they of course mean bit, not byte).

This should now work. I haven't implemented the automatic detection if profile is distance or recording time as if you know the speed, you can just calculate the length of the profile by hand and then use "adj profile" to set the distance, or if you want to keep it in recording time but want to change the x axis label in a figure, you can export your script and then change the label (see attached script).
The data looks as follows:
test1

The following script:
# Automatically generated by GPRPy
import gprpy.gprpy as gp
mygpr = gp.gprpyProfile()
mygpr.importdata('TEST__001.DZT')
#This creates the standard plot (automatically generated by GPRPy)
mygpr.printProfile('Test1.pdf', color='bwr', contrast=3, yrng=[0,2000], xrng=[-200,35.125], dpi=300)
# This changes the profile axis label
import matplotlib.pyplot as plt
mygpr.showProfile( color='bwr', contrast=3, yrng=[0,2000], xrng=[-200,35.125])
plt.gca().set_xlabel('recording time [s]')
# This if to make a pdf
plt.savefig('Test1p.pdf',dpi=300)
# This to just view it
plt.show()

Creates this figure:
test1p

Please let me know if this is what it should be or if there is still something I'm missing.

@AlainPlattner
Copy link
Member

About the multichannel:
I haven't implemented it but if there is a way that you could build into your readgssi code an option to read multichannel and then export them individually (even just as a binary file with the raw data and an ASCII header file with information), then they could be read into GPRPy afterward.

@iannesbitt
Copy link
Author

It works! Thank you!

I will add to my readgssi to-do list a way to export multichannel array data to separate files with the same basename (numpy binary and JSON, for example). Would something like this work?

FILE__001_800MHz.json
FILE__001_800MHz.npy
FILE__001_300MHz.json
FILE__001_300MHz.npy

Alternatively, I could try writing to multiple separate DZTs.

@AlainPlattner
Copy link
Member

Awesome! Thanks again for pointing out the issue and providing help with solving it!

For exporting multichannel data as individual files for each channel: I think something like json and npy could work. Or perhaps just a binary file of the raw data per channel and a separate ASCII text file header containing some information with keywords such as "ntraces", "traces per meter", "trace length [ns]". I'll leave it up to you what you find the most intuitive. You could also create DZT files but that could be a bit tricky as you'd need to write the header bits exactly like they do it...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants