Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPS Inconsistent behavior #431

Closed
stronnag opened this issue Aug 6, 2016 · 188 comments
Closed

GPS Inconsistent behavior #431

stronnag opened this issue Aug 6, 2016 · 188 comments
Labels

Comments

@stronnag
Copy link
Collaborator

stronnag commented Aug 6, 2016

Spoiler : Somedays, adequate humber of satellites and reasonable HDOP doesn't meant nav functions will work as expected.

Today I flew 85fe245 and f440ead on my usually utterly reliable quad. This machine has not failed to execute PH, RTH or missions since we fixed SBAS months ago ---- until today.

Result. Completely random PH behaviour (if PH fails, then I don't even try any other nav function). Randomly, on hard reset, nav functions will either work or fail. Some log files at
http://seyrsnys.myzen.co.uk/inav_ph_woes/. All the log files were created in a short time frame with good satellite coverage (16-19) and good HDOP (1.1 - 1.3).

  • LOG00279.TXT: 85fe245. PH fails miserably, repeated.
  • LOG00280.TXT: f440ead (having persistent mag/acc makes checkout / build / flash / apply config in the field so easy:). Works OK.

... does some betaflight stuff on the minis for 30 minutes

  • LOG00281.TXT: f440ead: nav functions fail miserably, repeatedly (multiple arm / disarms)
  • LOG00282.TXT: f440ead: powercycle nav functions work again !!!!
  • LOG00283.TXT: f440ead: powercycle nav functions still work. Out of lipos, otherwise I'd have tried 43eaf10 (which behaved perfectly across multiple lipos earlier in the week).

Note 0. LOG00281 Index 1, PH attempt 4 is a good example of 'fly off' selection_473

Note 1. 281-283 were executed in quick succession, it is unlikely that there was some environmental change that affected the results.
Note 2. If PH is not (obviously, visually working), it is rapidly aborted.

Exec Summary : nav post 43eaf10 seems pretty random to me at the moment. Other experiences solicited.

@stronnag stronnag changed the title RFC: Do we PH random failures post 43eaf10 ? RFC: Do we have PH random failures post 43eaf10 ? Aug 6, 2016
@digitalentity
Copy link
Member

digitalentity commented Aug 6, 2016

Seems like a GPS-related problem. GPS (and estimated navVel) velocity doesn't seem to be correlated with UAV attitude and stick input:
image

It might also be a compass issue:
At log 281 there is a moment when UAV is flying at only forward pitch - this is usually flying straight forward and GPS heading should be roughly equal to magnetic heading, however they are opposite:
image

However, I doubt it's compass - UAV heading strongly correlates with yaw stick input. I'd blame GPS reception that reported invalid coordinates/velocities/course for some reason.

@stronnag was there strong wind that day, if so - what direction?

@stronnag
Copy link
Collaborator Author

stronnag commented Aug 7, 2016

@digitalentity, thanks for the analysis.

There was less than 8 m/s wind; I often fly in more. My initial thought (from the field) was that it was just a bad GPS day. I guess that happens from time to time. I did not suspect the compass, much of random flying / drifting around was manually correlating the observed heading with the LTM reported heading (which seemed consistent), whereas the mwp calculated 'range from home' often seemed incongruous.

It was an object lesson in 'try PH before engaging more advanced nav functions'.

I'll do some more missions today before closing.

@digitalentity
Copy link
Member

@stronnag
Interestingly GPS glitch detection did fire an alarm a few times. We definitely need better glitch detection logic.

@digitalentity digitalentity changed the title RFC: Do we have PH random failures post 43eaf10 ? Better GPS glitch detection is needed Aug 7, 2016
@stronnag
Copy link
Collaborator Author

stronnag commented Aug 7, 2016

A day later .... updated the firmware to abf1015 (so no significant change). Starts with similarly less than stellar sat statistics (13-15 sats, 1.3-1.5 HDOP); couple of hours later its back to normal, 19-20 sats and 1.1 HDOP.

PH and other nav functions back to their awesome performance and consistency, even at the lower than normal end of the sat coverage range. Someone in the DoD really didn't like me yesterday.

Maybe we should use some of those spare X-FRAME bits for a glitch detection alert or counter?

@digitalentity
Copy link
Member

@stronnag thanks for your report, indeed something must be wrong with GPS reception. We can use a byte in X-Frame to indicate INAV internal status flags, however we still don't have a good logic of detecting GPS failures...

@Linjieqiang
Copy link
Contributor

The solution to the problem is using EKF algorithm although it looks messy and complex.

@digitalentity
Copy link
Member

EKF is nothing close to a glitch detection and protection, it's merely an algorithm to blend data from available sensors.

@Linjieqiang
Copy link
Contributor

If large difference between receiving from gps and predicting form EKF,program will move the data from gps.Maybe I'm wrong.

@digitalentity
Copy link
Member

It's not the EKF itself, it's a supporting code logic that does the detection.

@Linjieqiang
Copy link
Contributor

Oh.Sorry.It's my wrong.

@xdigyx
Copy link

xdigyx commented Aug 10, 2016

Today I had a situation when at pos hold mode my hexacopter rapidly started to fly away. Not sure whether it was caused by loosing some sats or... Anyway maybe it would be a good idea either to take an average of 2-3 readings or ignore for pos hold mode position change grater than... (say 10cm/sample depending on model speed from the previous cycle/ significant sat.num loss)?

@stronnag
Copy link
Collaborator Author

GPS glitches are very rare, my report was a once a year occurrence, for a frequent flyer. Your problem is more likely symptomatic of mag interference. Have you verified that the mag works perfectly at all throttle settings?

@xdigyx
Copy link

xdigyx commented Aug 11, 2016

Not sure what's telling you it's mag problem. If it was, then I believe the model would make slowly bigger and bigger circles, but it would not rapidly fly away. Do you agree?
Although I have used pos hold max for 1h, it happened only once till now, apart this time pos hold works great.
Unfortunately I had no BBox enabled...

@digitalentity
Copy link
Member

Hard to diagnose w/o logs. Circles happen when heading is slightly wrong (up to maybe 30 deg) - this result in quad moving in slightly wrong direction. Bigger error will result in quad going in significantly wrong direction and a case of fly away.

@xdigyx
Copy link

xdigyx commented Aug 11, 2016

There were no circles (for sure if it were then I would see it every time when using pos hold for a while). But there was just sudden pitch inclination increase by approx 30 deg. Now I know how to enable onboard dataflash bbox, so hope to catch it next time.

@digitalentity
Copy link
Member

It's either a magnetic anomaly or compass failure, or real GPS glitch. If compass wiring is not good the sporadic bug in connectivity to compass chip may cause the chip to freeze and give out same heading over and over again. Please also check the wiring to the compass.

@digitalentity
Copy link
Member

#453

@xdigyx
Copy link

xdigyx commented Aug 11, 2016

I am using onboard compass, so no wiring, can try to threat it with hot air but low chance. The FC is shielded (50x50mm 35um Cu PCB) from the bottom and grounded.

@DzikuVx
Copy link
Member

DzikuVx commented Aug 11, 2016

@xdigyx shielding like that would not block magnetic field from power cables. It might shield oscillating magnetic field, but frequency would have to be > 100kHz.
Interference from power cables is almost not oscillating, grounded Cu PCB or even full Faraday cage is ineffective.
The only solution is to move power cables from battery/ESC further away from FC and (better option) use external magnetometer on a mast

@xdigyx
Copy link

xdigyx commented Aug 11, 2016

Got your point, I was not thinking about this as the yaw heading reads were changing only 1-3deg depending on THR level.
Firstly I will try to catch the event with blackbox to see what's the root cause and maybe discuss what can be done to avoid such situation . Then I will use an external mag.

@stronnag
Copy link
Collaborator Author

@digitalentity. I no longer think that this is an external GPS issue.

  • I've been flying this quad for months without nav issue; I have 100s of logs to prove it;
  • The problems coincidently only happened after major I/O changes late in the 1.2 cycle.

Today.

  • Arm, PH fails miserable (no log alas, straight line 'fly away').
  • Land. Power cycle. PH works LOG00292.TXT /1.
  • Land. Disarm (still powered), fly a mission perfectly. LOG00292.TXT /2
  • Land. Power cycle. PH Fails miserably (straight line 'fly away') LOG00293.TXT
  • Land. Power cycle. PH works perfectly. LOG00294.TXT

(all logs at http://seyrsnys.myzen.co.uk/inav_ph_woes/). There was perhaps 30 seconds in the land / power-cycle sequence. I find it hard to believe the satellite performance is varying in that short time period. The whole sequence described above was within 7 minutes, with consistently between 17-19 satellites and c. 1.1 HDOP.

That I can have nav functions randomly work / fail on power cycle within very short time periods looks to me like a firmware issue rather than a celestial GPS issue. I'm encouraged in this theory by your recent CC3D sensor woes.

@martinbudden
Copy link
Contributor

I'm certainly willing to believe this is a firmware problem. The IO changes were pervasive, and although I was very careful, there is certainly a possibility that I've introduced a bug somewhere.

@stronnag
Copy link
Collaborator Author

stronnag commented Aug 12, 2016

It's all circumstantial at the moment, but definitely a regression since 2016-7-30, the last flawless nav experience (with this hardware).

I should add that so far the Dodo has behaved OK, whilst the SPRF3 has not (same firmware). Tomorrow, it's the Dodo.

@xdigyx
Copy link

xdigyx commented Aug 12, 2016

Today I did catch same issue. For sure there was some yaw drift, but it seems like the model was all time trying to face the starting position. After approx 2 min from start num of sats dropped to 0 just for one read cycle and my model rapidly moved.
I've got the log file. Some lines here:
time (us) GPS_fixType GPS_numSat GPS_altitude GPS_speed (m/s) GPS_ground_course GPS_hdop GPS_eph GPS_epv
105125924 2 16 123 0.06 181.4 129 88 132
105326384 2 16 123 0.06 181.4 129 88 132
105526783 0 0 123 0.07 181.4 9999 95 132
105727183 2 16 124 0.07 181.4 129 95 132
105927635 2 16 124 17.76 171.4 131 95 132
106128086 2 16 124 0.56 171.3 131 95 132
106328488 2 16 124 0.11 171.3 131 95 132
106528932 2 16 124 0.31 127.9 131 94 132
106729408 2 16 124 0.59 22.1 148 94 132
106929832 2 16 124 0.86 359.3 148 94 132
107130275 2 16 124 0.86 359.3 116 94 132
107330739 2 16 124 0.83 356.9 122 94 132
107531103 2 16 124 0.82 357.4 119 94 131
107535112 2 16 124 0.82 357.4 119 94 131
107731575 2 16 124 0.65 355.4 119 94 131
107931951 2 16 124 0.44 355.3 122 93 131
108132427 2 16 124 0.34 357.9 122 93 130
108332876 2 16 124 0.20 0.2 129 93 130

The full log file I can send on email to whom it may concern (pm me).

Just for clarification: I am using ver 1.1.

@digitalentity
Copy link
Member

digitalentity commented Aug 12, 2016

@stronnag, I agree, this is very likely software issue. However I'm
thinking it was there before the IO changes - I remember having odd mixer
issues a while ago before that major change happened.

Which direction was your machine facing on powering up? Maybe something in
firmware is messing up the mag...

@digitalentity
Copy link
Member

digitalentity commented Aug 12, 2016 via email

@digitalentity
Copy link
Member

@stronnag I have a strong feeling that something is wrong with either IMU of magnetometer code.

From your latest logs:
LOG 292
heading 31 - PH works

LOG 293
heading 22 - PH fails
heading 28 - PH fails

LOG 294
heading 200 - PH works

Interestingly in LOG292 and LOG293 machine does the same (correct) tilt to corrent for error however actual correction differes. This can only happen when heading is incorrect.

Can you send exactly the same hex/dump file you've been using during these tests so I can check it on my SPRF3 board?

@martinbudden
Copy link
Contributor

So, can you clarify what you have done. As I understand it you have taken your Dodo FC off your tricopter (which does PH fine) and put it on the quadcopter which previously had the SPRACINGF3. The quadcopter didn't do PH before and now still doesn't do PH. Is that correct?

Also, can you put the SPRACINGF3 on the tricopter and see how that flies?

@digitalentity
Copy link
Member

I'd say it's a hardware issue, however we have one single weird thing here - issue is not present on older commit (from before complete IO migration). I'm confused.

@stronnag do you, by any chance, have a SerialRX receiver (S.Bus or something like that)? I'd really like to rule out the PPM receiver code. 0489eb8 did some changes to timer handling which may be the thing that's haunting us now. Also, try disabling gyro sync (just to make sure EXTI is not responsible).

@digitalentity
Copy link
Member

If I understand correctly, the difference between Tri and Quad is the airframe and GPS module, the rest is the same. I'd say it's something wrong with GPS code or GPS processing code in NAV system.

Also worth trying - SPRacingF3 EVO on a quad. I know it's not that simple but it will rule out the specific firmware issue.

Facts we have:

  1. GPS modules are different on Tri and Quad
  2. Issue is not related to the FC hardware (to be confirmed)
  3. Issue is sometimes resolved by soft-resetting
  4. If PH works - it works perfectly w/o issue, it never fails between flights if machine wasn't rebooted
  5. In failed PH attempts we usually have uncorrelated heading and GPS course

Did I miss something?

@stronnag
Copy link
Collaborator Author

I have a spektrum satellite that I can steal off another machine (moving tele off or soft serial). I can also disable gyro sync. I could put the SPRF3 on the tri, but that's my last resort. I hope to get out later today with spek sat and gyro off (and working logging).

Summary:

Setup Result
TRI + Dodo No failure with any firmware
TRI + Evo No failure with any firmware (only since last 10 days)
Quad + SPRF3 / BMP280 / GPS mag PH fails 50% post 27822a5
Quad + SPRF3 / BMP180 +5583 PH fails 50% post 27822a5
Quad + Dodo PH fails 50% (not tested with 27822a5)

@stronnag
Copy link
Collaborator Author

Did you miss something. No
To confirm:

Vehicle GPS
Quad Beitian BN880 + V3 firmware
Tri popular 'banggood' M8N (in black plastic cover) + V2 firmware

@digitalentity
Copy link
Member

@stronnag thanks for testing my crazy ideas 😄

I think the issue is isolated to your specific setup, however since it wasn't evident with earlier revisions, we should try hard and fix it as it's clearly a regression in the latest code.

@stronnag
Copy link
Collaborator Author

And thanks for all the developer effort that has gone into this. I'm more that will to agree that it could well be setup specific post 27822a5, on the other hand, we have a very small testing community.

@stronnag
Copy link
Collaborator Author

Further testing / option elimination

Test Effect
gyro_sync = off No difference
gyro_lpf = 98HZ (vice 188) No difference
Raise GPS tower by 25mm No difference
Use serial RX No difference
move 3DR further away from GPS No difference

However, one thing you mentioned did seem worthy of further experimentation.

Most of time, after a hard reset PH will initially fail; a single soft reboot will iirc always fix the condition.
Once PH is working, it will not fail again after subsequent soft reboots (tested with about 6 PH / land / reboot cycles).
Is this indicative of a cold start timing or race condition that did not exist in 27822a5 and is mitigated during a soft reboot as the hardware is ready?

@digitalentity
Copy link
Member

@stronnag now this is something!

Quite possible it's a hardware not ready condition. Do you know if it's HMC5883L or HMC5983 chip on BN880 GPS? They use the same driver but HMC5983 is usually taking longer to initialise.

In the sensorsAutodetect() compass is detected last, so we have this sequence of initialisation:
PWM -> Serial -> GYRO -> ACC -> BARO -> MAG

If during migration to new IO we saved some time on initialising all of them then the delay for compass to initialise on power-up is no longer enough.

Another possibility is that we start initialising GPS too early, not giving it much time to boot properly.

Simply looking at the diff output I see that we changed a delay in bmp085 detection from 200ms to 20ms, maybe some other code is working faster as well. If so - simply adding a pre-init delay when detecting a power-up condition (not a soft-reset) will solve the problem.

I'll prepare a debug branch to test as soon as possible!

@digitalentity
Copy link
Member

I've implemented two extra delays on cold-booting in #517. One is imposed prior to sensor detection (500ms), the other delays GPS initialisation to start at least 2sec from power-up.

You get PH working some time, so missing delay is probably marginal. I suspect that reduced BMP085 detection delay (less 180ms to boot time) is what caused the problem. If so, extra half-second delay should put things back to perfect state.

@stronnag
Copy link
Collaborator Author

In the field. 7 successful pH cycles on boot-delay and naze32. Big grin. Changing fc to dodo for more tests.

@stronnag
Copy link
Collaborator Author

10 consecutive PH cycles on dodo with boot-delay. We're fixed. Time for another RC?

@digitalentity
Copy link
Member

Yee-haw! We finally did it!
Cleaned up lots of stuff on the way and finally tracked to mere needing a bigger boot delay.
Everybody involved, great job!

@martinbudden
Copy link
Contributor

Fantastic!

Thanks again @stronnag for all your help in testing.

An lots of good cleanup resulted from this.

And I am SO GLAD that it was not a problem with the conversion to the new IO!

@digitalentity
Copy link
Member

It's definitely time for another RC. I'll merge the following and release an RC3
Improved gyro initialisation #456
Add extra delays when cold-booting the UAV #517
Changed mag.init to return false if failed #515 (after testing)

@martinbudden
Copy link
Contributor

@digitalentity can you also merge #521 - updated targets.

It just adds some BARO and MAG #defines to some targets, adds the REVO_NANO and REV OPBL targets and makes a minor tidy to the makefile.

@bk79
Copy link

bk79 commented Aug 29, 2016

Also please merge ibus telemetry

Il 29/ago/2016 11:11, "Martin Budden" notifications@github.com ha scritto:

@digitalentity https://github.com/digitalentity can you also merge #521
#521 - updated targets.

It just adds some BARO and MAG #defines to some targets, adds the
REVO_NANO and REV OPBL targets and makes a minor tidy to the makefile.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#431 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ALSWigfjIyn7_N128DxnUJcxg16azV3cks5qkqIugaJpZM4JeUXk
.

@stronnag
Copy link
Collaborator Author

And EVO SDCARD ....

@digitalentity
Copy link
Member

@bk79 did anybody do a PR for iBus telem?
@stronnag ofcourse, forgot that one, already merged.

@hydra
Copy link
Contributor

hydra commented Aug 29, 2016

@stronnag I really appreciate all the time and test flights and willingness to put your aircraft at risk in order to help solve this problem.

Well done to everyone else involved too, of course!

@randyoo
Copy link
Contributor

randyoo commented Oct 3, 2017

I know this bug is marked closed, but I'm pretty sure I'm experiencing it (or something closely related), because engaging PH seems to always result in the aircraft flying away, but after a soft reboot is performed, PH works fine. (still gathering data, but haven't observed any occasions where PH works on hard reset, or doesn't work on soft reboot)

This is on an Omnibus clone (f4 v5pro) on 1.7.2 with a Beitian BN-880 GPS/compass. There's no blackbox capability on this FC, or I'd share a log. Willing to perform any steps that might help narrow down the cause of the issue.

EDIT: After some more research, it appears very likely that this is issue #1791 that I'm experiencing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests