Home Forum Support disappearing host

Viewing 15 posts - 1 through 15 (of 18 total)
  • Author
    Posts
  • #28407
    Ken Jamrogowicz
    Participant

    New user here. I have been watching my new monitor here for a day or two (A3 82000277 HW:108 SW:76). It seems that it periodically “disappears” my the LAN. That is to say I get “host unreachable” and no response to pings. The blackout seems to last perhaps 3-5 minutes and then it comes back for 10-15 minutes. And it repeats. While it is “disappeared” it appears that it is still sending data to the uradmonitor.com host and is indicated to be online (green light) – although I have no idea how (tunnel?)

    At first I thought that something was wrong with my network, or with the unit itself. But now I have convinced myself that this is just the way it works. Is this explained anywhere? What is the actual protocol used? What is the best way to determined when the unit is available to get JSON data? Do I just ping it until I get a response? etc.

    Or is there something really wrong with my unit?

    #28408
    uRADMonitor
    Keymaster

    hi Ken,

    This might be caused by some security settings on your router.

    Please try one of the following:
    – enable DMZ for your A3 on your router
    – connect the A3 to a different network / router to see if this happens again.

    From what you are saying it seems that the unit reboots periodically (triggered by the internal watchdog) due to network communication errors.

    Radu

    #28410
    Ken Jamrogowicz
    Participant

    I put the A3 address into the DMZ, but it is still doing the same thing.
    The router is a Verizon G1100 “Quantum FiOS” that is quite common in the US. I really have no other place to try it.
    I can send you the IP although I suspect you can find it from your end 😉

    I note that the parameter “Autoreboot” is set to 0s – is that OK? Anything else I can check (or change)?

    I am concerned that continual re-booting will eventually destroy some eeprom or similar …

    Ken

    #28411
    Ken Jamrogowicz
    Participant

    I should add that since the data is arriving at the community server OK and I can access it fine (between reboots) it seems to me that the unit is rebooting needlessly or incorrectly. How to stop it doing that?

    Ken

    #28412
    Ken Jamrogowicz
    Participant

    OK – here is another hypothesis. It looks like the unit can send data to the community server even when I cannot access it locally. I think what could cause that is if DHCP renewal is not handled correctly. My router would take the unit out of the ARP table and it would seem to disappear from the LAN. But it is still on the network and able to send data (stream it anyway). However, when the server tries to send a message back to the A3, it would get ‘destination unreachable’ from the router. Eventually, not seeing something from the server, the A3 reboots. It gets the initial DHCP information correctly and carries on until the first DHCP renewal operation where it gets kicked out of the ARP table again … and repeat.

    Is there some way to make the unit use a static IP?

    Ken

    #28414
    Wolferl
    Moderator

    Hi Ken,

    Have you tried giving your A3 a constant IP in the DHCP server of your router?

    When the uradmonitor server answers to data your device sent, it should be routed correctly, assuming you are using NAT.
    We have seen defective routers doing weird things like yours…

    Cheers,
    Wolferl

    #28415
    Ken Jamrogowicz
    Participant

    Fixing the IP to a static value has no effect. I tried it. The problem is not that the unit is jumping to different IP addresses.

    I have been watching the A3 on a packet sniffer and what I see is that the A3 simply stops responding to TCP requests. The web page page will repeat a request for SYN about 4 times and then declare the site can’t be reached. That “site” of course is on the very same LAN as my PC. This communication path does not traverse the router. There is no further communication between the A3 and the router until the unit reboots and requests a DHCP assignment – and my router always gives it the same address and refresh time of 1440 minutes.

    After watching it for while, what I see is that it runs quite normally for 10-11 minutes, updating the screen every 10 seconds. The watchdog never goes above 60 seconds. Somewhere just before 650 seconds, the A3 simply stops talking on the LAN. Since the watchdog will typically have 520 seconds on it, there is no further activity until the WD times out 520 seconds later and the unit reboots. While it is “incommunicado” on the LAN, it appears that the A3 is still sending data to the WAN side – the LED’s blink and no data is lost on the community server. This communication does not show up on the packet sniffer, so I cannot confirm 100%.

    It seems unlikely my router is defective – I have 48 devices on my LAN and the URADMonitor is the only one misbehaving. And the communications passing through the router seem to work OK.

    The shut-down of the LAN port is rather sudden but always in a predictable time frame of just under 11 minutes

    Reply from 192.168.1.69: bytes=32 time<1ms TTL=64
    Reply from 192.168.1.69: bytes=32 time<1ms TTL=64
    Reply from 192.168.1.69: bytes=32 time<1ms TTL=64
    Reply from 192.168.1.69: bytes=32 time=1ms TTL=64
    Reply from 192.168.1.69: bytes=32 time=1ms TTL=64
    Reply from 192.168.1.69: bytes=32 time<1ms TTL=64
    Reply from 192.168.1.69: bytes=32 time=1ms TTL=64
    Request timed out.
    Request timed out.
    Request timed out.
    Request timed out.
    Request timed out.
    Request timed out.
    Request timed out.
    Request timed out.
    Reply from 192.168.1.19: Destination host unreachable.

    I think it’s clear that there is some kind of problem with the A3 software. The problem is 100% repeatable.

    Ken

    #28416
    Wolferl
    Moderator

    Hi Ken,

    Thanks for your extensive test. That left not much to desire 🙂

    Just to be sure, your A3 is connected to wired LAN, not via Wifi?
    On Wifi, I have seen some routers (Fritzboxes come to mind) which can perform some sort of combination of WPA and WPA2, which can cause problems. PCs usually have no problem with that. Setting the mode to one OR the other fixed that every single time.

    One thing you need to test is the LAN cable to the A3. Can you replace it temporary?

    Cheers,
    Wolferl

    #28417
    Ken Jamrogowicz
    Participant

    PROBLEM SOLVED

    As the problem manifested itself with loss of LAN (wired) communications, that is where I was focussing my attention.

    In fact, the problem was the 9v/1A power supply included with the unit. I was using the included supply for my bench testing. It was never my intention to use the “wall wart” in the final installation.

    Powering the A3 at 24 VDC from a POE splitter has resulted in the unit running for nearly and hour now without a hiccup. So I think the problem is solved.

    My conclusion is that the tiny (1A) supply is simply inadequate and at the 11th minute the A3 was doing something that required an extra bit of power causing a momentary part-undervoltage condition.

    Usually the micro can signal an undervoltage condition (typically in /var/log/syslog). A feature request could be the addition of such an alarm – usually many voltage “near misses” are seen before the one that actually impacts operation. That certainly would have saved me a *lot* of time.

    Ken

    #28418
    Wolferl
    Moderator

    Hi Ken,

    Glad you found the culprit!

    As often with that cheap chinese wall warts, it is not low voltage that can be detected, but unstable voltage and spikes (dropouts) which cause the microcontroller to reset or latchup.

    Cheers,
    Wolferl

    #28421
    Ken Jamrogowicz
    Participant

    I tested the power supply today and found that its output voltage sags above 200 ma and it goes into full current limit “foldback” by 300 ma., despite its alleged 1 ampere rating.

    I should remind that you sold this power adapter with the A3 as suitable for use. I noticed today that uRad has reduced the selling price of the A3 by some $27. I guess this is possible due to the savings accrued from use of “cheap chinese wall warts”?

    #28422
    Wolferl
    Moderator

    Hi Ken,

    There are regular promo sales on the sensors quite often. That has nothing to do with the power supply or other problems.
    Your wall wart is defective and should be replaced by uRADMonitor. Contact user Radhoo if you like.
    Problem is, you can’t tell a better quality product from lower quality one simply by comparing the price.
    As an EE I have gone through a lot of problems like that…

    Cheers,
    Wolferl

    #28433
    Ken Jamrogowicz
    Participant

    I would like to update this subject. After discarding the power supply and switching to POE, I got a nice run of some 46 hours – and then the A3 started re-booting again. However, this time it is distinctly different:

    > the LAN activity does not stop (except for the 1 minute the reboot actually takes)
    > the run times between reboots are random – see attachment (before it was like clockwork)

    This is not desirable, but it is an improvement – since, at most, one data point is lost.

    Looking around on the system, it seems others have this issue too. For example, #8200074 rebooted some 13 times in June, #82000110 rebooted some 15 times in the last two weeks of June, #82000112 rebooted about 26 times in June. #8200012E is interesting – it reboots every single day at 7 AM.

    Its too soon to say what my situation will settle out to – it looks like it could be 10-20 times *per day* which seems a bit excessive.

    I have the A3’s address in my router’s DMZ at the moment, to remove one variable. But that does not seem to have anything to do with the problem.

    As mentioned previously – no other devices on my network are having this sort of problem AFAIK. A number of them are continuously reporting to some server, or other, like the A3 is doing.

    If there is something I can check or test to help trouble-shoot, I would be happy t o do that.

    Ken

    #28456
    Ken Jamrogowicz
    Participant

    Ok – I have been able to do some analysis of the rebooting shown above. I am running a small script that does a “curl” to the jason page on the local LAN and time-stamps it. It’s clear from this that the A3 is simply stopping to send data when requested. (see attached text file). Sometimes it stumbles and restarts. Other times it simply stops until a timeout causes a restart. The timing and duration seem to be random but one thing is certain – this has nothing to do with my router or ISP. The A3 has got an internal problem of some sort.

    Ken

    Attachments:
    #28503
    uRADMonitor
    Keymaster

    Hello Ken, thanks for the detailed observations.

    Given all the above and your other posts, it seems that there is an issue that we still don’t clearly understand. Please keep in mind that the FW running on your unit is stable and there are multiple other units using it with no reported issues. This is why I would like you do run another test, and connect your unit to a separate network . Either give it to a friend, or create a small separate network (via GSM, etc). Please connect it to this separate setup to rule out this direction. Yes, I fully understand that you have other devices using it, but it is important to understand more.

Viewing 15 posts - 1 through 15 (of 18 total)
  • You must be logged in to reply to this topic.