Next Previous Contents

7. Troubleshooting

There is a wealth of informaiton online in the OpenNMS Troubleshooting FAQ. Check here first, or at least review the list of questions answered in the next section.

7.1 Questions Answered in the Online FAQ

* 1 Q: Why do I see a frightening number of Java processes/memory allocated to Java with ps or top?

* 2 Q: Why are jar_cacheXXXX.tmp files filling up my /tmp?

* 3 Q: Why do I get "can't parse argument 'RRA:AVERAGE:0.5:1:8928'"?

* 4 Q: I installed OpenNMS, and admin/admin Does Not Log Me On, Why?

* 5 Q: Tomcat won't start, complains about JAVA_HOME, why?

* 6 Q: Why do I get '"FATAL 1: IDENT authentication failed for user "postgres"'?

* 7 Q: Why does apt complain about zebra and gated?

* 8 Q: Why does OpenNMS Says My DNS Server is Down, When It Is Up?

* 9 Q: Why are some of my XML files all one line?

* 10 Q: Why Don't My Linux Servers with the UCD SNMP Agent Show Up in Performance Reports?

* 11 Q: opennms.sh status returns nothing, what's happening?

* 12 Q: Why does an RPM install hang on RedHat 8.0?

* 13 Q: Why does the webUI give me an "Unable to compile class for JSP" exception?

* 14 Q: Why do I see JDBC related Exceptions in the log files?

* 15 Q: Why do I get node level SNMP information, but no interface level information?

* 16 Q: OpenNMS stops working after about 1 hour or intermittent servlet crashes are seen in the Web GUI.

* 17 Q: How Can I Best Test My XML Files?

* 18 Q: Why Do I Get an Invalid ifIndex Error?

* 19 Q: How are node labels determined?

* 20 Q: How Do I Log Out of the webUI?

* 21 Q: I upgraded to 1.1.1. Why does "Manage/Unmanage" not work?

* 22 Q: Why doesn't the dhcpd process ever start?

* 23 Q: I can snmpwalk a device, but OpenNMS won't collect data on it, why?

* 24 Q: Why Does My Windows DHCP Server Show as Down?

* 25 Q: Why do I get opennms startup failed?

* 26 Q: Why does OpenNMS 1.2.0 fail to complete SNMP discovery and refuse to perform SNMP polling on some nodes?

* 27 Q: Looking in output.log I see lots of references to 'java.lang.Exception' that appears to be 'Caused by: org.jrobin.core.RrdException: Bad sample timestamp ..... Last update time was ....., at least one second step is required'

Next check the Troubleshooting of the Official Installation Guide. OpenNMS Troubleshooting FAQ

7.2 Web Interface Messages

The OpenNMS web interface will report error messages to the user. Depending on the type of message, these may also appear in the opennms log files. To troubleshoot effectively, you must understand the OpenNMS logs, and how to browse them. Below are error messages you may experience in the web interface, and how to solve them.

A connection error has occurred: FATAL: IDENT authentication failed for user "opennms"

The OpenNMS server cannot access the OpenNMS database. Ensure that the correct permissions are in the Postgres access configuration file.


Backend start-up failed: FATAL: database "opennms" does not exist

The OpenNMS dabase does not exist, you must run the installation script to setup the SQL tables.

7.3 OpenNMS Log Messages

To troubleshoot effectively, you must understand the OpenNMS logs, and how to browse them. To emphasize use of the logs to identify problems, this section lists the ERROR and WARN messages that identify problems in the logs, and then describe the symptoms the user may experience and how to solve the problem.

7.4 [Capsd Suspect Pool-fiber0] IfCollector: IfCollector: Caught undeclared throwable exception when testing for protocol

7.5 Caused by: java.lang.OutOfMemoryError: unable to create new native thread

Detailed Log Excerpt

2006-03-24 12:24:28,498 WARN  [Capsd Suspect Pool-fiber0] IfCollector: IfCollector: Caught undeclared throwable exception when testing for protocol SNMP on host X.X.X.X
java.lang.reflect.UndeclaredThrowableException
        at org.opennms.netmgt.capsd.SnmpPlugin.isProtocolSupported(SnmpPlugin.java:239)
        at org.opennms.netmgt.capsd.IfCollector.probe(IfCollector.java:200)
        at org.opennms.netmgt.capsd.IfCollector.run(IfCollector.java:361)
        at org.opennms.netmgt.capsd.SuspectEventProcessor.run(SuspectEventProcessor.java:1264)
        at org.opennms.core.concurrent.RunnableConsumerThreadPool$FiberThreadImpl.run(RunnableConsumerThreadPool.java:412)
        at java.lang.Thread.run(Thread.java:595)
Caused by: java.lang.OutOfMemoryError: unable to create new native thread
        at java.lang.Thread.start0(Native Method)
        at java.lang.Thread.start(Thread.java:574)
        at org.opennms.protocols.snmp.SnmpTimer.<init>(SnmpTimer.java:237)
        at org.opennms.protocols.snmp.SnmpSession.<init>(SnmpSession.java:678)
        at org.opennms.netmgt.capsd.SnmpPlugin.isProtocolSupported(SnmpPlugin.java:192)

OpenNMS.org FAQ:

OpenNMS stops working after about 1 hour or intermittent servlet crashes are seen in the Web GUI.

Debian Solution Notes:

For enterprise users with many hosts, it may be that you actually need more memory. If you are only monitoring a dozen hosts, you can tune your settings to make better user of the resources.

Version 1.2.X ships with modified /usr/share/opennms/bin/opennms.sh with ulimit -s 2048 changed to ulimit -s 8192 ulimit -n 10240.

In the same file, reduce the HEAP by changing JAVA_HEAP_SIZE=256 to JAVA_HEAP_SIZE=128.

In /etc/default/tomcat4 uncomment the CATALINA_OPTS="-Djava.awt.headless=true -Xmx128M -server" line and modify the -Xmx setting to 64 MB so that it looks like CATALINA_OPTS="-Djava.awt.headless=true -Xmx64M -server" .

Common Mistakes

Ensure you are using Java 1.4.2 (Currently 1.4.2_10), and not a newer Java 1.5. 1.2.X works only with 1.4.

Check to make sure your swap space is available. A quick way to check is using the top command.

7.6 HTTP Status 500 - java.lang.UnsupportedClassVersionError: org/apache/jsp/index_jsp (Unsupported major.minor version 49.0)

Version 49.0 is also known as Java 1.5.X. Java 1.4.X is minor version 48.0. Often this is caused by starting with Java 1.5 and then switching back to Java 1.4. You need to clean out your tomcat cache files before you can continue using the web interface.

Debian Solution Notes:

onms~#/etc/init.d/tomcat4 stop
onms~#rm /var/cache/tomcat4/Standalone/localhost/opennms/* -rf
onms~#/etc/init.d/tomcat4 start

7.7 WARN [SnmpPortal--1] SnmpIfCollector: snmpReceivedPDU: (69.194.166.64) Error during interface SNMP collection for interface /69.194.166.64, SNMP error text: ErrNoSuchName

collectd.log:2006-03-26 18:02:50,279 WARN [SnmpPortal--1] SnmpIfCollector: snmpReceivedPDU: (69.194.166.64) Error during interface SNMP collection for interface /69.194.166.64, SNMP error text: ErrNoSuchName 

The 127.0.0.1 needs to be purged from the snmpd.conf. This was an Etch/Sid change that prevents SNMP access from remote hosts.

Debian Solution Notes:

In /etc/default/snmpd.conf, change

SNMPDOPTS='-Lsd -Lf /dev/null -u snmp -I -smux -p /var/run/snmpd.pid 127.0.0.1' 

to:

SNMPDOPTS='-Lsd -Lf /dev/null -u snmp -I -smux -p /var/run/snmpd.pid' 

and restart snmpd.

7.8 Tomcat HTTP Status 500 Error

Look for something like java.io.FileNotFoundException: /usr/share/OpenNMS/etc/users.xml (Permission denied) in your output. If you see something related to (Permission denied) then you are probably running the Tomcat front-end under the tomcat4 user. This error is telling you that the tomcat4 user cannot access the specified file (users.xml above) and you must manually change the permissions to resolve this problem.

bash#chown tomcat4 /usr/share/opennms/etc/users.xml

There is more information on running Tomcat as the tomcat4 user in the installation section <@@ref>tomcaton tomcat.

7.9 Interface Anomalies like strange characters or response time graphs not displaying properly in the GUI

This is likely a corrupt Tomcat cache. To clear the cache and restart the GUI:

  1. Stop Tomcat
    bash#/etc/init.d/tomcat4 stop
    
     
    
  2. Clear the cache
    bash#rm -rf /var/cache/tomcat4/*
    
     
    
  3. Restart Tomcat
    bash#/etc/init.d/tomcat4 start
    
     
    

7.10 You have VOIP dial peers and OpenNMS 'List All Nodes' displays after more than 5 minutes

Your devices may have many (hundreds or thousands) of interfaces due to VOIP dial peers. On Cisco devices (AS5300s) the SNMP process will timeout if the interface table is too long. SNMP views can be used to limit what SNMP interfaces are made available to ONMS. Limiting this information will allow ONMS to gather a complete (but restricted) interface table.


Next Previous Contents