There is a wealth of informaiton online in the OpenNMS Troubleshooting FAQ. Check here first, or at least review the list of questions answered in the next section.
* 1 Q: Why do I see a frightening number of Java processes/memory allocated to Java with ps or top?
* 2 Q: Why are jar_cacheXXXX.tmp files filling up my /tmp?
* 3 Q: Why do I get "can't parse argument 'RRA:AVERAGE:0.5:1:8928'"?
* 4 Q: I installed OpenNMS, and admin/admin Does Not Log Me On, Why?
* 5 Q: Tomcat won't start, complains about JAVA_HOME, why?
* 6 Q: Why do I get '"FATAL 1: IDENT authentication failed for user "postgres"'?
* 7 Q: Why does apt complain about zebra and gated?
* 8 Q: Why does OpenNMS Says My DNS Server is Down, When It Is Up?
* 9 Q: Why are some of my XML files all one line?
* 10 Q: Why Don't My Linux Servers with the UCD SNMP Agent Show Up in Performance Reports?
* 11 Q: opennms.sh status returns nothing, what's happening?
* 12 Q: Why does an RPM install hang on RedHat 8.0?
* 13 Q: Why does the webUI give me an "Unable to compile class for JSP" exception?
* 14 Q: Why do I see JDBC related Exceptions in the log files?
* 15 Q: Why do I get node level SNMP information, but no interface level information?
* 16 Q: OpenNMS stops working after about 1 hour or intermittent servlet crashes are seen in the Web GUI.
* 17 Q: How Can I Best Test My XML Files?
* 18 Q: Why Do I Get an Invalid ifIndex Error?
* 19 Q: How are node labels determined?
* 20 Q: How Do I Log Out of the webUI?
* 21 Q: I upgraded to 1.1.1. Why does "Manage/Unmanage" not work?
* 22 Q: Why doesn't the dhcpd process ever start?
* 23 Q: I can snmpwalk a device, but OpenNMS won't collect data on it, why?
* 24 Q: Why Does My Windows DHCP Server Show as Down?
* 25 Q: Why do I get opennms startup failed?
* 26 Q: Why does OpenNMS 1.2.0 fail to complete SNMP discovery and refuse to perform SNMP polling on some nodes?
* 27 Q: Looking in output.log I see lots of references to 'java.lang.Exception' that appears to be 'Caused by: org.jrobin.core.RrdException: Bad sample timestamp ..... Last update time was ....., at least one second step is required'
Next check the Troubleshooting of the Official Installation Guide. OpenNMS Troubleshooting FAQ
The OpenNMS web interface will report error messages to the user. Depending on the type of message, these may also appear in the opennms log files. To troubleshoot effectively, you must understand the OpenNMS logs, and how to browse them. Below are error messages you may experience in the web interface, and how to solve them.
The OpenNMS server cannot access the OpenNMS database. Ensure that the correct permissions are in the Postgres access configuration file.
The OpenNMS dabase does not exist, you must run the installation script to setup the SQL tables.
To troubleshoot effectively, you must understand the OpenNMS logs, and how to browse them. To emphasize use of the logs to identify problems, this section lists the ERROR and WARN messages that identify problems in the logs, and then describe the symptoms the user may experience and how to solve the problem.
2006-03-24 12:24:28,498 WARN [Capsd Suspect Pool-fiber0] IfCollector: IfCollector: Caught undeclared throwable exception when testing for protocol SNMP on host X.X.X.X
java.lang.reflect.UndeclaredThrowableException
at org.opennms.netmgt.capsd.SnmpPlugin.isProtocolSupported(SnmpPlugin.java:239)
at org.opennms.netmgt.capsd.IfCollector.probe(IfCollector.java:200)
at org.opennms.netmgt.capsd.IfCollector.run(IfCollector.java:361)
at org.opennms.netmgt.capsd.SuspectEventProcessor.run(SuspectEventProcessor.java:1264)
at org.opennms.core.concurrent.RunnableConsumerThreadPool$FiberThreadImpl.run(RunnableConsumerThreadPool.java:412)
at java.lang.Thread.run(Thread.java:595)
Caused by: java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:574)
at org.opennms.protocols.snmp.SnmpTimer.<init>(SnmpTimer.java:237)
at org.opennms.protocols.snmp.SnmpSession.<init>(SnmpSession.java:678)
at org.opennms.netmgt.capsd.SnmpPlugin.isProtocolSupported(SnmpPlugin.java:192)
OpenNMS stops working after about 1 hour or intermittent servlet crashes are seen in the Web GUI.
For enterprise users with many hosts, it may be that you actually need more memory. If you are only monitoring a dozen hosts, you can tune your settings to make better user of the resources.
Version 1.2.X ships with modified /usr/share/opennms/bin/opennms.sh
with ulimit -s 2048 changed to ulimit -s 8192 ulimit -n 10240.
In the same file, reduce the HEAP by changing JAVA_HEAP_SIZE=256
to JAVA_HEAP_SIZE=128.
In /etc/default/tomcat4 uncomment the CATALINA_OPTS="-Djava.awt.headless=true
-Xmx128M -server" line and modify the -Xmx setting to 64 MB
so that it looks like CATALINA_OPTS="-Djava.awt.headless=true
-Xmx64M -server" .
Ensure you are using Java 1.4.2 (Currently 1.4.2_10), and not a newer Java 1.5. 1.2.X works only with 1.4.
Check to make sure your swap space is available. A quick way
to check is using the top command.
Version 49.0 is also known as Java 1.5.X. Java 1.4.X is minor version 48.0. Often this is caused by starting with Java 1.5 and then switching back to Java 1.4. You need to clean out your tomcat cache files before you can continue using the web interface.
onms~#/etc/init.d/tomcat4 stop onms~#rm /var/cache/tomcat4/Standalone/localhost/opennms/* -rf onms~#/etc/init.d/tomcat4 start
collectd.log:2006-03-26 18:02:50,279 WARN [SnmpPortal--1] SnmpIfCollector: snmpReceivedPDU: (69.194.166.64) Error during interface SNMP collection for interface /69.194.166.64, SNMP error text: ErrNoSuchName
The 127.0.0.1 needs to be purged from the snmpd.conf. This was an Etch/Sid change that prevents SNMP access from remote hosts.
In /etc/default/snmpd.conf, change
SNMPDOPTS='-Lsd -Lf /dev/null -u snmp -I -smux -p /var/run/snmpd.pid 127.0.0.1'
to:
SNMPDOPTS='-Lsd -Lf /dev/null -u snmp -I -smux -p /var/run/snmpd.pid'
and restart snmpd.
Look for something like java.io.FileNotFoundException: /usr/share/OpenNMS/etc/users.xml
(Permission denied) in your output. If you see something related
to (Permission denied) then you are probably running the Tomcat front-end
under the tomcat4 user. This error is telling you that the tomcat4
user cannot access the specified file (users.xml above) and you must
manually change the permissions to resolve this problem.
bash#chown tomcat4 /usr/share/opennms/etc/users.xml
There is more information on running Tomcat as the tomcat4 user
in the installation section
<@@ref>tomcaton tomcat.
This is likely a corrupt Tomcat cache. To clear the cache and restart the GUI:
bash#/etc/init.d/tomcat4 stop
bash#rm -rf /var/cache/tomcat4/*
bash#/etc/init.d/tomcat4 start
Your devices may have many (hundreds or thousands) of interfaces due to VOIP dial peers. On Cisco devices (AS5300s) the SNMP process will timeout if the interface table is too long. SNMP views can be used to limit what SNMP interfaces are made available to ONMS. Limiting this information will allow ONMS to gather a complete (but restricted) interface table.