Tools to Help You Diagnose Issues in a Linux Environment

Have you ever SSH’ed into a Linux server or connected to a Docker container and asked yourself, “What the heck is going on here?” I have, countless times. It can be difficult to know where to start, or what tools you even have to diagnose an issue you’re observing from the outside. These are a handful of tools that I’ve found handy over the years when I need to diagnose issues in a Linux environment. They’re also present on pretty much any Linux-based environment, even stripped-down distros like Alpine Linux.

ps

The process status command, ps, will list running processes. This command is helpful to capture a quick snapshot of what’s running on a system. More specifically, it’s a good tool to verify whether a specific process is or isn’t running. I recommend using the arguments ps aux. This will display all processes (not just the ones you’ve launched) as well as some additional fields for each process.

top

Think of top as an interactive ps. It displays a sorted list of processes in a table format that refreshes every second. Look at top --help or man top (if manual pages are available) to see how to change which column is used for sorting the list. top also provides a summary of system-wide stats at the top, including overall CPU and memory usage. top is useful for verifying whether any process is using an inordinate amount of resources or if the system is, overall, under an unusually heavy load. I’d also recommend checking out top’s upgraded cousin, htop. This tool provides a much-improved user experience and more robust interactive controls.

netstat

netstat prints a list of various network resources. I primarily use it to verify which ports are in use. netstat has a ton of options that vary depending on your operating system. The most common way I use it is the following command that displays all TCP ports that are bound to accept incoming connections — netstat -atln | grep LIST. Note the -n option which forces netstat to display port numbers instead of service names. On most Linux systems, you can also add the -p option if you want to see which process is using the port.

lsof

lsof lists all open files on your system. This is a tool I don’t use often. However, it comes in handy if you need to verify whether a file-based resource is currently being used by a process or to verify whether a particular process has a ton of open files.

df

df prints free disk space on your system by mounted filesystem. This command is helpful when trying to see whether a filesystem is full or nearly full. df -h is my go-to version of this command because it displays “human readable” output — particularly the units for the volume sizes (e.g. MB and GB instead of bytes).

du

If you determine a filesystem is at or near capacity, du can help you figure out which directories are taking up that space. du will print disk usage stats for files and directories. I usually start at the root of the filesystem and (with admin privileges) start by running du -sh * to get a human readable summary. Then I’ll run du -s * | sort -n to get a sorted list of directories by disk usage in bytes. Finally I’ll either cd into directories and rinse/repeat, or add the -d option to increase the depth of the sorted report — du -sd 2 * | sort -n, du -sd 3 * | sort -n, etc.

free

free prints a snapshot of memory usage stats including free, unallocated memory. This command is useful when you want to determine whether all memory is used on a system and/or if a system is “thrashing” due to over-reliance on swapping ram out to disk.

watch

watch is a command that makes any other command that prints output then exits more dynamic. Specifically watch runs a command every two seconds and re-displays its output on the terminal e.g. watch free or watch df -h. This command is handy when you want to monitor the status of an aspect of the system in real-time without using an interactive tool such as top.

head / tail / cat / less

These tools are helpful when you want to peek at the contents of a file without opening it with an editor. head and tail display the beginning and end of a file, respectively — head /some/file, tail /some/file. cat will print the contents of a file to the terminal — cat /some/file. less is like an interactive cat. It allows you to navigate forward and backward through lines of a file with the j and k keys. It also lets you search for text if you press /.

grep

grep is really handy when you need to search the contents of a file, even if you don’t know which file you’re looking for. grep -r "mypattern" will recursively search all files from your current working directory and display matches for any occurrences of the example string “mypattern” (in this case).

What Tools Do You Use to Diagnose Issues in a Linux Environment?

What commands have you found useful when trying to figure out what the heck is going on in a Linux server environment?

Conversation
  • casymir says:

    journalctl, for querying log events: “journalctl –since “1 hour ago” -t sshd”

  • Comments are closed.