Single Line | Output is displayed from all specified nodes for the named columns, based on --cols, and displayed on a single line. |
Multi Line | All output is displayed on multiple lines (similar to the top command) and sorted by the column specified with --column. This is the default mode and the default column is 1. The column and sort order can be changed dynamically with the arrow keys (if TermReadKey is installed). |
Real-Time | In single line format the output for all nodes is always shown. To insure all nodes report data,
the last seen values are reported and if any samples are older then --age (the
default is 5 seconds), a value of -1 will be displayed.
In multi-line format, all specified nodes must be reachable or they will be dropped from the list of monitored nodes when colmux first initializes. The display will refresh at the same rate as the collectl monitoring interval. |
Playback | Data is played back from collectl raw files which must be in the same directory on all nodes or in the same, single directory on the node colmux is being run on. |
As you can see in the following example using -test, the output of the collectl command is shown in the Headers section with columns 3 and 4 shown as bold characters (in fact they as displayed on the terminal in reverse video for easier identification). The individual column numbering is shown in the second section. This format can be especially helpful when a collectl command can produce dozens of columns such as with -command "-sD -P", remembering the only way to get detail data on a single line is to display it in plot format.
Tip: It is often easiest when first using the -test to not specify a column, using the default of 1, and letting the output tell you the correct number(s) to actually choose rather than trying to figure them out yourself.
colmux -command "-sn -oT" -cols 3,4 --test >>> Headers <<< # <----------Network----------> #Host Time KBIn PktIn KBOut PktOut >>> Column Numbering <<< 0 #Host 1 Time 2 KBIn 3 PktIn 4 KBOut 5 PktOut
If you're already familiar with pdsh, the parallel distributed shell utility, node names can be specified in the same way. If you're not familiar with it I highly suggest downloading it and trying it out.
With colmux you use -addr to specify one of three things:
You can also include individual nodes or multiple expressions like this: node[1-10,13,18,20-30] and include multiple prefixes and even suffixes: node[1-10,13],pre[3-4]fix.
So now that we've gotten the basics out of the way, let's get started.
Multi-Line Format in Real-Time Mode
Since everyone is familiar with the top command, let's begin with this mode since it is also more straight forward than single-line mode. Just think of top on steroids. You simply specify a collectl command and its output is sorted by any column and displayed up to the number of lines required to fill the display. Don't worry if you choose the wrong column, because you can always change it dynamically with the arrow keys if you've installed TermReadKey or by typing in the column number followed by the enter key. Since the selected column is highlighted you won't lose track of where you are.
In the following example, we've chosen to look at the slab memory usage on 1/2 dozen nodes. As you can see, like top, the current time is displayed and colmux also displays how many nodes are reporting data, which in this case all 6 are. You can also see that the slab column is highlighted. It's that simple.
colmux -addr cn[5-10] -command "-sm" -colum 5 # Thu Nov 17 07:01:41 2011 Connected: 6 of 6 # <-----------Memory-----------> #Host Free Buff Cach Inac Slab Map cn8 123G 0 42M 26M 103M 87M cn5 495G 0 58M 41M 102M 89M cn7 123G 0 41M 27M 99M 88M cn10 123G 0 158M 121M 97M 47M cn9 123G 0 158M 121M 97M 47M cn6 123G 0 41M 27M 92M 88M
colmux -addr cn[5-10] -command "-sN" -colum 4 # NETWORK STATISTICS (/sec) Thu Nov 17 07:08:13 2011 Connected: 6 of 6 #Host Num Name KBIn PktIn SizeIn MultI CmpI ErrsI KBOut PktOut SizeO CmpO ErrsO cn6 1 eth0: 0 5 54 0 0 0 1 5 239 0 0 cn10 1 eth0: 0 4 54 0 0 0 1 4 288 0 0 cn9 1 eth0: 0 4 54 0 0 0 1 4 285 0 0 cn8 1 eth0: 0 4 54 0 0 0 1 4 285 0 0 cn7 1 eth0: 0 4 54 0 0 0 1 4 285 0 0 cn5 1 eth0: 0 4 54 0 0 0 1 4 285 0 0 cn6 0 lo: 0 0 0 0 0 0 0 0 0 0 0 cn5 8 ib3: 0 0 0 0 0 0 0 0 0 0 0 cn5 7 ib2: 0 0 0 0 0 0 0 0 0 0 0 cn5 6 ib1: 0 0 0 0 0 0 0 0 0 0 0
colmux -addr cn[5-10] -command "-sZ -i:1" -column 11 # PROCESS SUMMARY (counters are /sec) Thu Nov 17 07:14:56 2011 Connected: 6 of 6 #Host PID User PR PPID THRD S VSZ RSS CP SysT UsrT Pct AccuTime RKB WKB MajF MinF Command cn5 36723 mjs 20 36722 0 R 162M 23M 0 0.11 0.47 57 0:01.82 0 0 0 42 /usr/bin/perl cn9 3075 root 20 1 0 S 9M 660K 1 0.05 0.01 6 19:41.10 0 0 0 213 irqbalance cn6 221 root 20 2 0 S 0 0 19 0.00 0.00 0 0:00.00 0 0 0 0 kintegrityd/19 cn6 220 root 20 2 0 S 0 0 18 0.00 0.00 0 0:00.00 0 0 0 0 kintegrityd/18 cn6 219 root 20 2 0 S 0 0 17 0.00 0.00 0 0:00.00 0 0 0 0 kintegrityd/17 cn6 218 root 20 2 0 S 0 0 16 0.00 0.00 0 0:00.00 0 0 0 0 kintegrityd/16
Single-Line Format in Real-Time Mode
This a very powerful mechanism but understanding when best to use it will vary by situation. Do you remember collectl's basic concept of brief mode is to let you display everything on a single line to make it easier to spot change? While colmux in multi-line mode makes it easy to sort the output, like top it can be very difficult to spot change. Remember, when looking at top resouces, the consumers can often change from cycle to cycle and the output can be difficult to watch.
Tip - if you want to look at multi-line output across a set of nodes and not have the sort field continually changing, simply sort on the hostname field!
Getting back to single line format, the thing to remember is that less is more. In other words to get the most out of this you should probably settle on one or two variables you want to examine. Let's go back to our command that displays slab memory and change -column 5 to -cols 3,5, which will let us watch cache and slab memory at the same time.
In the following example, you can see when colmux first starts out the values are set to -1. This is because the remote nodes have not yet been connected. Once they are the values start to show, column 3 data on the left side of the dsplay and column 5 data on the right.
colmux -addr cn[5-10] -command "-sm" -cols 3,5 cn5 cn6 cn7 cn8 cn9 cn10 | cn5 cn6 cn7 cn8 cn9 cn10 -1 -1 -1 -1 -1 -1 | -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 | -1 -1 -1 -1 -1 -1 59816 -1 -1 -1 -1 -1 | 108340 -1 -1 -1 -1 -1 59816 42516 43036 43416 162692 162692 | 108340 96916 105016 109356 102576 101748 59816 42516 43036 43416 162688 162692 | 108340 96916 105016 109356 102664 101740 59816 42516 43036 43416 162688 162692 | 108348 96916 105024 109356 102656 101740
In some cases, you might also be interested in the totals for each column across all the nodes and that's what -coltot is for. As a side benefit, -coltot include column names as well.
colmux -addr cn[5-10] -command "-sm -oT" -cols 4,6 -colk -coltot #Time cn5 cn6 cn7 cn8 cn9 cn10 | cn5 cn6 cn7 cn8 cn9 cn10 | Cach Slab 08:21:21 -1 -1 -1 -1 -1 -1 | -1 -1 -1 -1 -1 -1 | 0 0 08:21:22 58 41 42 42 158 158 | 105 94 102 106 100 99 | 499 606 08:21:23 58 41 42 42 158 158 | 105 94 102 106 100 99 | 499 606 08:21:24 58 41 42 42 158 158 | 105 94 102 106 100 99 | 499 606
Even though you may not easily be able to read the numbers, you can still see that this is a read test because of the high output rates (these are being reported in MB) on the servers with essentially 0 output on the clients. You can also tell servers 1 and 2 are not participating. Similarly the bulk of the infiniband input is all on the clients and minimal on the servers. It is also easy to see something is wrong with clients 1 and 3 since their input rates are all 0. You can also see erratic behavior on the servers since the numbers are not evenly balanced and this is effecting the clients as well and that they do not finish the test together.
Playback Mode
Playback mode works exactly the same way as real-time mode execpt you include the -p switch with the collectl command to instruct it to play back the data from a previously recorded raw file. Since the contents of -command are actually passed directly to collectl, there are a few things to remember:
colmux -addr cn[5-10] -command "-sm" -colum 5
colmux -addr cn[5-10] -command "-sm -p /var/log/collectl/*20110219*" -colum 5
Also keep in mind that you can use --from and --thru switches just as you'd do running collectl standalone.
The exact technique also applies to single-line format, but with respect to the playback switch format as well as --from and --thru.
updated November 21, 2011 |