A rules-based statistical algorithm (RBSA) identifies packets in any TCP connection that are client keystrokes of an ssh login. The input data of the algorithm are the packet arrival times and TCP/IP headers of the connection packets at a point along the path of the connection. The algorithm is applied to all connections seen by a network monitor; ssh port 22 connections are classified as client-keystrokes or scp file transfers, and ssh keystroke connections are discovered for all other ports. This forms a network login database that can be further analyzed for network security monitoring and forensics. One application is to an "inside" network in which the monitor sees all connections between the inside and outside. The model --- which uses the packet sizes, flags, and interarrivaltimes --- first goes through the packets identifying epochs of different activities, and then goes back and uses more detailed information for the classification. Performance from three types of packet traces is excellent. Previous work has proceeded by forming connection summary statistics from the headers and timestamps, and classifying the connection as one with keystrokes or not using the statistics. The RBSA takes on a much more ambitious task of classifying each packet as a client keystroke packet or not, but in the end the classification of the connection has extremely low false positives and false negatives. One important property of the RBSA is that it does not employ packet payload, as is done in some connection-level surveillance methods, so it cannot be defeated by an attacker through payload encryption. A second important property is that the inside network can be a large enterprise, allowing monitoring and forensics across a very large number of hosts from a single device.
Paul Kidwell
Department of Statistics
Purdue University