A statistical technique for computer identification of outliers in multivariate data
Ram Swaroop, William R. Winter, United States. National Aeronautics and Space Administration, Flight Research Center (U.S.)
National Aeronautics and Space Administration, 1971 - Mathematics - 29 pages
A statistical technique and the necessary computer program for editing multivariate data are presented. The technique is particularly useful when large quantities of data are collected and the editing must be performed by automatic means. One task in the editing process is the identification of outliers, or observations which deviate markedly from the rest of the sample. A statistical technique, and the related computer program, for identifying the outliers in univariate data was presented in NASA TN D-5275. The current report is a multivariate analog which considers the statistical linear relationship between the variables in identifying the outliers. The program requires as inputs the number of variables, the data set, and the level of significance at which outliers are to be identified. It is assumed that the data are from a multivariate normal population and the sample size is at least two greater than the number of variables. Although the technique has been used primarily in editing biodata, the method is applicable to any multivariate data encountered in engineering and the physical sciences. An example is presented to illustrate the technique.
What people are saying - Write a review
We haven't found any reviews in the usual places.
ALPH automatic means biodata bivariate analysis covariance matrix data set DELETION DELSTAR DIAGCNAL DIAGONAL DOUBLE PRECISION DSINV DSWI ELEMENT LOC F-VALUE FALPH FLGTOT Flight Research Center FORMAT FT A3 FTABLE FUNCTION SUBPROGRAMS REQUIRED GIVEN MATRIX GIVEN SYMMETRIC POSITIVE GO TO 15 H/R and G Heart rate IDENTIFICATION OF OUTLIERS IDENTIFIED BY ASTERISKS identifying outliers identifying the outliers IFLAG INNER LOOP INTEGER INV S INV lemma level of significance LOOP S INV LOSS OF SIGNIFICANCE MATRIX LOC MATRIX USAGE CALL MEANS AND S.D. MFSD MFSD MFSD MPRD MPRD MPRD nonsingular matrix normal distribution NUMBER OF COLUMNS NUMBER OF ROWS number of variables observation vector OUTLIERS IDENTIFIED OUTLIERS IN MULTIVARIATE OUTPUT performed by automatic point labeled POSITIVE DEFINITE MATRIX RADICAND ROW-LOOP sample SIGNIFICANCE LEVEL SINV STANDARD DEVIATION statistical linear relationship statistical technique STORED COLUMNWISE SUBROUTINE SUBSCRIPT IS COMPUTED SYMMETRIC POSITIVE DEFINITE univariate data uvTA VECTORS WITH OUTLIERS XBAR Z2TS