CREATEBOXPLOTDATA

The CREATEBOXPLOTDATA function takes a raw input dataset and generates the data needed as input into the BOXPLOT function.

CREATEBOXPLOTDATA returns five values for each input dataset: the minimum (excluding possible outliers), the lower quartile, the median, the upper quartile, and the maximum (excluding possible outliers). If neither outlier nor suspected outliers are calculated then the minimum and maximum returned values will be the minimum and maximum of the dataset. If outliers or suspected outliers are calculated then the minimum and maximum returned will be the smallest and largest value (respectively) in the dataset that is not included in the outlier or suspected outlier data.

Examples

Copy and paste the following code to the IDL command line to create data for use in BOXPLOT.

 

; Create an array of average speeds on two different bicycles

; to use in CREATEBOXPLOTDATA

bike_mph = [ $

   [12.2, 16.2], $

   [12.1, 16.4], $

   [10.7, 16.9], $

   [11.6, 17.0], $

   [10.2, 16.5], $

   [10.9, 16.1], $

   [11.8, 17.1], $

   [10.9, 16.0], $

   [12.4, 16.8], $

   [12.9, 16.9], $

   [13.1, 17.5], $

   [13.0, 17.4]]

;Create the data and store mean and outlier values

bpd = CREATEBOXPLOTDATA(bike_mph, MEAN_VALUES=means, OUTLIER_VALUES=outliers)

 

;Display the data created to be used in BOXPLOT

PRINT, bpd

IDL displays:

10.200000 16.000000

11.250000 16.450001

12.150000 16.900000

12.950000 17.250000

13.100000 17.500000

 

; Display the mean values created

PRINT, means

IDL displays:

11.8167 16.7333

 

; Display the outlier values created

PRINT, outliers

IDL displays:

!NULL

Syntax

result = CREATEBOXPLOTDATA(data [, IGNORE=value] [, CI_VALUES=variable] [FINITE_INDICES=variable] [, MEAN_VALUES=variable] [, OUTLIER_VALUES=variable] [, SUSPECTED_OUTLIER_VALUES=variable)

Return Value

An M x 5 element array, where M is the number of distinct datasets containing data for use in BOXPLOT. IDL creates data in the order needed for BOXPLOT: minimum, lower quartile, median, upper quartile, and maximum values.

Arguments

Data

The input data used to generate the results for the BOXPLOT function. The input data may be any of the following:

Keywords

IGNORE

Set this keyword to a value to treat as bad data and to ignore when calculating the results.

CI_VALUES

Set this keyword to a named variable to return an N-element array denoting the confidence interval value around the median for each box. These values are used for the boundaries of the notch in the BOXPLOT function, if displayed.

FINITE_INDICES

Set this keyword to a named variable to return a vector containing the indices of the datasets in which valid data was returned. This useful when your data contains NaN's or infinite values, e.g., some datasets can not be used to create the five needed values for BOXPLOT.

MEAN_VALUES

Set this keyword to a named variable to return an M-element vector containing the mean values for each input dataset.

OUTLIER_VALUES

Set this keyword to a named variable to return a 2 x N-element array containing any outliers from each input dataset. For each value [x, y], x represents the box location and y represents the value at that location.

SUSPECTED_OUTLIER_VALUES

Set this keyword to a named variable to return a 2 x N element array containing any suspected outliers from each input dataset. For each value [x, y], x represents the box location and y represents the value at that location.

Notes on CREATEBOXPLOTDATA Calculations

Values returned by CREATEBOXPLOTDATA are calculated using the conventions outlined below. Given an ordered dataset with n elements:

Version History

8.2.2 Introduced

See Also

BOXPLOT