#C12935. CSV Cleaning by Median Replacement
CSV Cleaning by Median Replacement
CSV Cleaning by Median Replacement
You are given a CSV file via standard input. The first line of the input is a string specifying the name of the column that must be cleaned. The remaining lines represent the CSV file where the first row is the header containing column names. Some entries in the specified column may be missing (represented by an empty field). Your task is to compute the median of the non‐missing numeric values in that column and then replace every missing value in that column with the computed median. Finally, output the cleaned CSV data to standard output, preserving the header.
Note: The median is defined as follows: If there is an odd number of non‐missing entries, it is the middle value when sorted; if there is an even number, it is the average of the two middle values. The input CSV is simple and each field is separated by commas without extra quoting.
inputFormat
The input is provided via standard input (stdin) with the following format:
- The first line contains a string: the name of the column to clean.
- The second line is the header of the CSV file (comma-separated column names).
- The remaining lines each represent a row of the CSV file (comma-separated values). Missing values are represented by empty fields.
For example:
A A,B 1,10 2,20 ,30 4,40 5,50
outputFormat
Output the cleaned CSV data to standard output (stdout). The output should include the header, and for each row, if the value in the specified column was missing, it should be replaced with the median of the non‐missing numeric values from that column. The CSV structure (comma-separated fields) must be preserved.
For the sample input above, the correct output would be:
A,B 1,10 2,20 3,30 4,40 5,50## sample
A
A,B
1,10
2,20
,30
4,40
5,50
A,B
1,10
2,20
3,30
4,40
5,50
</p>