#C14231. Gini Impurity and Dataset Splitting Calculation
Gini Impurity and Dataset Splitting Calculation
Gini Impurity and Dataset Splitting Calculation
In this problem, you are given a list of integers representing a dataset and a list of integer thresholds. Your task is to compute the Gini impurity for the original dataset as well as after splitting the dataset using each threshold.
Recall that the Gini impurity is defined as:
$$Gini = 1 - \sum_{i=1}^{n} p_i^2$$
where \(p_i\) is the proportion of occurrences of class \(i\) in the dataset. For each threshold \(t\), split the dataset into two parts:
- Left subset: all elements \(\leq t\).
- Right subset: all elements \(> t\).
Compute the Gini impurity for the original dataset and for each subset, then output the results in a JSON object. The keys of the JSON object are:
original_gini
: Gini impurity of the entire dataset.left_split_gini_t
andright_split_gini_t
for each threshold \(t\).
inputFormat
The input consists of two lines:
- The first line contains space-separated integers representing the dataset.
- The second line contains space-separated integers representing the thresholds.
outputFormat
Output a JSON object (as a single line) with the calculated Gini impurity values. The JSON object should include the key original_gini
for the impurity of the full dataset and for each threshold t
, include keys left_split_gini_t
and right_split_gini_t
.
1 1 1
2
{"original_gini":0.0,"left_split_gini_2":0.0,"right_split_gini_2":0.0}