Gini Impurity and Dataset Splitting Calculation

ID: 43858

Type: Default

1000ms

256MiB

In this problem, you are given a list of integers representing a dataset and a list of integer thresholds. Your task is to compute the Gini impurity for the original dataset as well as after splitting the dataset using each threshold.

Recall that the Gini impurity is defined as:

$$Gini = 1 - \sum_{i=1}^{n} p_i^2$$

where $p_i$ is the proportion of occurrences of class $i$ in the dataset. For each threshold $t$, split the dataset into two parts:

Left subset: all elements $\leq t$.
Right subset: all elements $> t$.

Compute the Gini impurity for the original dataset and for each subset, then output the results in a JSON object. The keys of the JSON object are:

original_gini: Gini impurity of the entire dataset.
left_split_gini_t and right_split_gini_t for each threshold $t$.

inputFormat

The input consists of two lines:

The first line contains space-separated integers representing the dataset.
The second line contains space-separated integers representing the thresholds.

outputFormat

Output a JSON object (as a single line) with the calculated Gini impurity values. The JSON object should include the key original_gini for the impurity of the full dataset and for each threshold t, include keys left_split_gini_t and right_split_gini_t.

## sample

1 1 1
2

{"original_gini":0.0,"left_split_gini_2":0.0,"right_split_gini_2":0.0}

#C14231. Gini Impurity and Dataset Splitting Calculation

Gini Impurity and Dataset Splitting Calculation