#C13588. Titanic Baseline Model Evaluation
Titanic Baseline Model Evaluation
Titanic Baseline Model Evaluation
In this problem, you are given the Titanic passenger dataset in CSV format via standard input. The dataset contains a header row followed by multiple rows. Each row includes several features, and the last column is the label 'Survived' (with a value of 0 or 1). Your task is to implement a baseline model that works as follows:
-
Split the dataset into a training set and a test set. Use the first ⌊0.8 * N⌋ rows (where N is the total number of data rows, excluding the header) as the training set, and the remaining rows as the test set.
-
In the training set, determine the most frequent value (mode) of the 'Survived' column. In the event of a tie, choose 0.
-
For every row in the test set, predict the mode as its label.
-
Compute the following metrics on the test set:
- Accuracy = (number of correct predictions) / (number of test samples).
- Precision = (number of true positives) / (number of predicted positives), with the convention that if there are no predicted positives, precision is 0.
- Recall = (number of true positives) / (number of actual positives), with the convention that if there are no actual positives, recall is 0.
- F1-score = 2 * Precision * Recall / (Precision + Recall), and if Precision + Recall is 0, then F1-score is 0.
-
Additionally, output an empty list for model coefficients (since no actual training is performed beyond the baseline prediction).
The program should read the CSV data from standard input and output the computed metrics and coefficients to standard output, each on a separate line. The output order should be: accuracy, precision, recall, f1_score, and then the coefficients (as an array). All numeric outputs must be formatted as floating point numbers with 4 decimals.
inputFormat
The input is provided via standard input and consists of CSV data. The first line contains column headers. The columns include: Pclass, Sex, Age, SibSp, Parch, Fare, Cabin, Embarked, Survived. The subsequent lines contain the data (with values separated by commas).
outputFormat
The output should be printed to standard output. It consists of 5 lines:
- Accuracy (a floating point number with 4 decimals).
- Precision (4 decimals).
- Recall (4 decimals).
- F1-score (4 decimals).
- An empty list for coefficients, printed as [].## sample
Pclass,Sex,Age,SibSp,Parch,Fare,Cabin,Embarked,Survived
3,male,22,1,0,7.25,,S,0
1,female,38,1,0,71.2833,C85,C,1
3,female,26,0,0,7.925,,S,1
1,female,35,1,0,53.1,C123,S,1
3,male,28,0,0,8.05,,Q,0
3,male,30,0,0,8.05,,S,0
0.0000
0.0000
0.0000
0.0000
[]
</p>