#C13530. Titanic Survival Logistic Regression Evaluation

    ID: 43079 Type: Default 1000ms 256MiB

Titanic Survival Logistic Regression Evaluation

Titanic Survival Logistic Regression Evaluation

This problem requires you to build and evaluate a logistic regression model for predicting the survival of Titanic passengers. The task is divided into several steps:

  1. Load the Titanic dataset from a given URL.
  2. Preprocess the data by handling missing values, encoding categorical features, and scaling numerical features. Only the following features should be retained for prediction: Pclass, Sex, Age, SibSp, Parch, Fare, and Embarked.
  3. Split the dataset into training (80%) and testing (20%) sets using stratified sampling.
  4. Build a logistic regression model with the preprocessing pipeline. The model should be built so that both fit and predict methods are available.
  5. Evaluate the model on the test set by computing the precision, recall, and F1-score. These metrics must be rounded to two decimal places and printed in the following format:</p>
    Precision: X
    Recall: Y
    F1-score: Z
      
  6. Perform a 5-fold cross-validation on the entire dataset to compute the F1 score for each fold and the mean F1 score. Print these results in the format below:

The expected output consists of exactly five lines:

Precision: X
Recall: Y
F1-score: Z
CV Scores: s1 s2 s3 s4 s5
Mean CV Score: M

where X, Y, Z, s1 to s5, and M are float values rounded to two decimal places. For the purposes of this problem, you may assume that the dataset is available at the specified URL and that its structure is fixed. In an actual contest setting, a dummy version with fixed output is acceptable as long as the structure and function signatures are maintained.

Note: The program should use standard input (stdin) and standard output (stdout), although no input is required for this problem.

inputFormat

This problem does not require any input. The program should read from stdin, which will be empty.

outputFormat

Print exactly five lines to stdout in the following format:

Precision: X Recall: Y F1-score: Z CV Scores: s1 s2 s3 s4 s5 Mean CV Score: M

where X, Y, Z are the evaluation metrics (precision, recall, and F1-score) and s1 to s5 are the F1 scores from 5-fold cross-validation, and M is their mean. All numbers must be rounded to two decimal places.## sample

Precision: 0.79

Recall: 0.66 F1-score: 0.72 CV Scores: 0.70 0.73 0.72 0.71 0.74 Mean CV Score: 0.72

</p>