Can get explantion of the code to better understand it
---------------------------------------------------------------------------------------------------
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, accuracy_score
data = pd.read_csv('creditcard.csv')
data.head()
# "Time" column is the time difference from current transaction and first transaction in the database
# "Amount" column is the amount that was transacted
# Columns "V1" to "V28" are a result of PCA
# checking missing values
data.isna().sum()
# There are no missing values
# Normalizing the "Amount" values
data['Amount'] = (data['Amount'] - data['Amount'].min()) / (data['Amount'].max() - data['Amount'].min())
data.head()
# Checking correlation of features with target "Class"
plt.figure(figsize=(12, 6))
data.corr()['Class'].plot(kind='bar')
plt.show()
# Specifying training variables and target variables
X = data.drop(['Class'], axis=1).to_numpy()
y = data['Class'].to_numpy()
# Setting random seed
np.random.seed(0)
# Splitting the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.75)
X_train.shape, X_test.shape, y_train.shape, y_test.shape
# Creating Logistic Regression classifier and training ("max_iter=1500" ensures that the model converges to the global minima)
clf = LogisticRegression(max_iter=1500)
clf.fit(X_train, y_train)
# Model accuracy on train and test sets
preds = clf.predict(X_train)
print("Training accuracy =", accuracy_score(preds, y_train))
preds = clf.predict(X_test)
print("Testing accuracy =", accuracy_score(preds, y_test))
# The classification report including precision and recall on the testing set
print(classification_report(preds, y_test))

Question

trudie.mraz · Accepted Answer

Here is the main answer, where we will be discussing the above code and what it is doing:1) The given code is reading a CSV file named "creditcard.csv" and storing it in a Pandas Data Frame called "data". Then it is displaying the first 5 rows of the Data Frame using the head() function.2) It is then checking whether there are any missing values in the Data Frame using the isna() function and summing them.

It confirms that there are no missing values in the DataFrame.3) Then, it normalizes the values of the "Amount" column in the Data Frame. Normalization of the "Amount" column helps in getting rid of any irregularities in the data.4) A bar plot of the correlation between features and target "Class" is created using the correlation matrix of the DataFrame.5) The training and testing data are split using the train_test_split() function from sklearn.model_selection.

It returns 4 arrays, "X_train", "X_test", "y_train", and "y_test".6) A logistic regression classifier is created and fitted on the training data using the Logistic Regression() function from sklearn.linear_model.7) The predictions are generated on the training and testing data using the predict() method of the classifier. Then it is calculating and printing the training accuracy, testing accuracy and the classification report of the model. So, this is what the given code is doing in detail.

To know more about Data Frame visit:

https://brainly.com/question/32218725

#SPJ11

Answers

Related Questions