Can get explantion of the code to better understand it
---------------------------------------------------------------------------------------------------
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, accuracy_score
data = pd.read_csv('creditcard.csv')
data.head()
# "Time" column is the time difference from current transaction and first transaction in the database
# "Amount" column is the amount that was transacted
# Columns "V1" to "V28" are a result of PCA
# checking missing values
data.isna().sum()
# There are no missing values
# Normalizing the "Amount" values
data['Amount'] = (data['Amount'] - data['Amount'].min()) / (data['Amount'].max() - data['Amount'].min())
data.head()
# Checking correlation of features with target "Class"
plt.figure(figsize=(12, 6))
data.corr()['Class'].plot(kind='bar')
plt.show()
# Specifying training variables and target variables
X = data.drop(['Class'], axis=1).to_numpy()
y = data['Class'].to_numpy()
# Setting random seed
np.random.seed(0)
# Splitting the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.75)
X_train.shape, X_test.shape, y_train.shape, y_test.shape
# Creating Logistic Regression classifier and training ("max_iter=1500" ensures that the model converges to the global minima)
clf = LogisticRegression(max_iter=1500)
clf.fit(X_train, y_train)
# Model accuracy on train and test sets
preds = clf.predict(X_train)
print("Training accuracy =", accuracy_score(preds, y_train))
preds = clf.predict(X_test)
print("Testing accuracy =", accuracy_score(preds, y_test))
# The classification report including precision and recall on the testing set
print(classification_report(preds, y_test))

Answers

Answer 1

Here is the main answer, where we will be discussing the above code and what it is doing:1) The given code is reading a CSV file named "creditcard.csv" and storing it in a Pandas Data Frame called "data". Then it is displaying the first 5 rows of the Data Frame using the head() function.2) It is then checking whether there are any missing values in the Data Frame using the isna() function and summing them.

It confirms that there are no missing values in the DataFrame.3) Then, it normalizes the values of the "Amount" column in the Data Frame. Normalization of the "Amount" column helps in getting rid of any irregularities in the data.4) A bar plot of the correlation between features and target "Class" is created using the correlation matrix of the DataFrame.5) The training and testing data are split using the train_test_split() function from sklearn.model_selection.

It returns 4 arrays, "X_train", "X_test", "y_train", and "y_test".6) A logistic regression classifier is created and fitted on the training data using the Logistic Regression() function from sklearn.linear_model.7) The predictions are generated on the training and testing data using the predict() method of the classifier. Then it is calculating and printing the training accuracy, testing accuracy and the classification report of the model. So, this is what the given code is doing in detail.

To know more about Data Frame visit:

https://brainly.com/question/32218725

#SPJ11


Related Questions

Other Questions
GRAMMAR HELP!!!The ants go marching one-by-one, hurrah, hurrah!, said the child, while the other children proceeded to march, in turn. When the children got to the end of the song, they screamed, the end!Question 2 options:1.) Vague pronoun reference2.) Mechanical error with a quotation3.) Spelling4.) Unnecessary comma Joseph is trying to select a new couch for his living room. As he looks at each couch at the furniture store, he tries to visualize how it would look in his living room at home. Joseph is using _____ to help him determine which couch to buy Which of the following is FALSE? none of the given choices O A signal is digital if the DT signal can take on finite number of distinct values. OA signal is analog if the CT signal can take on any value in any continuous interval. 4 O A signal is digital if the DT signal can take on infinite number of distinct values. If the scaling factor is> 1, what happens to the signal in the time domain? O amplified O expanded O unchanged O compressed 13 Which axis given below is a possible axis of symmetry of an even symmetric signal? O none of the given choices O x = 2 O y = 0 Ox=y 2 Which of the following is a bounded signal? O et cos(wt) O e 2t cos (wt) O et cos(-wt) O et sin(-wt) Which of the following signals is aperiodic? O 10 sin(nt) O none of the given choices O 3ent O 6 cos (2t + ) 4 "The rooms and stables of the inn were wide;/ They made us easy, all was of the best," What did travelers find important in a hotel in Medieval times? It is different from today? Explain why 20.00 mL of 0.025 M Na2S2O3 solution is equivalent to 20.00 mL of a 4.167 mM KIO3 solution. The system below uses the Banker's algorithm for deadlock avoidance. You are given that the system has 14 devices [5] Job No. Devices Allocated Maximum Required Remaining Needs Job 1 3 6 Job 2 5 7 Job 3 0 13 Job 4 4 15 Answer the following questions: 15.1 Fill in the table for the remaining needs of the system. You are not required to redraw the table, but just to type in the numbers for the remaining needs [2] 15.2 Determine whether the system is in a safe or unsafe state. In case if you find out that it is unsafe, propose a scenario whereby the system can be changed to a safe state. If the system is in a safe state, list the sequence of requests and releases that will make it possible for all jobs to run to completion to three decmal places.) t Rivevet =13,X= State the concusion in the problem content. the polce car a greatier for mwle mankers than for temaie merteri. with the polve car is greater tor male monkep than for fewele meraey. 5tane the abprepriace neit and afternacive terpothesen. to three decimal placks. to theee decimel racin-l t= Avalise = ts thee decimel pracel. t= Pivalief = with the 4arry dog is not the same tor male and frmain marsment. Burimary of the fodrept fiplen. "Teminire" tayd, for "emaie modicys? Stare the a0srigriate nua and witemareve frpstheses. Her 72=0 H42=2 reasons why Europeans came to East Africa What is the Hypothesis Test of Proportion where the claim for a marketing campaign is that 0.65 clients respond, and you want to prove it is less. Your survey of 116 clients showed 80 respond. Test the hypothesis at a 5 % level of significance.What is the decision rule for the above? d. A student used a different calibration curve to determine the mass of sugar in a 100.0 mL sample of Gatorade to be 6.95 g. How many grams sugar would be in a 12 oz. bottle of this Gatorade? What is the range of the function in the graph Seeking help with practice problem #66. Draw both chair conformations for the following cyclohexane and indicate which one is favored. what does mVSR = equals? 2.19 Billy Bob wants to gain some weight so that he can play football. Billy eats only milkshakes and spinach. Milkshakes cost him $1 each and spinach costs $2 per serving. A milkshake has 850 calories and a serving of spinach has 200 calories. Billy Bob never spends more than $20 a day on food and he always consumes at least 8000 calories per day. Which of the following is necessarily true?(a) Billy Bob consumes at least 9 milkshakes a day.(b) Billy Bob never consumes more than 6 servings of spinach a day .(c) Billy Bob never consumes positive amounts of both goods.(d) Billy Bob consumes only milkshakes.(e) None of the above. True or False if one has access to a census for a population that covers the object of interest, its always best to use a sample of the population rather than the census to study the object of interest. A cylindrical vessel 1.20 m in diameter and 2 m high has a rounded circular orifice 5 cm in diameter in the bottom with C = 0.95. If the vessel is full of water, how long will it take to lower the surface 1.50 m? . Based on the Cenozoic Life exhibit, describe hominins, their rise to dominance, and the beginnings of civilization. Ethanol has a normal boiling point of 78.4 o C, a boiling-point constant of Kb = n1.07 K kg/mol and its vapour pressure at 292 K is 5332 Pa. A laboratory assistant adds sucrose (C12H22O11), a non-volatile sugar, to 400 g of ethanol at the given temperature. The vapour pressure of the solution is measured to be 5252 Pa. Determine the boiling point of the solution. Suppose that f(x) is continuous at x=0 and limx0+f(x)=1. Which of the following must be true? Circle all that apply. a) limx0f(x)=1 b) limx0f(x)=DNE c) f(0)=1. d) f(x) is differentiable at x=0 Given the function C(r) = (r6) (r + 7) (r - 2) its C-intercept is its r-intercepts are Question Help: Video Message instructor Calculator Submit Question