有源代码怎么做网站,企业管理软件公司排名,口碑好网站建设报价,网站 建设可行性报告《------往期经典推荐------》
一、【100个深度学习实战项目】【链接】#xff0c;持续更新~~
二、机器学习实战专栏【链接】#xff0c;已更新31期#xff0c;欢迎关注#xff0c;持续更新中~~ 三、深度学习【Pytorch】专栏【链接】 四、【Stable Diffusion绘画系列】专…《------往期经典推荐------》
一、【100个深度学习实战项目】【链接】持续更新~~
二、机器学习实战专栏【链接】已更新31期欢迎关注持续更新中~~ 三、深度学习【Pytorch】专栏【链接】 四、【Stable Diffusion绘画系列】专栏【链接】 五、YOLOv8改进专栏【链接】持续更新中~~ 六、YOLO性能对比专栏【链接】持续更新中~
《------正文------》 目录 1.原始数据分析1.1 查看数据基本信息1.2 绘图查看数据分布 2.数据预处理2.1 数据特征编码与on-hot处理 3.模型训练与调优3.1 数据划分3.2 模型训练调优3.3 模型评估 1.原始数据分析
1.1 查看数据基本信息
#import libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd#Load Data
data pd.read_csv(/kaggle/input/brain-tumor-dataset/brain_tumor_dataset.csv)#insights from data
data.head()Tumor TypeLocationSize (cm)GradePatient AgeGender0OligodendrogliomaOccipital Lobe9.23I48Female1EpendymomaOccipital Lobe0.87II47Male2MeningiomaOccipital Lobe2.33II12Female3EpendymomaOccipital Lobe1.45III38Female4EpendymomaBrainstem6.45I35Female
data.shape(1000, 6)脑肿瘤的类型查看共5种。
data[Tumor Type].unique()array([Oligodendroglioma, Ependymoma, Meningioma, Astrocytoma,Glioblastoma], dtypeobject)data.describe()Size (cm)Patient Agecount1000.0000001000.000000mean5.22150043.519000std2.82731825.005818min0.5100001.00000025%2.76000022.00000050%5.26500043.00000075%7.69250065.000000max10.00000089.000000
#Percentage of missing values in the dataset
missing_percentage (data.isnull().sum() / len(data)) * 100
print(missing_percentage)Tumor Type 0.0
Location 0.0
Size (cm) 0.0
Grade 0.0
Patient Age 0.0
Gender 0.0
dtype: float64没有缺失数据
1.2 绘图查看数据分布
import seaborn as snsplt.figure(figsize(10, 6))
sns.histplot(data[Patient Age], bins10, kdeTrue, colorskyblue)
plt.title(Distribution of Patient Ages)
plt.xlabel(Age)
plt.ylabel(Count)
plt.grid(True)
plt.show() plt.figure(figsize(10, 6))
sns.boxplot(xTumor Type, ySize (cm), datadata, palettepastel)
plt.title(Tumor Sizes by Type)
plt.xticks(rotation45)
plt.xlabel(Tumor Type)
plt.ylabel(Size (cm))
plt.grid(True)
plt.show()
plt.figure(figsize(8, 6))
sns.countplot(xTumor Type, datadata, paletteSet3)
plt.title(Count of Tumor Types)
plt.xlabel(Tumor Type)
plt.ylabel(Count)
plt.xticks(rotation45)
plt.grid(True)
plt.show()
plt.figure(figsize(10, 6))
sns.scatterplot(xSize (cm), yPatient Age, hueTumor Type, datadata, paletteSet2, s100)
plt.title(Tumor Sizes vs. Patient Ages)
plt.xlabel(Size (cm))
plt.ylabel(Patient Age)
plt.grid(True)
plt.legend(bbox_to_anchor(1.05, 1), locupper left)
plt.show()
location_counts data[Location].value_counts()
plt.figure(figsize(8, 8))
plt.pie(location_counts, labelslocation_counts.index, autopct%1.1f%%, colorssns.color_palette(pastel))
plt.title(Distribution of Tumor Locations)
plt.axis(equal)
plt.show()2.数据预处理
2.1 数据特征编码与on-hot处理
#Data Preprocessing
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder,OneHotEncoder
import pandas as pddata[Gender] LabelEncoder().fit_transform(data[Gender]) # Encode Gender (0 for Female, 1 for Male)
data[Location] LabelEncoder().fit_transform(data[Location]) # Encode Location
data[Grade] LabelEncoder().fit_transform(data[Grade])data[Tumor Type] LabelEncoder().fit_transform(data[Tumor Type]) # Encode Tumor Typecolumns [Gender,Location,Grade]
enc OneHotEncoder()
# 将[Gender,Location,Grade]这3列进行独热编码
new_data enc.fit_transform(data[columns]).toarray()new_data.shape(1000, 12)data.head()Tumor TypeLocationSize (cm)GradePatient AgeGender0439.2304801130.8714712332.3311203131.4523804106.450350
from sklearn.preprocessing import StandardScaler
# 1、实例化一个转换器类
transfer StandardScaler()
# 2、调用fit_transform
data[[Size (cm),Patient Age]] transfer.fit_transform(data[[Size (cm),Patient Age]])old_data data[[Tumor Type,Size (cm),Patient Age]]old_data.head()one_hot_data pd.DataFrame(new_data)one_hot_data.head()0123456789101101.00.00.00.00.01.00.00.01.00.00.00.010.01.00.00.00.01.00.00.00.01.00.00.021.00.00.00.00.01.00.00.00.01.00.00.031.00.00.00.00.01.00.00.00.00.01.00.041.00.01.00.00.00.00.00.01.00.00.00.0
final_data pd.concat([old_data, one_hot_data], axis1)final_data.head()Tumor TypeSize (cm)Patient Age01234567891011041.4184840.1792881.00.00.00.00.01.00.00.01.00.00.00.011-1.5398610.1392770.01.00.00.00.01.00.00.00.01.00.00.023-1.023212-1.2610971.00.00.00.00.01.00.00.00.01.00.00.031-1.334617-0.2208191.00.00.00.00.01.00.00.00.00.01.00.0410.434728-0.3408511.00.01.00.00.00.00.00.01.00.00.00.0
final_data.info()class pandas.core.frame.DataFrame
RangeIndex: 1000 entries, 0 to 999
Data columns (total 15 columns):# Column Non-Null Count Dtype
--- ------ -------------- ----- 0 Tumor Type 1000 non-null int64 1 Size (cm) 1000 non-null float642 Patient Age 1000 non-null float643 0 1000 non-null float644 1 1000 non-null float645 2 1000 non-null float646 3 1000 non-null float647 4 1000 non-null float648 5 1000 non-null float649 6 1000 non-null float6410 7 1000 non-null float6411 8 1000 non-null float6412 9 1000 non-null float6413 10 1000 non-null float6414 11 1000 non-null float64
dtypes: float64(14), int64(1)
memory usage: 117.3 KB3.模型训练与调优
3.1 数据划分
# Defining features and target
X final_data.iloc[:,1:].values
y final_data[Tumor Type].values # Example target variable# Splitting data into training and testing sets
X_train, X_test, y_train, y_test train_test_split(X, y, test_size0.2, random_state42)X_train.shape(800, 14)3.2 模型训练调优
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import classification_report, accuracy_score, confusion_matrix
import matplotlib.pyplot as plt
from sklearn.model_selection import GridSearchCVparam_grid {C: [0.1, 1, 10, 100],kernel: [linear, poly, rbf, sigmoid],degree: [3, 5] # 仅对多项式核有效
}
grid_search GridSearchCV(SVC(random_state42), param_grid, cv5, n_jobs-1)
grid_search.fit(X_train, y_train)
best_params grid_search.best_params_
print(Best Parameters from Grid Search:)
print(best_params)Best Parameters from Grid Search:
{C: 0.1, degree: 3, kernel: linear}3.3 模型评估
best_model grid_search.best_estimator_
y_pred best_model.predict(X_test)
print(Best Model Classification Report:)
print(classification_report(y_test, y_pred))
# Print Confusion Matrix
print(confusion_matrix(y_test, y_pred))好了这篇文章就介绍到这里如果对你有帮助感谢点赞关注