Hello world

Mechine learning(python)

最近更新：2025-02-01 | 字数总计：1.1k | 阅读估时：5分钟 | 阅读量：次

supervised learning
unsupervised learning algorithm
Numpy
SciPy
matplotlib
pandas

Iris example

supervised learning

自动化的决策过程是从已知的示例之中泛化得出的。将成对的输入和预期输出喂给算法

unsupervised learning algorithm

只有输入数据是已知的，没有为算法提供预期的输出数据。

无论是监督学习还是无监督学习，都需要将构建良好的数据表征（feature extraction）

1	$ pip install numpy scipy matplotlib ipython scikit-learn pandas

Numpy

核心功能是 ndarray 类。要求元素类型完全相同。

scikit-learn 接受 Numpy 数组格式的数据，所以所有数据必须转换成 Numpy 数组。

1
2
3

import numpy as np
x=np.array([[1,2,3],[4,5,6]])
print("x:\n{}".format(x))

SciPy

科学计算的函数集合

scikit-learn 使用 SciPy 的函数实现算法。

sparse matrice 稀疏矩阵：数据的另一种方式。【see on the page】

matplotlib

%matplotlib inline
import matplotlib.pyplot as plt

# 在-10和10之间生成一个数列，共100个数
x = np.linspace(-10, 10, 100)
# 用正弦函数创建第二个数组
y = np.sin(x)
# plot函数绘制一个数组关于另一个数组的折线图
plt.plot(x, y, marker="x")

%matplotlib notebook和matplotlib inline命令可以直接将图像显示在浏览器中

pandas

DataFrame：一种表格数据

import pandas as pd
from IPython.display import display

# 创建关于人的简单数据集
data = {'Name': ["John", "Anna", "Peter", "Linda"],
        'Location' : ["New York", "Paris", "Berlin", "London"],
        'Age' : [24, 13, 53, 33]
       }

data_pandas = pd.DataFrame(data)
# IPython.display可以在Jupyter Notebook中打印出“美观的”DataFrame
display(data_pandas)

Iris example

已知鸢尾花的三类品种以及对于的参数。这是一个监督学习问题。

类别(class)：可能的输出

单个数据点的预期输出是这朵花的品种。

对于一个数据点而言，其品种被称为：标签(label)

# 鸢尾花数据集包含在 scikit-learn dataset 模块中，调用 load_iris 函数加载数据
from sklearn.datasets import load_iris
iris_database=load_iris()
# load_iris 返回的 iris 对象是一个 Bunch 对象，与字典非常相似，里面包含键和值：
print("Key of iris_database: \n{}".format(iris_database.keys()))
# 其中, DESCR 键对应数据集的简要说明
print("Value of DESCR Key: \n{}".format(iris_database["DESCR"][:193])+"\n")
# 其中，target_names 对应字符串数组，包含花的品种
print("Value of target_names Key: \n{}".format(iris_database["target_names"])+"\n")
# 其中，feature_names 对应字符串列表，每一个特征的说明
print("Value of feature_names Key: \n{}".format(iris_database["feature_names"])+"\n")
# 数据包含在 target 和 data 字段中。data 里面是花萼长度、花萼宽度、花瓣长度、花瓣宽度的测量数据，格式为 NumPy 数组
print("Shape of data: {}".format(iris_database['data'].shape)+"\n")
# data 数组的形状 （shape）是样本数乘以特征数。这是 scikit-learn 中的约定
print("First five rows of data:\n{}".format(iris_database['data'][:5])+"\n")
# target 数组包含的是测量过的每朵花的品种，也是一个 NumPy 数组, 每朵花对应其中一个数据
print("Target:\n{}".format(iris_database['target']))

D:\python\机器学习\.venv\Scripts\python.exe D:\python\机器学习\Iris.py 
Key of iris_database: 
dict_keys(['data', 'target', 'frame', 'target_names', 'DESCR', 'feature_names', 'filename', 'data_module'])
Value of DESCR Key: 
.. _iris_dataset:

Iris plants dataset
--------------------

**Data Set Characteristics:**

    :Number of Instances: 150 (50 in each of three classes)
    :Number of Attributes: 4 numeric, pre

Value of target_names Key: 
['setosa' 'versicolor' 'virginica']

Value of feature_names Key: 
['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']

Shape of data: (150, 4)

First five rows of data:
[[5.1 3.5 1.4 0.2]
 [4.9 3.  1.4 0.2]
 [4.7 3.2 1.3 0.2]
 [4.6 3.1 1.5 0.2]
 [5.  3.6 1.4 0.2]]

Target:
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2
 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 2 2]

进程已结束，退出代码为 0

2024-11-25 该篇文章被 Stone Ocean 归为分类: 学习