dask_ml.preprocessing.PolynomialFeatures

`dask_ml.preprocessing`.PolynomialFeatures¶

class dask_ml.preprocessing.PolynomialFeatures(degree: int = 2, interaction_only: bool = False, include_bias: bool = True, preserve_dataframe: bool = False)¶

生成多项式和交互特征。

生成一个新的特征矩阵，其中包含小于或等于指定阶数的所有多项式特征组合。例如，如果输入样本是二维的，形式为 [a, b]，则 2 阶多项式特征为 [1, a, b, a^2, ab, b^2]。

在用户指南中阅读更多内容。

参数

degreeint 或 tuple (min_degree, max_degree)，默认值=2

如果给定单个 int，则指定多项式特征的最大阶数。如果传递元组 (min_degree, max_degree)，则 min_degree 是生成的特征的最小多项式阶数，max_degree 是最大多项式阶数。请注意，min_degree=0 和 min_degree=1 是等价的，因为输出零阶项取决于 include_bias。

interaction_onlybool，默认值=False

如果为 True，则只生成交互特征：这些特征是最多 degree 个不同输入特征的乘积，即排除同一输入特征的 2 次或更高次幂项

包含：x[0], x[1], x[0] * x[1] 等
排除：x[0] ** 2, x[0] ** 2 * x[1] 等

include_biasbool，默认值=True

如果为 True (默认值)，则包含一个偏置列，该特征中所有多项式次幂均为零（即全一列 - 在线性模型中充当截距项）。

order{‘C’, ‘F’}，默认值=’C’

密集情况下的输出数组顺序。‘F’ 顺序计算速度更快，但可能会减慢后续估计器的速度。

0.21 版本新增。

preserve_dataframeboolean

如果为 True，则在转换后保留 pandas 和 dask dataframes。使用 False (默认值) 则返回 numpy 或 dask 数组，并模仿 sklearn 的默认行为

属性

powers_形状为 (n_output_features_, n_features_in_) 的 ndarray: 输出中每个输入的指数。
n_features_in_int: 在 fit 过程中看到的特征数量。

0.24 版本新增。
feature_names_in_形状为 (n_features_in_,) 的 ndarray: 在 fit 过程中看到的特征名称。仅当 X 的特征名称全部为字符串时定义。

1.0 版本新增。
n_output_features_int: 多项式输出特征的总数。通过遍历所有适当大小的输入特征组合来计算输出特征的数量。

另请参阅

SplineTransformer: 为特征生成单变量 B 样条基的转换器。

注意

请注意，输出数组中的特征数量与输入数组的特征数量呈多项式关系，与阶数呈指数关系。高阶数可能导致过拟合。

请参阅 examples/linear_model/plot_polynomial_interpolation.py

示例

>>> import numpy as np
>>> from sklearn.preprocessing import PolynomialFeatures
>>> X = np.arange(6).reshape(3, 2)
>>> X
array([[0, 1],
       [2, 3],
       [4, 5]])
>>> poly = PolynomialFeatures(2)
>>> poly.fit_transform(X)
array([[ 1.,  0.,  1.,  0.,  0.,  1.],
       [ 1.,  2.,  3.,  4.,  6.,  9.],
       [ 1.,  4.,  5., 16., 20., 25.]])
>>> poly = PolynomialFeatures(interaction_only=True)
>>> poly.fit_transform(X)
array([[ 1.,  0.,  1.,  0.],
       [ 1.,  2.,  3.,  6.],
       [ 1.,  4.,  5., 20.]])

方法

`fit`(X[, y])	计算输出特征的数量。
`fit_transform`(X[, y])	拟合数据，然后对其进行转换。
`get_feature_names_out`([input_features])	获取用于转换的输出特征名称。
`get_metadata_routing`()	获取此对象的元数据路由。
`get_params`([deep])	获取此估计器的参数。
`set_output`(*[, transform])	设置输出容器。
`set_params`(**params)	设置此估计器的参数。
`transform`(X[, y])	将数据转换为多项式特征。

__init__(degree: int = 2, interaction_only: bool = False, include_bias: bool = True, preserve_dataframe: bool = False)¶

dask_ml.preprocessing.LabelEncoder

dask_ml.preprocessing.BlockTransformer

dask_ml.preprocessing.PolynomialFeatures

dask_ml.preprocessing.PolynomialFeatures¶

`dask_ml.preprocessing`.PolynomialFeatures¶