Datasets
Dataset Helper
- features_labels_from_data(X, y, train_size=None, test_size=None, n_features=None, *, use_pca=False, return_bunch=False)[source]
This script splits a dataset according to the required train size, test size and number of features
- Parameters
X – raw data from dataset
y – labels from dataset
test_size – float or int, default=None If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the test split. If int, represents the absolute number of test samples. If None, the value is set to the complement of the train size. If
train_size
is also None, it will be set to 0.25.train_size – float or int, default=None If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the train split. If int, represents the absolute number of train samples. If None, the value is automatically set to the complement of the test size.
n_features – number of desired features
use_pca – whether to use PCA for dimensionality reduction or not default False
return_bunch – whether to return a
sklearn.Bunch
(similar to a dictionary) or notReturns – Preprocessed dataset as available in sklearn
- label_to_class_name(predicted_labels, classes)[source]
Helper converts labels (numeric) to class name (string)
- Parameters
predicted_labels (numpy.ndarray) – Nx1 array
classes (dict or list) – a mapping form label (numeric) to class name (str)
- Return type
List
[str
]- Returns
list of predicted class names of each datum
Example
>>> classes = ['sepal length (cm)', >>> 'sepal width (cm)', >>> 'petal length (cm)', >>> 'petal width (cm)'] >>> predicted_labels = [0, 2, 1, 2, 0] >>> print(label_to_class_name(predicted_labels, classes))
Breast Cancer
- load_breast_cancer(train_size=None, test_size=None, n_features=None, *, use_pca=False, return_bunch=False)[source]
This script loads breast cancer dataset from sklearn and splits it according to the required train size, test size and number of features
- Parameters
test_size – float or int, default=None If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the test split. If int, represents the absolute number of test samples. If None, the value is set to the complement of the train size. If
train_size
is also None, it will be set to 0.25.train_size – float or int, default=None If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the train split. If int, represents the absolute number of train samples. If None, the value is automatically set to the complement of the test size.
n_features – number of desired features
use_pca – whether to use PCA for dimensionality reduction or not default False
return_bunch – whether to return a
Bunch
(similar to a dictionary) or notReturns – Breast Cancer dataset as available in sklearn
Iris
- load_iris(train_size=None, test_size=None, n_features=None, *, use_pca=False, return_bunch=False)[source]
This script loads iris dataset from sklearn and splits it according to the required train size, test size and number of features
- Parameters
test_size – float or int, default=None If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the test split. If int, represents the absolute number of test samples. If None, the value is set to the complement of the train size. If
train_size
is also None, it will be set to 0.25.train_size – float or int, default=None If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the train split. If int, represents the absolute number of train samples. If None, the value is automatically set to the complement of the test size.
n_features – number of desired features
use_pca – whether to use PCA for dimensionality reduction or not default False
return_bunch – whether to return a
Bunch
(similar to a dictionary) or notReturns – Iris dataset as available in sklearn
Wine
- load_wine(train_size=None, test_size=None, n_features=None, *, use_pca=False, return_bunch=False)[source]
This script loads wine dataset from sklearn and splits it according to the required train size, test size and number of features
- Parameters
test_size – float or int, default=None If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the test split. If int, represents the absolute number of test samples. If None, the value is set to the complement of the train size. If
train_size
is also None, it will be set to 0.25.train_size – float or int, default=None If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the train split. If int, represents the absolute number of train samples. If None, the value is automatically set to the complement of the test size.
n_features – number of desired features
use_pca – whether to use PCA for dimensionality reduction or not default False
return_bunch – whether to return a
Bunch
(similar to a dictionary) or notReturns – Wine dataset as available in sklearn