.. _Modules_Data_Handling: Data Handling ----------------------------------------------- Description +++++++++++ The Data Handling module facilitates the preparation of raw data for machine learning experiments using the ``DataHandler`` class. Main Class ++++++++++ .. class:: DataHandler A class for handling dataset operations including creation, enhancement, splitting, and saving images. .. _DataHandler_init: .. method:: __init__() Initializes the DataHandler. .. _DataHandler_load_dataset: .. method:: load_dataset(data: Union[tf.data.Dataset, dict, pandas.DataFrame]) Loads a dataset from the given data and stores it in the 'datasets_container' under 'complete_dataset'. :param data: The data to load. It can be: 1. A TensorFlow dataset of tuples (image, label), where image shape is (height, width, 1|3). 2. A dictionary or pandas DataFrame with 'path' and 'label' columns. :type data: Union[tf.data.Dataset, dict, pandas.DataFrame] .. _DataHandler_prepare_datasets: .. method:: prepare_datasets(dataset_names: Optional[List[str]] = None, batch_size: Optional[int] = None, shuffle_seed: Optional[int] = None, prefetch_buffer_size: int = tf.data.experimental.AUTOTUNE, repeat_num: Optional[int] = None) Prepares datasets by applying transformations and updates them in the 'datasets_container'. :param dataset_names: The names of the datasets to enhance. Can be 'complete_dataset' or any split datasets ('train_dataset', 'val_dataset', 'test_dataset'). If None, all datasets are processed. :type dataset_names: Optional[List[str]] :param batch_size: The batch size for the dataset. If None, no batching is applied. :type batch_size: Optional[int] :param shuffle_seed: The seed for shuffling. If None, no shuffling is applied. :type shuffle_seed: Optional[int] :param prefetch_buffer_size: The prefetch buffer size. Defaults to tf.data.experimental.AUTOTUNE. :type prefetch_buffer_size: int :param repeat_num: The number of times to repeat the dataset. If None, no repetition is applied. :type repeat_num: Optional[int] .. _DataHandler_split_dataset: .. method:: split_dataset(train_split: float = 0.8, val_split: float = 0.1, test_split: float = 0.1, dataset_size: Optional[int] = None) Splits 'complete_dataset' into 'train_dataset', 'val_dataset', and 'test_dataset'. Removes the 'complete_dataset' after splitting. :param train_split: Proportion of the dataset for training. Defaults to 0.8. :type train_split: float :param val_split: Proportion of the dataset for validation. Defaults to 0.1. :type val_split: float :param test_split: Proportion of the dataset for testing. Defaults to 0.1. :type test_split: float :param dataset_size: The dataset size. If None, the size is determined using the 'cardinality' method. :type dataset_size: Optional[int] .. _DataHandler_save_images: .. method:: save_images(output_dir: str, prefix: Optional[Union[str, Callable[[Any], str]]] = None, num_images: Optional[int] = None) Saves images from the dataset to a specified directory. :param output_dir: The directory to save the images. :type output_dir: str :param prefix: The prefix for the image files. If callable, it should take the label as input and return a string. If None, a default prefix is used. :type prefix: Optional[Union[str, Callable[[Any], str]]] :param num_images: The number of images to save. If None, the complete dataset is taken. :type num_images: Optional[int] .. _DataHandler_backup_datasets: .. method:: backup_datasets() Creates a backup of the current dataset container. .. _DataHandler_restore_datasets: .. method:: restore_datasets() Restores the dataset container from the backup.