libuplift.datasets.pbc ====================== .. py:module:: libuplift.datasets.pbc .. autoapi-nested-parse:: The pbc datasets from R survival package. .. !! processed by numpydoc !! Functions --------- .. autoapisummary:: libuplift.datasets.pbc.fetch_pbc Module Contents --------------- .. py:function:: fetch_pbc(data_home=None, download_if_missing=True, random_state=None, shuffle=False, categ_as_strings=False, return_X_y=False, as_frame=False) Load the pbc dataset from R survival package (uplift survival). Download it if necessary. Only first 312 records with assigned treatment are kept. Following the original dataset, the edema variable is numerical but can also be treated as categorical: 0 no edema, 0.5 untreated or successfully treated, 1 edema despite diuretic therapy **Variables** chol, copper, trig, platelet contain missing data :Parameters: **data_home** : string, optional Specify another download and cache folder for the datasets. By default all scikit-learn data is stored in '~/scikit_learn_data' subfolders. **download_if_missing** : boolean, default=True If False, raise a IOError if the data is not locally available instead of trying to download the data from the source site. **random_state** : int, RandomState instance or None (default) Determines random number generation for dataset shuffling. Pass an int for reproducible output across multiple function calls. **shuffle** : bool, default=False Whether to shuffle dataset. **categ_as_strings** : bool, default=False Whether to return categorical variables as strings. **return_X_y** : boolean, default=False. If True, returns ``(data.data, data.target)`` instead of a Bunch object. **as_frame** : boolean, default=False If True features are returned as pandas DataFrame. If False features are returned as object or float array. Float array is returned if all features are floats. :Returns: **dataset** : dict-like object with the following attributes: .. **dataset.data** : numpy array Each row corresponds to the features in the dataset. **dataset.target_status** : numpy array Censoring status: 0=censored, 1=transplant, 2=dead. **dataset.target_time** : numpy array Censoring, transplant or death time. **dataset.DESCR** : string Description of the dataset. **(data, target_time, target_status)** : tuple if ``return_X_y`` is True .. !! processed by numpydoc !!