-
Notifications
You must be signed in to change notification settings - Fork 783
Large datasets: support for Dask arrays? #249
Copy link
Copy link
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Hi, I tried training a ExplainableBoostingRegressor using Dask arrays, but I keep running into the following issue:
ERROR:interpret.utils.all:Could not unify data of type: <class 'dask.array.core.Array'>
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-53-b5e1d33190e1> in <module>
1 model = create_model()
2
----> 3 fold_models, train_preds, valid_preds = model_cv(model)
<ipython-input-48-178d0374bff6> in model_cv(model)
17
---> 18 fold_model = sklearn.clone(model).fit(x_train_fold, y_train_fold)
19
/opt/anaconda/envs/ebm/lib/python3.8/site-packages/interpret/glassbox/ebm/ebm.py in fit(self, X, y)
744 # TODO: PK don't overwrite self.feature_names here (scikit-learn rules), and it's also confusing to
745 # user to have their fields overwritten. Use feature_names_out_ or something similar
--> 746 X, y, self.feature_names, _ = unify_data(
747 X, y, self.feature_names, self.feature_types, missing_data_allowed=False
748 )
/opt/anaconda/envs/ebm/lib/python3.8/site-packages/interpret/utils/all.py in unify_data(data, labels, feature_names, feature_types, missing_data_allowed)
325 msg = "Could not unify data of type: {0}".format(type(data))
326 log.error(msg)
--> 327 raise ValueError(msg)
328
329 new_labels = unify_vector(labels)
ValueError: Could not unify data of type: <class 'dask.array.core.Array'>Each of my folds is a 2D array consisting of 56 features and occupying ~16GB of memory.
Passing model.fit(X.compute(), y.compute() crashes memory after some time, probably because of Joblib copying data around unnecessarily.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request