Skip to content

Alternative scoring metrics #965

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 25 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
7549f76
Fix typos in the Customer Segmentation case study (#957)
KuanHaoHuang Mar 21, 2025
f231d61
update datasets host url for notebooks (#959)
kgao Mar 28, 2025
47c0134
Copy everything from the other branch, make sure to use a --sign-off …
carl-offerfit Mar 31, 2025
214c510
Fix: No default y,t scoring so it defaults to the models built in
carl-offerfit Apr 2, 2025
3881da5
Switch to passing non sklearn scoring as a function argument
carl-offerfit May 1, 2025
395fec7
Fix imperative nature of docstring
carl-offerfit May 1, 2025
d9813fe
Update tests to use pearsonr function in the test
carl-offerfit May 1, 2025
4a4f430
Fix for naming of return result in test
carl-offerfit May 1, 2025
a28827b
Cleaner handling of the scorer name when it is a function
carl-offerfit May 1, 2025
3c5c233
Correct the docstring to include the other alternatives
carl-offerfit May 1, 2025
71a7a6d
Fix type hints to be compatible with earlier Python versions
carl-offerfit May 15, 2025
2d26998
Fix for if a scoring function is partial to np.array vs. np.ndarray (…
carl-offerfit May 15, 2025
1028c72
Add comment
carl-offerfit May 15, 2025
a309559
Adding tests of score function validation
carl-offerfit May 15, 2025
3946e38
Fix the tests to pass
carl-offerfit May 15, 2025
992eca0
Add sensitivity analysis methods (#967)
fverac May 22, 2025
01b8115
Remove the validation of the score function.
Jun 9, 2025
28c12bc
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jun 9, 2025
52e5099
minor readme fix (#974)
fverac May 28, 2025
5d485e3
Bugfix rscorer discrete outcome treatment (#977)
maartenvanhooftds Jun 7, 2025
bd85963
Fix bug in else condition on application of squeeze
Jun 11, 2025
7e7b306
add Validation docs (#975)
fverac Jun 12, 2025
b2eb70f
allow missing values in X when inferencing for specific ests (#982)
fverac Jun 25, 2025
0164ae6
ENH: _check_sample_weight dtype made kwarg in sklearn 1.7.x (#980)
atharva-novi Jul 3, 2025
149b10c
Revert the change to pyproject.toml for local unit test runs
Jul 7, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -357,9 +357,9 @@ lb, ub = est.effect_interval(X_test, alpha=0.05) # OLS confidence intervals
```Python
from econml.iv.dml import NonParamDMLIV

est = NonParamDMLIV(projection=False,
discrete_treatment=True,
discrete_instrument=True)
est = NonParamDMLIV(discrete_treatment=True,
discrete_instrument=True,
model_final=RandomForestRegressor())
est.fit(Y, T, Z=Z, X=X, W=W) # no analytical confidence interval available
treatment_effects = est.effect(X_test)
```
Expand Down
6 changes: 6 additions & 0 deletions doc/spec/references.rst
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,12 @@ References
Two-Stage Estimation with a High-Dimensional Second Stage.
2018.

.. [Chernozhukov2022]
V. Chernozhukov, C. Cinelli, N. Kallus, W. Newey, A. Sharma, and V. Syrgkanis.
Long Story Short: Omitted Variable Bias in Causal Machine Learning.
*NBER Working Paper No. 30302*, 2022.
URL https://www.nber.org/papers/w30302.

.. [Hartford2017]
Jason Hartford, Greg Lewis, Kevin Leyton-Brown, and Matt Taddy.
Deep IV: A flexible approach for counterfactual prediction.
Expand Down
1 change: 1 addition & 0 deletions doc/spec/spec.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ EconML User Guide
estimation_dynamic
inference
model_selection
validation
interpretability
federated_learning
references
Expand Down
68 changes: 68 additions & 0 deletions doc/spec/validation.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
Validation
======================

Validating causal estimates is inherently challenging, as the true counterfactual outcome for a given treatment is
unobservable. However, there are several checks and tools available in EconML to help assess the credibility of causal
estimates.


Sensitivity Analysis
---------------------

For many EconML estimators, unobserved confounding can lead to biased causal estimates.
Moreover, it is impossible to prove the absence of unobserved confounders.
This is a fundamental problem for observational causal inference.

To mitigate this problem, EconML provides a suite of sensitivity analysis tools,
based on [Chernozhukov2022]_,
to assess the robustness of causal estimates to unobserved confounding.

Specifically, select estimators (subclasses of :class:`.DML` and :class:`.DRLearner`)
have access to ``sensitivity_analysis``, ``robustness_value``, and ``sensitivity_summary`` methods.

``sensitivity_analysis`` provides an updated confidence interval for the ATE based on a specified level of unobserved confounding.


``robustness_value`` computes the minimum level of unobserved confounding required
so that confidence intervals around the ATE would begin to include the given point (0 by default).


``sensitivity_summary`` provides a summary of the the two above methods.

DRTester
----------------

EconML provides the :class:`.DRTester` class, which implements Best Linear Predictor (BLP), calibration r-squared,
and uplift modeling methods for validation.

See an example notebook `here <https://github.com/py-why/EconML/blob/main/notebooks/CATE%20validation.ipynb>`__.

Scoring
-------

Many EconML estimators implement a ``.score`` method to evaluate the goodness-of-fit of the final model. While it may be
difficult to make direct sense of results from ``.score``, EconML offers the :class:`RScorer` class to facilitate model
selection based on scoring.

:class:`RScorer` enables comparison and selection among different causal models.

See an example notebook `here
<https://github.com/py-why/EconML/blob/main/notebooks/Causal%20Model%20Selection%20with%20the%20RScorer.ipynb>`__.

Confidence Intervals and Inference
----------------------------------

Most EconML estimators allow for inference, including standard errors, confidence intervals, and p-values for
estimated effects. A common validation approach is to check whether the p-values are below a chosen significance level
(e.g., 0.05). If not, the null hypothesis that the causal effect is zero cannot be rejected.

**Note:** Inference results are only valid if the model specification is correct. For example, if a linear model is used
but the true data-generating process is nonlinear, the inference may not be reliable. It is generally not possible to
guarantee correct specification, so p-value inspection should be considered a surface-level check.

DoWhy Refutation Tests
----------------------

The DoWhy library, which complements EconML, includes several refutation tests for validating causal estimates. These
tests work by comparing the original causal estimate to estimates obtained from perturbed versions of the data, helping
to assess the robustness of causal conclusions.
7 changes: 6 additions & 1 deletion econml/_cate_estimator.py
Original file line number Diff line number Diff line change
Expand Up @@ -880,7 +880,12 @@ def _postfit(self, Y, T, *args, **kwargs):
self._set_transformed_treatment_names()

def _expand_treatments(self, X=None, *Ts, transform=True):
X, *Ts = check_input_arrays(X, *Ts)
if 'X' in self._gen_allowed_missing_vars():
force_all_finite = 'allow-nan'
else:
force_all_finite = False
X, = check_input_arrays(X, force_all_finite=force_all_finite)
Ts = check_input_arrays(*Ts)
n_rows = 1 if X is None else shape(X)[0]
outTs = []
for T in Ts:
Expand Down
56 changes: 48 additions & 8 deletions econml/_ortho_learner.py
Original file line number Diff line number Diff line change
Expand Up @@ -990,7 +990,11 @@ def _fit_final(self, Y, T, X=None, W=None, Z=None, nuisances=None, sample_weight
groups=groups))

def const_marginal_effect(self, X=None):
X, = check_input_arrays(X)
if 'X' in self._gen_allowed_missing_vars():
force_all_finite = 'allow-nan'
else:
force_all_finite = False
X, = check_input_arrays(X, force_all_finite=force_all_finite)
self._check_fitted_dims(X)
if X is None:
return self._ortho_learner_model_final.predict()
Expand All @@ -1000,34 +1004,52 @@ def const_marginal_effect(self, X=None):
const_marginal_effect.__doc__ = LinearCateEstimator.const_marginal_effect.__doc__

def const_marginal_effect_interval(self, X=None, *, alpha=0.05):
X, = check_input_arrays(X)
if 'X' in self._gen_allowed_missing_vars():
force_all_finite = 'allow-nan'
else:
force_all_finite = False
X, = check_input_arrays(X, force_all_finite=force_all_finite)
self._check_fitted_dims(X)
return super().const_marginal_effect_interval(X, alpha=alpha)

const_marginal_effect_interval.__doc__ = LinearCateEstimator.const_marginal_effect_interval.__doc__

def const_marginal_effect_inference(self, X=None):
X, = check_input_arrays(X)
if 'X' in self._gen_allowed_missing_vars():
force_all_finite = 'allow-nan'
else:
force_all_finite = False
X, = check_input_arrays(X, force_all_finite=force_all_finite)
self._check_fitted_dims(X)
return super().const_marginal_effect_inference(X)

const_marginal_effect_inference.__doc__ = LinearCateEstimator.const_marginal_effect_inference.__doc__

def effect_interval(self, X=None, *, T0=0, T1=1, alpha=0.05):
X, T0, T1 = check_input_arrays(X, T0, T1)
if 'X' in self._gen_allowed_missing_vars():
force_all_finite = 'allow-nan'
else:
force_all_finite = False
X, = check_input_arrays(X, force_all_finite=force_all_finite)
T0, T1 = check_input_arrays(T0, T1)
self._check_fitted_dims(X)
return super().effect_interval(X, T0=T0, T1=T1, alpha=alpha)

effect_interval.__doc__ = LinearCateEstimator.effect_interval.__doc__

def effect_inference(self, X=None, *, T0=0, T1=1):
X, T0, T1 = check_input_arrays(X, T0, T1)
if 'X' in self._gen_allowed_missing_vars():
force_all_finite = 'allow-nan'
else:
force_all_finite = False
X, = check_input_arrays(X, force_all_finite=force_all_finite)
T0, T1 = check_input_arrays(T0, T1)
self._check_fitted_dims(X)
return super().effect_inference(X, T0=T0, T1=T1)

effect_inference.__doc__ = LinearCateEstimator.effect_inference.__doc__

def score(self, Y, T, X=None, W=None, Z=None, sample_weight=None, groups=None):
def score(self, Y, T, X=None, W=None, Z=None, sample_weight=None, groups=None, scoring=None):
"""
Score the fitted CATE model on a new data set.

Expand Down Expand Up @@ -1055,6 +1077,9 @@ def score(self, Y, T, X=None, W=None, Z=None, sample_weight=None, groups=None):
Weights for each samples
groups: (n,) vector, optional
All rows corresponding to the same group will be kept together during splitting.
scoring: name of an sklearn scoring function to use instead of the default, optional
Supports f1_score, log_loss, mean_absolute_error, mean_squared_error, r2_score,
and roc_auc_score.

Returns
-------
Expand Down Expand Up @@ -1113,9 +1138,24 @@ def score(self, Y, T, X=None, W=None, Z=None, sample_weight=None, groups=None):

accumulated_nuisances += nuisances

score_kwargs = {
'X': X,
'W': W,
'Z': Z,
'sample_weight': sample_weight,
'groups': groups
}
# If using an _rlearner, the scoring parameter can be passed along, if provided
if scoring is not None:
# Cannot import in header, or circular imports
from .dml._rlearner import _ModelFinal
if isinstance(self._ortho_learner_model_final, _ModelFinal):
score_kwargs['scoring'] = scoring
else:
raise NotImplementedError("scoring parameter only implemented for "
"_rlearner._ModelFinal")
return self._ortho_learner_model_final.score(Y, T, nuisances=accumulated_nuisances,
**filter_none_kwargs(X=X, W=W, Z=Z,
sample_weight=sample_weight, groups=groups))
**filter_none_kwargs(**score_kwargs))

@property
def ortho_learner_model_final_(self):
Expand Down
Loading