Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider explicit declaration of landings partitions in AnalyticalRatioEstimate #125

Open
edvinf opened this issue Aug 27, 2024 · 6 comments · Fixed by #152
Open

Consider explicit declaration of landings partitions in AnalyticalRatioEstimate #125

edvinf opened this issue Aug 27, 2024 · 6 comments · Fixed by #152
Assignees
Labels
analytical issues for new analytical estimation workflow release1.6 Issues that must be solved before release of 1.6

Comments

@edvinf
Copy link
Contributor

edvinf commented Aug 27, 2024

In current pre-release (-9006) of analytical Ratio estimate, correspondance between stratum and PSU-domains is done automatically, and an error is raised if no correspondances are found in terms of column names, or if no code correspondences are found between estimates and landings. This makes it difficult for users to spot potential misconfigurations. They should rather be excplicitly declared, similar to how fixed effects in Reca are declared.

@edvinf edvinf self-assigned this Aug 27, 2024
@edvinf
Copy link
Contributor Author

edvinf commented Aug 27, 2024

This can also be used to allow RatioEstimates with TotalDomainWeight to ignore some stratification columns. Consider if that is desirable, or if LiftStrata need to be exposed to StoX.

@edvinf
Copy link
Contributor Author

edvinf commented Sep 20, 2024

Could also consider adding explicit imputation-function post estimate.

@edvinf
Copy link
Contributor Author

edvinf commented Sep 20, 2024

Summary of dicussion with user:

  • It would be desirable if domains for which estimates are not provided are imputed with NAs by RatioEstimates, rather than it stopping and complaining about incomplete sampling. In many cases reporting NAs for domains is acceptable and desirable.
  • Unsampled strata is a bit different. In these cases the users should make explicit choices, but it would typically be desired to extrabpolate out of sampling frame, within sampled domains. E.g. extrapolate to all vessels in Q1, even if only vessels above 15m are in sampled strata / sampling frame.

@edvinf
Copy link
Contributor Author

edvinf commented Sep 20, 2024

Consider:

  • change RatioEstimate to impute NAs, rather than complaining about missing domains.
  • change RatioEstimate to always require match with all stratification columns in analytical estimates, but provide options to explicitly ignore some of them
  • change provision of PSU design parameters, so that it by default annotates stratification columns reflecting, vessel flag, vessel length, and target species.
  • Add ImputeRatioEstimate, that provides options for imputing selected values and parameters based on ratio estimation and the average of sampled domains in the same stratum. Provide DefinitionMethod with one option, and consider if other imputation-strategies should be considered (explicit domain to domain imputation, imputation from coarser domain definitions, selective imputation to only some domains, etc.)

Need to update:

  • examples on gitlab
  • vignette
  • RstoxDocumentation project and vignette

@edvinf
Copy link
Contributor Author

edvinf commented Oct 1, 2024

Note that lottery data from 2019 and 2020 only have sampling weights, not exact probabilities. AnalyticalRatioEstimate with MeanDomainWeight needs to work for these data. That is complicated by:

  • New standard stratification columns in lottery parameter files that include some columns not in landing. Will be relieved by the ignore option.
  • stratification from lower levels (Species Category). Will be relieved by the ignore option. May also consider exposing CollapseStrata o StoX, as that will also help herring cases with inconsistent species code annotation (see NSSH examples).

@edvinf edvinf added the analytical issues for new analytical estimation workflow label Oct 24, 2024
@edvinf edvinf added the release2.0 Issues that must be solved before release of 2.0 label Dec 23, 2024
@edvinf edvinf linked a pull request Dec 27, 2024 that will close this issue
@edvinf edvinf added release1.6 Issues that must be solved before release of 1.6 and removed release2.0 Issues that must be solved before release of 2.0 labels Dec 27, 2024
@edvinf
Copy link
Contributor Author

edvinf commented Dec 30, 2024

We will split up whats happening here in some more steps:

  1. Match estimates to census, infer parameters beyond sampling frame and to missing domains (Add explicit out-of-frame inference #154)
  2. Do ratio estimation

In the ratio estimation, we will specify

  • variables used for matching with samples and landings (mean weights and frequencies)
  • any additional variables used for matching with landings (total weights). That is use total weights at higher resolution than estimates.
  • variables to retain in results (among those above). That is collapse results to desired resolution.
    We will then do explicit matching, rather than optionally ignoring. I think that is better as, it will encourage explicit considerations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
analytical issues for new analytical estimation workflow release1.6 Issues that must be solved before release of 1.6
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant