Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Median nightly delays #795

Open
wants to merge 62 commits into
base: main
Choose a base branch
from
Open

Median nightly delays #795

wants to merge 62 commits into from

Conversation

aewallwi
Copy link
Collaborator

@aewallwi aewallwi commented May 23, 2022

Tools to better ensure that redcal degeneracies are relatively constant in time. Depends on #772. Depends on #798 #815 . Coverage gaps arise from gaps that will be addressed in some shape or form in #798

@codecov
Copy link

codecov bot commented May 23, 2022

Codecov Report

Merging #795 (beb3c23) into main (d974d1a) will increase coverage by 0.03%.
The diff coverage is 100.00%.

❗ Current head beb3c23 differs from pull request most recent head 77a85fe. Consider uploading reports for the commit 77a85fe to get more accurate results

@@            Coverage Diff             @@
##             main     #795      +/-   ##
==========================================
+ Coverage   97.04%   97.07%   +0.03%     
==========================================
  Files          18       18              
  Lines        8294     8389      +95     
==========================================
+ Hits         8049     8144      +95     
  Misses        245      245              
Impacted Files Coverage Δ
hera_cal/frf.py 97.98% <ø> (ø)
hera_cal/reflections.py 92.68% <ø> (ø)
hera_cal/apply_cal.py 94.54% <100.00%> (ø)
hera_cal/io.py 98.16% <100.00%> (+<0.01%) ⬆️
hera_cal/redcal.py 99.30% <100.00%> (+0.06%) ⬆️
hera_cal/smooth_cal.py 96.66% <100.00%> (+0.01%) ⬆️
hera_cal/vis_clean.py 97.76% <100.00%> (+0.01%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d974d1a...77a85fe. Read the comment docs.

@aewallwi
Copy link
Collaborator Author

The problem that I am trying to fix arises from transient misidentified antenna polarities. I found that using remove_degen_gains with a median delay and polarity over the night does not seem to fix the problem. What does seem to fix the problem is just mulitplying each antenna gain by (-1) ** abs(median_polarity - per_time_polarity). Here are some examples:

Original phases
image

Using remove_degen_gains where degen_gains is set to the median delay and polarity.
image

Here is just multiplying by (-1) ** abs(median_polarity - per_time_polarity)
image

And finally, here is using remove_degen_gains only to fix delays and then using a separate step of (-1) ** abs(median_polarity - per_time_polarity) to fix the polarities

image

@jsdillon
Copy link
Member

jsdillon commented May 23, 2022

I'm surprised that last step does literally nothing... Have you checked multiple antennas? What's your reference antenna?

We saw jumps like these in H1C IDR2 before there was ever any polarity flipping, e.g. figure 6 of https://arxiv.org/pdf/2108.02263.pdf

@aewallwi
Copy link
Collaborator Author

I'm surprised that last step does literally nothing... Have you checked multiple antennas? What's your reference antenna?

We saw jumps like these in H1C IDR2 before there was ever any polarity flipping, e.g. figure 6 of https://arxiv.org/pdf/2108.02263.pdf

The last plot had a bug. This is actually what happens when you try to fix the delays without paying attention to polarity flips and then flip polarities as a separate step.

image

@aewallwi
Copy link
Collaborator Author

aewallwi commented May 29, 2022

I've updated this PR so that the default recipe for updating first-cal is to both switch the polarities and to update the offsets and delays in the degenerate subspace. I found that the polarity flips fixes the polarity misidentification. Updating the degeneracies, while it has minimal impact on the 2459140 data is functionality that might be desirable in the future to guarantee that we have constant first-cal parameters.

@aewallwi aewallwi marked this pull request as ready for review May 30, 2022 05:54
@aewallwi aewallwi requested a review from jsdillon May 30, 2022 05:54
Copy link
Member

@jsdillon jsdillon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking pretty good. Just a few minor tweaks.

Also, not that the PR is almost done, can you add some detail to the PR description for posterity's sake and show some plots of this procedure in action on real data?

----------
redcal_meta_file_list: list of str
list of file names containing redcal meta data. All files should have same number of times.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please document the other kwargs, even though they are pretty obvious.

output_replace='.redcal_meta.hdf5'):
"""
Find the median delay and polarity for list of firstcal meta files.
and write them out with that median.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

write them out with that median. is unclear. What you're actually doing is making new redcal meta files with median-ed info.

Also, this function is misleadingly named, since it also edits offsets (missing from the docstring) and polarities. Perhaps just call it median_nightly_firstcal_metadata or similar?

Comment on lines +1978 to +1980
redcal_metas = []
for meta_file in redcal_meta_file_list:
redcal_metas.append((meta_file, ) + read_redcal_meta(meta_file))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have used a list comprehension

# compute medians.
for ant in delays:
delays[ant] = np.nanmedian(delays[ant])
polarity_flips[ant] = bool(np.nanmedian(polarity_flips[ant]))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If all the polarity flips are np.nan for a given antenna, this will turn into True though I think you want it to be False by default. Probably doesn't matter but you might want an a check on not np.any(np.isfinite())

Comment on lines +2034 to +2035
# get redundant calibrator.
# get reds
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these comments got misplaced

path to output file. Optional to write output to.
replace_degens: bool, optional
If True, replace degenerate portion of calibration solution with solution in redcal_meta_file.
fix_polarities: bool, optional
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you have this option, it's weird that old_redcal_meta_file is a required parameter. Perhaps you want to put a way to allow the user to pass None for old_redcal_meta_file if this is False?

if not dont_replace_polarities:
for ant in fc_meta['polarity_flips']:
gains[ant] *= (-1. + 0j) ** np.abs(fc_meta['polarity_flips'][ant][:, None] - fc_meta_old['polarity_flips'][ant][:, None])
ntimes = hc.Ntimes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ntimes isn't used

ntimes = hc.Ntimes
if not dont_replace_degens:
for ant in gains:
dly_factor = np.exp(2j * np.pi * freqs[None, :] * fc_meta['dlys'][ant][:, None]) * np.exp(1j * fc_meta['offsets'][ant][:, None])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a bit weird to call it a dly_factor when it includes an offset (which can be summed inside the exponent)

@@ -1986,7 +2109,7 @@ def redcal_argparser():
Default None loads all integrations.")
redcal_opts.add_argument("--upsample", default=False, action="store_true", help="Upsample BDA files to the highest temporal resolution.")
redcal_opts.add_argument("--downsample", default=False, action="store_true", help="Downsample BDA files to the highest temporal resolution.")
redcal_opts.add_argument("--pol_mode", type=str, default='2pol', help="polarization mode of redundancies. Can be '1pol', '2pol', '4pol', or '4pol_minV'. See recal.get_reds documentation.")
redcal_opts.add_argument("--pol_mode", type=str, default='2pol', help="polarization mode of redundancies. Can be '1pol', '2pol', '4pol', or '4pol_minV'. See redcal.get_reds documentation.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol, thanks

# check that files have been created
for ofile in omnifiles:
assert os.path.exists(omni_file.replace('.calfits', '.medphase.calfits'))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any other test we can do? Like, does doing this make the calibration solutions smoother in time? If so, can we quantify that and make an assertion about it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's overkill to show whether or not the calibration is smoother. Anecdotally, I find that the differences in the solutions after replacing the degeneracies are quite small. I will document this above.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point. Mostly I'm interested in the question of whether polarities are handled appropriately.

@aewallwi
Copy link
Collaborator Author

aewallwi commented Jun 3, 2022

I am unable to reproduce the macOS unittest failures on a mac. I have tried to create a fresh conda environment and still no luck. Should we remove the macOS tests?

@jsdillon
Copy link
Member

jsdillon commented Jun 3, 2022

Maybe @bhazelton or @plaplant have ideas about how to debug testing failures that appear only to happen in the CI but can't be reproduced locally...

@plaplant
Copy link
Member

plaplant commented Jun 3, 2022

@aewallwi @jsdillon I ran the tests locally on my (M1-based) Mac in a fresh environment, and I'm getting a test failure too, though on a different test than the CI failures. I've posted the full traceback at the bottom of my comment.

As a way to debug testing failures in CI that don't appear locally, I tend to ssh into the machine running the job, and poke around on the command line there. This way you're using the exact same environment and hardware that produced the error, so it's easier to figure out what's going wrong. I would highly (can't stress this enough) recommend against ignoring test failures or tuning off CI checks. It's better in the long run to get to the bottom of failing tests, and understand what's going wrong.

My local test failure:

__________________________________________________________________ Test_ReflectionFitter_XTalk.test_misc_svd_funcs ___________________________________________________________________

self = <hera_cal.tests.test_reflections.Test_ReflectionFitter_XTalk object at 0x12fb78640>

    def test_misc_svd_funcs(self):
        # setup RF object
        RF = reflections.ReflectionFitter(self.uvd)
        # add noise
        np.random.seed(0)
        Namp = 3e0
        for k in RF.data:
            RF.data += stats.norm.rvs(0, Namp, RF.Ntimes * RF.Nfreqs).reshape(RF.Ntimes, RF.Nfreqs) + 1j * stats.norm.rvs(0, Namp, RF.Ntimes * RF.Nfreqs).reshape(RF.Ntimes, RF.Nfreqs
)
        bl = (23, 24, 'ee')

        # fft data
        RF.fft_data(data=RF.data, window='blackmanharris', overwrite=True)

        # sparse sv decomp
        svd_wgts = RF.svd_weights(RF.dfft, RF.delays, min_dly=150, max_dly=500, side='both')
        RF.sv_decomp(RF.dfft, wgts=svd_wgts, keys=[bl], overwrite=True, Nkeep=None, sparse_svd=True)
        assert RF.umodes[bl].shape == (100, 98)
        assert RF.vmodes[bl].shape == (98, 128)

>       RF.sv_decomp(RF.dfft, wgts=svd_wgts, keys=[bl], overwrite=True, Nkeep=10, sparse_svd=True)

hera_cal/tests/test_reflections.py:429:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
hera_cal/reflections.py:589: in sv_decomp
    u, svals, v = sparse.linalg.svds(dfft[k] * wgts[k], k=Nk, which='LM')
../../../../miniforge3/envs/hera_cal_tests/lib/python3.10/site-packages/scipy/sparse/linalg/_eigen/_svds.py:350: in svds
    eigvals, eigvec = eigsh(XH_X, k=k, tol=tol ** 2, maxiter=maxiter,
../../../../miniforge3/envs/hera_cal_tests/lib/python3.10/site-packages/scipy/sparse/linalg/_eigen/arpack/arpack.py:1566: in eigsh
    ret = eigs(A, k, M=M, sigma=sigma, which=which, v0=v0,
../../../../miniforge3/envs/hera_cal_tests/lib/python3.10/site-packages/scipy/sparse/linalg/_eigen/arpack/arpack.py:1345: in eigs
    params.iterate()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <scipy.sparse.linalg._eigen.arpack.arpack._UnsymmetricArpackParams object at 0x2da3c35b0>

    def iterate(self):
        if self.tp in 'fd':
            self.ido, self.tol, self.resid, self.v, self.iparam, self.ipntr, self.info =\
                self._arpack_solver(self.ido, self.bmat, self.which, self.k,
                                    self.tol, self.resid, self.v, self.iparam,
                                    self.ipntr, self.workd, self.workl,
                                    self.info)
        else:
            self.ido, self.tol, self.resid, self.v, self.iparam, self.ipntr, self.info =\
                self._arpack_solver(self.ido, self.bmat, self.which, self.k,
                                    self.tol, self.resid, self.v, self.iparam,
                                    self.ipntr, self.workd, self.workl,
                                    self.rwork, self.info)

        xslice = slice(self.ipntr[0] - 1, self.ipntr[0] - 1 + self.n)
        yslice = slice(self.ipntr[1] - 1, self.ipntr[1] - 1 + self.n)
        if self.ido == -1:
            # initialization
            self.workd[yslice] = self.OP(self.workd[xslice])
        elif self.ido == 1:
            # compute y = Op*x
            if self.mode in (1, 2):
                self.workd[yslice] = self.OP(self.workd[xslice])
            else:
                Bxslice = slice(self.ipntr[2] - 1, self.ipntr[2] - 1 + self.n)
                self.workd[yslice] = self.OPa(self.workd[Bxslice])
        elif self.ido == 2:
            self.workd[yslice] = self.B(self.workd[xslice])
        elif self.ido == 3:
            raise ValueError("ARPACK requested user shifts.  Assure ISHIFT==0")
        else:
            self.converged = True

            if self.info == 0:
                pass
            elif self.info == 1:
                self._raise_no_convergence()
            else:
>               raise ArpackError(self.info, infodict=self.iterate_infodict)
E               scipy.sparse.linalg._eigen.arpack.arpack.ArpackError: ARPACK error 3: No shifts could be applied during a cycle of the Implicitly restarted Arnoldi iteration. One possibility is to increase the size of NCV relative to NEV.

../../../../miniforge3/envs/hera_cal_tests/lib/python3.10/site-packages/scipy/sparse/linalg/_eigen/arpack/arpack.py:757: ArpackError

@jsdillon
Copy link
Member

jsdillon commented Jun 3, 2022

Thanks @plaplant. How do you ssh into github actions?

@aewallwi
Copy link
Collaborator Author

aewallwi commented Jun 15, 2022

@jsdillon it looks like there isn't a built-in way to do it for Github Actions (unlike in CircleCI). I found this: https://github.com/marketplace/actions/debugging-with-ssh. Maybe that will work?

The tests pass for linux which is the OS that is going to be used for our science runs. I think this raises an interesting question. How much time do we want to be investing in order to fully support multiple operating systems when Linux is the standard for our production analysis and the error here is not reproducible on other macOS machines and could potentially be a bug on githubs end?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants