From: Roger Cappallo <rjc@haystack.mit.edu>
Subject: Re: data flagging in fourfit
Date: Fri, 18 May 2018 10:48:22 -0400
Cc: John Barrett <barrettj@haystack.mit.edu>
To: Geoff Crew <gbc@haystack.mit.edu>


   Hi Geoff,
   For the record, the original code to support difx weights was added to ff
   in norm.c on 2013.12.2 in rev 884.
   It took a bit of head-scratching to figure out why the +/-0.0 code shows
   up only in the spectral (now norm_fx) code there. At first I thought maybe
   it was a simple mistake - that I’d stuck it in the wrong section.
   The backward compatibility, though, is referring to being compatible with
   fx (i.e. difx) output that was generated by difx2mark4 *prior* to the code
   changes to d2m4. For those old files the flag field, which was co-opted to
   also carry weights, was simply zero (0x00000000 by the IEEE 754 standard),
   aka +0.0. The ff code changes the weight to be 1.0 if that value is
   detected, whereas actual zero weights are encoded by d2m4 as -0.0
   (0x10000000). The code should probably be kept as is, since someone might
   go back and re-fourfit the older difx outputs.
   So far as generalized data editing goes, I see 3 reasonable options:
   1) rely on the “natural” difx weighting scheme by generating .flag
   files for use at correlation time, or more precisely, at the time of
   Swinburne file to mk4 fileset translation. It would be necessary to go
   back and rerun d2m4 if the desired flagging is only known after examining
   the data.
   2) as you suggest, one could have a parameter specifying a data weight
   threshold below which data are simply discarded. I like this mechanism a
   lot, due to its simplicity and clarity. In the ff code there are still
   vestiges of a parameter called max_parity, which served this function back
   in the bad old days of spotty tape playback. One could easily imagine a
   similar mechanism, with min_weight setting the bar for data discard within
   norm_fx, using the usb/lsb_frac variables.
   3) also as you suggest, and ancillary file having data weights by antenna
   or baseline, by time and frequency channel, would be the most general. If
   this route is followed, one might look to the ad hoc phase file (Appendix
   D of the ff manual) as an example to follow. Keeping similarity of the
   external files as close as possible would ease the job of future users.
   If this option (#3) is implemented, I would also advise writing a separate
   subroutine to access the weighting data, thus keeping the norm_fx mods to
   a minimum.  One could call the new routine with time and frequency (ap,
   fr) and have the weights returned.
   Cheers,
   Roger

--------------

To add a new (simple) parameter, one must modify a number
of the sources.  The ones marked * are optional--depends
on what you're changing.

a: min_weight changes
b: data flagging changes
c: dump plot data (for amp(ap,fr))

  1. control.h
    most likely the paraameter value needs to be kept in struct c_block

  a: max_parity -> min_weight
  b: char adhoc_flag_files[2][256];
  c: char plot_data_dir[2][256];

  2. parser.h
    most likely you'll need to provide a new token for the parameter

  a: MAX_PARITY_ -> MIN_WEIGHT_
  b: ADHOC_FLAG_FILE_
  c: PLOT_DATA_DIR_

* 3. param_struct.h
    it depends on what you're doing

  a: no change for min_weight
  b: char ah_flag_files[2][256]     /* Ad hoc flag files */
  c: char plot_data_dir[2][256];

  4. init_tokens.c
    maps your tokens to the control file strings

  a: max_parity -> min_weight takes a float
  b: adhoc_flags takes a filename
      tokenize (ADHOC_FLAG_FILE_,  "adhoc_flag_file", STRING_PARAM)
  c: tokenize (PLOT_DATA_DIR_,  "plot_data_dir", STRING_PARAM)

  5. default_cblock.c
    set the default value

  a: max_parity -> min_weight 0.000
  b: adhoc_flags is an empty string
    cb_ptr -> adhoc_flag_files[i][0] = 0;
  c: plot_data_dir is an empty string:
    cb_ptr -> plot_data_dir[i][0] = 0;

  6. nullify_cblock.c
    set to recognizible null value

  a: max_parity -> min_weight NULLFLOAT
  b: adhoc_flags is an empty string
    cb_ptr -> adhoc_flag_files[i][0] = 0;     
  c: cb_ptr -> plot_data_dir[i][0]    = 0;

  7. copy_cblock_parts.c
    a line to copy it when non-null

  a: max_parity -> min_weight
  b: copy the string
    if (f->adhoc_flag_files[i][0] != 0)
        strcpy (t->adhoc_flag_files[i], f->adhoc_flag_files[i]);
  c: if (f->plot_data_dir[i][0] != 0)
      strcpy (t->plot_data_dir[i], f->plot_data_dir[i]);

  8. parser.c
    appropriate fsm changes

  a: max_parity/MAX_PARITY_ -> min_weight/MIN_WEIGHT_
  b: capture the filename
    duplicate of toknum == ADHOC_FILE_) logic
  c: dup of ADHOC_FLAG_FILE_ logic

* 9. create_fsm.c
    if your change also requires changes the fsm

  a:  min_weight no changes
  b:  no changes
  c:  no changes

* 10. implementation:
    precorrect.c -- if changes prior to fringe search
    ...
  a:
    norm_fx.c:   DONE: l.212ff:
    if (pass->control.min_weight > 0.0 &&
        pass->control.min_weight > t120->fw.weight) continue;
  b:
    norm_fx.c    datum->flag replaced by datum_flag = ah_flags(...)

    and copy the files in precorrect.c
  c: hook into output.c towards dump_plot_data2dir()
        plot_data_dir.h
        plot_data_dir.c
    and copy the files in precorrect.c

  11. ../../help/fourfit.doc
    document what the new parameter does

  a: documented
  b: ...
  c: and also alphabetized words

* 12. ../../data/ff_testdata/chk..sh
    consider adding a test to verify that what you changed
    does what you want (now and into the future).

  a: modified chk_ff_3571.sh into a new test
  b: similar, perhaps
  c: TODO

[gbc@gefera trunk]$ svn status
M       chops/Makefile.am
M       help/fourfit.doc
M       postproc/fourfit/control.h
M       postproc/fourfit/copy_cblock_parts.c
A       postproc/fourfit/data-flagging.txt
M       postproc/fourfit/default_cblock.c
M       postproc/fourfit/init_tokens.c
M       postproc/fourfit/nullify_cblock.c
M       postproc/fourfit/parser.c
M       postproc/fourfit/parser.h

----------- surgical approach:

control:
    adhoc_flag_file     filename

file contains lines of the form
    tboy  flaggingnibbles
    tboy is time as with other ad hoc--time in sec from start of year
        thyme = ((ap + 0.5) * param.acc_period + param.start) / 8.64e4;
        (param.start is seconds from beginning of year
    flagging nibbles is one bit / pol (per sideband for a byte)
        length is one byte per channel, with the last value filled to remainder

control.h
    adhoc_weight_file   char[256]

as filenames are found, they are read, parsed and stored with
a lookup by filename returning a pointer to an array of
    one nibble per sb (4 pols): 4pol / side band FF = USB/LSB 4 pol flagging
    0x1 == use it, 0x0 == lose it
    bits ordered as per datum->flag values

    { double thyme, char wgt[(#freqs)*(#pols)*sb] }

    so that normfx can have datum_flag = lookup(datum->flag, ap, fr)
    to replace datum->flag


rotate_pcal.c:
    thyme = ((ap + 0.5) * param.acc_period + param.start) / 8.64e4;

2d array time and channels: 1 = retain, 0 = ignore at starting change time.
wild-carding on pols & freqs?  64 x 4bits = 32 or 64 bytes

binary search on thyme with implicit unflagged at bounds

adhoc_flag.h
adhoc_flag.c

M       help/fourfit.doc
M       postproc/fourfit/Makefile.am
A       postproc/fourfit/adhoc_flag.c
A       postproc/fourfit/adhoc_flag.h
M       postproc/fourfit/control.h
M       postproc/fourfit/copy_cblock_parts.c
M       postproc/fourfit/data-flagging.txt
M       postproc/fourfit/default_cblock.c
M       postproc/fourfit/init_tokens.c
M       postproc/fourfit/norm_fx.c
M       postproc/fourfit/nullify_cblock.c
M       postproc/fourfit/param_struct.h
M       postproc/fourfit/parser.c
M       postproc/fourfit/parser.h

plot_data_dir.?

M       help/fourfit.doc
M       postproc/fourfit/Makefile.am
M       postproc/fourfit/control.h
M       postproc/fourfit/copy_cblock_parts.c
M       postproc/fourfit/data-flagging.txt
M       postproc/fourfit/default_cblock.c
M       postproc/fourfit/init_tokens.c
M       postproc/fourfit/nullify_cblock.c
M       postproc/fourfit/output.c
M       postproc/fourfit/param_struct.h
M       postproc/fourfit/parser.c
M       postproc/fourfit/parser.h
A       postproc/fourfit/plot_data_dir.c
A       postproc/fourfit/plot_data_dir.h
M       postproc/fourfit/precorrect.c


eof
