The First Phase: PointcloudPhase
The first phase of the defualt Pipeline is the transformation of raw AT-TPC digitized waveforms (often called traces) into a point cloud geometry. There are four steps to this part of the analysis:
- Removal of the trace baselines
- Identifying signal peaks in the traces
- Consolidation of the signals into point clouds
- Identifying associated auxilary detectors and applying corrections
Below we'll walk through each of these steps. This is run through the PointcloudPhase
.
Baseline Removal
Each trace has some DC offset, that is a shift from 0 on the ADC scale of the trace. To correct for this, a Fourier analysis techinque is employed. This technique was first used in AT-TPC analysis by J. Bradt in his thesis work. Very breifly we will go over some of the details here. A Fourier transform can be written as
This transformation transforms a function to the frequency domain; however the main useful tool here what is known as the Convolution Theorem, which states that for two functions \(f(x)\) and \(g(x)\) with Fourier transforms \(\hat{f}(\xi)\) and \(\hat{g}(\xi)l\) the convolution \(\{f * g\}(x)\) can be expressed as the inverse Fourier transform of the prodcut of the Fourier transforms; that is
This is really powerful! It means calculating a covolution is as simple as multiplying the Fourier transforms of the functions and taking the inverse Fourier transform of that product. Numpy provides the libraries for calculating Fourier transforms, and we can speed them up using the Rocket-FFT library and Numba.
We still haven't exactly explained why this is useful. Removing a baseline is kind of like removing an average, or applying a low-pass filter. In this way, we basically want to convolve a window or averaging range with our signal to extract a smooth baseline to remove (remember that the baseline is noisy as well). Such a function is basically a step
Conviently for us, the Fourier transform of a step has a simple description
We introduce an aditional scaling parameter \(a\) so that \(\hat{g}(\xi) = \text{sinc}(\xi/a)\) and the size of the baseline window can be controlled. This means that all we need to do is take the Fourier transform of our trace, multiply it by our \(\hat{g}\), and then take the inverse fourier transform of that! We can even do this in one shot for the GET traces, correcting all of the baselines at once using the 2-D Fourier transform algorithms of Numpy.
There are some small details that can be interesting. Before taking the Fourier transform of the trace, we need to remove singals from the trace, otherwise they will upset the baseline, moving the average up. This is done removing any time-buckets which have a value \(\ge 1.5 \sigma\) away from the mean amplitude of the trace where \(\sigma\) is the standard deviation of the trace amplitude.
Identifying Signal Peaks
Identifying peaks in spectra (traces) can be a tedious, delicate, and time consuming
task. Thankfully, it is a common enough problem that someone already solved it. Scipy
has a signal
library that contains the find_peaks
function. find_peaks
takes a
lot of different arguments that can greatly impact the quality of the peak finding.
Chekcout our configuration description to see which parameters
are exposed and what they do.
Once peaks are identified, centroids, amplitudes, integrals are extracted. Centroids are smeared within a single time bucket when converted to floating point values for subsequent calculations. This accounts for the binning behavior of sampling the electronic signal.
Consolidation into Point Clouds
Once peaks have been identified, it is time to convert from signals into geometry. Each signal represents a measurement of the electrons ionized by the flight path of a particle through the AT-TPC (in principle anyways, there's a lot of noise). Coordinates in the AT-TPC pad plane (X-Y) are taken using the pad ID in which the signal occurred and looking up the pad position from a CSV file (again see the configuration docs). The beam-axis coordinate (Z) is extracted by calibrating the centroid location of the peak, which originally is in units of GET time-buckets, to millimeters. This is done using two aboslute reference points: the window and the micromegas. The time bucket of the window and micromegas can be extracted by examining events with long trajectories such as events where the beam reacted with the window. The calibration is then
where \(l_{attpc}\) is the length of the AT-TPC in millimeters. However for some
experiments the window time bucket can be difficult to determine. In this case it can be
better to use an experimentally determined drift velocity and the micromegas time
bucket. Spyral provides a calculate_window_time
function
which can be used to get a window time from a known drift velocity, GET sampling
frequency, and micromegas time.
The available columns (in order) in a point cloud are:
- x-position(mm)
- y-position(mm)
- z-position(mm if calibrated, TB if not)
- charge(amplitude)
- charge(integral)
- pad ID
- z-time(TB)
- scale(relative, indicates Big Pad or Small Pad)
Auxilary Detectors and Corrections
The AT-TPC has at minimum one auxilary detector, the upstream ion chamber. The ion
chamber (IC) is used to generate the AT-TPC trigger (in conjunction with the mesh) and
identify beam species, and is recorded using the FRIBDAQ framework. Its data is
analyzed very similarly to the GET data; baselines are removed and signals are
extracted. Users can specify the maximum allowed ion chamber multiplicty to reject data
with has an inconsistent trigger. See the code in spyral/traces/frib_traces.py
and
spyral/traces/frib_event.py
for details.
Additionally, it has been found that in some cases the AT-TPC needs a correction for
distortions to the electric field. The correction is calculated using
Garfield++ externally from Spyral. Spyral
takes in a correction file containing initial electron positions and final drift
positions and times. Spyral generates an interpolation mesh to calculate corrections
for each point. The mesh is calculated and stored in a Numpy format file (.npy
). It
is not clear how impactful this correction is anymore; it was seemingly most important
early on in the AT-TPC development. This correction is optional, and can be turned off
and on using the configuration.
Point clouds are also sorted in z for ease of use later.
The Ion Chamber and the Trigger
Some additional discussion needs to be had about the behavior of the ion chamber as it pertains to the trigger (start) of events. The ion chamber recieves a beam particle, which causes a signal to be registered. This ion chamber signal is then delayed by 100 μs. The beam then enters the AT-TPC active volume, electrons are ionized, and the electrons drift toward the mesh. The mesh registers a singal, and this signal generates a 120 μs gate. The coincidence of this mesh gate and the ion chamber delayed signal is then the trigger for an event. This effects the analysis of the ion chamber because the recording window for the module recieving the ion chamber recieves this delayed IC signal, and so can have peaks in the waveform that occured before the event trigger. To compensate for this there is a configuration parameter to set the delay in FRIBDAQ time buckets; all peaks before the delay time bucket are ignored and not used in subsequent ion chamber analysis. See here for more details. (We should probably put a diagram of what is happening here)
Legacy Data
In the past, the AT-TPC was not run with a split DAQ. In general we don't support everything needed for Legacy analysis, and a more complete example is given by Z. Serkiow here
Scalers
The FRIBDAQ data does not only contain auxilary detectors. It also contains scalers, or
counters. These scalers are readout independently from the main acquisition and
represent statistics of the experiment. The most common scaler to use is the downscaled
ion chamber trigger, which is a measurement of the total beam count over the course of
the experiment. Scalers are read out run by run and are written to the scaler
directory of the workspace as dataframes in parquet files. The available columns in the
dataframe are:
- clock_free: The time ellapsed while running the data acqusition
- clock_live: The amount of time for which the acquisition is "live" (able to accept triggers)
- trigger_free: The total number of trigger signals recieved by the acquisition
- trigger_live: The total number of triggers which acutally cause events in the acquisition
- ic_sca: The total number of ion chamber signals recieved by the acquisition
- mesh_sca: The total number of mesh signals recieved by the acquisition
- si1_cfd: The total number of Si detector 1 signals recieved by the acquisition
- si2_cfd: The total number of Si detector 2 signals recieved by the acquisition
- sipm: Unclear
- ic_ds: The downscaled rate into the ion chamber, typically a factor of 1000
- ic_cfd: Unclear
Additionally, when using the ic_multiplicity
parameter, it applies a gate to
essentially the trigger of the data that is acutally analyzed. This means that the
scalers recorded by FRIBDAQ are now inconsistent with the analysis and cannot be used
as a beam normalization without accounting for the effect of the gate. To compensate
for this, Spyral will make its own scaler, counting the number of events which have an
IC that matches the multiplicity condition. This scaler will be written to a simple
text file stored in the Pointcloud directory of the workspace
(run_#_gated_ic_scaler.txt
).
Final Thoughts
The first phase is very intense and represents a major data transformation. Typically, the trace data is somewhere between 10-50 GB per run, and the output of the point cloud phase is something like 250 MB to 1 GB. This is an enormous reduction of data, and as such is usually the slowest phase, taking anywhere from 10 minutes to an hour depending on the hardware being used. The bottleneck is typically I/O speed; reading in so much data is a serious issue depending on the type of storage used to hold the traces. In general, if the traces are stored on a network drive which doesn't have hardline 10 Gb connection, this phase will be slow. HDD drives can also be a slow down, or older USB connected external drives. The general recommendation is to move data to a fast local SSD for analysis when possible.