reboost package¶

Subpackages¶

Submodules¶

reboost.build_evt module¶

A program for combining the hits from various detectors, to build events.

Is able to parse a config file with the following format config file:

channels:
    geds_on:
    - det001
    - det002
    geds_ac:
    - det003

outputs:
- energy
- multiplicity

operations:
 energy_id:
    channels: geds_on
    aggregation_mode: gather
    query: "hit.energy > 25"
    expression: tcm.channel_id

 energy:
    aggregation_mode: keep_at_ch:evt.energy_id
    expression: "hit.energy > 25"
    channels: geds_on

 multiplicity:
    channels: geds_on
    aggregation_mode: sum
    expression: "hit.energy > 25"
    initial: 0

Must contain: - “channels”: dictionary of channel groupings - “outputs”: fields for the output file - “operations”: operations to perform see pygama.evt.build_evt.evaluate_expression() for more details.

reboost.build_evt.build_evt(hit_file, tcm_file, evt_file, config, buffer=5000000)¶

Generates the event tier from the hit and tcm.

Parameters:

hit_file (str) – path to the hit tier file
tcm_file (str) – path to the tcm tier file
evt_file (str | None) – path to the evt tier (output) file, if None the Table is returned in memory
config (dict) – dictionary of the configuration.
buffer (int) – number of events to process simultaneously

Returns:

ak.Array of the evt tier data (if the data is not saved to disk)

Return type:

Array | None

reboost.build_glm module¶

reboost.build_glm.build_glm(stp_files, glm_files, *, out_table_name='glm', id_name='g4_evtid', evtid_buffer=10000000, stp_buffer=10000000)¶

Builds a g4_evtid look up (glm) from the stp data.

This object is used by reboost to efficiency iterate through the data. It consists of a lgdo.VectorOfVectors for each lh5_table in the input files. The rows of this lgdo.VectorOfVectors correspond to the id_name while the data are the stp indices for this event.

Parameters:

stp_files (str | list[str]) – path to the stp (input) file.
glm_files (str | list[str] | None) – path to the glm data, can also be None in which case an ak.Array is returned in memory.
out_table_name (str) – name for the output table.
id_name (str) – name of the evtid file, default g4_evtid.
stp_buffer (int) – the number of rows of the step file to read at a time
evtid_buffer (int) – the number of evtids to read at a time

Returns:

either None or an ak.Array

Return type:

Array | None

reboost.build_glm.get_glm_rows(stp_evtids, vert, *, start_row=0)¶

Get the rows of the Geant4 event lookup map (glm).

Parameters:

stp_evtids (_SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes]) – Array of evtids for the steps
vert (_SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes]) – Array of simulated evtid for the vertices.
start_row (int) – The index of the first element of stp_evtids.

Returns:

an awkward array of the glm.

Return type:

Array

reboost.build_glm.get_stp_evtids(lh5_table, stp_file, id_name, start_row, last_vertex_evtid, stp_buffer)¶

Extracts the rows of a stp file corresponding to a particular range of evtid.

The reading starts at start_row to allow for iterating through the file. The iteration stops when the evtid being read is larger than last_vertex_evtid.

Parameters:

lh5_table (str) – the table name to read.
stp_file (str) – the file name path.
id_name (str) – the name of the evtid field.
start_row (int) – the row to begin reading.
last_vertex_evtid (int) – the last evtid to read up to.
stp_buffer (int) – the number of rows to read at once.

Returns:

a tuple of the updated start_row, the first row for the chunk and an awkward Array of the steps.

Return type:

tuple[int, int, Array]

reboost.build_hit module¶

Routines to build the hit tier from the stp tier.

A build_hit() to parse the following configuration file:

# dictionary of objects useful for later computation. they are constructed with
# auxiliary data (e.g. metadata). They can be accessed later as OBJECTS (all caps)
objects:
 lmeta: LegendMetadata(ARGS.legendmetadata)
 geometry: pyg4ometry.load(ARGS.gdml)
 user_pars: dbetto.TextDB(ARGS.par)
 dataprod_pars: dbetto.TextDB(ARGS.dataprod_cycle)

# processing chain is defined to act on a group of detectors
processing_groups:

    # start with HPGe stuff, give it an optional name
    - name: geds

      # this is a list of included detectors (part of the processing group)
      detector_mapping:
        - output: OBJECTS.lmeta.channglmap(on=ARGS.timestamp)
         .group('system').geds
         .group('analysis.status').on
         .map('name').keys()

      # which columns we actually want to see in the output table
      outputs:
         - t0
         - evtid
         - energy
         - r90
         - drift_time

      # in this section we define objects that will be instantiated at each
      # iteration of the for loop over input tables (i.e. detectors)
      detector_objects:
         # The following assumes that the detector metadata is stored in the GDML file
         pyobj: legendhpges.make_hpge(pygeomtools.get_sensvol_metadata(OBJECTS.geometry, DETECTOR))
         phyvol: OBJECTS.geometry.physical_volume_dict[DETECTOR]
         drift_time_map: lgdo.lh5.read(DETECTOR, ARGS.dtmap_file)

      # this defines "hits", i.e. layout of the output hit table
      # group steps by time and evtid with 10us window
      hit_table_layout: reboost.shape.group_by_time(STEPS, window=10)

      # finally, the processing chain
      operations:

        t0: ak.fill_none(ak.firsts(HITS.time, axis=-1), np.nan)

        evtid: ak.fill_none(ak.firsts(HITS.__evtid, axis=-1), np.nan)

        # distance to the nplus surface in mm
        distance_to_nplus_surface_mm: reboost.hpge.distance_to_surface(
            HITS.__xloc, HITS.__yloc, HITS.__zloc,
            DETECTOR_OBJECTS.pyobj,
            DETECTOR_OBJECTS.phyvol.position.eval(),
            surface_type='nplus')

        # activness based on FCCD (no TL)
        activeness: ak.where(
            HITS.distance_to_nplus_surface_mm <
                lmeta.hardware.detectors.germanium.diodes[DETECTOR].characterization.combined_0vbb_fccd_in_mm.value,
            0,
            1
            )

        activeness2: reboost.math.piecewise_linear(
            HITS.distance_to_nplus_surface_mm,
            PARS.tlayer[DETECTOR].start_in_mm,
            PARS.fccd_in_mm,
            )

        # summed energy of the hit accounting for activeness
        energy_raw: ak.sum(HITS.__edep * HITS.activeness, axis=-1)

        # energy with smearing
        energy: reboost.math.sample_convolve(
            scipy.stats.norm, # resolution distribution
            loc=HITS.energy_raw, # parameters of the distribution (observable to convolve)
            scale=np.sqrt(PARS.a + PARS.b * HITS.energy_raw) # another parameter
            )

        # this is going to return "run lengths" (awkward jargon)
        clusters_lengths: reboost.shape.cluster.naive(
            HITS, # can also pass the exact fields (x, y, z)
            size=1,
            units="mm"
            )

        # example of low level reduction on clusters
        energy_clustered: ak.sum(ak.unflatten(HITS.__edep, HITS.clusters_lengths), axis=-1)

        # example of using a reboost helper
        steps_clustered: reboost.shape.reduction.energy_weighted_average(HITS, HITS.clusters_lengths)

        r90: reboost.hpge.psd.r90(HITS.steps_clustered)

        drift_time: reboost.hpge.psd.drift_time(
            HITS.steps_clustered,
            DETECTOR_OBJECTS.drift_time_map
            )

    # example basic processing of steps in scintillators
    - name: lar
      detector_mapping:
       - output: scintillators

      outputs:
        - evtid
        - tot_edep_wlsr

      operations:
        tot_edep_wlsr: ak.sum(HITS[(HITS.__detuid == 0) & (HITS.__zloc < 3000)].__edep, axis=-1)

    - name: spms

      # by default, reboost looks in the steps input table for a table with the
      # same name as the current detector. This can be overridden for special processors

      detector_mapping:
       - output: OBJECTS.lmeta.channglmap(on=ARGS.timestamp)
        .group("system").spms
        .group("analysis.status").on
        .map("name").keys()
         input: lar

      outputs:
        - t0
        - evtid
        - pe_times

      detector_objects:
         meta: pygeomtools.get_sensvol_metadata(OBJECTS.geometry, DETECTOR)
         optmap_lar: lgdo.lh5.read(DETECTOR, "optmaps/pen", ARGS.optmap_path)
         optmap_pen: lgdo.lh5.read(DETECTOR, "optmaps/lar", ARGS.optmap_path)

      hit_table_layout: reboost.shape.group_by_time(STEPS, window=10)

      operations:
        pe_times_lar: reboost.spms.detected_photoelectrons(
            STEPS,
            DETECTOR_OBJECTS.optmap_lar,
            0
         )

        pe_times_pen: reboost.spms.detected_photoelectrons(
            STEPS,
            DETECTOR_OBJECTS.optmap_pen,
            1
         )

        pe_times: ak.concatenate([HITS.pe_times_lar, HITS.pe_times_pen], axis=-1)

reboost.build_hit.build_hit(config, args, stp_files, glm_files, hit_files, *, start_evtid=0, n_evtid=None, in_field='stp', out_field='hit', buffer=5000000)¶

Build the hit tier from the remage step files.

Parameters:

config (Mapping | str) – dictionary or path to YAML file containing the processing chain.
args (Mapping | AttrsDict) – dictionary or dbetto.AttrsDict of the global arguments.
stp_files (str | list[str]) – list of strings or string of the stp file path.
glm_files (str | list[str]) – list of strings or string of the glm file path.
hit_files (str | list[str] | None) – list of strings or string of the hit file path. The hit file can also be None in which case the hits are returned as an ak.Array in memory.
start_evtid (int) – first evtid to read.
n_evtid (int | None) – number of evtid to read, if None read all.
in_field (str) – name of the input field in the remage output.
out_field (str) – name of the output field
buffer (int) – buffer size for use in the LH5Iterator.

Return type:

None | Array

reboost.build_tcm module¶

reboost.build_tcm.build_tcm(hit_file, out_file, channels, time_name='t0', idx_name='global_evtid', time_window_in_us=10)¶

Build the (Time Coincidence Map) TCM from the hit tier.

Parameters:

hit_file (str) – path to hit tier file.
out_file (str) – output path for tcm.
channels (list[str]) – list of channel names to include.
time_name (str) – name of the hit tier field used for time grouping.
idx_name (str) – name of the hit tier field used for index grouping.
time_window_in_us (float) – time window used to define the grouping.

Return type:

None

reboost.build_tcm.get_tcm_from_ak(hit_data, channels, *, window=10, time_name='t0', idx_name='global_evtid')¶

Builds a time-coincidence map from a hit of hit data Tables.

build an ak.Array of the data merging channels with fields base on “time_name”, and “idx_name” and adding a field rawid from the channel idx, also add the row (hit_idx)
sorts this array by “idx_name” then “time_name” fields
group by “idx_name” and “time_name” based on the window parameter

Parameters:

hit_data (list[Array]) – list of hit tier data for each channel
channels (list[int]) – list of channel indices
window (float) – time window for selecting coincidences (in us)
time_name (str) – name of the field for time information
idx_name (str) – name of the decay index field

Returns:

an LGDO.VectorOfVectors containing the time-coincidence map

Return type:

Table

reboost.cli module¶

reboost.cli.cli(args=None)¶

Return type:: None

reboost.core module¶

reboost.core.evaluate_hit_table_layout(steps, expression, time_dict=None)¶

Evaluate the hit_table_layout expression, producing the hit table.

This expression should be a function call which performs a restructuring of the steps, i.e. it sets the number of rows. The steps array should be referred to by “STEPS” in the expression.

Parameters:

steps (Array | Table) – awkward array or Table of the steps.
expression (str) – the expression to evaluate to produce the hit table.
time_dict (dict | None) – time profiling data structure.

Returns:

lgdo.Table of the hits.

Return type:

Table

reboost.core.evaluate_object(expression, local_dict)¶

Evaluate an expression returning any object.

The expression should be a function call. It can depend on any objects contained in the local dict. In addition, the expression can use packages which are then imported.

Parameters:

expression (str) – the expression to evaluate.
local_dict (dict) – local dictionary to pass to eval().

Returns:

the evaluated object.

Return type:

Any

reboost.core.evaluate_output_column(hit_table, expression, local_dict, *, table_name='HITS', time_dict=None, name=' ')¶

Evaluate an expression returning an LGDO.

Uses lgdo.Table.eval() to compute a new column for the hit table. The expression can depend on any field in the Table (prefixed with table_name.) or objects contained in the local dict. In addition, the expression can use packages which are then imported.

Parameters:

hit_table (Table) – the table containing the hit fields.
expression (str) – the expression to evaluate.
local_dict (dict) – local dictionary to pass to lgdo.Table.eval().
table_name (str) – keyword used to refer to the fields in the table.
time_dict (ProfileDict | None) – time profiling data structure.
name (str) – name to use in time_dict.

Returns:

an LGDO with the new field.

Return type:

LGDO

reboost.core.get_detector_objects(output_detectors, expressions, args, global_objects, time_dict=None)¶

Get the detector objects for each detector.

This computes a set of objects per output detector. These should be the expressions (defined in the expressions input). They can depend on the keywords:

ARGS : in which case values of from the args parameter AttrsDict can be references,
DETECTOR: referring to the detector name (key of the detector mapping)
OBJECTS : The global objects.

For example expressions like:

compute_object(arg=ARGS.first_arg, detector=DETECTOR, obj=OBJECTS.meta)

are supported.

Parameters:

output_detectors (list) – list of output detectors,
expressions (dict) – dictionary of expressions to evaluate.
args (AttrsDict) – any arguments the expression can depend on, is passed as locals to eval().
global_objects (AttrsDict) – a dictionary of objects the expression can depend on.
time_dict (ProfileDict | None) – time profiling data structure.

Returns:

An AttrsDict of the objects for each detector.

Return type:

AttrsDict

reboost.core.get_detectors_mapping(output_detector_expression, objects=None, input_detector_name=None)¶

Extract the output detectors and the list of input to outputs by parsing the expressions.

The output_detector_expression can be a name or a string evaluating to a list of names. This expression can depend on any objects in the objects dictionary, referred to by the keyword “OBJECTS”.

The function produces a dictionary mapping input detectors to output detectors with the following format:

{
    "input1": ["output1", "output2"],
    "input2": ["ouput3", ...],
}

If only output_detector_expression is supplied the mapping is one-to-one (i.e. every input detector maps to the same output detector). If instead a name for the input_detector_name is also supplied this will be the only key with all output detectors being mapped to this.

Parameters:

output_detector_expression (str) – An output detector name or a string evaluating to a list of output tables.
objects (AttrsDict | None) – dictionary of objects that can be referenced in the expression.
input_detector_name (str | None) – Optional input detector name for all the outputs.

Returns:

a dictionary with the input detectors as key and a list of output detectors for each.

Return type:

dict

Examples

For a direct one-to-one mapping:

>>> get_detectors_mapping("[str(i) for i in range(2)]")
{'0':['0'],'1':['1'],'2':['2']}

With an input detector name:

>>> get_detectors_mapping("[str(i) for i in range(2)])",input_detector_name = "dets")
{'dets':['0','1','2']}

With objects:

>>> objs = AttrsDict({"format": "ch"})
>>> get_detectors_mapping("[f'{OBJECTS.format}{i}' for i in range(2)])",
                            input_detector_name = "dets",objects=objs)
{'dets': ['ch0', 'ch1', 'ch2']}

reboost.core.get_global_objects(expressions, *, local_dict, time_dict=None)¶

Extract global objects used in the processing.

Parameters:

expressions (dict[str, str]) – a dictionary containing the expressions to evaluate for each object.
local_dict (dict) – other objects used in the evaluation of the expressions, passed to eval() as the locals keyword.
time_dict (dict | None) – time profiling data structure.

Returns:

dictionary of objects with the same keys as the expressions.

Return type:

dict

reboost.core.merge(hit_table, output_table)¶

Merge the table with the array.

Parameters:

hit_table (Table)
output_table (Array | None)

reboost.core.remove_columns(tab, outputs)¶

Remove columns from the table not found in the outputs.

Parameters:

tab (Table) – the table to remove columns from.
outputs (list) – a list of output fields.

Returns:

the table with columns removed.

Return type:

Table

reboost.iterator module¶

class reboost.iterator.GLMIterator(glm_file, stp_file, lh5_group, start_row, n_rows, *, stp_field='stp', read_vertices=False, buffer=10000, time_dict=None)¶

Bases: object

A class to iterate over the rows of an event lookup map.

Constructor for the glmIterator.

Parameters:

glm_file (str) – the file containing the event lookup map.
stp_file (str) – the file containing the steps to read.
lh5_group (str) – the name of the lh5 group to read.
start_row (int) – the first row to read.
n_rows (int | None) – the number of rows to read, if None read them all.
stp_field (str) – name of the group.
read_vertices (bool) – whether to read also the vertices table.
buffer (int) – the number of rows to read at once.
time_dict (dict | None) – time profiling data structure.

reboost.log_utils module¶

reboost.log_utils.setup_log(level=None, multiproc=False)¶

Setup a colored logger for this package.

Parameters:

level (int | None) – initial log level, or None to use the default.
multiproc (bool) – set to True to include process ID in log output (i.e. for multiprocessing setups)

Return type:

None

reboost.profile module¶

class reboost.profile.ProfileDict(value=None)¶

Bases: AttrsDict

A class to store the results of time profiling.

Construct an AttrsDict object.

Note

The input dictionary is copied.

Parameters:: value (dict | None) – a dict object to initialize the instance with.

_format(data, indent=1)¶

Recursively format the dictionary.

Parameters:

data (ProfileDict) – The dictionary to format.
indent (int) – The current indentation level.

Returns:

the formatted print out.

Return type:

str

update_field(name, time_start)¶

Update the stored time.

Parameters:

name (str) – the name of the field to update. If it contains / this will be interpreted as subdictionaries.
time_start (float) – the starting time of the block to evaluate

Return type:

None

reboost.utils module¶

reboost.utils._check_input_file(parser, file, descr='input')¶

Parameters:

file (str | Iterable[str])
descr (str)

Return type:

None

reboost.utils._check_output_file(parser, file)¶

Parameters:: file (str | Iterable[str])
Return type:: None

reboost.utils._search_string(string)¶

Capture the characters matching the pattern for a function call.

Parameters:: string (str)

reboost.utils.filter_logging(level)¶

reboost.utils.get_file_dict(stp_files, glm_files, hit_files=None)¶

Get the file info as a AttrsDict.

Parameters:

stp_files (list[str] | str)
glm_files (list[str] | str)
hit_files (list[str] | str | None)

Return type:

AttrsDict

reboost.utils.get_file_list(path, threads=None)¶

Get a list of files accounting for the multithread index.

Parameters:

path (str | None)
threads (int | None)

Return type:

list[str]

reboost.utils.get_function_string(expr, aliases=None)¶

Get a function call to evaluate.

Search for any patterns matching the pattern for a function call. We also detect any cases of aliases being used, by default just for numpy as np and awkward as ak. In this case, the full name is replaces with the alias in the expression and also in the output globals dictionary.

It is possible to chain together functions eg:

ak.num(np.array([1, 2]))

and all packages will be imported.

Parameters:

expr (str) – expression to evaluate.
aliases (dict | None) – dictionary of package aliases for names used in dictionary. These allow to give shorter names to packages. This is combined with two defaults ak for awkward and np for numpy. If None is supplied only these are used.

Returns:

a tuple of call string and dictionary of the imported global packages.

Return type:

tuple[str, dict]

reboost.utils.merge_dicts(dict_list)¶

Merge a list of dictionaries, concatenating the items where they exist.

Parameters:: dict_list (list) – list of dictionaries to merge
Returns:: a new dictionary after merging.
Return type:: dict

Examples

>>> merge_dicts([{"a":[1,2,3],"b":[2]},{"a":[4,5,6],"c":[2]}])
{"a":[1,2,3,4,5,6],"b":[2],"c":[2]}