reboost package¶

Subpackages¶

Submodules¶

reboost.build_evt module¶

A program for combining the hits from various detectors, to build events.

Is able to parse a config file with the following format config file:

channels:
    geds_on:
    - det001
    - det002
    geds_ac:
    - det003

outputs:
- energy
- multiplicity

operations:
 energy_id:
    channels: geds_on
    aggregation_mode: gather
    query: "hit.energy > 25"
    expression: tcm.channel_id

 energy:
    aggregation_mode: keep_at_ch:evt.energy_id
    expression: "hit.energy > 25"
    channels: geds_on

 multiplicity:
    channels: geds_on
    aggregation_mode: sum
    expression: "hit.energy > 25"
    initial: 0

Must contain: - “channels”: dictionary of channel groupings - “outputs”: fields for the output file - “operations”: operations to perform see pygama.evt.build_evt.evaluate_expression() for more details.

reboost.build_evt.build_evt(hit_file, tcm_file, evt_file, config, buffer=5000000)¶

Generates the event tier from the hit and tcm.

Parameters:

hit_file (str) – path to the hit tier file
tcm_file (str) – path to the tcm tier file
evt_file (str | None) – path to the evt tier (output) file, if None the Table is returned in memory
config (dict) – dictionary of the configuration.
buffer (int) – number of events to process simultaneously

Returns:

ak.Array of the evt tier data (if the data is not saved to disk)

Return type:

Array | None

reboost.build_glm module¶

reboost.build_glm.build_glm(stp_files, glm_files, lh5_groups=None, *, out_table_name='glm', id_name='g4_evtid', evtid_buffer=10000000, stp_buffer=10000000)¶

Builds a g4_evtid look up (glm) from the stp data.

This object is used by reboost to efficiency iterate through the data. It consists of a lgdo.VectorOfVectors for each lh5_table in the input files. The rows of this lgdo.VectorOfVectors correspond to the id_name while the data are the stp indices for this event.

Parameters:

stp_files (str | list[str]) – path to the stp (input) file.
glm_files (str | list[str] | None) – path to the glm data, can also be None in which case an ak.Array is returned in memory.
out_table_name (str) – name for the output table.
id_name (str) – name of the evtid file, default g4_evtid.
stp_buffer (int) – the number of rows of the step file to read at a time
evtid_buffer (int) – the number of evtids to read at a time
lh5_groups (list | None)

Returns:

either None or an ak.Array

Return type:

Array | None

reboost.build_glm.get_glm_rows(stp_evtids, vert, *, start_row=0)¶

Get the rows of the Geant4 event lookup map (glm).

Parameters:

stp_evtids (_Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes]) – Array of evtids for the steps
vert (_Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes]) – Array of simulated evtid for the vertices.
start_row (int) – The index of the first element of stp_evtids.

Returns:

an awkward array of the glm.

Return type:

Array

reboost.build_glm.get_stp_evtids(lh5_table, stp_file, id_name, start_row, last_vertex_evtid, stp_buffer)¶

Extracts the rows of a stp file corresponding to a particular range of evtid.

The reading starts at start_row to allow for iterating through the file. The iteration stops when the evtid being read is larger than last_vertex_evtid.

Parameters:

lh5_table (str) – the table name to read.
stp_file (str) – the file name path.
id_name (str) – the name of the evtid field.
start_row (int) – the row to begin reading.
last_vertex_evtid (int) – the last evtid to read up to.
stp_buffer (int) – the number of rows to read at once.

Returns:

a tuple of the updated start_row, the first row for the chunk and an awkward Array of the steps.

Return type:

tuple[int, int, Array]

reboost.build_hit module¶

Routines to build the hit tier from the stp tier.

A build_hit() to parse the following configuration file:

# dictionary of objects useful for later computation. they are constructed with
# auxiliary data (e.g. metadata). They can be accessed later as OBJECTS (all caps)
objects:
 lmeta: LegendMetadata(ARGS.legendmetadata)
 geometry: pyg4ometry.load(ARGS.gdml)
 user_pars: dbetto.TextDB(ARGS.par)
 dataprod_pars: dbetto.TextDB(ARGS.dataprod_cycle)

# processing chain is defined to act on a group of detectors
processing_groups:

    # start with HPGe stuff, give it an optional name
    - name: geds

      # this is a list of included detectors (part of the processing group)
      detector_mapping:
        - output: OBJECTS.lmeta.channelmap(on=ARGS.timestamp)
         .group('system').geds
         .group('analysis.status').on
         .map('name').keys()

      # which columns we actually want to see in the output table
      outputs:
         - t0
         - evtid
         - energy
         - r90
         - drift_time

      # in this section we define objects that will be instantiated at each
      # iteration of the for loop over input tables (i.e. detectors)
      detector_objects:
         # The following assumes that the detector metadata is stored in the GDML file
         pyobj: legendhpges.make_hpge(pygeomtools.get_sensvol_metadata(OBJECTS.geometry, DETECTOR))
         phyvol: OBJECTS.geometry.physical_volume_dict[DETECTOR]
         drift_time_map: lgdo.lh5.read(DETECTOR, ARGS.dtmap_file)

      # finally, the processing chain
      operations:

        t0: ak.fill_none(ak.firsts(HITS.time, axis=-1), np.nan)

        evtid: ak.fill_none(ak.firsts(HITS.__evtid, axis=-1), np.nan)

        # distance to the nplus surface in mm
        distance_to_nplus_surface_mm: reboost.hpge.distance_to_surface(
            HITS.__xloc, HITS.__yloc, HITS.__zloc,
            DETECTOR_OBJECTS.pyobj,
            DETECTOR_OBJECTS.phyvol.position.eval(),
            surface_type='nplus')

        # activness based on FCCD (no TL)
        activeness: ak.where(
            HITS.distance_to_nplus_surface_mm <
                lmeta.hardware.detectors.germanium.diodes[DETECTOR].characterization.combined_0vbb_fccd_in_mm.value,
            0,
            1
            )

        activeness2: reboost.math.piecewise_linear(
            HITS.distance_to_nplus_surface_mm,
            PARS.tlayer[DETECTOR].start_in_mm,
            PARS.fccd_in_mm,
            )

        # summed energy of the hit accounting for activeness
        energy_raw: ak.sum(HITS.__edep * HITS.activeness, axis=-1)

        # energy with smearing
        energy: reboost.math.sample_convolve(
            scipy.stats.norm, # resolution distribution
            loc=HITS.energy_raw, # parameters of the distribution (observable to convolve)
            scale=np.sqrt(PARS.a + PARS.b * HITS.energy_raw) # another parameter
            )

        # this is going to return "run lengths" (awkward jargon)
        clusters_lengths: reboost.shape.cluster.naive(
            HITS, # can also pass the exact fields (x, y, z)
            size=1,
            units="mm"
            )

        # example of low level reduction on clusters
        energy_clustered: ak.sum(ak.unflatten(HITS.__edep, HITS.clusters_lengths), axis=-1)

        # example of using a reboost helper
        steps_clustered: reboost.shape.reduction.energy_weighted_average(HITS, HITS.clusters_lengths)

        r90: reboost.hpge.psd.r90(HITS.steps_clustered)

        drift_time: reboost.hpge.psd.drift_time(
            HITS.steps_clustered,
            DETECTOR_OBJECTS.drift_time_map
            )

    # example basic processing of steps in scintillators
    - name: lar
      detector_mapping:
       - output: scintillators

      outputs:
        - evtid
        - tot_edep_wlsr

      operations:
        tot_edep_wlsr: ak.sum(HITS[(HITS.__detuid == 0) & (HITS.__zloc < 3000)].__edep, axis=-1)

    - name: spms

      # by default, reboost looks in the steps input table for a table with the
      # same name as the current detector. This can be overridden for special processors

      detector_mapping:
       - output: OBJECTS.lmeta.channglmap(on=ARGS.timestamp)
        .group("system").spms
        .group("analysis.status").on
        .map("name").keys()
       - input: lar

      outputs:
        - t0
        - evtid
        - pe_times

      detector_objects:
         meta: pygeomtools.get_sensvol_metadata(OBJECTS.geometry, DETECTOR)
         optmap_lar: lgdo.lh5.read(DETECTOR, "optmaps/pen", ARGS.optmap_path)
         optmap_pen: lgdo.lh5.read(DETECTOR, "optmaps/lar", ARGS.optmap_path)

      hit_table_layout: reboost.shape.group_by_time(STEPS, window=10)

      operations:
        pe_times_lar: reboost.spms.detected_photoelectrons(
            STEPS,
            DETECTOR_OBJECTS.optmap_lar,
            0
         )

        pe_times_pen: reboost.spms.detected_photoelectrons(
            STEPS,
            DETECTOR_OBJECTS.optmap_pen,
            1
         )

        pe_times: ak.concatenate([HITS.pe_times_lar, HITS.pe_times_pen], axis=-1)

# can list here some lh5 objects that should just be forwarded to the
# output file, without any processing
forward:
  - /vtx
  - /some/dataset

reboost.build_hit.build_hit(config, args, stp_files, glm_files, hit_files, *, start_evtid=0, n_evtid=None, out_field='hit', buffer=5000000, overwrite=False)¶

Build the hit tier from the remage step files.

Parameters:

config (Mapping | str) – dictionary or path to YAML file containing the processing chain.
args (Mapping | AttrsDict) – dictionary or dbetto.AttrsDict of the global arguments.
stp_files (str | list[str]) – list of strings or string of the stp file path.
glm_files (str | list[str] | None) – list of strings or string of the glm file path, if None will be build in memory.
hit_files (str | list[str] | None) – list of strings or string of the hit file path. The hit file can also be None in which case the hits are returned as an ak.Array in memory.
start_evtid (int) – first evtid to read.
n_evtid (int | None) – number of evtid to read, if None read all.
out_field (str) – name of the output field
buffer (int) – buffer size for use in the LH5Iterator.
overwrite (bool) – flag to overwrite the existing output.

Return type:

None | Array

reboost.cli module¶

reboost.cli.cli(args=None)¶

Return type:: None

reboost.core module¶

reboost.core._get_table_keys(tab)¶

Get keys in a table.

Parameters:: tab (Table)

reboost.core._remove_col(field, tab)¶

Remove column accounting for nesting.

Parameters:

field (str)
tab (Table)

reboost.core.add_field_with_nesting(tab, col, field)¶

Add a field handling the nesting.

Parameters:

tab (Table)
col (str)
field (LGDO)

Return type:

Table

reboost.core.evaluate_hit_table_layout(steps, expression, time_dict=None)¶

Evaluate the hit_table_layout expression, producing the hit table.

This expression should be a function call which performs a restructuring of the steps, i.e. it sets the number of rows. The steps array should be referred to by “STEPS” in the expression.

Parameters:

steps (Array | Table) – awkward array or Table of the steps.
expression (str) – the expression to evaluate to produce the hit table.
time_dict (dict | None) – time profiling data structure.

Returns:

lgdo.Table of the hits.

Return type:

Table

reboost.core.evaluate_object(expression, local_dict)¶

Evaluate an expression returning any object.

The expression should be a function call. It can depend on any objects contained in the local dict. In addition, the expression can use packages which are then imported.

Parameters:

expression (str) – the expression to evaluate.
local_dict (dict) – local dictionary to pass to eval().

Returns:

the evaluated object.

Return type:

Any

reboost.core.evaluate_output_column(hit_table, expression, local_dict, *, table_name='HITS', time_dict=None, name=' ')¶

Evaluate an expression returning an LGDO.

Uses lgdo.Table.eval() to compute a new column for the hit table. The expression can depend on any field in the Table (prefixed with table_name.) or objects contained in the local dict. In addition, the expression can use packages which are then imported.

Parameters:

hit_table (Table) – the table containing the hit fields.
expression (str) – the expression to evaluate.
local_dict (dict) – local dictionary to pass to lgdo.Table.eval().
table_name (str) – keyword used to refer to the fields in the table.
time_dict (ProfileDict | None) – time profiling data structure.
name (str) – name to use in time_dict.

Returns:

an LGDO with the new field.

Return type:

LGDO

reboost.core.get_detector_mapping(detector_mapping, global_objects)¶

Get all the detector mapping using get_one_detector_mapping().

Parameters:

detector_mapping (dict) – dictionary of detector mapping
global_objects (AttrsDict) – dictionary of global objects to use in evaluating the mapping.

Return type:

dict

reboost.core.get_detector_objects(output_detectors, expressions, args, global_objects, time_dict=None)¶

Get the detector objects for each detector.

This computes a set of objects per output detector. These should be the expressions (defined in the expressions input). They can depend on the keywords:

ARGS : in which case values of from the args parameter AttrsDict can be references,
DETECTOR: referring to the detector name (key of the detector mapping)
OBJECTS : The global objects.

For example expressions like:

compute_object(arg=ARGS.first_arg, detector=DETECTOR, obj=OBJECTS.meta)

are supported.

Parameters:

output_detectors (list) – list of output detectors,
expressions (dict) – dictionary of expressions to evaluate.
args (AttrsDict) – any arguments the expression can depend on, is passed as locals to eval().
global_objects (AttrsDict) – a dictionary of objects the expression can depend on.
time_dict (ProfileDict | None) – time profiling data structure.

Returns:

An AttrsDict of the objects for each detector.

Return type:

AttrsDict

reboost.core.get_global_objects(expressions, *, local_dict, time_dict=None)¶

Extract global objects used in the processing.

Parameters:

expressions (dict[str, str]) – a dictionary containing the expressions to evaluate for each object.
local_dict (dict) – other objects used in the evaluation of the expressions, passed to eval() as the locals keyword.
time_dict (dict | None) – time profiling data structure.

Returns:

dictionary of objects with the same keys as the expressions.

Return type:

AttrsDict

reboost.core.get_one_detector_mapping(output_detector_expression, objects=None, input_detector_name=None)¶

Extract the output detectors and the list of input to outputs by parsing the expressions.

The output_detector_expression can be a name or a string evaluating to a list of names. This expression can depend on any objects in the objects dictionary, referred to by the keyword “OBJECTS”.

The function produces a dictionary mapping input detectors to output detectors with the following format:

{
    "input1": ["output1", "output2"],
    "input2": ["ouput3", ...],
}

If only output_detector_expression is supplied the mapping is one-to-one (i.e. every input detector maps to the same output detector). If instead a name for the input_detector_name is also supplied this will be the only key with all output detectors being mapped to this.

Parameters:

output_detector_expression (str | list) – An output detector name or a string evaluating to a list of output tables.
objects (AttrsDict | None) – dictionary of objects that can be referenced in the expression.
input_detector_name (str | None) – Optional input detector name for all the outputs.

Returns:

a dictionary with the input detectors as key and a list of output detectors for each.

Return type:

dict

Examples

For a direct one-to-one mapping:

>>> get_detectors_mapping("[str(i) for i in range(2)]")
{'0':['0'],'1':['1'],'2':['2']}

With an input detector name:

>>> get_detectors_mapping("[str(i) for i in range(2)])",input_detector_name = "dets")
{'dets':['0','1','2']}

With objects:

>>> objs = AttrsDict({"format": "ch"})
>>> get_detectors_mapping("[f'{OBJECTS.format}{i}' for i in range(2)])",
                            input_detector_name = "dets",objects=objs)
{'dets': ['ch0', 'ch1', 'ch2']}

reboost.core.merge(hit_table, output_table)¶

Merge the table with the array.

Parameters:

hit_table (Table)
output_table (Array | None)

reboost.core.remove_columns(tab, outputs)¶

Remove columns from the table not found in the outputs.

Parameters:

tab (Table) – the table to remove columns from.
outputs (list) – a list of output fields.

Returns:

the table with columns removed.

Return type:

Table

reboost.iterator module¶

class reboost.iterator.GLMIterator(glm_file, stp_file, lh5_group, start_row, n_rows, *, stp_field='stp', buffer=10000, time_dict=None, reshaped_files=False)¶

Bases: object

A class to iterate over the rows of an event lookup map.

Constructor for the GLMIterator.

The GLM iterator provides a way to iterate over the simulated geant4 evtids, extracting the number of hits or steps for each range in evtids. This ensures a single simulated event is not split between two iterations and allows to specify a start and an end evtid to extract.

In case the data is already reshaped and we do not need to read a specific range of evtids this iterator is just loops over the input stp field. Otherwise if the GLM file is not provided this is created in memory.

Parameters:

glm_file (str | None) – the file containing the event lookup map, if None the glm will be created in memory if needed.
stp_file (str) – the file containing the steps to read.
lh5_group (str) – the name of the lh5 group to read.
start_row (int) – the first row to read.
n_rows (int | None) – the number of rows to read, if None read them all.
stp_field (str) – name of the group.
buffer (int) – the number of rows to read at once.
time_dict (dict | None) – time profiling data structure.
reshaped_files (bool) – flag for whether the files are reshaped.

get_n_rows()¶: Get the number of rows to read.

reboost.log_utils module¶

reboost.log_utils.setup_log(level=None, multiproc=False)¶

Setup a colored logger for this package.

Parameters:

level (int | None) – initial log level, or None to use the default.
multiproc (bool) – set to True to include process ID in log output (i.e. for multiprocessing setups)

Return type:

None

reboost.profile module¶

class reboost.profile.ProfileDict(value=None)¶

Bases: AttrsDict

A class to store the results of time profiling.

Construct an AttrsDict object.

Note

The input dictionary is copied.

Parameters:: value (dict | None) – a dict object to initialize the instance with.

_format(data, indent=1)¶

Recursively format the dictionary.

Parameters:

data (ProfileDict) – The dictionary to format.
indent (int) – The current indentation level.

Returns:

the formatted print out.

Return type:

str

update_field(name, time_start)¶

Update the stored time.

Parameters:

name (str) – the name of the field to update. If it contains / this will be interpreted as subdictionaries.
time_start (float) – the starting time of the block to evaluate

Return type:

None

reboost.units module¶

reboost.units.pg4_to_pint(obj)¶

Convert pyg4ometry object to pint Quantity.

Return type:: Quantity

reboost.units.unit_to_lh5_attr(unit)¶

Convert Pint unit to a string that can be used as attrs[“units”] in an LGDO.

Parameters:: unit (Unit)
Return type:: str

reboost.units.units_convfact(data, target_units)¶

Calculate numeric conversion factor to reach target_units.

Parameters:

data (Any) – starting data structure. If an LGDO, try to determine units by peeking into its attributes. Otherwise, just return 1.
target_units (pint.Units) – units you wish to convert data to.

Return type:

float

reboost.units.unwrap_lgdo(data, library='ak')¶

Return a view of the data held by the LGDO and its physical units.

Parameters:

data (Any) – the data container. If not an LGDO, it will be returned as is with None units.
library (str) – forwarded to lgdo.view_as().

Returns:

A tuple of the un-lgdo’d data and the data units.

Return type:

tuple(Any, pint.Unit | None)

reboost.units.ureg = <pint.registry.ApplicationRegistry object>¶: The physical units registry.

reboost.utils module¶

reboost.utils._check_input_file(parser, file, descr='input')¶

Parameters:

file (str | Iterable[str])
descr (str)

Return type:

None

reboost.utils._check_output_file(parser, file, optional=False)¶

Parameters:

file (str | Iterable[str] | None)
optional (bool)

Return type:

None

reboost.utils._search_string(string)¶

Capture the characters matching the pattern for a function call.

Parameters:: string (str)

reboost.utils.assign_units(tab, units)¶

Copy the attributes from the map of attributes to the table.

Parameters:

tab (Table) – Table to add attributes to.
units (Mapping) – mapping (dictionary like) of units of each field

Returns:

an updated table with LGDO attributes.

Return type:

Table

reboost.utils.copy_units(tab)¶

Extract a dictionary of attributes (i.e. units).

Parameters:: tab (Table) – Table to get the units from.
Returns:: a dictionary with the units for each field in the table.
Return type:: dict

reboost.utils.filter_logging(level)¶

reboost.utils.get_channels_from_groups(names, groupings=None)¶

Get a list of channels from a list of groups.

Parameters:

names (list | str | None) – list of channel names
groupings (dict | None) – dictionary of the groupings of channels

Returns:

list of channels

Return type:

list

reboost.utils.get_file_dict(stp_files, glm_files, hit_files=None)¶

Get the file info as a AttrsDict.

Creates an dbetto.AttrsDict with keys stp_files, glm_files and hit_files. Each key contains a list of file-paths (or None).

Parameters:

stp_files (list[str] | str) – string or list of strings of the stp files.
glm_files (list[str] | str | None) – string or list of strings of the glm files, or None in which case the glm will be created in memory.
hit_files (list[str] | str | None) – string or list of strings of the hit files, if None the output files will be created in memory.

Return type:

AttrsDict

reboost.utils.get_file_list(path, threads=None)¶

Get a list of files accounting for the multithread index.

Parameters:

path (str | None)
threads (int | None)

Return type:

list[str]

reboost.utils.get_function_string(expr, aliases=None)¶

Get a function call to evaluate.

Search for any patterns matching the pattern for a function call. We also detect any cases of aliases being used, by default just for numpy as np and awkward as ak. In this case, the full name is replaces with the alias in the expression and also in the output globals dictionary.

It is possible to chain together functions eg:

ak.num(np.array([1, 2]))

and all packages will be imported.

Parameters:

expr (str) – expression to evaluate.
aliases (dict | None) – dictionary of package aliases for names used in dictionary. These allow to give shorter names to packages. This is combined with two defaults ak for awkward and np for numpy. If None is supplied only these are used.

Returns:

a tuple of call string and dictionary of the imported global packages.

Return type:

tuple[str, dict]

reboost.utils.get_wo_mode(group, out_det, in_det, chunk, new_hit_file, overwrite=False)¶

Get the mode for lh5 file writing.

Parameters:

group (int)
out_det (int)
in_det (int)
chunk (int)
new_hit_file (bool)
overwrite (bool)

reboost.utils.merge_dicts(dict_list)¶

Merge a list of dictionaries, concatenating the items where they exist.

Parameters:: dict_list (list) – list of dictionaries to merge
Returns:: a new dictionary after merging.
Return type:: dict

Examples

>>> merge_dicts([{"a":[1,2,3],"b":[2]},{"a":[4,5,6],"c":[2]}])
{"a":[1,2,3,4,5,6],"b":[2],"c":[2]}

reboost.utils.write_lh5(hit_table, file, time_dict, out_field, out_detector, wo_mode)¶

Write the lh5 file. This function handles writing first the data as a struct and then appending to this.

Parameters:

hit_table (Table) – the table to write
file (str) – the file to write to
time_dict (ProfileDict) – the dictionary of timing information to update.
out_field (str) – output field
out_detector (str) – output detector name
wo_mode (str) – the mode to pass to lh5.write