Skip to content

Select runs and get values

Dmitry Romanov edited this page Mar 15, 2017 · 21 revisions

Contents:


To experiment with the examples on this page, one can download daily recreated SQLite database: https://halldweb.jlab.org/dist/rcdb.sqlite

Using connection string:

sqlite:///<path to file>/rcdb.sqlite

Or connect to readonly mysql:

mysql://rcdb@hallddb.jlab.org/rcdb


Selecting runs and getting values


Get values

Suppose, one wants to get all event_count-s and beam_current-s for production runs:

import rcdb

# Connect to database
db = rcdb.RCDBProvider("mysql://rcdb@hallddb.jlab.org/rcdb")

# Select runs and get values
table = db.select_runs("@is_production")\
          .get_values(['event_count', 'beam_current'], insert_run_number=True)
print table

As the result one gets something like:

[
[1023, 3984793, 0.145]
[1024, 4569873, 0.230]
...
]

The first column is a run number (we set insert_run_number=True above). The other two columns are 'event_count' and 'beam_current' as we gave it above.

If run number is not needed insert_run_number may be skipped:

table = db.select_runs("@is_production")
          .get_values(['event_count', 'beam_current'])

A nice way to iterate the values:

for row in table:
    event_count, beam_current = tuple(row)
    print event_count, beam_current

Run range

If one wants to apply a run range, say for a particular run period:

table = db.select_runs("@is_production", 10000, 20000)\
          .get_values(['event_count', 'beam_current'], True)

No filtration

To get values for all runs without filtration a search pattern may be skipped:

table = db.select_runs(run_min=10000, run_max=20000)\
               .get_values(['event_count', 'beam_current'], insert_run_number=True)

(note that parameter names are used here, so the python could figure function parameters out)


Sort order

The table is always sorted by run number. It is just a 'feature' of getting runs DB query (that is under the hood). However, the order in with run numbers are sorted could be changed:

table = db.select_runs(run_min=10000, run_max=20000, sort_desc=True)\
               .get_values(['event_count', 'beam_current'], insert_run_number=True)

sort_desc=True - makes rows to be sorted by descending run_number



Iterating over runs and getting conditions


Getting runs

select_runs function returns RunSelectionResult object that contains all selected runs and some other information about how the runs where selected. The RunSelectionResult implements list interface returning ```Run`-s. Thus one can do:

import rcdb
db = rcdb.RCDBProvider("mysql://rcdb@hallddb.jlab.org/rcdb")
result = db.select_runs("@is_production")
for run in result:
    print run.number

As one could guess the selected run numbers are printed as the result.


Get any condition of the run

Run has the next useful functions:

def get_conditions_by_name(self):              
    # Get all conditions and returns dictionary of condition.name -> condition
def get_condition(self, condition_name):       
    # Gets Condition object by name if such name condition exist or None
def get_condition_value(self, condition_name): 
    # Gets the condition value if such condition exist or None

So one can iterate selected runs and get any desired condition:

import rcdb
db = rcdb.RCDBProvider("mysql://rcdb@hallddb.jlab.org/rcdb")
result = db.select_runs("@is_production")
for run in result:
    print run.get_condition_value('event_count')



Performance

In the performance point of view, the fastest way to get values by using

db.select_runs(...).get_values(...)

Because get_values makes just a single database call to retrieve all values for selected runs.

In case of iterating:

result = db.select_runs("@is_production")
for run in result:
    print run.get_condition_value('event_count')

Database is queried on each get_condition_value