GitHub - stevedh/readingdb: readingdb time series database

stevedh / readingdb Public

Notifications You must be signed in to change notification settings
Fork 11
Star 34

readingdb time series database

BSD-2-Clause, Unknown licenses found

Licenses found

34 stars 11 forks Branches Tags Activity

Star

Notifications

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 120 Commits
c6		c6
debian		debian
iface_bin		iface_bin
AUTHORS		AUTHORS
COPYING		COPYING
ChangeLog		ChangeLog
INSTALL		INSTALL
LICENSE		LICENSE
Makefile.am		Makefile.am
NEWS		NEWS
README		README
configure.ac		configure.ac
s3backup.sh		s3backup.sh

Repository files navigation

readingdb is a time-series database designed for efficiency and speed.

Time series data is any data which is a series of (time, sequence,
value) points.  ReadingDB buckets and compresses your data using a
delta encoding and zlib, and then writes this into a bdb installation
with a bdb index.  It uses the bdb transaction manager for write-ahead
logging so that the volume will not become corrupted.

To use it, first follow the instructions in INSTALL to build the
database and the python interface module.  The key objects in
readingdb are "streams", which have an integer id.  The readingdb
python module lets you talk to the server process; traffic between the
client and server is encoded using google protocol buffer definitions
found in c6/pbuf.

A simple python script for inserting data would be:

--
import readingdb as rdb

# specify default host/port
rdb.db_setup('localhost', 4242)

# create a connection
db = rdb.db_open('localhost')

# add data.  the tuples are  (timestamp, seqno, value)
rdb.db_add(db, 1, [(x, 0, x) for x in xrange(0, 100)])

# read back the data we just wrote using the existing connection
# the args are streamid, start_timestamp, end_timestamp
print rdb.db_query(1, 0, 100, conn=db)
# close
rdb.db_close(db)

# read back the data again using a connection pool this time.  You can
# specify a list of streamids to range-query multiple streams at once.
rdb.db_query([1], 0, 100) 
--

ReadingDB supports efficient range querying and interation using the
db_query, db_prev, and db_next operations; you can delete data with
db_del.

As you can see, db_query can re-use an existing connection if desired.
As of April 20, 2012, the result of querying the TSDB is returned as a
list of numpy matrices.  This is because this is a very
memory-efficient data structure, and creating numpy matrices using the
c API is a lot more efficient than later doing it in python.  

If no connection is specified for db_query/db_prev/db_next, new
connections will be opened to the host/port specified with db_setup.
The most efficient way of downloading data from a large number of
streams is to specify a list of streamids, which allows the client
library to conduct multiple parallel downloads of the data.  Using
this approach we have observed readingdb easily saturating a 100Mb
NIC.

Sketches
--------

As of 0.7.0 (October, 2014), readingdb supports computing sketches
over the data in order to allow for more interactive performance on
very large data sets.  As of this release, it supports precomputing
min, max, mean, and count at 5-minute, 15-minute, and 1-hour resolutions.

By default, the behavor is the same as before.  If sketches are
enabled on reading-server (by starting it with the -r flag), it will
stream a log of regions of streams which have new data to disk; these
are the places where the sketches should be updated.

The sketches may be updated using the new reading-sketch program; when
run, it reads in the log, computes new sketches for the regions of
time which have changed, and exits.  This should be called
periodically (e.g., by cron); the debian package installes a disabled
crontab for this purpose into /etc/cron.d/readingdb.

Clients may request sketches through the new sketch kwarg to rb_query.
For instance:

-- 
# load hourly minimia from stream number 2, over all time.
min_data = rdb.db_query([2], 0, int(time.time()), sketch=("min", 3600))[0]
--

The value should be a tuple of (sketchname, resolution (in seconds)).
The server will raise an exception on the client if an invalid sketch
is passed.  

Note: older servers may instead return the underlying data.  older
client libraries will raise an exception since sketch is not a valid
kwarg there.

Dependencies
------------

For reading-server
   libdb4.8, libdb4.8-dev (berkeley database)
   libprotobuf, libprotobuf-dev (google protocol buffers)
   libprotoc6, libprotoc6-dev (c bindings for protobufs)
   zlib, zlib-dev (for compression)
   gcc, make, automake

For python bindings
   python, python-dev, python-numpy (python deps)
   libdb4.8-dev
   swig (interface generator)