GWOSC API Examples
In this tutorial, we will use the sample API and a segment list to read in some GWOSC data that we know to be of good quality. This tutorial will emphasize reading in data only for target GPS segments.
The method getstrain()
is the main tool in the API for loading
GWOSC data. Instead of taking the file name as input,
getstrain
takes a GPS start and stop time, and uses the GWOSC
file naming convention to find the right file to read in. So, you can download
a lot of GWOSC data files to a directory or a directory tree. Then, you can
tell getstrain
only the times and instrument for which you want
data, and the API will find the files and load the data for you.
- Install ReadLIGO
- Some GWOSC data files from the Bulk Data Archive
Download some data files
The GWOSC API will work with a directory of GWOSC data files. To see how it works, try downloading a few GWOSC data files.
- Go to the Data Archive Query Form
- Choose the H1 detector
- Enter start time 842656000 and end time 842670000
- Click Continue to get a list of data files. You should see 4 data files
- Click HDF5 to download all 4 data files, and save them in the same directory where you have been working.
Collect the data files in a single directory. Then, start the python interpreter (or Canopy) and make sure your working directory is the same directory where you stored these files.
Load all data from a single file
The command loaddata
can be used to load all data from a single
GWOSC data file.
#----------------------------------------------------------------
# Load all GWOSC data from a single file
#----------------------------------------------------------------
strain, time, dq = rl.loaddata('ligo_data/H-H1_LOSC_4_V1-842653696-4096.hdf5', 'H1')
- STRAIN is a numpy array of strain data between the input GPS times
- TIME is a numpy array of GPS times of each sample, corresponding to STRAIN.
- DQ is a dictionary of data quality flags.
Including the optional argument tvec=False
will result in
returning a dictionary of gps start, stop, and sample time instead of the TIME
vector. So, with tvec
set to False, the return value is (STRAIN,
META, DQ).
To loop over segments of usable strain data, you can do the following:
slice_list = rl.dq_channel_to_seglist(dq['DATA'])
for slice in slice_list:
time_seg = time[slice]
strain_seg = strain[slice]
# -- Process strain segment here
Basic use case: Load data using a segment list
getstrain()
and getsegs()
will not work
for releases of individual events. To work with released events, see the above
example.
A common task is to read in data associated with each segment, and then perform some analysis on that data. The API is designed to make this easy.
import readligo as rl
import numpy as np
start = 842656000
stop = 842670000
segs = rl.getsegs(start, stop, 'H1')
for (begin, end) in segs:
strain, meta, dq = rl.getstrain(begin, end, 'H1')
#-- Your analysis code goes here
This default configuration assumes that the needed GWOSC data files are
available in the current working directory or a subdirectory. The variable
strain
is now a numpy array of strain values, between gps
meta['start']
and meta['stop']
. The time between
samples (in seconds) can be seen in the variable meta['dt']
.
Currently, GWOSC data is sampled at 4096 Hz.
The return values are as follows:
- STRAIN is a numpy array of strain data between the input GPS times
- META is a dictionary of gps start, gps stop, and the sample time.
- DQ is a dictionary of data quality flags.
Segment Lists
The "Basic Use Case" example above loops over segments in a segment list. The
segment list is constructed with the getsegs()
method. By
default, getsegs()
returns Science Mode segments. However, the
optional argument flag
may be used to return segments based on
any data quality flag:
segs = rl.getsegs(start, stop, 'H1', flag='BURST_CAT2')
Alternatively, a segment list may be downloaded from the GWOSC web site using the Timeline Query Form. A segment list as an ASCII text file may be read in using the SegmentList class.
segs = rl.SegmentList('H1_segs.txt')
It is also possible to construct a segment list by combining 2 or more
channels from a DQ dictionary. The data quality channels contained in the DQ
dictionary are sampled at 1 Hz, and may be easily combined with logical
operators. The method dq2segs
will convert a 1 Hz DQ channel to a
segment list.
# -- Construct segment list based on multiple DQ channels
strain, time, dq = rl.loaddata('ligo_data/H-H1_LOSC_4_V1-842653696-4096.hdf5', 'H1')
bcat3 = dq['BURST_CAT3']
cbccat3 = dq['CBCLOW_CAT3']
clean = bcat3 & cbccat3
segs = rl.dq2segs(clean, time[0])
File Lists
The LOSC API uses the class FileList
to identify GWOSC data files that are stored locally. By default, the current
working directory and sub-directories are searched for GWOSC data files. If
the GWOSC data files are in another directory tree, first construct a
FileList, and then pass it as the parameter
filelist
to getsegs
and getstrain
.
filelist = rl.FileList(directory='/home/ligodata')
segList = getsegs(842657792, 842658792, 'H1', filelist=filelist)
for start, stop in segList:
strain, meta, dq = getstrain(start, stop, 'H1', filelist=filelist)
# -- Analysis code here
For large data sets, it is also possible to save a file list as a cache file
with the method filelist.writecache('cache.txt')
. A previously
saved cache file may be loaded as a working FileList object by passing the
cache file name to the FileList constructor:
filelist = rl.FileList(cache='cache.txt')
An existing FileList may be searched for a file containing a given GPS time
using the method FileList.findfile()
gps = 842656000
filename = filelist.findfile(gps, 'H1')
print "The file {0} contains time {1} for IFO H1".format(filename, gps)
Working with channel names
Some GWF files require additional parameters to name the strain, DQ, and hardware injection channel:
strain, time, dq = rl.loaddata('H-H1_LOSC_16_V1-1127415808-4096.gwf', 'H1',
strain_chan='H1:GWOSC-16KHZ_R1_STRAIN',
dq_chan='H1:GWOSC-16KHZ_R1_DQMASK',
inj_chan='H1:GWOSC-16KHZ_R1_INJMASK')
These channel names may be found on the documentation page for each data set: e.g. O1 documentation.
That's it! Additional documentation is available in the ReadLIGO homepage. You can also download the examples shown here as api_examples.py.