GWOSC API Examples

In this tutorial, we will use the sample API and a segment list to read in some GWOSC data that we know to be of good quality. This tutorial will emphasize reading in data only for target GPS segments.

The method getstrain() is the main tool in the API for loading GWOSC data. Instead of taking the file name as input, getstrain takes a GPS start and stop time, and uses the GWOSC file naming convention to find the right file to read in. So, you can download a lot of GWOSC data files to a directory or a directory tree. Then, you can tell getstrain only the times and instrument for which you want data, and the API will find the files and load the data for you.

You will need for this tutorial:

Install ReadLIGO
Some GWOSC data files from the Bulk Data Archive

Download some data files

The GWOSC API will work with a directory of GWOSC data files. To see how it works, try downloading a few GWOSC data files.

Go to the Data Archive Query Form
Choose the H1 detector
Enter start time 842656000 and end time 842670000
Click Continue to get a list of data files. You should see 4 data files
Click HDF5 to download all 4 data files, and save them in the same directory where you have been working.

Collect the data files in a single directory. Then, start the python interpreter (or Canopy) and make sure your working directory is the same directory where you stored these files.

Load all data from a single file

The command loaddata can be used to load all data from a single GWOSC data file.

#----------------------------------------------------------------
# Load all GWOSC data from a single file 
#----------------------------------------------------------------
strain, time, dq = rl.loaddata('ligo_data/H-H1_LOSC_4_V1-842653696-4096.hdf5', 'H1')

STRAIN is a numpy array of strain data between the input GPS times
TIME is a numpy array of GPS times of each sample, corresponding to STRAIN.
DQ is a dictionary of data quality flags.

Including the optional argument tvec=False will result in returning a dictionary of gps start, stop, and sample time instead of the TIME vector. So, with tvec set to False, the return value is (STRAIN, META, DQ).

To loop over segments of usable strain data, you can do the following:

slice_list = rl.dq_channel_to_seglist(dq['DATA'])
for slice in slice_list:
    time_seg = time[slice]
    strain_seg = strain[slice]
    # -- Process strain segment here

Basic use case: Load data using a segment list

The methods getstrain() and getsegs() will not work for releases of individual events. To work with released events, see the above example.

A common task is to read in data associated with each segment, and then perform some analysis on that data. The API is designed to make this easy.

import readligo as rl
import numpy as np
start = 842656000
stop  = 842670000

segs = rl.getsegs(start, stop, 'H1')
for (begin, end) in segs:
    strain, meta, dq = rl.getstrain(begin, end, 'H1')
    #-- Your analysis code goes here

This default configuration assumes that the needed GWOSC data files are available in the current working directory or a subdirectory. The variable strain is now a numpy array of strain values, between gps meta['start'] and meta['stop']. The time between samples (in seconds) can be seen in the variable meta['dt']. Currently, GWOSC data is sampled at 4096 Hz.

The return values are as follows:

STRAIN is a numpy array of strain data between the input GPS times
META is a dictionary of gps start, gps stop, and the sample time.
DQ is a dictionary of data quality flags.

Segment Lists

The "Basic Use Case" example above loops over segments in a segment list. The segment list is constructed with the getsegs() method. By default, getsegs() returns Science Mode segments. However, the optional argument flag may be used to return segments based on any data quality flag:

segs = rl.getsegs(start, stop, 'H1', flag='BURST_CAT2')

Alternatively, a segment list may be downloaded from the GWOSC web site using the Timeline Query Form. A segment list as an ASCII text file may be read in using the SegmentList class.


segs = rl.SegmentList('H1_segs.txt')

It is also possible to construct a segment list by combining 2 or more channels from a DQ dictionary. The data quality channels contained in the DQ dictionary are sampled at 1 Hz, and may be easily combined with logical operators. The method dq2segs will convert a 1 Hz DQ channel to a segment list.

# -- Construct segment list based on multiple DQ channels
strain, time, dq = rl.loaddata('ligo_data/H-H1_LOSC_4_V1-842653696-4096.hdf5', 'H1')
bcat3 = dq['BURST_CAT3']
cbccat3 = dq['CBCLOW_CAT3']
clean = bcat3 & cbccat3
segs = rl.dq2segs(clean, time[0])

File Lists

The LOSC API uses the class FileList to identify GWOSC data files that are stored locally. By default, the current working directory and sub-directories are searched for GWOSC data files. If the GWOSC data files are in another directory tree, first construct a FileList, and then pass it as the parameter filelist to getsegs and getstrain.

filelist = rl.FileList(directory='/home/ligodata')
segList = getsegs(842657792, 842658792, 'H1', filelist=filelist)
for start, stop in segList:
  strain, meta, dq = getstrain(start, stop, 'H1', filelist=filelist)
  # -- Analysis code here

For large data sets, it is also possible to save a file list as a cache file with the method filelist.writecache('cache.txt'). A previously saved cache file may be loaded as a working FileList object by passing the cache file name to the FileList constructor:

filelist = rl.FileList(cache='cache.txt')

An existing FileList may be searched for a file containing a given GPS time using the method FileList.findfile()

gps = 842656000
filename = filelist.findfile(gps, 'H1')
print "The file {0} contains time {1} for IFO H1".format(filename, gps)

Working with channel names

Some GWF files require additional parameters to name the strain, DQ, and hardware injection channel:

strain, time, dq = rl.loaddata('H-H1_LOSC_16_V1-1127415808-4096.gwf', 'H1',
    strain_chan='H1:GWOSC-16KHZ_R1_STRAIN',
    dq_chan='H1:GWOSC-16KHZ_R1_DQMASK',
    inj_chan='H1:GWOSC-16KHZ_R1_INJMASK')

These channel names may be found on the documentation page for each data set: e.g. O1 documentation.

That's it! Additional documentation is available in the ReadLIGO homepage. You can also download the examples shown here as api_examples.py.