Tutorial: Getting Data Automatically

In this tutorial, we'll fetch LIGO data associated with specific times, doing it all by code and web services rather than by clicking.

Data discovery automatically

Let us assume we want to investigate a number of astrophysical events, each precisely timed, and we want to run a search in the minutes around each event. Suppose we have GPS times for each event. As an example, we can take the times of the 22 gamma-ray bursts that happened during LIGO's S5 run, as listed here, taken from this paper.

Here are the first few, written as a python dictionary:

eventList = [
[815976703, 'GRB051114'],
[818228794, 'GRB051210'],
[818304618, 'GRB051211'],
[821917508, 'GRB060121'],

Obviously we could make a dictionary like this in many ways: the data model is an 'event' as a pair: (GPStime, Name), and we have a list of N of these. Even though the list above has only 22 events, let us try to build the software for N >> 1.

Fetching Timeline data: Is There Good Data?

First we will want to find out which detetctors were operating when, or more specifically, when are they operating with good data quality. You can code against this with the JSON version of the Timeline tool. Here is code to make this decision. It asks for the duty cycle of the CAT2 data for H1, in 128 seconds surrounding the given time:

import json
import urllib

def fetchTimeline(timelineID, GPSstart, GPSend, level, detector, dataset='S5'):
    urlformat = 'https://www.gw-osc.orgtimelinejson/{0}/{1}_{2}/{3}/{4}/{5}/'
    url = urlformat.format(dataset, detector, timelineID, GPSstart, GPSend-GPSstart, level)
    print "Fetching ", url
    r = urllib.urlopen(url).read()
    timelines = json.loads(r)
    return timelines[0]

for event in eventList:
    t = event[0]   # GPS time of a GRB
    detector = 'H1'    # One of the detectors we are interested in
    level = 9          # so that 2^level seconds is about 2 minutes
    timelineData = fetchTimeline( 'CBCLOW_CAT2', t, t, level, detector)
    duty = timelineData[0][1]    # it returns a list that has one [time,duty] pair
    print "%s CAT2 duty cycle over 128 seconds: %3.2f" % (detector, duty)

Thus we can select for further study only those GRBs where there is good data (passes CAT2) from both detectors (H1 and L1).

Fetching the Strain Data

For those GRBs where there is good data, we can download the 4096-second HDF5 files of strain data. These will be up to 120 Mbyte in size.

def fetchStrain(GPStime, detector, dataset='S5'):
    observatory = detector[0]         # first letter of the detector H or L
    hour        = GPStime&0xFFFFF000  # the filename rounding down to a multiple of 4096
    fortnight   = GPStime&0xFFF00000  # the directory by rounding down to multiple of 4096*256
    filename = '{0}-{1}_LOSC_4_V1-{2}-4096.hdf5'.format(observatory, detector, hour)
    urlformat = 'https://www.gw-osc.orgarchive/data/{0}/{1}/{2}'
    url = urlformat.format(dataset, fortnight, filename)
    print "Fetching data file from ", url
    print 'Fetching ', filename,
    r = urllib.urlopen(url).read()
    f = open(filename, 'w')   # write it to the right filename

t = eventList[3][0]   # GPS time of a GRB
detector = 'H1'
fetchStrain(t, detector)

Now that we have the data files, we can go back to the earlier tutorials, read in the data, and run a search at the times of the GRBs.

Fetching the Data Catalog

Another way to look at data quality is through the GWOSC data catalog. For each of the 4096-second data files, there are statistics available about the proportion of time (duty cycle) that the various data quality flags are on, similar to the 128-second averages computed above from the Timelines. There are also gross statistics about the strain vector during the time of the file, the minimum, maximum, mean, and standard deviation; also three band-limited RMS in three bands: 30-40 Hz, 40-100 Hz, and 100-1000 Hz.

def fetchTileCatalog(GPSstart, GPSend, detector, dataset='S5'):
    urlformat = 'https://www.gw-osc.orgarchive/links/{0}/{1}/{2}/{3}/json/'
    url = urlformat.format(dataset, detector, GPSstart, GPSend)
    "Tile catalog URL is ", url
    r = urllib.urlopen(url).read()
    tiles = json.loads(r)
    attrnames = tiles[0]   # first row of json is the column names
    rows = tiles[1:]       # subsequent records are rows of table
    return (attrnames, rows)

t = eventList[3][0]   # GPS time of a GRB
detector = 'H1'
(attrnames, rows) = fetchTileCatalog(t, t, detector)
print "Data for %s at time %d" % (detector, t)
for row in rows:   # expect only one, since GPSstart == GPSend
    for i in range(len(attrnames)): 
        print "%s = %s" % (attrnames[i], row[i])
There is another tutorial about automatically downloading all the data from a Run, or all the open data from all Events.