O1 Data Set Technical Details
GWOSC data downsampling and repackaging
We have chosen to repackage our data to make it more accessible to casual users both within the LVC and outside.
- Frame files are delivered from the CIT cluster. However, frame format is unfamiliar to people outside the GW community, and a "lightweight" frame reader is not readily available. So the data is converted to HDF5, to eliminate need for a frame reader. hdf5 is a popular format (easily readable in python, matlab, Mathematica, C, ..), and will be readable for many years. Frame files are also released (repackaged as described below), in case the user already has frame reading software.
- The strain data is re-sampled from 16384 Hz to 4096 Hz. Almost all LVC searches do this already, in pre-processing, because of the increased shot noise and the dearth of astrophysical source targets at higher frequencies. The data quality (DQ) is less well studied above 2 kHz, and the strain calibration is valid only up to 5 kHz. This resampling reduces the size of our data by a factor 4, making the downloading easier and easing disk space needs for our users.
- The down-sampling is done using the python package scipy, with the method scipy.signal.decimate.
- Advanced LIGO data are not calibrated or valid below 10 Hz or above 5 kHz, and the data sampled at 4096 Hz are not valid above 2 kHz. In most searches for astrophysical sources, data below 20 Hz are not used because the noise is too high.
- Our hdf5/frame files have fixed duration (4096s) and boundaries. Tutorial 4 presents a user API to get the data and load it into python, giving users access to a list of data segments. This approach is now also adopted for aLIGO frames.
- We provide Timelines and My Sources to aid the user in finding data (including DQ and HWinj info) from a particular time, instead of segDB queries. From Timeline, you can see multiple DQ and Injection flags, zoom in, and download segments.
- The data quality (DQ) and hardware injections (HW) are summarized in 1Hz vectors, in both the hdf5 and frame files. This approach is now also adopted for aLIGO frames.
Note about the O1 start and stop dates
The official start date for the first Advanced LIGO observation run, O1, was 2015 Sep 18. This was preceded by Engineering Run 8, or ER8, which began in 2015 Aug 17. However, the detectors were running well and were well calibrated by 2015 Sep 12. They were left in undisturbed observation mode for long periods of time from that time on, as an "unofficial" early start to O1. This turned out to be a good thing, because GW150914 came two days later! For the purposes of this data release, all of the strain data from 2015 Sep 12th 0:00 UTC (GPS 1126051217) to 2016 Jan 19 16:00 UTC (GPS 1137254417) are referred to as "O1" data.
Notes about the DATA flag
See the Defining the DATA Flag page.
O1 DQ flags
During O1, the Burst and CBC groups removed hardware injections at various DQ category levels. However, the GWOSC data does not mask out data that has injections -- rather lists are provided of those. Thus a search pipeline should expect to find many "chirps", which will then be found in the injection lists. This is different from the conventional (LSC) search, which does not find injections, because it analyzes data from which injection times have been removed already.
Note about hardware injections
The timelines about hardware injections (BURST and CBC) indicate that injections were happening at times where GWOSC does not supply data, that is where the DATA bit is off. These should be ignored. The correct lists of injections are here.
Note about UTC Time in O1 HDF5 files
A one second offset was found and fixed in the UTC times stored in the O1 HDF5 files. The GPS times in these O1 HDF5 files was not impacted. (The GWF files, also were not impacted as they only store GPS). All O1 (4KHz and 16KHz) HDF5 files have as May 25th, 2018 had the UTC time corrected. The md5sums for all HDF5 files were regenerated to match the new files with the correct UTC times. The current HDF5 files were modified in place so the md5sums will match the corrected files. No other changes were made other than to correct the UTC time for the one second offset, and to update the md5sums.