O4 Data Set Technical Details

Some of the links on this page are to internal pages that require ligo.org credentials. If you need to gain access to the information there, please contact us at gwosc@igwn.org.

GWOSC data downsampling and repackaging

GWOSC builds files from standard LIGO, Virgo and KAGRA h(t) frames. We have chosen to create a repackaged version of our data to make it more accessible to casual users.

  • Data are made available both as frame files (GWF) and HDF5 (HDF). The GWF frame format is a standard within the GW community, but may be unfamiliar to people in other fields. HDF5 is a popular format, easily readable in many languages, including python, matlab, Mathematica, and C.
  • The channel names used to collect data from the original files are: H1:GDS-CALIB_STRAIN_CLEAN_AR for H1 and L1:GDS-CALIB_STRAIN_CLEAN_AR for L1.
  • The strain data are made available both at 16384 Hz and 4096 Hz sample rates. Users should choose which sampling rate is most appropriate for their search. The data quality (DQ) is less well studied above 2 kHz, and the strain calibration is valid only up to 5 kHz. Of course, the down-sampled dataset is smaller, reducing both the download time and storage requirements.
  • In the 4096 Hz data set, the use of an anti-aliasing filter corrupts the data near the Nyquist frequency. For studies involving frequencies of around 1700 Hz or above, the 16384 Hz data should be used instead.
  • The down-sampling is done using the python package scipy, with the method scipy.signal.decimate.
  • Our hdf5/frame files have fixed duration (4096 seconds) and boundaries. Before downsampling the data from 16384 Hz to 4096 Hz, for each file a padding of 8 seconds is requested to avoid border effect. However, these data are not always available so in some cases tiny border effects could still be present in the 4096 Hz data.
  • We provide Timelines and My Sources to aid the user in finding data (including DQ and HWinj info) from a particular time, instead of segDB queries. From Timeline, you can see multiple DQ and Injection flags, zoom in, and download segments.
  • The data quality (DQ) and hardware injections (HW) are summarized in 1 Hz vectors, in both the hdf5 and frame files. See the bit mask definition for details. The bit mask definition is equivalent for the files sampled at 16 kHz and at 4 kHz. Step 3 of the introductory tutorial shows how to apply data-quality flag (in the tutorial use the flags for the O4a run from the bit mask definition).
  • Meta-data on each file includes an estimate of the Binary Neutron Star (BNS) Range, as seen on the O4a_4KHZ_R1 and O4a_16KHZ_R1, archives, with the "includes statistics of each file" option. Note that we have updated the BNS range calculation starting from O4a using the sensemon_range method from gwpy.astro with a minimum ferquency of 10 Hz.

Notes on calibration

The detector strain h(t) is only calibrated between 10 Hz and 5000 Hz for Advanced LIGO. In most searches for astrophysical sources, data below 20 Hz are not used because the noise is too high.

Check the following papers for details about the calibration and calibration uncertainties:

Files containing the uncertainty in the calibration, both magnitude and phase, as a function of frequency, with associated documentation, are available at this link.

Noise Subtraction

After data collection, several independently-measured terrestrial contributions to the detector noise were subtracted from the LIGO data at both sites. This subtraction removed calibration lines and 60 Hz AC power mains harmonics from both LIGO data streams. Additional noise contributions due to non-stationary couplings of the power mains were also subtracted.

For reference, see:

  • "Improving the sensitivity of Advanced LIGO using noise subtraction" arXiv:1809.05348
  • "Machine-learning non-stationary noise out of gravitational-wave detectors" arXiv:1911.09083

GWF Channel Names

The O4 4KHZ and 16KHz GWF files (ending with extension .gwf) use the channel names in the table below:

Channel names found inside GWF files

O4a (4KHz samples per second) O4a (16KHz samples per second)
{ifo}:GWOSC-4KHZ_R1_STRAIN {ifo}:GWOSC-16KHZ_R1_STRAIN
{ifo}:GWOSC-4KHZ_R1_DQMASK {ifo}:GWOSC-16KHZ_R1_DQMASK
{ifo}:GWOSC-4KHZ_R1_INJMASK {ifo}:GWOSC-16KHZ_R1_INJMASK

NOTES:

  • {ifo} is a place holder for either H1 or L1, e.g., H1:GWOSC-16KHZ_R1_STRAIN or L1:GWOSC-16KHZ_R1_STRAIN.
  • The _R1_ substring represents the revision number of the named channel.

Notes about the DATA flag

See the Defining the DATA Flag page.

O4 Hardware Injections

The O4 data contain hardware injections that appear as (simulated) gravitational wave signals in the data.

However, the GWOSC data does not mask out data that has injections -- rather lists are provided of those. See the O4 Hardware Injection page for details.

Segment lists of hardware injections may include times when data are not publicly available. Details of these injections are not included in the documentation.

Data Quality

Data-quality categories, or flags, are defined by each analysis group: Compact Binary Coalescence (CBC), Burst, Continuous Waves (CW) and Stochastic. This is because periods of noisy data will affect each type of analysis differently.

For each flag, GWOSC data files contain a corresponding 1 Hz time-series that marks times that pass the flag as a "1" (good data), and times that fail the flag as a "0" (bad data). A full list of O3 data-quality categories can be seen on the O4a data quality definitions page (for the release at 16 kHz) (the data-quality categories for the files at 4 kHz are identical). Data quality is described in these categories:

  • DATA (Data Available): Failing this level indicates that GW strain data are not publicly available because the instruments were not operating in an acceptable condition. For O4, DATA is equivalent to Category 1.
  • CAT1 (Category 1), CAT2 (Category 2), CAT3 (Category 3): See O4a LIGO Detector Characterization Paper

In general, data-quality levels are defined in a cumulative way: a time which fails a given category automatically fails all higher categories. For example, if the only known problem with a given time fails a burst category 2 flag, then the data is said to pass DATA and BURST_CAT1, but fails BURST_CAT2 and BURST_CAT3. However, the different analysis groups are independent: if something fails at CAT2_BURST, then it may still pass CAT2_CBC.

These graduated categories of quality allow a data pipeline to adjust its behavior depending on the data-quality. An example is running the numerical search (template matching) against all the data segments that pass CAT1, but ignoring any candidate events from data that do not pass CAT3. This strategy allows long sections of data to be used, increasing search efficiency.

Note to LVK members: Conventionally, hardware injections are vetoed by CAT flags so that searches do not see them. However GWOSC strain data provides h(t) at these times: therefore a search with GWOSC data will find lots of chirps that must be compared with the lists of injections -- see above.

For information on how to use data-quality information:

  • LIGO Detector Characterization in the first half of the fourth Observing run: arxiv:2409.02831
  • Step 3 of the introductory tutorial shows how to apply data-quality flag
  • Data Quality definitions for the O4 data set (the link refers to the 16 kHz files, the data quality categories for the files at 4 kHz are identical).
  • Plot and download segment lists from Timeline for O4a

MD5 Check Sums