pySYD paths

Whether you choose to script or import pysyd as a module, it’s important that you always import

When running the software from a terminal or command prompt, the __init__ file saves two important locations, defined as _ROOT and PACKAGEDIR, for software-related files.

The _ROOT directory has everything from input data and information to target results and therefore by default, is defined in an easily accessible place (aka the current working directory.). From this, the software assumes there are 3 subdirectories:

  • INFDIR : ‘~/path/to/local/pysyd/directory/info’

  • INPDIR : ‘~/path/to/local/pysyd/directory/data’

  • OUTDIR : ‘~/path/to/local/pysyd/directory/results’

The latter, PACKAGEDIR should never need to be touched unless for some reason the installation or setup was modified by the user and intentionally left out package data (which is not recommended). For example, the pySYD matplotlib stylesheet is saved there as well as relevant info dictionaries. Since this is used a lot by the software and does not need to be modified by a user, this is typically installed in the user root directory within the pysyd directory (e.g., /usr/local/lib/python3.10/site-packages/pysyd/data/).

pySYD inputs

For what it’s worth and if you haven’t done so already, running the pySYD setup feature will conveniently provide all of files which are discussed in detail on this page.

Required

The only thing that’s really required is the data.

For a given star ID, possible input data are its:
  1. light curve ('ID_LC.txt') and/or

  2. power spectrum ('ID_PS.txt').

Light curve: The Kepler, K2 & TESS missions have provided billions of stellar light curves, or a measure of the object’s brightness (or flux) in time. Like most standard photometric data, we require that the time array is in units of days. This is really important if the software is calculating the power spectrum for you! The y-axis is less critical here – it can be anything from units of fraction flux or brightness as a function of time, along with any other normalization(s). Units: time (days) vs. normalized flux (ppm)

Power spectrum: the frequency series or power spectrum is what’s most important for the asteroseismic analyses applied and performed in this software. Thanks to open-source languages like Python, we have many powerful community-driven libraries like astropy that can fortunately compute these things for us. Units: frequency (\(\rm \mu Hz\)) vs. power density (\(\rm ppm^{2} \mu Hz^{-1}\))

Cases

Therefore for a given star, there are four different scenarios that arise from a combination of these two inputs and we describe how the software handles each of these cases.

Additionally, we will list these in the recommended order, where the top is the most preferred and the bottom is the least.

Case 1: light curve and power spectrum

Here, everything can be inferred and/or calculated from the data when both are provided. This includes the time series cadence, which is relevant for the nyquist frequency, or how high our sampling rate is. The total duration of the time series sets an upper limit on the time scales we can measure and also sets the resolution of the power spectrum. Therefore from this, we can determine if the power spectrum is oversampled or critically-sampled and make the appropriate arrays for all input data.

The following are attributes saved to the pysyd.target.Target object in this scenario:

  • Parameter(s):

    • time series cadence (star.cadence)

    • nyquist frequency (star.nyquist)

    • total time series length or baseline (star.baseline)

    • upper limit for granulation time scales (star.tau_upper)

    • frequency resolution (star.resolution)

    • oversampling factor (star.oversampling_factor)

  • Array(s):

    • time series (star.time & star.flux)

    • power spectrum (star.frequency & star.power)

    • copy of input power spectrum (star.freq_os & star.pow_os)

    • critically-sampled power spectrum (star.freq_cs & star.pow_cs)

Issue(s)

  1. the only problem that can arise from this case is if the power spectrum is not normalized correctly or in the proper units (i.e. frequency is in \(\rm \mu Hz\) and power is in \(\rm ppm^{2} \mu Hz^{-1}\)). This is actually more common than you think so if this might be the case, we recommend trying CASE 2 instead

Case 2: light curve only

Again we can determine the baseline and cadence, which set important features in the frequency domain as well. Since the power spectrum is not yet calculated, we can control if it’s oversampled or critically-sampled. So basically for this case, we can calculate all the same things as in Case 1 but we just have a few more steps that may take a little more time to do.

The following are attributes saved to the pysyd.target.Target object in this scenario:

  • Parameter(s):

    • time series cadence (star.cadence)

    • nyquist frequency (star.nyquist)

    • total time series length or baseline (star.baseline)

    • upper limit for granulation time scales (star.tau_upper)

    • frequency resolution (star.resolution)

    • oversampling factor (star.oversampling_factor)

  • Array(s):

    • time series (star.time & star.flux)

    • newly-computed power spectrum (star.frequency & star.power)

    • copy of oversampled power spectrum (star.freq_os & star.pow_os)

    • critically-sampled power spectrum (star.freq_cs & star.pow_cs)

Issue(s)

Case 3: power spectrum only

This case can be o-k, so long as additional information is provided.

Calculation(s)
  • Parameter(s):

  • Array(s):

Issue(s)
Issue(s): 1) if oversampling factor not provided
  1. if not normalized properly

Case 4: no data

well, we all know what happens when zero input is provided… but just in case, this will raise a PySYDInputError

CASE 1: light curve and power spectrum - Summary: - Calculation(s):

  • time series cadence (\(\Delta t\))

  • nyquist frequency (\(\rm \nu_{nyq}\))

  • time series duration or baseline (\(\Delta T\))

  • frequency resolution (\(\Delta frequency\))

  • oversampling factor (i.e. critically-sampled has an of=1)

  • critically-sampled power spectrum

  • Issue(s):
    • the only problem that can arise from this case is if the power spectrum is not normalized correctly or in the proper units (i.e. frequency is in \(\rm \mu Hz\) and power is in \(\rm ppm^{2} \mu Hz^{-1}\)). This is actually more common than you think so if this might be the case, we recommend trying CASE 2 instead.

CASE 2: light curve only - summary: Again we can determine the baseline and cadence, which set important features in the

frequency domain as well. Since the power spectrum is not yet calculated, we can control if it’s oversampled or critically-sampled

CASE 3: power spectrum only This case can be alright, as long as additional information is provided. Issue(s): 1) if oversampling factor not provided

  1. if not normalized properly

Important

For the saved power spectrum, the frequency array has units of \(\rm \mu Hz\) and the power array is power density, which has units of \(\rm ppm^{2} \, \mu Hz^{-1}\). We normalize the power spectrum according to Parseval’s Theorem, which loosely means that the fourier transform is unitary. This last bit is incredibly important for two main reasons, but both that tie to the noise properties in the power spectrum: 1) different instruments (e.g., Kepler, TESS) have different systematics and hence, noise properties, and 2) the amplitude of the noise becomes smaller as your time series gets longer. Therefore when we normalize the power spectrum, we can make direct comparisons between power spectra of not only different stars, but from different instruments as well!

Optional

There are two main information files that can be provided but both are optional – whether you choose to use them or not is ultimately up to you!

Target list

For example, providing a star list via a basic text file is convenient for running a large sample of stars. We provided an example with the rest of the setup, but essentially all it is is a list with one star ID per line. The star ID must match the same ID associated with the data.

$ cat todo.txt
11618103
2309595
1435467

Note: If no stars are specified via command line or in a notebook, pySYD will read in this text file and process the list of stars by default.

Star info

As suggested by the name of the file, this contains star information on an individual basis. Similar to the data, target IDs must exactly match the given name in order to be successfully crossmatched – but this also means that the information in this file need not be in any particular order.

Below is a snippet of what the csv would look like:

Star info

stars

rs

logg

teff

numax

lower_se

upper_se

lower_bg

1435467

100.0

5000.0

100.0

2309595

100.0

100.0

Just like the input data, the stars must match their ID but also, the commands must adhere to a special format. In fact, the columns in this csv are exactly equal to the value (or destination) that the command-line parser saves each option to. Since there are a ton of available columns, we won’t list them all here but there are a few ways you can view the columns for yourself.

The first is by visiting our special command-line glossary, which explicitly states how each of the variables is defined. You can also see them fairly easily by importing the pysyd.utils.get_dict module and doing a basic print statement.

>>> from pysyd import utils
>>> columns = utils.get_dict('columns')
>>> print(columns['all])
['rs', 'rs_err', 'teff', 'teff_err', 'logg', 'logg_err', 'cli', 'inpdir',
 'infdir', 'outdir', 'overwrite', 'show', 'ret', 'save', 'test', 'verbose',
 'dnu', 'gap', 'info', 'ignore', 'kep_corr', 'lower_ff', 'lower_lc', 'lower_ps',
 'mode', 'notching', 'oversampling_factor', 'seed', 'stars', 'todo', 'upper_ff',
 'upper_lc', 'upper_ps', 'stitch', 'n_threads', 'ask', 'binning', 'bin_mode',
 'estimate', 'adjust', 'lower_se', 'n_trials', 'smooth_width', 'step',
 'upper_se', 'background', 'basis', 'box_filter', 'ind_width', 'n_laws',
 'lower_bg', 'metric', 'models', 'n_rms', 'upper_bg', 'fix_wn', 'functions',
 'cmap', 'clip_value', 'fft', 'globe', 'interp_ech', 'lower_osc', 'mc_iter',
 'nox', 'noy', 'npb', 'n_peaks', 'numax', 'osc_width', 'smooth_ech', 'sm_par',
 'smooth_ps', 'threshold', 'upper_osc', 'hey', 'samples']
>>> len(columns['all'])
77

Note: This file is especially helpful for running many stars with different options - you can make your experience as customized as you’d like!