Examples
======================================================
Examples may eventually be found in one or more jupyter notebook(s). This is very preliminary.
Note that pyasdm can not read ASDM with versions < "3" (the schemaVersion
value in the ASDM.xml file).
.. toctree::
:maxdepth: 2
Example 1
---------
Import the pyasdm module and open an existing ASDM. The pyasdm method
*setFromFile* uses *os.path.expanduser* to expand this path to the
full path. This example uses an SDM from the casatestdata repository,
use an appropriate path for where you have already installed casatestdata.::
>>> import pyasdm
>>> asdm = pyasdm.ASDM()
>>> asdm.setFromFile("~/casa/casatestdata/sdm/uid___A002_X71e4ae_X317_short")
Following the c++ and Java class design, an ASDM is always instantiated
first and then set to a specific ASDM by giving it the path to the directory
containing that ASDM as shown here. Table instances are then fetched from
that ASDM and individual rows can be gotten from a table. The classes at the
*pyasdm* layer are the container ( *ASDM* , *enumerations* , *exceptions* , *types* , and *utils* .
The *bdf* classes deal with reading a BDF (binary data file/format) file
(eventually that will also include classes to write a BDF to a file).
The *enumerations* are classes that handle the enumerations found in the SDM
by limiting the set of allowed values to those known to the model and translating
between how the enumerations are stored in an SDM (generally as integers). The
*types* are specific types known to the model, built on top of the standard
types (this is largely so that they can be read and written correctly) and
the *utils* and *exceptions* hold utility methods and the specific exceptions
added and used by pyasdm.
Note that only the ASDM container is loaded in that first example. None of the
contained tables that may exist have been read yet. In pyasdm, tables
are always loaded on demand.::
>>> asdm.status()
'Main' : IS present in _tableEntity, presentInMemory = False size = 0
'AlmaRadiometer' : IS present in _tableEntity, presentInMemory = False size = 0
'Annotation' : IS NOT present in _tableEntity, presentInMemory = True size = 0
'Antenna' : IS present in _tableEntity, presentInMemory = False size = 0
'CalAmpli' : IS present in _tableEntity, presentInMemory = False size = 0
'CalAntennaSolutions' : IS NOT present in _tableEntity, presentInMemory = True size = 0
'CalAppPhase' : IS NOT present in _tableEntity, presentInMemory = True size = 0
'CalAtmosphere' : IS present in _tableEntity, presentInMemory = False size = 0
'CalBandpass' : IS NOT present in _tableEntity, presentInMemory = True size = 0
Shown here is the first few lines of the output of *status* for this SDM. There
is one output line for each possible table that might exist in an SDM. The
line indicates "IS present" for a table that is already present (it exists) in that
SDM and "IS NOT present" if that table does not exist on disk. Note that initially
the tables that exist have *False* for "presentInMemory". That's because they have
not yet been loaded. That is also why their size is 0. For tables that do not exist
on disk, they DO exist as a zero-sized table. Rows could be added. If that SDM
is then written to disk that table with newly added rows would then be written
to disk. That's not shown in this example.::
>>> mt = asdm.getMain()
>>> mt.size()
30
>>> mt.getName()
'Main'
>>> mt.getKeyName()
['time', 'configDescriptionId', 'fieldId']
>>> mr = mt.get()
>>> len(mr)
30
>>> print(mr[0].toXML())
30 INTEGRATION 10080000000 5 1 1 9610595 ConfigDescription_0 ExecBlock_0 Field_0 1 30 State_0 State_0 State_0 State_0 State_0 State_0 State_0 State_0 State_0 State_0 State_0 State_0 State_0 State_0 State_0 State_0 State_0 State_0 State_0 State_0 State_0 State_0 State_0 State_0 State_0 State_0 State_0 State_0 State_0 State_0
This first gets the Main table. You can see that it's size is 30 rows and it's name is "Main".
The *getKeyName* method returns the list of key fields in the table. These can be
used to select the row that matches a specific set of Key values but that method doesn't
support wildcards or ranges and so for a table like Main is not terribly useful. The
full listof rows found in Main is returned by *get* and this example shows that it returned
30 rows. Each element is a MainRow instance. The *toXML* method returns the full row
as an XML string (this is what is used to write the rows of a table to disk).::
>>> mr[0].getDataSize()
9610595
>>> mr[0].getNumAntenna()
30
>>> mr[0].getConfigDescriptionId()
>>> print(mr[0].getConfigDescriptionId())
ConfigDescription_0
>>> print(mr[0].getDataUID())
>>> print(mr[0].getTimeSampling())
INTEGRATION
>>> pyasdm.enumerations.TimeSampling.names()
['SUBINTEGRATION', 'INTEGRATION']
There are getters and setters for each of the fields in a row. For standard types, they return
a value as shown here for *dataSize* and *numAntenna* (note that the method name for
a get function is the field name with the first letter in upper case).
Here, *configDescriptionId* is stored as a *Tag* type (which is how fields that
indicate rows in other tables are stored) and *dataUID* is an *EntityRef* type.
Specific types have *__str__* method so they should print out in a useful
form.
Note also that *stateId* is a 1-D array of values. In this case, there
is 1 value for each antenna. In the XML form the dimensionality is shown first
followed by the number of values and then the individual values. In this case,
the values are all *Tag* types indicating a row in the State table and they
all have the same value.
The *timeSampling* field is an example of an *enumeration* instance. The
static member *names* can be used for any enumeration class to show all of
the allowed names for that type of enumeration. For the *TimeSampling*
enumeration there are 2 types.::
>>> frow = mr[0].getFieldUsingFieldId()
>>> print(frow.toXML())
Field_0 J1924-2914 1 2 1 2 5.082621016035585 -0.5103639492133764 2 1 2 5.082621016035585 -0.5103639492133764 2 1 2 5.082621016035585 -0.5103639492133764 none J2000 0
>>> srow = mr[0].getStateUsingStateId(10)
>>> print(srow.toXML())
State_0 NONE True True True 1.0
Some fields in a row indicate a row in another table. The row class has methods that can be used to get
the instance of the indicated row as shown here for *fieldId*. For cases
like *stateId* you can either get a single StateRow instance as shown
here (getting the one for element 10 in that list of stateId values)
or you can get all of the rows as a list of rows using the
appropriate getter for that field in that row (here it is
*getStatesUsingStateId* ).
For the ASDM tables and rows and types and enumerations, python
lists are used when there are multiple values (and sometimes
lists of list or lists of lists of lists when the number of
dimensions in that field is more than 1, the sdm currently has
cases up to 4 dimensions). For the binary file, the binary
data itself is returned as numpy arrays.::
>>> mr[0].getBDFPath()
'/users/bgarwood/casa/casatestdata/sdm/uid___A002_X71e4ae_X317_short/ASDMBinary/uid___A002_X71e4ae_X328'
>>> bdf = pyasdm.bdf.BDFReader()
>>> bdf.open(mr[0].getBDFPath())
>>> print(bdf.getHeader())
XML Schema version = 2
Byte order = Little_Endian
startTime = 4890409733520000000
dataOID = uid://A002/X71e4ae/X328
title = ALMA BL Correlator Spectral Data
dimensionality = 1
execBlockUID = uid://A002/X71e4ae/X317
execBlockNum = 1
scanNum = 1
subscanNum = 1
numAntenna = 30
correlationMode = CROSS_AND_AUTO
spectralResolutionType = FULL_RESOLUTION
processorType = CORRELATOR
atmospheric phase correction = AP_UNCORRECTED
baseband #0:
name = BB_1
spectralWindow #0:
sw = 1
crossPolProducts = XX YY
sdPolProducts = XX YY
scaleFactor = 168374.57812
numSpectralPoint = 128
numBin = 1
sideband = LSB
baseband #1:
name = BB_2
spectralWindow #0:
sw = 2
crossPolProducts = XX YY
sdPolProducts = XX YY
scaleFactor = 168374.57812
numSpectralPoint = 128
numBin = 1
sideband = LSB
baseband #2:
name = BB_3
spectralWindow #0:
sw = 3
crossPolProducts = XX YY
sdPolProducts = XX YY
scaleFactor = 168374.57812
numSpectralPoint = 128
numBin = 1
sideband = USB
baseband #3:
name = BB_4
spectralWindow #0:
sw = 4
crossPolProducts = XX YY
sdPolProducts = XX YY
scaleFactor = 168374.57812
numSpectralPoint = 128
numBin = 1
sideband = USB
flags:
size = 3720
axes = BAL ANT BAB POL
actualTimes:
size = 3720
axes = BAL ANT BAB POL
actualDurations
size = 3720
axes = BAL ANT BAB POL
zeroLags
size = 240
axes = BAL BAB POL
correlatorType = FXF
crossData
size = 890880
axes = BAL BAB SPP POL
autoData
size = 30720
axes = ANT BAB SPP POL
normalized = True
Here, the *getHeader* method returns a BDFHeader instance. It has getters for
each of these fields. Some of them are simple values, others are
enumerations, etc. This shows the organization of the data. Namely that there
are 4 basebands here and that each baseband has a single spectral window.
Each spectral window has an item that indicates the cross and single dish (auto)
correlation products. The scale factor is used to scale the crossData values
for that spectral window in that baseband. In general, each baseband could
have multiple spectral windows, with different scale factors and cross and
sd pol products. The baseband information summarized here is currently
available as a list of dict values with one dict valeu for each
baseband, in order, through *getBasebandsList()* .
At the end are shown the possible binary components that may be found in
this BDF. The binary components are found in data subsets. A subset is
retrieve by the *getSubset* method and the *hasSubset* method indicates
that there are still more subsets available from that bdf. After the
*open* method returns the position of the BDF is just before the first
subset. For each type of binary data the header shows the expected size
in a subset (if that type is found) and the axes used. See the BDF format
documentation for more details on the meanings.
Eventually the binary data will be available in one or more views that
are useful to downstream users. The c++ code provides views that are
appropriate when filling a single row of a MS v2 or multiple rows of a
MS v2. Neither one of those views are liable to be useful to VIPER.::
>>> bh = bdf.getHeader()
>>> bh.getTitle()
'ALMA BL Correlator Spectral Data'
>>> print(bh.getProcessorType())
CORRELATOR
>>> bb_dict = bh.getBasebandsList()[1]
>>> bb_dict
{'name': 'BB_2', 'spectralWindows': [{'crossPolProducts': [, ], 'sdPolProducts': [, ], 'scaleFactor': np.float32(168374.58), 'numSpectralPoint': 128, 'numBin': 1, 'sideband': , 'sw': '2'}]}
>>> bh.getBinaryTypes()
['flags', 'actualTimes', 'actualDurations', 'zeroLags', 'crossData', 'autoData']
>>> bh.getAxesNames('crossData')
['BAL', 'BAB', 'SPP', 'POL']
>>> bh.getSize('crossData')
890880
This illustrates how some of these values are enumerations. It shows the
available binary types and the list of axes and size for the crossData
type. In this case, BAL is the baseline axis, BAB is the baseband axis,
SPP is the spectral axis (channels) and POL is the axes necessary to hold
all of the pol products shown for the spectral window in question. In
the more general case that can be complicated, but here there are 30
antennas so the number of BAL elements is 30x29/2 (435), there are
4 basebands, each baseband has a single spectral window with 128 channels,
and there are 2 elements along the POL axis. In addition, crossData is
stored as a pair of values, one each for the real and imaginary parts.
So the expected size here is 435x4x128x2x2 or 890880, as indicated by
the size value.::
>>> bdf.hasSubset()
True
>>> ss = bdf.getSubset()
>>> ss.keys()
dict_keys(['projectPath', 'integrationNumber', 'subIntegrationNumber', 'midpointInNanoSeconds', 'intervalInNanoSeconds', 'aborted', 'stopTime', 'abortReason', 'actualTimes', 'actualDurations', 'crossData', 'autoData', 'flags', 'zeroLags'])
>>> ss['projectPath']
'1/1/1/1'
>>> ss['actualTimes']
{'present': False, 'startsAt': -1, 'arr': None, 'type': 'INT64_TYPE', 'np_type': dtype('int64')}
>>> for t in bh.getBinaryTypes():
... print('%s : %s' % (t,ss[t]['present']))
...
flags : True
actualTimes : False
actualDurations : False
zeroLags : True
crossData : True
autoData : True
>>> ss['crossData']
{'present': True, 'startsAt': 18130, 'arr': array([ 13, 43, 72, ..., 29, -59, 26], shape=(890880,), dtype=int16), 'type': 'INT16_TYPE', 'np_type': dtype('int16')}
>>> # iterate through to the end with (not all shown, there are 5 subsets in this BDF)
>>> bdf.hasSubset()
True
>>> ss = bdf.getSubset()
>>> # until bdf.hasSubset() return False
The subset is a dictionary at the moment. It may become a class
as development of pyasdm continues. The dictionary has several
fields that describe the subset and then it has a field for each
of the binary types indicated by the global header. Each of those
is itself a dictionary. Not all binary types will be found in each
subset. Here, this subset has flags, zeroLags, crossData and autoData.
The *present* field indicates whether it's present in that subset.
The *arr* field is the array of values found, when present and
*startsAt* is the starting location of those bytes in the file (that
could be used to skip to that location and read just those values
once the type and byte order is known). Note that crossData can be
stored as a scaled integer (either 16 or 32 bit integers). The
floating point values are recovered by dividing the integer values
by the scaleFactor from the global header for that spectral window
and baseband. The format also allows crossData to be stored as 32 bit
floating point values.
Eventually additional code will exist that will serve these values
up in a form that is useful for downstream processing, with the
accompanying meta information from ASDM as necessary (the views
discussed earlier).
Example 2
---------
The WSU data will have a single spectral window in each BDF and so
extracting and scaling the crossData will be simplified because
the BAB axis will always have a single element and the SPP axis
will then be a single number instead of something that depends
on the BAB element being used.
This example is from an SDM that where the BDFs were split to look
like how we think WSU data will look like. The script to split a
BDF is not robust for general use and is not part of pyasdm. The
data used here is from a personal copy to illustrate the difference.
Note: eventually it will be possible to close and reopen an ASDM, as
it already is with a BDF. Tests indicate that something isn't being
cleared properly so that does not yet work. If trying another
ASDM or BDF you should currently create one each time, as in
this example.
::
>>> asdm = pyasdm.ASDM()
>>> asdm.setFromFile('~/casa/split_data/uid___A002_X10d9399_X6279')
>>> mt = asdm.getMain()
>>> mr = mt.get()
>>> bdf = pyasdm.bdf.BDFReader()
>>> bdf.open(mr[4].getBDFPath())
>>> bh = bdf.getHeader()
>>> print(bh)
XML Schema version = 2
Byte order = Little_Endian
startTime = 5202971372592000000
dataOID = uid://A002/X10d9399/Xe6d
title = ALMA ACA Correlator Spectral Data
dimensionality = 1
execBlockUID = uid://A002/X10d9399/X6279
execBlockNum = 1
scanNum = 1
subscanNum = 1
numAntenna = 9
correlationMode = CROSS_AND_AUTO
spectralResolutionType = FULL_RESOLUTION
processorType = CORRELATOR
atmospheric phase correction = AP_UNCORRECTED
baseband #0:
name = BB_1
spectralWindow #0:
sw = 1
crossPolProducts = XX YY
sdPolProducts = XX YY
scaleFactor = 168374.57812
numSpectralPoint = 128
numBin = 1
sideband = LSB
flags:
size = 90
axes = BAL ANT BAB POL
actualTimes:
size = 90
axes = BAL ANT BAB POL
actualDurations
size = 90
axes = BAL ANT BAB POL
crossData
size = 18432
axes = BAL BAB SPP POL
autoData
size = 2304
axes = ANT BAB SPP POL
normalized = True
>>> ss = bdf.getSubset()
>>> ss['crossData']
{'present': True, 'startsAt': 4200, 'arr': array([ 81, 6, -46, ..., 25, 53, -44], shape=(18432,), dtype=int16), 'type': 'INT16_TYPE', 'np_type': dtype('int16')}
>>> import numpy as np
>>> farr = ss['crossData']['arr'].astype(np.float32)
>>> spw = bh.getBasebandsList()[0]['spectralWindows'][0]
>>> scaleFactor = spw['scaleFactor']
>>> farr /= scaleFactor
>>> nAnt = bh.getNumAntenna()
>>> nbl = int(nAnt*(nAnt-1)/2)
>>> nchan = spw['numSpectralPoint']
>>> npol = len(spw['crossPolProducts'])
>>> shape = (nbl,nchan,npol,2)
>>> farr_shaped = farr.reshape(shape)
>>> carr_shaped = farr_shaped[:,:,:,0] + 1j * farr_shaped[:,:,:,1]
>>> carr_shaped.shape
(36, 128, 2)
>>> carr_shaped[0,0,0]
np.complex64(0.00048107025+3.5634832e-05j)
This subset also has actualTimes and actualDurations and does not have
flags or zeroLags data.
Knowing that there is a single baseband here with a single spectral window
simplifies extracting the data. Here, the crossData array is converted
to a 32-bit float array and the scale factor is applied. The array is
then reshaped, including the real and imaginary array implied as the
final axis (note that the baseband axis is skipped in makeing the shape
as it as a single element here). Then a complex array is created from the
real and imaginary parts of the floating point values and the value at the
origin of the resulting array is printed. The axes are baseline, channel,
and polarization (here "XX" and "YY"). See the BDF documentation for
how baseline are ordered.
All of this illustrates the need for well-defined views that are useful for
VIPER use. Presenting the data in such a view isn't difficult, but
that detail obviously needs to be hidden from the user.