Examples ====================================================== Examples may eventually be found in one or more jupyter notebook(s). This is very preliminary. Note that pyasdm can not read ASDM with versions < "3" (the schemaVersion value in the ASDM.xml file). .. toctree:: :maxdepth: 2 Example 1 --------- Import the pyasdm module and open an existing ASDM. The pyasdm method *setFromFile* uses *os.path.expanduser* to expand this path to the full path. This example uses an SDM from the casatestdata repository, use an appropriate path for where you have already installed casatestdata.:: >>> import pyasdm >>> asdm = pyasdm.ASDM() >>> asdm.setFromFile("~/casa/casatestdata/sdm/uid___A002_X71e4ae_X317_short") Following the c++ and Java class design, an ASDM is always instantiated first and then set to a specific ASDM by giving it the path to the directory containing that ASDM as shown here. Table instances are then fetched from that ASDM and individual rows can be gotten from a table. The classes at the *pyasdm* layer are the container ( *ASDM* , *enumerations* , *exceptions* , *types* , and *utils* . The *bdf* classes deal with reading a BDF (binary data file/format) file (eventually that will also include classes to write a BDF to a file). The *enumerations* are classes that handle the enumerations found in the SDM by limiting the set of allowed values to those known to the model and translating between how the enumerations are stored in an SDM (generally as integers). The *types* are specific types known to the model, built on top of the standard types (this is largely so that they can be read and written correctly) and the *utils* and *exceptions* hold utility methods and the specific exceptions added and used by pyasdm. Note that only the ASDM container is loaded in that first example. None of the contained tables that may exist have been read yet. In pyasdm, tables are always loaded on demand.:: >>> asdm.status() 'Main' : IS present in _tableEntity, presentInMemory = False size = 0 'AlmaRadiometer' : IS present in _tableEntity, presentInMemory = False size = 0 'Annotation' : IS NOT present in _tableEntity, presentInMemory = True size = 0 'Antenna' : IS present in _tableEntity, presentInMemory = False size = 0 'CalAmpli' : IS present in _tableEntity, presentInMemory = False size = 0 'CalAntennaSolutions' : IS NOT present in _tableEntity, presentInMemory = True size = 0 'CalAppPhase' : IS NOT present in _tableEntity, presentInMemory = True size = 0 'CalAtmosphere' : IS present in _tableEntity, presentInMemory = False size = 0 'CalBandpass' : IS NOT present in _tableEntity, presentInMemory = True size = 0 Shown here is the first few lines of the output of *status* for this SDM. There is one output line for each possible table that might exist in an SDM. The line indicates "IS present" for a table that is already present (it exists) in that SDM and "IS NOT present" if that table does not exist on disk. Note that initially the tables that exist have *False* for "presentInMemory". That's because they have not yet been loaded. That is also why their size is 0. For tables that do not exist on disk, they DO exist as a zero-sized table. Rows could be added. If that SDM is then written to disk that table with newly added rows would then be written to disk. That's not shown in this example.:: >>> mt = asdm.getMain() >>> mt.size() 30 >>> mt.getName() 'Main' >>> mt.getKeyName() ['time', 'configDescriptionId', 'fieldId'] >>> mr = mt.get() >>> len(mr) 30 >>> print(mr[0].toXML()) 30 INTEGRATION 10080000000 5 1 1 9610595 ConfigDescription_0 ExecBlock_0 Field_0 1 30 State_0 State_0 State_0 State_0 State_0 State_0 State_0 State_0 State_0 State_0 State_0 State_0 State_0 State_0 State_0 State_0 State_0 State_0 State_0 State_0 State_0 State_0 State_0 State_0 State_0 State_0 State_0 State_0 State_0 State_0 This first gets the Main table. You can see that it's size is 30 rows and it's name is "Main". The *getKeyName* method returns the list of key fields in the table. These can be used to select the row that matches a specific set of Key values but that method doesn't support wildcards or ranges and so for a table like Main is not terribly useful. The full listof rows found in Main is returned by *get* and this example shows that it returned 30 rows. Each element is a MainRow instance. The *toXML* method returns the full row as an XML string (this is what is used to write the rows of a table to disk).:: >>> mr[0].getDataSize() 9610595 >>> mr[0].getNumAntenna() 30 >>> mr[0].getConfigDescriptionId() >>> print(mr[0].getConfigDescriptionId()) ConfigDescription_0 >>> print(mr[0].getDataUID()) >>> print(mr[0].getTimeSampling()) INTEGRATION >>> pyasdm.enumerations.TimeSampling.names() ['SUBINTEGRATION', 'INTEGRATION'] There are getters and setters for each of the fields in a row. For standard types, they return a value as shown here for *dataSize* and *numAntenna* (note that the method name for a get function is the field name with the first letter in upper case). Here, *configDescriptionId* is stored as a *Tag* type (which is how fields that indicate rows in other tables are stored) and *dataUID* is an *EntityRef* type. Specific types have *__str__* method so they should print out in a useful form. Note also that *stateId* is a 1-D array of values. In this case, there is 1 value for each antenna. In the XML form the dimensionality is shown first followed by the number of values and then the individual values. In this case, the values are all *Tag* types indicating a row in the State table and they all have the same value. The *timeSampling* field is an example of an *enumeration* instance. The static member *names* can be used for any enumeration class to show all of the allowed names for that type of enumeration. For the *TimeSampling* enumeration there are 2 types.:: >>> frow = mr[0].getFieldUsingFieldId() >>> print(frow.toXML()) Field_0 J1924-2914 1 2 1 2 5.082621016035585 -0.5103639492133764 2 1 2 5.082621016035585 -0.5103639492133764 2 1 2 5.082621016035585 -0.5103639492133764 none J2000 0 >>> srow = mr[0].getStateUsingStateId(10) >>> print(srow.toXML()) State_0 NONE True True True 1.0 Some fields in a row indicate a row in another table. The row class has methods that can be used to get the instance of the indicated row as shown here for *fieldId*. For cases like *stateId* you can either get a single StateRow instance as shown here (getting the one for element 10 in that list of stateId values) or you can get all of the rows as a list of rows using the appropriate getter for that field in that row (here it is *getStatesUsingStateId* ). For the ASDM tables and rows and types and enumerations, python lists are used when there are multiple values (and sometimes lists of list or lists of lists of lists when the number of dimensions in that field is more than 1, the sdm currently has cases up to 4 dimensions). For the binary file, the binary data itself is returned as numpy arrays.:: >>> mr[0].getBDFPath() '/users/bgarwood/casa/casatestdata/sdm/uid___A002_X71e4ae_X317_short/ASDMBinary/uid___A002_X71e4ae_X328' >>> bdf = pyasdm.bdf.BDFReader() >>> bdf.open(mr[0].getBDFPath()) >>> print(bdf.getHeader()) XML Schema version = 2 Byte order = Little_Endian startTime = 4890409733520000000 dataOID = uid://A002/X71e4ae/X328 title = ALMA BL Correlator Spectral Data dimensionality = 1 execBlockUID = uid://A002/X71e4ae/X317 execBlockNum = 1 scanNum = 1 subscanNum = 1 numAntenna = 30 correlationMode = CROSS_AND_AUTO spectralResolutionType = FULL_RESOLUTION processorType = CORRELATOR atmospheric phase correction = AP_UNCORRECTED baseband #0: name = BB_1 spectralWindow #0: sw = 1 crossPolProducts = XX YY sdPolProducts = XX YY scaleFactor = 168374.57812 numSpectralPoint = 128 numBin = 1 sideband = LSB baseband #1: name = BB_2 spectralWindow #0: sw = 2 crossPolProducts = XX YY sdPolProducts = XX YY scaleFactor = 168374.57812 numSpectralPoint = 128 numBin = 1 sideband = LSB baseband #2: name = BB_3 spectralWindow #0: sw = 3 crossPolProducts = XX YY sdPolProducts = XX YY scaleFactor = 168374.57812 numSpectralPoint = 128 numBin = 1 sideband = USB baseband #3: name = BB_4 spectralWindow #0: sw = 4 crossPolProducts = XX YY sdPolProducts = XX YY scaleFactor = 168374.57812 numSpectralPoint = 128 numBin = 1 sideband = USB flags: size = 3720 axes = BAL ANT BAB POL actualTimes: size = 3720 axes = BAL ANT BAB POL actualDurations size = 3720 axes = BAL ANT BAB POL zeroLags size = 240 axes = BAL BAB POL correlatorType = FXF crossData size = 890880 axes = BAL BAB SPP POL autoData size = 30720 axes = ANT BAB SPP POL normalized = True Here, the *getHeader* method returns a BDFHeader instance. It has getters for each of these fields. Some of them are simple values, others are enumerations, etc. This shows the organization of the data. Namely that there are 4 basebands here and that each baseband has a single spectral window. Each spectral window has an item that indicates the cross and single dish (auto) correlation products. The scale factor is used to scale the crossData values for that spectral window in that baseband. In general, each baseband could have multiple spectral windows, with different scale factors and cross and sd pol products. The baseband information summarized here is currently available as a list of dict values with one dict valeu for each baseband, in order, through *getBasebandsList()* . At the end are shown the possible binary components that may be found in this BDF. The binary components are found in data subsets. A subset is retrieve by the *getSubset* method and the *hasSubset* method indicates that there are still more subsets available from that bdf. After the *open* method returns the position of the BDF is just before the first subset. For each type of binary data the header shows the expected size in a subset (if that type is found) and the axes used. See the BDF format documentation for more details on the meanings. Eventually the binary data will be available in one or more views that are useful to downstream users. The c++ code provides views that are appropriate when filling a single row of a MS v2 or multiple rows of a MS v2. Neither one of those views are liable to be useful to VIPER.:: >>> bh = bdf.getHeader() >>> bh.getTitle() 'ALMA BL Correlator Spectral Data' >>> print(bh.getProcessorType()) CORRELATOR >>> bb_dict = bh.getBasebandsList()[1] >>> bb_dict {'name': 'BB_2', 'spectralWindows': [{'crossPolProducts': [, ], 'sdPolProducts': [, ], 'scaleFactor': np.float32(168374.58), 'numSpectralPoint': 128, 'numBin': 1, 'sideband': , 'sw': '2'}]} >>> bh.getBinaryTypes() ['flags', 'actualTimes', 'actualDurations', 'zeroLags', 'crossData', 'autoData'] >>> bh.getAxesNames('crossData') ['BAL', 'BAB', 'SPP', 'POL'] >>> bh.getSize('crossData') 890880 This illustrates how some of these values are enumerations. It shows the available binary types and the list of axes and size for the crossData type. In this case, BAL is the baseline axis, BAB is the baseband axis, SPP is the spectral axis (channels) and POL is the axes necessary to hold all of the pol products shown for the spectral window in question. In the more general case that can be complicated, but here there are 30 antennas so the number of BAL elements is 30x29/2 (435), there are 4 basebands, each baseband has a single spectral window with 128 channels, and there are 2 elements along the POL axis. In addition, crossData is stored as a pair of values, one each for the real and imaginary parts. So the expected size here is 435x4x128x2x2 or 890880, as indicated by the size value.:: >>> bdf.hasSubset() True >>> ss = bdf.getSubset() >>> ss.keys() dict_keys(['projectPath', 'integrationNumber', 'subIntegrationNumber', 'midpointInNanoSeconds', 'intervalInNanoSeconds', 'aborted', 'stopTime', 'abortReason', 'actualTimes', 'actualDurations', 'crossData', 'autoData', 'flags', 'zeroLags']) >>> ss['projectPath'] '1/1/1/1' >>> ss['actualTimes'] {'present': False, 'startsAt': -1, 'arr': None, 'type': 'INT64_TYPE', 'np_type': dtype('int64')} >>> for t in bh.getBinaryTypes(): ... print('%s : %s' % (t,ss[t]['present'])) ... flags : True actualTimes : False actualDurations : False zeroLags : True crossData : True autoData : True >>> ss['crossData'] {'present': True, 'startsAt': 18130, 'arr': array([ 13, 43, 72, ..., 29, -59, 26], shape=(890880,), dtype=int16), 'type': 'INT16_TYPE', 'np_type': dtype('int16')} >>> # iterate through to the end with (not all shown, there are 5 subsets in this BDF) >>> bdf.hasSubset() True >>> ss = bdf.getSubset() >>> # until bdf.hasSubset() return False The subset is a dictionary at the moment. It may become a class as development of pyasdm continues. The dictionary has several fields that describe the subset and then it has a field for each of the binary types indicated by the global header. Each of those is itself a dictionary. Not all binary types will be found in each subset. Here, this subset has flags, zeroLags, crossData and autoData. The *present* field indicates whether it's present in that subset. The *arr* field is the array of values found, when present and *startsAt* is the starting location of those bytes in the file (that could be used to skip to that location and read just those values once the type and byte order is known). Note that crossData can be stored as a scaled integer (either 16 or 32 bit integers). The floating point values are recovered by dividing the integer values by the scaleFactor from the global header for that spectral window and baseband. The format also allows crossData to be stored as 32 bit floating point values. Eventually additional code will exist that will serve these values up in a form that is useful for downstream processing, with the accompanying meta information from ASDM as necessary (the views discussed earlier). Example 2 --------- The WSU data will have a single spectral window in each BDF and so extracting and scaling the crossData will be simplified because the BAB axis will always have a single element and the SPP axis will then be a single number instead of something that depends on the BAB element being used. This example is from an SDM that where the BDFs were split to look like how we think WSU data will look like. The script to split a BDF is not robust for general use and is not part of pyasdm. The data used here is from a personal copy to illustrate the difference. Note: eventually it will be possible to close and reopen an ASDM, as it already is with a BDF. Tests indicate that something isn't being cleared properly so that does not yet work. If trying another ASDM or BDF you should currently create one each time, as in this example. :: >>> asdm = pyasdm.ASDM() >>> asdm.setFromFile('~/casa/split_data/uid___A002_X10d9399_X6279') >>> mt = asdm.getMain() >>> mr = mt.get() >>> bdf = pyasdm.bdf.BDFReader() >>> bdf.open(mr[4].getBDFPath()) >>> bh = bdf.getHeader() >>> print(bh) XML Schema version = 2 Byte order = Little_Endian startTime = 5202971372592000000 dataOID = uid://A002/X10d9399/Xe6d title = ALMA ACA Correlator Spectral Data dimensionality = 1 execBlockUID = uid://A002/X10d9399/X6279 execBlockNum = 1 scanNum = 1 subscanNum = 1 numAntenna = 9 correlationMode = CROSS_AND_AUTO spectralResolutionType = FULL_RESOLUTION processorType = CORRELATOR atmospheric phase correction = AP_UNCORRECTED baseband #0: name = BB_1 spectralWindow #0: sw = 1 crossPolProducts = XX YY sdPolProducts = XX YY scaleFactor = 168374.57812 numSpectralPoint = 128 numBin = 1 sideband = LSB flags: size = 90 axes = BAL ANT BAB POL actualTimes: size = 90 axes = BAL ANT BAB POL actualDurations size = 90 axes = BAL ANT BAB POL crossData size = 18432 axes = BAL BAB SPP POL autoData size = 2304 axes = ANT BAB SPP POL normalized = True >>> ss = bdf.getSubset() >>> ss['crossData'] {'present': True, 'startsAt': 4200, 'arr': array([ 81, 6, -46, ..., 25, 53, -44], shape=(18432,), dtype=int16), 'type': 'INT16_TYPE', 'np_type': dtype('int16')} >>> import numpy as np >>> farr = ss['crossData']['arr'].astype(np.float32) >>> spw = bh.getBasebandsList()[0]['spectralWindows'][0] >>> scaleFactor = spw['scaleFactor'] >>> farr /= scaleFactor >>> nAnt = bh.getNumAntenna() >>> nbl = int(nAnt*(nAnt-1)/2) >>> nchan = spw['numSpectralPoint'] >>> npol = len(spw['crossPolProducts']) >>> shape = (nbl,nchan,npol,2) >>> farr_shaped = farr.reshape(shape) >>> carr_shaped = farr_shaped[:,:,:,0] + 1j * farr_shaped[:,:,:,1] >>> carr_shaped.shape (36, 128, 2) >>> carr_shaped[0,0,0] np.complex64(0.00048107025+3.5634832e-05j) This subset also has actualTimes and actualDurations and does not have flags or zeroLags data. Knowing that there is a single baseband here with a single spectral window simplifies extracting the data. Here, the crossData array is converted to a 32-bit float array and the scale factor is applied. The array is then reshaped, including the real and imaginary array implied as the final axis (note that the baseband axis is skipped in makeing the shape as it as a single element here). Then a complex array is created from the real and imaginary parts of the floating point values and the value at the origin of the resulting array is printed. The axes are baseline, channel, and polarization (here "XX" and "YY"). See the BDF documentation for how baseline are ordered. All of this illustrates the need for well-defined views that are useful for VIPER use. Presenting the data in such a view isn't difficult, but that detail obviously needs to be hidden from the user.