-
Hi! I've been working on bringing an observational oceanographic dataset into xarray and am having trouble getting the dimensions to look right. Currently, it starts out as a Pandas DataFrame with rows organized by sample number ("unique_ID"). Within each row, I have metadata specifying the depth and station number each sample was collected at. There are multiple replicate samples for a given (depth, station number) combination. Each row has a number of data columns unique to each sample, and also other data values that are unique to the (depth, station number) combination, but shared between replicate samples (e.g. salinity, which is measured separately from the samples but copied to match the rows). Additionally, there are certain columns that "re-label" the depth and station number dimensions; potential density for the depth values, and lat and lon values for the station numbers. When I bring this into xarray, I can set the sample number ("unique_ID") as the index, which becomes the single dimension for the dataset. I then use set_coords() to set the station number and depth as coordinates:
This allows me to do a groupby mean operation on (depth, station number), which gives me a dataset with depth and station number as coordinate and means of all the sample values, but gets rid of all the data that was common to (depth, station number) values (e.g. the potential densities which correspond to depths but are different between stations, or salinities which are collected for each (depth, station number) but don't actually correspond to sample numbers).
My question is how do I reorganize this dataset so that I keep all the data variables when I do a groupby operation like this? Do I need to set station_ID and depth as dimensions before doing the groupby, and then make the data variables that only depend on those be indexed on those instead of on unique_ID alone? What functions would I use to do those steps? Thanks in advance, and please let me know if I can clarify anything! |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
Hi @rmshkv !
IIUC To help more, we'd need a sample dataset (synthetic is fine) |
Beta Was this translation helpful? Give feedback.
Hi Deepak,
Thanks! That's actually pretty much what I ended up doing. To record in case it's useful to anyone else: it ended up being more straightforward to do this in Pandas before converting to xarray. I set
station_ID
anddepth
as a Pandas MultiIndex on my metadata dataset, then converted that to xarray. From there the dimensions were the same as my data so I was able to merge the two xarray datasets (metadata and data) and get what I wanted.