HI!
I build many neurtal network in the las years but always into memory loading data. Now I have to upscale to feed and train my models with big data from climate models grid.
My datas are stored into a .zarr file, I use xarray build on Dask to load it. I need to do some data engineering before feed it to my models.
Working into Graph (lazy loading) mode is new for me so I struggle with many detail at this point.
1- can I turn my xarray/Dask into tf.data.dataset without full memory loading??
2- It is possible to get rid of xarray, build graph directly with tensorflow from data read?
I did a lot of search and all tuto seam build in 2 way, tensorflow read from csv or somes simples files like, or transform datas to TFR but it mean duplicate my datas…
May be I take the problem in the wrong way… some help to give me the good dirrection will be greatly appreciated.
All th try I did fall memory error like I fell to stay into graph mode…
here an exemple:
I try to split precipitation into wet, solide and mix, so all sample too far from freazing point are useless (or I get unbalance datas).
One of may engineering data task is to group labels 1,3,5 into one category and 7,7,12 into an other one.
ds = xr.open_zarr(path_spatial, consolidated=True, chunks='auto')
# stack to get an easyest way to clean not usefull sample
xdf = ds.stack(indx=('time', 'latitude', 'longitude'))
red_df = xdf.sel(indx=(xdf['ptype'] != 0) & (xdf['ptype'].notnull()), drop=True)
xdf.sel(… fall idle and finaly crash into memory problem.
All help is welcome