Comparing gradients in gradientTape with finite difference approximation

Background I am part of a team developing a TensorFlow model trained to emulate a physical model. This CNN-based model takes physical parameters and calculates physical properties like flow or temperature. Part of the capabilities involve using automatic differentiation of the forward model to tackle the inverse problem - inferring physical model parameters from observations. While the forward model emulator works very well, I have been trying to diagnose problems with the inverse capabilities that perform far less well than the emulated physical model. One test that I thought would be very simple was to compare the gradients that I get from the gradientTape with gradients calculated by doing finite differencing of small perfurbations in my parameters i.e. a sort of brute force approach. Ultimately these gradients are then used to minimize a cost function, which represents misfit between the forward model and observations. Clearly it is very important that these gradients are very accurate in order to find a good minima.
A simplified version of the code I am running to compare the gradients looks like this:
def fwdWrapper(params,state,i,j,h):
    state.slidingco[i,j].assign(state.slidingco[i,j] + h)
    fieldin = [vars(state)[f] for f in params.iflo_fieldin]
    Y = state.forwardModel(fieldin )
    cost = misfit(params,state)
    state.slidingco[i,j].assign(state.slidingco[i,j] - h)
    
    return cost


def compareGradients(params, state):
    with tf.GradientTape() as tape:
        tape.watch(state.slidingco)
        cost = fwd_wrapper(params, state,0,0,0)
    grad = tape.gradient(cost, state.slidingco)

    Ny, Nx = state.slidingco.shape
    h = 1e-4
    dJdC = np.zeros((Ny, Nx))
    for i in range(Ny):
        for j in range(Nx):
            J1 = fwd_wrapper(params, state, i,j,h)
            J2 = fwd_wrapper(params, state, i,j,-h)
            dJdC[i, j] = (J1 - J2) / (2.0 * h)

So specifically I’m comparing dJdC with grad. I have three problems:

  1. For small h which would be desirable to get a more accurate finite difference gradient, the change in cost is zero, and hence dJdC is also zero, although it is non-zero in grad from the gradientTape, and more generally I know it should not be zero.
  2. Where dJdC is not zero, it is generally not close to grad and at best you could say there is some correlation between the two.
  3. In many cases, particularly for small h, I find the resulting dJdC matrix consists of only a few values i.e. the cost seems to change in steps although I would expect it to be continuous.

Both 2. and 3. are improved somewhat by (a) retraining the forward model on my specific setup before running the test and (b) increasing h (up to a certain point).

Clearly I do not believe there is a problem with Tensorflow’s gradientTape, so my question is what am I missing here? I must be doing something very stupid. The only other thing I can think of that could explain some of this is that I’m running into some precision issues. I should add that I’m not very experienced with Python which might be clear from the above code sample. I’d really appreciate any help as I don’t feel like I can proceed until I convince myself there is not a problem with the gradients or how these are being applied to minimize the cost.