Unexpected behavior when using batch_jacobian with multiple inputs/outputs in quantum-classical neural network

I’m implementing a neural network that includes quantum layers (using PennyLane’s qml.qnn.KerasLayer) to solve ODEs. I want to encode several points at once and get several results of the ODE (one result per corresponding input) at the same run.
Currently I’m running a toy model where I try to get u(x)=sin(x) according to the loss function: du_dx - cos(x).

The network structure is:

network structure
NN = tf.keras.models.Sequential([
    tf.keras.layers.Input(shape=(in_out_size,)),
    tf.keras.layers.Dense(n_qubits, activation="tanh"),
    qml.qnn.KerasLayer(qnode, weight_shapes, output_dim=n_qubits),
    tf.keras.layers.Dense(in_out_size)
])

The quantum circuit uses StronglyEntanglingLayers:

quantum circuit
@qml.qnode(dev, diff_method='best')
def qnode(inputs, weights):
    qml.AngleEmbedding(inputs, wires=range(n_qubits))
    qml.templates.StronglyEntanglingLayers(weights, wires=range(n_qubits))
    return [qml.expval(qml.PauliZ(wires=i)) for i in range(n_qubits)]

The gradient method:

gradient method
    def compute_gradients_1st_der_jac(self, inputs):
        with tf.GradientTape(persistent=True) as tape1:
            tape1.watch(inputs)
            outputs = self.model(inputs)
            batch_size = tf.shape(inputs)[0]
            n_features = tf.shape(inputs)[1]
            diagonal_mask = tf.eye(n_features)
        jacobian = tape1.batch_jacobian(outputs, inputs)
        first_derivatives = tf.reduce_sum(jacobian * diagonal_mask, axis=[2])
        del tape1
        return outputs, first_derivatives, first_derivatives

When computing derivatives using batch_jacobian:

  • With in_out_size=1: derivatives correctly correspond to spatial derivatives
  • With in_out_size=n_qubits: derivatives don’t match expected spatial derivatives

Question: Why does increasing input/output dimensions affect the derivative computation, even when the Jacobian’s batches appears diagonal (like in TF’s documentation example: Advanced automatic differentiation  |  TensorFlow Core)?

The same behavior is observed when I change the quantum layer to tf.keras.layers.Dense(n_qubits, activation="tanh") or when I use tf.gradient, so I don’t think it’s due to the quantum circuit or the kind of derivative.

results for the two cases:

Thanks in advance!

2 Likes