I do not know about using a channels_first architecture. All of the Keras models support it.
But, channels_last is the “native” format of the API, and you are better off using it.
tf.transpose() does an n-dimensional transpose of the input data.
For 2D data, this turns channels_first into channels_last:
tf.transpose(data, perm=[1, 2, 0])
It shifts the 2D data from (1,2) to (0,1) and moves the channel from (0) to (2).
If you use Keras, a Lambda layer with this will do what you need. This should probably be a built-in convenience layer in Keras.