@ylecun
@arthur_spirling Arguably, what one would ideally need to compute is not the exact gradient but the direction of steepest descent assuming a step size commensurate with the expected change in parameters. No need for this fancy infinitesimally-small fluxion stuff!