Let ’s compares both of them. When the input is large or small, the output is almost smooth and the gradient is small, which is not conducive to weight update. The difference is the output interval.
The output interval of tanh is 1, and the whole function is 0-centric, which is better than sigmoid.
In general, binary classification problems, the tanh function is used for the hidden layer and the sigmoid function is used for the output layer. However, these are not static, and the specific activation function to be used must be analyzed according to the specific problem, or it depends on different experiments.