explain the activation function in neural network with an actual case and demonstrate it in python code
In a neural network, an activation function is used to introduce non-linearity into the output of a neuron. It is applied to the weighted sum of the inputs and biases in a neuron神經元, which produces the neuron's output.
One commonly used activation function is the Rectified Linear Unit (ReLU), which is defined as:
f(x) = max(0, x)
The ReLU function returns the input value if it is greater than zero, and returns zero otherwise. This means that the function is "activated" only when the input is positive, and otherwise it has no effect. This non-linear behavior allows the neural network to model more complex functions than a purely linear model.
Here's an example of how to implement ReLU activation in Python using NumPy:
import numpy as np
# Define the ReLU function
def relu(x):
return np.maximum(0, x)
# Example input vector
x = np.array([-1, 2, 3, -4, 0])
# Apply ReLU activation
y = relu(x)
# Print the output
print(y)
In this example, we define the ReLU function using NumPy's maximum
function, which returns the element-wise maximum of two arrays. We then apply the function to an example input vector x
and print the output. The ReLU activation function will set any negative input values to zero, while leaving positive values unchanged.
How many function types are there for activation functions?
There are several types of activation functions that are commonly used in neural networks. Here are a few examples:
-
Sigmoid: This function is defined as
f(x) = 1 / (1 + exp(-x))
. It produces a smooth "S"-shaped curve that maps any input value to a value between 0 and 1. This function is often used in the output layer of a neural network to produce a probability value. -
ReLU (Rectified Linear Unit): This function is defined as
f(x) = max(0, x)
. It returns the input value if it is greater than zero, and returns zero otherwise. This function is commonly used in hidden layers of neural networks, as it introduces non-linearity and helps to prevent the vanishing gradient problem. -
Tanh (hyperbolic tangent): This function is defined as
f(x) = (exp(x) - exp(-x)) / (exp(x) + exp(-x))
. It produces a smooth "S"-shaped curve that maps any input value to a value between -1 and 1. This function is similar to the sigmoid function, but it produces negative output values as well. -
Softmax: This function is used in the output layer of a neural network that has multiple classes. It takes a vector of inputs and normalizes them so that the output values sum to 1. This produces a probability distribution over the classes.
There are also other types of activation functions, such as the Leaky ReLU, ELU, and Swish functions, among others. Each of these functions has different properties and may be more suitable for different types of neural networks or specific use cases.