type
Post
Created date
Mar 9, 2024 09:16 AM
category
Data Science
tags
Machine Learning
Artificial Intelligence
status
Published
Language
English
From
summary
Exploring sigmoid and softmax functions for predicting loan defaults effectively.
slug
softmax-sigmoid
password
Author
Priority
Featured
Featured
Cover
Origin
Type
URL
Youtube
Youtube
icon
Difference between Sigmoid and Softmax
A conversation on what use of activation functions in the context of predicting loan defaults.
dialogue between Alex, a Machine Learning Engineer, and Jordan, a Product Manager, discussing the use of activation functions in the context of predicting loan defaults.
Jordan: Alex, I'm trying to understand how we're using neural networks for our loan default prediction model. Specifically, what's this about using different activation functions?
Alex: Sure, Jordan. In our neural network, an activation function decides whether a neuron should be activated or not. It's like deciding if a piece of information is relevant for the prediction.
Jordan: Okay, and what’s the role of the Sigmoid function here?
Alex: The Sigmoid function is perfect when we’re making a binary decision. In the context of loan defaults, it helps us decide between two classes: will default or will not default. It outputs a value between 0 and 1, which we can interpret as a probability.
Jordan: Got it. And the Softmax function?
Alex: Softmax is used when we have more than two classes. Although for loan defaults we generally have a yes or no decision, if we had multiple levels of risk we wanted to classify, like 'low', 'medium', or 'high', Softmax would be suitable as it gives a probability distribution across those classes.
Jordan: That makes sense. But what do you mean by using them in different layers?
Alex: In a neural network, we have an input layer, hidden layers, and an output layer. The Sigmoid can be used in the final layer for binary outcomes like our case, while Softmax is typically used in the final layer for multi-class problems. However, we can also use them in hidden layers to help model complex relationships.
Jordan: So in hidden layers, they help in understanding the complex patterns regarding who might default on a loan?
Alex: Exactly! They determine what information is passed forward through the network, contributing to our final prediction.
Jordan: Makes sense now. The activation function is crucial in shaping the output at each layer, whether it's recognizing simple patterns or making the final prediction in our loan default scenario.
Alex: Precisely. Each function plays a significant role in our model's ability to learn from the data and make accurate predictions.
A dialogue between Taylor, a Data Scientist, and Casey, a Data Analyst, discussing the code
Code from Softmax/SoftmaxActivation.py at main · AIMLModeling/Softmax (github.com)
from numpy import exp import numpy as np import matplotlib.pyplot as plt # calculate the softmax of a vector def softmax(vector): e = exp(vector) return e / e.sum() def sigmoid(x): return 1/(1 + np.exp(-x)) # define data data = [-1.5, 2.2, -0.8, 3.6] # convert list of numbers to a list of probabilities print(f"Input vector:{data}") result_softmax = softmax(data) # report the probabilities print(f"softmax result:{result_softmax}") sum_softmax=0.0 for i in range(0, len(result_softmax)): sum_softmax = sum_softmax + result_softmax[i]; print(f"Sum of all the elements of softmax results: {sum_softmax}"); print("") sig_result=[0] *len(data) sum_sigmoid=0 for i in range(0, len(data)): sig_result[i] = sigmoid(data[i]); print(f"Sigmoid result {i}: {sig_result[i]}") sum_sigmoid = sum_sigmoid + sig_result[i]; print(f"Sum of all the elements of Sigmoid results: {sum_sigmoid}"); x = np.linspace(-10, 10, 100) y = softmax(x) plt.scatter(x, y) plt.title('Softmax Function') plt.show()
Casey: Hey Taylor, I came across this code snippet that uses softmax and sigmoid functions, and I'm having trouble understanding it. Can you walk me through it?
Taylor: Of course, Casey! Let's start with the basics. Both softmax and sigmoid are activation functions in neural networks, which you already know. This code defines these functions and then applies them to a data vector.
Casey: Okay, I see two functions defined here,
softmax
and sigmoid
. What's the difference between them?Taylor: The
softmax
function is used to convert a vector of numbers into a vector of probabilities, where the probabilities of each value are proportional to the exponentials of the input numbers. On the other hand, sigmoid
function gives us the probability between 0 and 1 for an individual value.Casey: I see, so
softmax
is about the whole vector, and sigmoid
is for individual values. Why do we need to convert numbers into probabilities?Taylor: In the context of machine learning, probabilities help us make decisions. For instance, if we're trying to classify data into categories, probabilities give us a measure of confidence about our classifications.
Casey: Got it. Now, the code has a data vector. What does it represent?
Taylor: It's just an example data vector to demonstrate the functions. Think of it as raw scores or logits that you might get from the output layer of a neural network before activation.
Casey: Makes sense. And then we apply
softmax
to this vector, right?Taylor: Yes, we pass the data vector through the
softmax
function which normalizes these values into probabilities that sum up to 1, making it a proper probability distribution.Casey: The code prints the result and the sum of the softmax results. Why is the sum important?
Taylor: It's to show that
softmax
has done its job correctly. The sum of probabilities should be 1, which confirms that we have a valid probability distribution.Casey: Okay, and the
sigmoid
function is applied in a loop. Why is that?Taylor: The
sigmoid
function is meant for individual numbers. The loop applies sigmoid
to each number in the data vector separately, giving us a list of probabilities.Casey: So we end up with two different lists of probabilities, one from
softmax
and another from sigmoid
?Taylor: Exactly.
softmax
gives a distribution across our vector, useful for multi-class classification. sigmoid
gives individual probabilities, useful for binary classification.Casey: What about the scatter plot at the end?
Taylor: That's a visual representation of the softmax function. It plots the softmax probabilities for a range of values from -10 to 10. It's useful to see how the function behaves across different inputs.
Casey: Now it's clearer. We're using these functions to understand the probabilities of different outcomes, and the plot is to see how
softmax
assigns probabilities.Taylor: You've got it, Casey! And remember, understanding the output of these functions is key in predicting outcomes like whether a loan will default or not, based on the learned patterns.
Casey: Thanks, Taylor. This was really helpful!
I hope this dialogue clarifies the context and functionality of the code for you.
A similar conversation between Taylor and Casey discussing how to apply the softmax
and sigmoid
functions in the context of loan default predictions.
Refined Code
from numpy import exp import numpy as np import matplotlib.pyplot as plt # Sigmoid function for binary classification def sigmoid(x): return 1/(1 + np.exp(-x)) # Define logits for loan default probabilities # A real-world model would output these logits based on application data loan_logits = np.array([0.8, -1.2, 3.0]) # Example logits from our model # Calculate the probability of default using sigmoid default_probabilities = sigmoid(loan_logits) # Print probabilities of default print(f"Loan default probabilities: {default_probabilities}") # Plotting the sigmoid function x = np.linspace(-10, 10, 100) y = sigmoid(x) plt.plot(x, y) plt.title('Sigmoid Function') plt.xlabel('Logits') plt.ylabel('Default Probability') plt.show()
Taylor: Here, we have a list of logits,
loan_logits
, which our neural network has determined based on loan application data. The sigmoid
function is then used to calculate the probability of default for each application.Casey: I understand now. We're not using
softmax
here because we don't have multiple categories, right?Taylor: Exactly. We're only predicting if someone will default or not, which is a binary outcome. If we were assigning applications to different risk categories, that's when
softmax
would come into play.Casey: And the plot shows us how the
sigmoid
function translates logits into probabilities?Taylor: Correct. It's a visual way to understand how changes in logits affect the probability of default.
Casey: Thanks, that makes it clear how we apply these functions to loan defaults.
Reference:
- Author:Jason Siu
- URL:https://jason-siu.com/article/softmax-sigmoid
- Copyright:All articles in this blog, except for special statements, adopt BY-NC-SA agreement. Please indicate the source!
Relate Posts