This is just gender-bias in English, which isn’t even a gendered language like Spanish. Whenever I ask it a question in Spanish, it just defaults to the masculine, even if I used the feminine in the prompt.
Were the pair of questions asked in separate context windows? (i.e. fresh conversation). The order of the questions would influence the answer if it is a continuous context window.
This is not specified in the methodology, and the examples are all in continuous windows...
I am MUCH more concerned about POLITICAL bias then gender bias. I would ask, to what end is gender bias a more important issue then political bias? And to that end the spectrum of pure ideological biases should be looked at through that lens, not starting with GENDER in mind. Gender is but one small part of a much larger dataset bias problem, in my opinion, and we should be focusing our energy on that. Solve the ideological biases and you might actually solve these more ambiguous issues you are pointing out. That's my 2 cents.
When interpreting text that is inherently ambiguous, people and machines are going to guess at the probability of certain interpretations to come up with a default interpretation. The rational approach is to grasp the interpretation is only a default and is subject to change when new information comes in. If we read a sentence saying that a "Woman let go of the golf ball" our minds will leap ahead to interpreting that as likely meaning the ball fell to the ground or floor. Of course it could turn out that contrary to our expectations the woman was located on a space station and the ball floated, or the woman was in a plane that was diving and the ball slammed into the ceiling. When interpreting sentences we use probabilistic reasoning implicitly and it seems to make sense that'll be embedded implicitly in these systems.
That first "bias" example is reasoning based on probabilities in a way most humans likely would when reading such a sentence. Its not clear why that is a problem.
It seems the concern over problematic bias should be where an entity is incapable of grasping that their assumptions may need to be changed after they turn out to be wrong or acting on assumptions is if they were certainties in a way that causes trouble. Merely making a wrong guess when the real world turns out to not match the probabilistic likelihood default guess isn't the problematic aspect of"bias". Its only the issue of how they handle default assumptions that needs to be dealt with, not the existence of default assumptions. There may be issues with how they handle flawed assumptions: but to deal with that issue its seems important to carefully think through what the problem is to tackle it the right way.
The fact that the real world has probability distributions that many see as problematic doesn't change the reality that they exist. Trying to train AI systems to pretend distributions are different than reality in one aspect of reality may unintentionally lead them to distort reality in other ways and ignore data.
I don't understand your emphatic claim that the GPT interpretation is simply wrong. There is a perfectly consistent world in which people celebrate their birthdays by giving others presents (in fact I have personally experienced variants of this), and in that scenario "his" would refer to the mechanic, without any gender bias. I am prepared to accept that such an interpretation is unlikely, because we have information about the likely direction of birthday gift-giving in current societies. However, that is a probabilistic argument, not one that can be simply asserted as self-evident.
And just to be clear: the OP experiment works to detect bias because changing the pronoun leaves the semantics unchanged, yet changes the GPT answer. I'm arguing that your analysis of the sentence is flawed.
My comment referred to the first example in the article that seemed to be the one chosen to highlight claimed bias: yet that example merely showed ambiguity interpreted based on probability. You may be right about the benchmark: but that in no way impacts my comment which didn't reference the benchmark.
You also merely linked to a list of sentences. It may well be that GPT-4 does get such a clearly flawed logical statement wrong: but that isn't in that list and I haven't hunted for whatever information is out there that might make your case for you. Hopefully there is clear data regarding that type of testing where the answer is explicitly wrong rather than merely based on a probability assessment. I hadn't taken time to check the various links to see if there is something that clearly describes testing and results for such a thing.
In a world where 70% of clerks are female, I would guess that sentence triggers a mental ambiguity-detector in our brains, which causes a second reading followed by more careful step-by-step analysis to reach the true meaning.
It would be interesting if GPT-4 could be made to reveal if it finds any ambiguity in that sentence. If so, then the solution might be some sort of step-by-step prompt with a rule list questioning bias assumptions whenever ambiguity is high. This would much closer match human reasoning I would think.
GPT-4 acts statistically so to ask for ambiguity would not be possible. For GPT-4, ambiguity would be 50/50 likelihood of a correct match, whereas finding ambiguity in something like the WinoBias data set requires understanding that there is a difference between statistical (historical) data, and the ideal present.
Yes, figuring out the difference between ambiguity and ignorance is a challenge. I think a possible solution to both has been proposed as a sampling strategy (SelfcheckGPT) which runs the same query through multiple times and then measures how consistent the answers are. Consistent answers is a confident answer (and is it at least true to the training set, whatever weaknesses that may have). Inconsistent answers, on the other hand, could trigger an automated step-by-step prompt to help the model be more explicit about its ignorance, confusion or bias.
I've put forward a very simplified 5 Laws of Robotics which reflect the socio-legal answers to the major areas of problems with robotics (and AI in robotics). Interested to see at which levels you think that confidence (as in consistent answers after rigorous self checking) would be practical as a remedy.
Those look well-thought out and reasonable as strong guidelines for robot/AI-developers.
I also think that large-language model performance is good evidence that a robot/AI can understand and internalize these moral/social written directives, too-- not just the literal meaning but the spirit behind them. This is initially surprising to me but this ability might be because humans write about moral/social issues more than anything else and that's what the model was trained on.
So if that premise is correct, the real challenge is the same one we face from time to time: moral dilemmas resulting from an inability to do the right thing. Self-driving car: swerving into a crowd to protect the driver from a head on-collision, or sacrifice the driver for the greater good?
Measuring LLM results and confidence on moral and social dilemmas will tell us a great deal how safe LLMs (and the robot presumably under control of the LLM) are I think. But dilemmas should shake an LLM's confidence just like it does our own. What is the right thing to do when you have to break one of your moral rules under situations of uncertainty?
This is just gender-bias in English, which isn’t even a gendered language like Spanish. Whenever I ask it a question in Spanish, it just defaults to the masculine, even if I used the feminine in the prompt.
Were the pair of questions asked in separate context windows? (i.e. fresh conversation). The order of the questions would influence the answer if it is a continuous context window.
This is not specified in the methodology, and the examples are all in continuous windows...
Yes
Interesting post, thank you. The distinction of implicit and explicit bias is important, and I guess it will have regulatory ramifications.
I am MUCH more concerned about POLITICAL bias then gender bias. I would ask, to what end is gender bias a more important issue then political bias? And to that end the spectrum of pure ideological biases should be looked at through that lens, not starting with GENDER in mind. Gender is but one small part of a much larger dataset bias problem, in my opinion, and we should be focusing our energy on that. Solve the ideological biases and you might actually solve these more ambiguous issues you are pointing out. That's my 2 cents.
When interpreting text that is inherently ambiguous, people and machines are going to guess at the probability of certain interpretations to come up with a default interpretation. The rational approach is to grasp the interpretation is only a default and is subject to change when new information comes in. If we read a sentence saying that a "Woman let go of the golf ball" our minds will leap ahead to interpreting that as likely meaning the ball fell to the ground or floor. Of course it could turn out that contrary to our expectations the woman was located on a space station and the ball floated, or the woman was in a plane that was diving and the ball slammed into the ceiling. When interpreting sentences we use probabilistic reasoning implicitly and it seems to make sense that'll be embedded implicitly in these systems.
That first "bias" example is reasoning based on probabilities in a way most humans likely would when reading such a sentence. Its not clear why that is a problem.
It seems the concern over problematic bias should be where an entity is incapable of grasping that their assumptions may need to be changed after they turn out to be wrong or acting on assumptions is if they were certainties in a way that causes trouble. Merely making a wrong guess when the real world turns out to not match the probabilistic likelihood default guess isn't the problematic aspect of"bias". Its only the issue of how they handle default assumptions that needs to be dealt with, not the existence of default assumptions. There may be issues with how they handle flawed assumptions: but to deal with that issue its seems important to carefully think through what the problem is to tackle it the right way.
The fact that the real world has probability distributions that many see as problematic doesn't change the reality that they exist. Trying to train AI systems to pretend distributions are different than reality in one aspect of reality may unintentionally lead them to distort reality in other ways and ignore data.
I don't understand your emphatic claim that the GPT interpretation is simply wrong. There is a perfectly consistent world in which people celebrate their birthdays by giving others presents (in fact I have personally experienced variants of this), and in that scenario "his" would refer to the mechanic, without any gender bias. I am prepared to accept that such an interpretation is unlikely, because we have information about the likely direction of birthday gift-giving in current societies. However, that is a probabilistic argument, not one that can be simply asserted as self-evident.
And just to be clear: the OP experiment works to detect bias because changing the pronoun leaves the semantics unchanged, yet changes the GPT answer. I'm arguing that your analysis of the sentence is flawed.
My comment referred to the first example in the article that seemed to be the one chosen to highlight claimed bias: yet that example merely showed ambiguity interpreted based on probability. You may be right about the benchmark: but that in no way impacts my comment which didn't reference the benchmark.
You also merely linked to a list of sentences. It may well be that GPT-4 does get such a clearly flawed logical statement wrong: but that isn't in that list and I haven't hunted for whatever information is out there that might make your case for you. Hopefully there is clear data regarding that type of testing where the answer is explicitly wrong rather than merely based on a probability assessment. I hadn't taken time to check the various links to see if there is something that clearly describes testing and results for such a thing.
In a world where 70% of clerks are female, I would guess that sentence triggers a mental ambiguity-detector in our brains, which causes a second reading followed by more careful step-by-step analysis to reach the true meaning.
It would be interesting if GPT-4 could be made to reveal if it finds any ambiguity in that sentence. If so, then the solution might be some sort of step-by-step prompt with a rule list questioning bias assumptions whenever ambiguity is high. This would much closer match human reasoning I would think.
GPT-4 acts statistically so to ask for ambiguity would not be possible. For GPT-4, ambiguity would be 50/50 likelihood of a correct match, whereas finding ambiguity in something like the WinoBias data set requires understanding that there is a difference between statistical (historical) data, and the ideal present.
Yes, figuring out the difference between ambiguity and ignorance is a challenge. I think a possible solution to both has been proposed as a sampling strategy (SelfcheckGPT) which runs the same query through multiple times and then measures how consistent the answers are. Consistent answers is a confident answer (and is it at least true to the training set, whatever weaknesses that may have). Inconsistent answers, on the other hand, could trigger an automated step-by-step prompt to help the model be more explicit about its ignorance, confusion or bias.
I've put forward a very simplified 5 Laws of Robotics which reflect the socio-legal answers to the major areas of problems with robotics (and AI in robotics). Interested to see at which levels you think that confidence (as in consistent answers after rigorous self checking) would be practical as a remedy.
Those look well-thought out and reasonable as strong guidelines for robot/AI-developers.
I also think that large-language model performance is good evidence that a robot/AI can understand and internalize these moral/social written directives, too-- not just the literal meaning but the spirit behind them. This is initially surprising to me but this ability might be because humans write about moral/social issues more than anything else and that's what the model was trained on.
So if that premise is correct, the real challenge is the same one we face from time to time: moral dilemmas resulting from an inability to do the right thing. Self-driving car: swerving into a crowd to protect the driver from a head on-collision, or sacrifice the driver for the greater good?
Measuring LLM results and confidence on moral and social dilemmas will tell us a great deal how safe LLMs (and the robot presumably under control of the LLM) are I think. But dilemmas should shake an LLM's confidence just like it does our own. What is the right thing to do when you have to break one of your moral rules under situations of uncertainty?