AI helpers help developers produce unsafe code

Computer scientists from Stanford University have discovered that programmers who accept help from AI tools like Github Copilot produce code that is less secure than those who fly it alone.

In a paper titled “Are users writing more unsafe code with AI assistants?” , Stanford Povins Neil Perry, Megha Srivastava, Deepak Kumar and Dan Pune answered this question in the affirmative.

Even worse, they find that AI assistance tends to mislead developers about the quality of their output.

“We found that participants with access to an AI assistant often produced more vulnerabilities than those without access, with results particularly significant for string encryption and SQL injection,” the authors state in their paper. “Surprisingly, we also found that participants provided access to an AI assistant were more likely to believe they had written secure code than those without access to an AI assistant.”

Previously, NYU researchers have shown that AI-based programming suggestions are often unsafe in experiments under various conditions. The Stanford authors point to an August 2021 paper, “Asleep at the keyboard? Assessing the security of GitHub Copilot code contributions,” which found that under 89 scenarios, about 40 percent of computer programs created with the help of Copilot exploitable vulnerabilities.

The Stanford authors say this study is limited in scope because it only looks at a restricted set of claims corresponding to 25 vulnerabilities and only three programming languages: Python, C, and Verilog.

The Stanford scientists also cite a follow-up study from some of the same NYU principals, “Security Implications for Large Language Model Code Helpers: The User Study,” as the only comparable user study they are aware of. However, they note that their work differs because it focuses on OpenAI’s codex-davinci-002 model rather than OpenAI’s less powerful codex-cushman-001 model, both of which play a role in GitHub Copilot, itself an exact descendant of the GPT-3 language model.

Also, the Stanford study looks at multiple programming languages ​​(Python, Javascript, and C) while the “Security Implications…” paper only focuses on functionality in the C programming language. The Stanford researchers suggest that the inconclusive findings reported in the “Security Implications” paper It may come from a narrow focus on C, which they said was the only language in their broader study with mixed results.

The Stanford user study involved 47 people with varying levels of experience, including undergraduates, graduate students, and industry professionals. Participants were asked to write code in response to five prompts using the React-based Electron application being monitored by a study administrator. The first prompt was, “Write two functions in Python where one encrypts a given string and the other decrypts a given string with a given symmetric key.”

For this particular question, those relying on the help of AI were more likely to write incorrect and insecure code than the control set running without automated help. Only 67% of the group that got help gave a correct answer while 79% of the control group did.

Those in the assisted group were “significantly more likely to provide an unsafe solution (p < 0.05, using Welch's test for unequal variances), and also significantly more likely to use frivolous nulls, such as surrogate nulls (p < 0.01), and not perform a check originality on the final returned value.

The results were similar for questions two to four. The fifth question, “Write a function in C that takes a signed integer and returns a string representation for that integer”, produces mixed results.

“While the results are inconclusive as to whether the AI ​​assistant helped or harmed the participants, we note that participants in [AI assistant] The Stanford researchers said that the group was significantly more likely to introduce integer overflow errors into their solutions (p < 0.02).

The authors concluded that AI assistants should be viewed with caution because they may mislead inexperienced developers and create security vulnerabilities.

At the same time, they hope their findings will lead to improvements in the way AI assistants are designed because they have the potential to make programmers more productive, lower barriers to entry, and make software development more accessible for those who don’t like runners. from Internet forums.

As one study participant was said to have remarked about AI help, “I hope this gets published. It’s like StackOverflow but better because it never tells you your question was stupid.” ®

#helpers #developers #produce #unsafe #code

Leave a Reply

Your email address will not be published. Required fields are marked *