GPT-4能自我进化吗MIT与微软的代码实验揭示了一个简单的手工科技小发明
机器之心报道
编辑:赵阳
作为最先进的大型语言模型,GPT-4 能够自我纠正生成代码,并结合人类反馈进一步提高其自我纠正能力。
大型语言模型(LLM)已被证明能够从自然语言中生成代码片段,但在面对复杂编码挑战时,如专业竞赛和软件工程专业面试,仍然存在巨大的挑战。近期研究尝试通过利用自修复来提高模型的编码性能。自修复指的是让模型反思并修正自己代码中的错误。
下图展示了基于自修复方法的典型工作流程。在这个过程中,首先给定一个规范,然后从代码生成模型中对程序进行采样;接着在提供的单元测试上执行程序;如果程序在任何单元测试中失败,则将错误消息和错误程序提供给一个反馈生成模型,该模型输出代码失败原因的简短解释;最后,将反馈传递给修复模型,该模型生成程序的最终固化版本。
这种设计看似非常吸引人,因为它能让系统克服解码过程中的离群样本引起的问题,并且可以轻松地整合来自编译器、静态分析工具和执行引擎等符号系统的反馈,就像人类软件工程师编写代码时所做的一样的试错方式。
实验绘制了这两个量的bootstrapped estimates(一种统计估计方法,用于评估参数估计的不确定性)。为了获得这些数值,本文首先为每个任务规范生成一个非常大的树,其中包含初始多个程序样本,每个错误程序有多个反馈字符串,每个反馈串也有多个候选解决方案。然后,从这个冻结数据集中,对不同的树进行子采样,以计算通过率和树大小。
实验结果显示,对于具有挑战性的编程难题,本文提出的self-repair技术比不使用该技术更有效。此外,更强大的feedback model会提高model repair performance,而即使是最强大的model,让人类参与提供feedback也能带来更好的repair performance。本文使用Python programming challenges data set评估了这些问题。
Self-repair需要强大models and diverse initial samples. The results show that for GPT-3.5, the pass@t is always lower or equal to the baseline (black line), indicating that self-repair is not effective for GPT-3.5. However, for GPT-4, there are several n_p and n_fr values where the self-repair's pass@t significantly outperforms the baseline.
The feedback improved GPT-3.5's self-repair ability. The experiment shows that using a separate stronger model to generate feedback does improve the absolute performance of M_P=GPT-3.5 and M_F=GPT-4 slightly above independent sampling from same distribution.
Human feedback significantly improved GPT-4's self-repair success rate by 1.57 times more than using its own debugging capabilities alone.
However, human participants' provided feedback was mostly natural language with occasional mathematical/code expressions; whereas most of GPT-4's feedback was inaccurate with obvious suggestions of small changes.
In conclusion, this study demonstrates that while AI can be useful in generating code snippets based on natural language input, it still lags behind humans in terms of understanding complex coding challenges and providing accurate and actionable feedback during software development processes.
Therefore, despite advancements in AI technology such as deep learning models like GTP series which have shown promising results in various tasks including image recognition tasks but they still lack one crucial aspect - common sense! This gap between what we know about human intelligence versus artificial intelligence has led researchers into exploring ways to close this gap through further research studies involving both theoretical work as well as practical applications such as improving communication systems so people can better understand each other without confusion or misinterpretation arising from misunderstandings due to differences between how humans perceive reality vs how machines interpret reality.
This article highlights an important aspect: while machine learning algorithms may excel at processing vast amounts of data quickly & accurately enough to make decisions faster than any human could ever hope too do themselves within their lifetime span; however these tools cannot replace our intuition nor our creativity because those aspects are part & parcel integral parts making us who we truly are.
Therefore let’s embrace new technologies like deep neural networks which will allow us create even more sophisticated computer programs capable performing tasks requiring higher levels cognitive abilities beyond just pattern matching but rather true comprehension understanding contextually relevant information without needing explicit instructions.
By doing so we'll be able eventually develop intelligent machines capable taking over routine jobs freeing up time allowing us pursue higher level creative endeavors leading toward technological singularity where computers become smarter than humans themselves thus opening doors towards limitless potentialities!