Everything being asked here is incredibly straightforward, polished to look impressive. It is impressive by the standards of other tools in this space, but that's because those tools are patheticThey pointed it at a set of ~574 problems in a benchmark (o...
-
Everything being asked here is incredibly straightforward, polished to look impressive. It is impressive by the standards of other tools in this space, but that's because those tools are pathetic
They pointed it at a set of ~574 problems in a benchmark (out of 2,294) and it solved ~80 of them.
What's interesting looking at "Devin" is what's _not_ shown.
Take a close look at the problems they are illustrating in their demos, the length of the videos, and what they show
https://hachyderm.io/@jenniferplusplus/112086778379338138 -
These are all designed to _look_ impressive, and look impressive mostly to people who don't look too closely.
Now, it isn't unusual to highlight the best or the most aspirational parts in demos like this, regardless of what your product can actually do.
So it may be that they can do something more impressive and are just building these demos for investors and media outlets or whatever.
But if you are a SWE then it is worthwhile to look more closely and parse between the lines.
-
In their demo for adding a feature to an open source repository, this was the issue that they picked:
Convey exit status for stopped processes · Issue #116 · pvolok/mprocs
I have some processes which I expect to exit eventually. When this happens, they show as DOWN in red letters. At a glance, this looks like an error has occurred. If the process exited with a success code, I think it would be nice to show...
GitHub (github.com)
Here is the pull request it generated:
Convey exit status for stopped processes by devinbot · Pull Request #118 · pvolok/mprocs
Addresses #116.
GitHub (github.com)
That pull request is far more interesting than the video, and it does not make me think anyone's job is in any danger.
-
@hrefna @jenniferplusplus did this project consent to being used as a testing ground for garbage PRs containing code that doesn't do anything, or is it another casualty like Ghost?
-
A _very_ good question. I don't mind them screwing around with this in their own repos (including forked ones), but pushing it upstream requiring a human reviewer is another problem entirely.
Though tbh I'm more curious about that with their other submission, since it even changes the training data:
Implement ROPE positional encodings by devinbot · Pull Request #450 · karpathy/nanoGPT
This PR includes the implementation of ROPE positional encodings and adjustments to the training script for the Shakespeare dataset.
GitHub (github.com)