Everything being asked here is incredibly straightforward, polished to look impressive. It is impressive by the standards of other tools in this space, but that's because those tools are patheticThey pointed it at a set of ~574 problems in a benchmark (o...

Hrefna (DHC)

Everything being asked here is incredibly straightforward, polished to look impressive. It is impressive by the standards of other tools in this space, but that's because those tools are pathetic

They pointed it at a set of ~574 problems in a benchmark (out of 2,294) and it solved ~80 of them.

What's interesting looking at "Devin" is what's _not_ shown.

Take a close look at the problems they are illustrating in their demos, the length of the videos, and what they show
https://hachyderm.io/@jenniferplusplus/112086778379338138

Hrefna (DHC)

These are all designed to _look_ impressive, and look impressive mostly to people who don't look too closely.

Now, it isn't unusual to highlight the best or the most aspirational parts in demos like this, regardless of what your product can actually do.

So it may be that they can do something more impressive and are just building these demos for investors and media outlets or whatever.

But if you are a SWE then it is worthwhile to look more closely and parse between the lines.

Hrefna (DHC)

In their demo for adding a feature to an open source repository, this was the issue that they picked:

https://github.com/pvolok/mprocs/issues/116

Here is the pull request it generated:

https://github.com/pvolok/mprocs/pull/118/commits/fc379cc6571937377c8e9591e01ecd8ef0589c8a

That pull request is far more interesting than the video, and it does not make me think anyone's job is in any danger.

cc @jenniferplusplus

aburka 🫣

@hrefna @jenniferplusplus did this project consent to being used as a testing ground for garbage PRs containing code that doesn't do anything, or is it another casualty like Ghost?

Hrefna (DHC)

@aburka

A _very_ good question. I don't mind them screwing around with this in their own repos (including forked ones), but pushing it upstream requiring a human reviewer is another problem entirely.

Though tbh I'm more curious about that with their other submission, since it even changes the training data:

https://github.com/karpathy/nanoGPT/pull/450/files

@jenniferplusplus