Upgrade subject to access all content

Why not minimize the sum of the residuals, instead of the sum of the squared residuals?

Whichever you use, you will obtain the same estimate, so the use of the squared residuals is just convention.

The minimum point of the sum of the residuals will always be the sample average of $y$ -- we therefore gain nothing from using this method.

By squaring the residuals, large deviations make a bigger impact on the sum, forcing the fit of the OLS line to be better.

There is no minimum point for the sum of the residuals, so it is not possible to obtain any estimates via this method.