People often say that the OLS method finds the line that is the "best fit" for the data. What do they mean by that?

The estimates of the coefficients are chosen so that the sum of the squared residuals equals zero.

The estimates of the coefficients are chosen so that the resulting line intersects each of the data points

The estimates of the coefficients are chosen to minimize the sum of the squared residuals

The estimates of the coefficients are chosen so that the sum of the error terms equals zero