Gradient descent direction's cheaper to calculate, and performing a line search in that direction is a more reliable, steady source of progress toward an optimum. In short, gradient descent's relatively reliable.
Newton's method is relatively expensive in that you need to calculate the Hessian on the first iteration. Then, on each subsequent iteration, you can either fully recalculate the Hessian (as in Newton's method) or merely "update" the prior iteration's Hessian (in quasi-Newton methods) which is cheaper but less robust.
In the extreme case of a very well-behaved function, especially a perfectly quadratic function, Newton's method is the clear winner. If it's perfectly quadratic, Newton's method will converge in a single iteration.
In the opposite extreme case of a very poorly behaved function, gradient descent will tend to win out. It'll pick a search direction, search down that direction, and ultimately take a small-but-productive step. By contrast, Newton's method will tend to fail in these cases, especially if you try to use the quasi-Newton approximations.
In-between gradient descent and Newton's method, there're methods like Levenberg–Marquardt algorithm (LMA), though I've seen the names confused a bit. The gist is to use more gradient-descent-informed search when things are chaotic and confusing, then switch to a more Newton-method-informed search when things are getting more linear and reliable.