It's to allow lifetime management.
The control block is not destroyed until weak_count is zero. The storage object is destroyed as soon as count reaches zero. That means you need to directly call the destructor of storage when the count reaches zero, and not in the destructor of the control block.
To prevent the destructor of the control block calling the destructor of storage, the actual type of storage cannot be T.
If we only had the strong reference count, then a T would be fine (and much simpler).
In actual fact, the implementation is a bit more complex than this. Remember that a shared_ptr can be constructed by allocating a T with new, and then constructing the shared_ptr from that. Thus the actual control block looks more like:
template<typename T>
struct shared_ptr_control_block {
std::atomic<long> count;
std::atomic<long> weak_count;
T* ptr;
};
and what make_shared allocates is:
template<typename T>
struct both {
shared_ptr_control_block cb;
std::aligned_storage_t<sizeof (T), alignof (T)> storage;
};
And cb.p is set to the address of storage. Allocating the both structure in make_shared means that we get a single memory allocation, rather than two (and memory allocations are expensive).
Note: I have simplified: There has to be a way for the shared_ptr destructor to know whether the control block is part of both (in which case the memory cannot be released until done), or not (in which case it can be freed earlier). This could be a simple bool flag (in which case the control block is bigger), or by using some spare bits in a pointer (which is not portable - but the standard library implementation doesn't have to be portable). The implementation can be even more complex to avoid storing the pointer at all in the make_shared case.