I can tell you this process will never yield a clean result. You will always be left with a decent amount of artifacts from the originial song. One thing you can try is using mid side processing to extract the mono voice track. You will be using phase cancelation but instead of just layering the instrumental on top of the song with vocals, you layer it twice and pan it left and right. Then, you'll be left with mostly just an acapella in the mono channel.