If you don't create the thread, which is the case for many libraries, thread locals are the only way to hold thread private data (other than passing it on the call stack of course, which should be your first attempt).
A thread local implies a map lookup (assumed order 1). It scales well since the map size is to the number of still living threads that accessed this threadlocal instance (and didn't call threadlocal.set(null)).
If you control thread creation and you have a thread variable, you have to test if instanceof and typecast a Thread.currentThread() to your thread class. If the class isn't your for any future usages of this code, the functionality is broken.
Therefore, a threadlocal in your code is a safer design than a custom thread class. However, if you are doing millions of lookups per seconds, maybe this approach isn't the best. Consider injecting arguments in your call stack if possible (gather many arguments in a context class, see IOC pattern).