I dont know about any conf server in Java, sorry. But I use Asterisk, for instance. It is also C/C++ based, which is actually great, because it can do processing MUCH faster than Java. However, my SIP client is in Java, which is fine with me, because I dont expect much processing there.
Anyway, SEMS and Asterisk can both be easily used with a SIP server. In my case, im using SER (Sip Express Router from iptel.org) as my SIP server, which will forward SIP calls to Asterisk, and Asterisk does the rest (i.e. place my call into the conference room).
As for SEMS... SER has a module to directly interact with SEMS. This means that you'll have a SIP server and a conference (SEMS) server on a single machine. On top of that, SER is capable of proxying RTP for those clients, which are stuck behind NAT, like many of those who use ADSL lines. Works like a charm and saves you extra headache.
With SEMS and Asterisk you also have advantages of voicemail, IVR, recording, etc.. Or you can just use Asterisk alone to act as your SIP server and handle calls for you. Btw, Asterisk not just supports SIP, but also IAX, H.323 (if im not mistaken) and PSTN. It is a full blown PBX.
I know many people (including me of course ;) who use SER+SEMS/Asterisk+RTPProxy to handle all their SIP stuff. It has worked very well for them. :))
Good luck anyway!
What do you mean by Media Mixing in SIP? SIP is a connection protocol. The playing, mixing, receiving, etc. of actual audio (or video) data is not part of SIP.
What exactly are you calling audio mixing? Multiple inputs (all the same format) into a single output? -- or Multiple inputs (different format) into a single output? -- or ...?