> Other than explicitly wanting to create new objects,
> is there any reason why you would ever want to call
> the String constructor that takes another String as
> an argument?
Simply put: no. Some time ago, it seemed like a good idea to have a copy c'tor. Too bad that it's an immutable object, or it even would have made sense.
There are acutally times when you want to invoke the constructor that takes a string as argument, but doing so with a string literal is rather pointless.
The method String.substring returns a new string which is a view into the original string. They share the same data array. The original string might be very large (an xml document), and the data array of that string can't be garbage collected since it is shared with the substring. calling new String(theSubstring) will create a new data array of correct size, and the larger array can now be garbage collected.
Kaj
If you look at the source code for String, you'll see most operations result in creating 'new', not 'internallized' String objects.Imagine a situation where you're doing intensive String manipulation that generates many temporary String Objects. Here, having all your Strings internalized would be a Bad Idea, because you run the risk of filling up your 'permanent' String pool with throwaway objects. Here, it's better to have Strings that are elegible for Garbage Collection.
> The method String.substring returns a new string
> which is a view into the original string. They share
> the same data array. The original string might be
> very large (an xml document), and the data array of
> that string can't be garbage collected since it is
> shared with the substring. calling new
> String(theSubstring) will create a new data array of
> correct size, and the larger array can now be garbage
> collected.
I didn't know that. Makes sense. Thanks.
> If you look at the source code for String, you'll see
> most operations result in creating 'new', not
> 'internallized' String objects.Imagine a situation
> where you're doing intensive String manipulation that
> generates many temporary String Objects. Here,
> having all your Strings internalized would be a Bad
> Idea, because you run the risk of filling up your
> 'permanent' String pool with throwaway objects.
> Here, it's better to have Strings that are elegible
> for Garbage Collection.
As I understand it (and some testing has borne this out) the interned strings are referred to via weak references so they are still eligible for collection once external references are released. Is this not correct?
> As I understand it (and some testing has borne this
> out) the interned strings are referred to via weak
> references so they are still eligible for collection
> once external references are released. Is this not
> correct?
I researched this some time ago. IIRC, the JLS is silent on how (or even if) internalized Strings are GC'd. Sun's JVM might very well hold weak refs, but it doesn't appear to be a requirement.
> If you look at the source code for String, you'll see
> most operations result in creating 'new', not
> 'internallized' String objects.Imagine a situation
> where you're doing intensive String manipulation that
> generates many temporary String Objects. Here,
> having all your Strings internalized would be a Bad
> Idea, because you run the risk of filling up your
> 'permanent' String pool with throwaway objects.
> Here, it's better to have Strings that are elegible
> for Garbage Collection.
Interesting explanation. It makes sense. But, by the way (well, not so by the way), I was always curious about how Java works internally, with the String pool. For instance, how big a String can be, how the JVM handles all the Strings in the pool, etc. I've always supposed that the more the amount of Strings increases, the more the JVM performance decreases. Of course it is just an assumption, but I don't know very well. I imagine a lot of Strings storaged in the String pool, and whenever a new String appears, it is storaged in the pool, only if this String doesn't exist yet. I also imagine a maximum number of Strings in this pool. And I also think about a case of a very big String, like a big phrase. Does the JVM do any kind of checking, in order to check if this big String is already storaged in the pool, so that the creation of a new String for this phrase is not needed?
Any links that point me out to a place that clarifies my questions? Any points? Is my reply a bullshit?
Thanks.
One case is if you're using WeakHashMap and using Strings as keys.
map.put("myKey",myValue);
If the String, "myKey", is interned then the key will stongly reachable and never elible for garbage collection. But if you use:
map.put(new String("myKey"),myValue);
Then the "myKey" key will only be weakly reachable and eligible for garbage collection when weak references are cleaned up.
> If the String, "myKey", is interned then the key will
> stongly reachable and never elible for garbage
> collection. But if you use:
> > map.put(new String("myKey"),myValue);
>
> Then the "myKey" key will always be weakly reachable
> and eligible for garbage collection when weak
> references are cleaned up.
Not sure I believe that. The act of interning does not guarantee that the interned string will always be strongly reachable. If, for example, I create and intern a string inside a method but without making that interned string available outside the method (either via a return value or modification of another reference) then that string is eligible for GC *assuming* the intern mechanism uses weak references -- I'm pretty sure the Sun JVM does.
Run the following test class to convince yourself:
package com.test;
import java.util.WeakHashMap;
public class WeakMapTester {
static WeakHashMap<String, Object> map = new WeakHashMap<String, Object>();
public static void main(String[] args) {
long l = Long.MIN_VALUE;
while (l++ < Long.MAX_VALUE) {
addToMap();
}
}
static long i = Long.MIN_VALUE;
private static void addToMap() {
map.put(new String("Key_" + (i++)).intern(), new Object());
}
}
Whether you intern that key or not, the code will run through with minimal memory impact.
That said, if the string is interned because it's a very common string then the WeakHashMap may degrade simply because there are always strong references to the key -- but that's a different problem.
> I've
> always supposed that the more the amount of Strings
> increases, the more the JVM performance decreases.
More than likely, Strings are stored in the pool based on hash, which gives very efficient access speeds - even for very large data sets. Memory consumption; now that's a different story.
> course it is just an assumption, but I don't know
> very well. I imagine a lot of Strings storaged in the
> String pool, and whenever a new String appears, it is
> storaged in the pool, only if this String doesn't
> exist yet. I also imagine a maximum number of Strings
> in this pool. And I also think about a case of a very
> big String, like a big phrase. Does the JVM do any
> kind of checking, in order to check if this big
> String is already storaged in the pool, so that the
> creation of a new String for this phrase is not
> needed?
The JLS clears this up a bit. Basically, if an internallized String is created that doesn't already exist, memory is allocated in the pool, and the String is stored there. If the created String already exists, it is *not* duplicated.
For example
String s1 = "I'm a String"; // Creates this string in the String Pool, and stores its reference in s1.
String s2 = "I'm a String"; //Copies the reference of the String that s1 points to
String s3 = new String("I'm a String"); // Allocates a new String, outside of the pool.
s3.internalize(); ////Copies the reference of the String that s1 and s2 point to
> > > More than likely, Strings are stored in the pool
> > > based on hash,
> >
> > Eh?
>
>
> Ok, I should've said Strings are retrieved
> from the pool based on hash.
Eh?
Can you elaborate?
> > > > More than likely, Strings are stored in the
> pool
> > > > based on hash,
> > >
> > > Eh?
> >
> >
> > Ok, I should've said Strings are retrieved
> > from the pool based on hash.
>
> Eh?
>
> Can you elaborate?
u hv dbt abt it?
> > I don't follow... Aren't internalized Strings
> > indexed via a hashtable?
>
> What's the key and what's the value, and when is the
> lookup done?
What I'm getting at is that no, it's not a hash. It's just an indexed lookup.
The compiler might very well use a hashed lookup to keep track of which strings literals it knows about so far and what index they're at. But at runtime the VM simply pulls the strings from the constant pool at the compiler-specified index.
import java.io.*;
public class StrPool {
String abc = "abc";
static String xyz = "xyz";
public static void main(String args[]) {
String s123 = "123";
String sabc = "abc";
System.out.println("Before");
System.out.println(sabc);
System.out.println(xyz);
System.out.println(s123);
System.out.println("after");
}
}
:; javap -c -classpath . StrPool | cli
Compiled from "StrPool.java"
public class StrPool extends java.lang.Object{
java.lang.String abc;
static java.lang.String xyz;
public StrPool();
Code:
0:aload_0
1:invokespecial#1; //Method java/lang/Object."<init>":()V
4:aload_0
5:ldc#2; //String abc
7:putfield#3; //Field abc:Ljava/lang/String;
10:return
public static void main(java.lang.String[]);
Code:
0:ldc#4; //String 123
2:astore_1
3:ldc#2; //String abc
5:astore_2
6:getstatic#5; //Field java/lang/System.out:Ljava/io/PrintStream;
9:ldc#6; //String Before
11:invokevirtual#7; //Method java/io/PrintStream.println:(Ljava/lang/String;)V
14:getstatic#5; //Field java/lang/System.out:Ljava/io/PrintStream;
17:aload_2
18:invokevirtual#7; //Method java/io/PrintStream.println:(Ljava/lang/String;)V
21:getstatic#5; //Field java/lang/System.out:Ljava/io/PrintStream;
24:getstatic#8; //Field xyz:Ljava/lang/String;
27:invokevirtual#7; //Method java/io/PrintStream.println:(Ljava/lang/String;)V
30:getstatic#5; //Field java/lang/System.out:Ljava/io/PrintStream;
33:aload_1
34:invokevirtual#7; //Method java/io/PrintStream.println:(Ljava/lang/String;)V
37:getstatic#5; //Field java/lang/System.out:Ljava/io/PrintStream;
40:ldc#9; //String after
42:invokevirtual#7; //Method java/io/PrintStream.println:(Ljava/lang/String;)V
45:return
static {};
Code:
0:ldc#10; //String xyz
2:putstatic#8; //Field xyz:Ljava/lang/String;
5:return
}
> What's the key and what's the value, and when is the
> lookup done?
I'm running purely on assumptions here, but the key would be the result of String.hashCode() and the value would be the String, itself. When a new internalized String was created, either by saying
String s1 = "Hello, jverd!";
or
s1.internalize();
The String in question would be hashed, and checked for existence on the Pool. If it does not yet exist, it will be pooled, if it does exist, the object gets a copy of the pooled string's reference.
Make sense?
> I'm running purely on assumptions here, but the key
> would be the result of String.hashCode() and the
> value would be the String, itself.
There'd obviously have to be some further lookup, as multiple strings will have the same hash value, but that's a side issue.
> When a new
> internalized String was created, either by saying
>
> String s1 = "Hello, jverd!";
See the above code and javap output. That string is put into the pool when the class is loaded and is retrieved by its index in the pool.
> or
> s1.internalize();
intern(), no?
Yes, for that we have to look up the string. There might be hashing going on there, or a sorted set on which we can do a binary search
> The String in question would be hashed, and checked
> for existence on the Pool. If it does not yet exist,
> it will be pooled, if it does exist, the object gets
> a copy of the pooled string's reference.
What I'm not sure of is what happens to the same string in different classes. The bytecode shows a single constant-index lookup. Whether that just gives another index into a global pool, or whether each class has its own pool, I'm not sure.
In the former case, then when the class is loaded, some sort of (one would hope) hashed or binary search lookup will tell us whether that string is already in the global pool. Else it's just at a fixed place in that class' private pool and it's known at compile time.
Note, however, that the hash lookup or binary search (by string contents) would only occur on class load and on calls to intern(), but NOT on execution of str = "abc";
> There'd obviously have to be some further lookup, as
> multiple strings will have the same hash value, but
> that's a side issue.
Sure, but then again, what's a little collision between friends. :)
> Yes, for that we have to look up the string. There
> might be hashing going on there, or a sorted set on
> which we can do a binary search
Makes sense.
> Note, however, that the hash lookup or binary search
> (by string contents) would only occur on class load
> and on calls to intern(), but NOT on execution of
> str = "abc";
Yes. My chronic case of Friday-itis made me forget that this would be created and indexed at compile-time.
> Run the following test class to convince yourself:
It depends on the test.
import java.util.WeakHashMap;
import java.util.LinkedList;
public class WeakMapTester {
static final LinkedList<Object> fillMemory = new LinkedList<Object>();
static final WeakHashMap<String, Object> map = new WeakHashMap<String, Object>();
public static void main(String[] args) {
for (int i=0; i<4; i++) {
map.put("Key1", new Object());
map.put(new String("Key2"), new Object());
map.put("Key3", new Object());
map.put(new String("Key4"), new Object());
}
int mapSize = map.size();
System.err.println("mapSize is " + mapSize);
for (;;) {
if (mapSize != map.size()) {
mapSize = map.size();
System.err.println("mapSize is now " + mapSize);
}
fillMemory.add(new Object());
}
}
}
In this example the map starts with 4 entries and soon shrinks to 2. But never goes below 2.
>
> As I understand it (and some testing has borne this
> out) the interned strings are referred to via weak
> references so they are still eligible for collection
> once external references are released. Is this not
> correct?
Sounds reasonable but literals won't be eligible until the class is collected. And the class won't be collected until the class loader is. And that means that without a custom loader that it (the literal) will never be collected.
>
> Interesting explanation. It makes sense. But, by the
> way (well, not so by the way), I was always curious
> about how Java works internally, with the String
> pool. For instance, how big a String can be, how the
> JVM handles all the Strings in the pool, etc. I've
> always supposed that the more the amount of Strings
> increases, the more the JVM performance decreases.
The memory footprint, regardless of how it grows, can always reach a point where the OS must swap due to virtual memory limits. That would definitely impact performance.
Otherwise no. Since the addressing (pointers into memory) do not change with size.
Intern must do a look up, but that is a creation mechanism. Other than that there is no reason references wouldn't be a direct look up (no hashing required.)
> Of
> course it is just an assumption, but I don't know
> very well. I imagine a lot of Strings storaged in the
> String pool, and whenever a new String appears, it is
> storaged in the pool, only if this String doesn't
> exist yet. I also imagine a maximum number of Strings
> in this pool. And I also think about a case of a very
> big String, like a big phrase.
Everything has a limit on a computer.
Not sure what you mean by a 'big' string. To construct a big string you have to start building it dynamically like loading it from a file. That is just an object at that point. If you interned it might cause a problems but it would also demonstrate how really stupid it is to do that with a large string.
> Does the JVM do any
> kind of checking, in order to check if this big
> String is already storaged in the pool, so that the
> creation of a new String for this phrase is not
> needed?
>
Interned strings are required to be unique.
Other than that there is no 'pool'.
> One case is if you're using WeakHashMap and using
> Strings as keys.
>
> map.put("myKey",myValue);
>
> If the String, "myKey", is interned then the key will
> stongly reachable and never elible for garbage
> collection. But if you use:
The literal will always be interned on a compliant VM.
>
> map.put(new String("myKey"),myValue);
>
> Then the "myKey" key will only be weakly reachable
> and eligible for garbage collection when weak
> references are cleaned up.
Again the literal will always be interned.
The literal will only be available for collection (potentially) when the containing class loader is collected.
The String created with the new expression, which is not the literal, is just an object. It will be collectable once it no longer is reachable (period.)
It is unclear to me why you are using the term 'weakly'.
> More than likely, Strings are stored in the pool
> based on hash, which gives very efficient access
> speeds - even for very large data sets. Memory
> consumption; now that's a different story.
The interned strings would need to be hashed for the creation process.
Other than that (for just access) there is no reason I can see that a hash would be used. There is no reason to not use a direct reference. And at least as far as I have tracked the reference implementation in the Sun VM source code that is true.
> > Run the following test class to convince yourself:
>
> It depends on the test.
> ...
>
> In this example the map starts with 4 entries and
> soon shrinks to 2. But never goes below 2.
What exactly do you think is being collected?
It most assuredly is not the literals.
For this discussion I will exclude the possibility that the VM can defer creation. The result is still ultimately the same.
The class file contains structures, not strings, which are contained in the constant pool (part of the class structure.)
The byte code uses indexes into the constant pool for things like string literals (and other things as well.) But that code is not 'using' the object references at that point. Look at the byte code output for how Length is accessed on a literal for what I mean here.
When a class is loaded the VM uses the string info structure in the class file to create an interned string that represents that literal. That creation process is required to use an existing interned string if it already exists.
This is the creation process for the literal. It has nothing to do with code that says "new String(...)".
Because the creation process must use existing interned strings that means that it is likely (but not assured) that a hash is used to look up existing strings.
Note that this is all creation. It has nothing to do with accessing a string via a java reference.
So now the VM has a class and it starts executing the byte codes. And at some point the byte codes access a reference that points to a string. Now a VM can certainly implement a reference any way it wants but the Sun VM, as far as I traced it, uses a modified pointer and when resolving that pointer it points directly at the relevant string object. It doesn't index into a string pool, nor hash nor any other collection.
> Because the creation process must use existing
> interned strings that means that it is likely (but
> not assured) that a hash is used to look up existing
> strings.
Yes, I believe that a hash is used. I also believe that there is a kind of indexed ordering in this hashing, something like primary keys in database tables, so that the searching operation for an existing string becomes faster.
After all, I believe this, I believe that, I suppose this, that... ;-). It would be better if the JLS detailed more how Java works...
Well, below a test code that Ive written, and tested of course:
*********************************/**
* Example 1
*/
import java.util.Calendar;
public class Main {
public static void main(String[] args) {
Object[][] objArray = new Object[50][2];
int i=0;
//Putting a lot of strings
//into the string pool
for (char c='A' ; c<'Z' ; c++) {
Object[] temp = fillArrays(c);
objArray[i][0] = temp[0];
objArray[i][1] = temp[1];
i++;
}
System.out.println(Calendar.getInstance().getTimeInMillis());
for (i=0 ; i<30000000 ; i++) {
//Testing with a string that IS NOT
//in that previous string pool
String test = "1234";
test.intern();
test = null;
}
System.out.println(Calendar.getInstance().getTimeInMillis());
}
/**
* The objective of this method is just
* putting a lot of strings into the
* string pool.
*/
static Object[] fillArrays(char c) {
final int quant = 1500;
StringBuffer[] strBufArray = new StringBuffer[quant];
String[] strArray = new String[quant];
strBufArray[0] = new StringBuffer();
strBufArray[0].append(c);
strArray[0] = strBufArray[0].toString();
for (int i=1 ; i < quant ; i++) {
strBufArray[i] = strBufArray[i-1];
strBufArray[i].append(c);
strArray[i] = strBufArray[i].toString().intern();
}
Object[] o = new Object[2];
o[0] = strBufArray;
o[1] = strArray;
return o;
}
}
*********************************
Example 1 Result:
1165631612488 - 1165631589515 = 22973
*********************************
*********************************/**
* Example 2
* Now, just modifying the main method
*/
public static void main(String[] args) {
Object[][] objArray = new Object[50][2];
int i=0;
//Putting a lot of strings
//into the string pool
for (char c='A' ; c<'Z' ; c++) {
Object[] temp = fillArrays(c);
objArray[i][0] = temp[0];
objArray[i][1] = temp[1];
i++;
}
System.out.println(Calendar.getInstance().getTimeInMillis());
for (i=0 ; i<30000000 ; i++) {
//Testing with a string that IS
//in that previous string pool.
//Next line of code is the only difference,
//compared to the example 1.
String test = "AAA";
test.intern();
test = null;
}
System.out.println(Calendar.getInstance().getTimeInMillis());
}
*********************************
Example 2 Result:
1165631769574 - 1165631738529 = 31045
Strange big and significative difference,
compared to the example 1.
What I was expecting is exactly the
contrary behaviour!!!
*********************************
*********************************/**
* Example 3
* Again, just modifying the main method
*/
public static void main(String[] args) {
Object[][] objArray = new Object[50][2];
int i=0;
//Putting a lot of strings
//into the string pool
for (char c='A' ; c<'Z' ; c++) {
Object[] temp = fillArrays(c);
objArray[i][0] = temp[0];
objArray[i][1] = temp[1];
i++;
}
//forcing garbage collection.
//The following two lines of code
//dont exist in the previous examples.
objArray = null;
System.gc();
System.out.println(Calendar.getInstance().getTimeInMillis());
for (i=0 ; i<30000000 ; i++) {
//This string IS NOT in
//any string pool.
String test = "9876";
test.intern();
test = null;
}
System.out.println(Calendar.getInstance().getTimeInMillis());
}
*********************************
Example 3 Result:
1165632349287 - 1165632326455 = 22832
Equal to the example 1
*********************************
Hi,
Strings are getting stored in a global pool in the current running JVM context. So String str0 = new String("A String"); will create a new object string in the pool where as String str1 = "A String"; will create only a new object and keep in the pool if an only if there is no similar string there currently getting referred; otherwise it will make a reference only to the first occurrence (I have doubt here) to the similar string that it find. See the String API for the intern() method. This will resolve two similar string objects to a single reference. Hope I could put some more light on this topic. Discuss further for more clarification.
Regards,
Thomas.
> String str1 = "A String"; will
> create only a new object and keep in the pool if an
> only if there is no similar string there currently
> getting referred;
No.
That line will never create a string. It will assign a reference to an existing String. The String is created when the class is loaded (initialized, actually, but close enough).
> Strings are getting stored in a global pool in the
> current running JVM context. So String str0 = new
> String("A String"); will create a new object string
> in the pool
No. That is a new object just like any other object in java. It will be stored, managed and GC'd just like any other object.
> where as String str1 = "A String"; will
> create only a new object and keep in the pool if an
> only if there is no similar string there currently
> getting referred; otherwise it will make a reference
> only to the first occurrence (I have doubt here) to
> the similar string that it find.
That is an inaccurate or at least too general description for what happens in terms of this particular thread.
The literal will be created as a String at some point during the loading of the class itself.
The assignment will occur when the code runs. No creation will occur when the code (assignment) runs.
And if you have a doubt about that then you should look a the byte codes.
> See the String API for the intern() method. This will resolve two similar string objects to a single reference.
No. The intern method either returns an existing object or puts a new object into the intern pool an returns a reference to the new object. This has nothing to do with general string usage however and certainly nothing with the 'new' expression as it appears in code.
Note again that although there is a pool that intern uses that this is NOT where all string objects are kept. Only interned strings are kept there and many strings are not intern strings.
> Note again that although there is a pool that intern
> uses that this is NOT where all string objects are
> kept. Only interned strings are kept there and many
> strings are not intern strings.
And what are those that are not intern Strings? Those that are instantiated with "new String("Blah")"? Is all the rest intern strings?
> > And what are those that are not intern Strings?
> Those
> > that are instantiated with "new String("Blah")"?
> Is
> > all the rest intern strings?
>
> > String str1 = foo.toString();
> String str2 = str1 + bar.toString();
> String str3 = br.readLine();
> etc.
>
Oh, finally, I got it!