专业的JAVA编程教程与资源

网站首页 > java教程 正文

聊聊Java 9的Compact Strings(java我的世界)

temp10 2024-09-11 09:15:29 java教程 8 ℃ 0 评论

本文主要研究一下Java 9的Compact Strings

聊聊Java 9的Compact Strings(java我的世界)

Compressed Strings(Java 6)

Java 6引入了Compressed Strings,对于one byte per character使用byte[],对于two bytes per character继续使用char[];之前可以使用-XX:+UseCompressedStrings来开启,不过在java7被废弃了,然后在java8被移除

Compact Strings(Java 9)

Java 9引入了Compact Strings来取代Java 6的Compressed Strings,它的实现更过彻底,完全使用byte[]来替代char[],同时新引入了一个字段coder来标识是LATIN1还是UTF16

String

java.base/java/lang/String.java

public final class String
 implements java.io.Serializable, Comparable<String>, CharSequence,
 Constable, ConstantDesc {
?
 /**
 * The value is used for character storage.
 *
 * @implNote This field is trusted by the VM, and is a subject to
 * constant folding if String instance is constant. Overwriting this
 * field after construction will cause problems.
 *
 * Additionally, it is marked with {@link Stable} to trust the contents
 * of the array. No other facility in JDK provides this functionality (yet).
 * {@link Stable} is safe here, because value is never null.
 */
 @Stable
 private final byte[] value;
?
 /**
 * The identifier of the encoding used to encode the bytes in
 * {@code value}. The supported values in this implementation are
 *
 * LATIN1
 * UTF16
 *
 * @implNote This field is trusted by the VM, and is a subject to
 * constant folding if String instance is constant. Overwriting this
 * field after construction will cause problems.
 */
 private final byte coder;
?
 /** Cache the hash code for the string */
 private int hash; // Default to 0
?
 /** use serialVersionUID from JDK 1.0.2 for interoperability */
 private static final long serialVersionUID = -6849794470754667710L;
?
 /**
 * If String compaction is disabled, the bytes in {@code value} are
 * always encoded in UTF16.
 *
 * For methods with several possible implementation paths, when String
 * compaction is disabled, only one code path is taken.
 *
 * The instance field value is generally opaque to optimizing JIT
 * compilers. Therefore, in performance-sensitive place, an explicit
 * check of the static boolean {@code COMPACT_STRINGS} is done first
 * before checking the {@code coder} field since the static boolean
 * {@code COMPACT_STRINGS} would be constant folded away by an
 * optimizing JIT compiler. The idioms for these cases are as follows.
 *
 * For code such as:
 *
 * if (coder == LATIN1) { ... }
 *
 * can be written more optimally as
 *
 * if (coder() == LATIN1) { ... }
 *
 * or:
 *
 * if (COMPACT_STRINGS && coder == LATIN1) { ... }
 *
 * An optimizing JIT compiler can fold the above conditional as:
 *
 * COMPACT_STRINGS == true => if (coder == LATIN1) { ... }
 * COMPACT_STRINGS == false => if (false) { ... }
 *
 * @implNote
 * The actual value for this field is injected by JVM. The static
 * initialization block is used to set the value here to communicate
 * that this static final field is not statically foldable, and to
 * avoid any possible circular dependency during vm initialization.
 */
 static final boolean COMPACT_STRINGS;
?
 static {
 COMPACT_STRINGS = true;
 }
?
 /**
 * Class String is special cased within the Serialization Stream Protocol.
 *
 * A String instance is written into an ObjectOutputStream according to
 * <a href="{@docRoot}/../specs/serialization/protocol.html#stream-elements">
 * Object Serialization Specification, Section 6.2, "Stream Elements"</a>
 */
 private static final ObjectStreamField[] serialPersistentFields =
 new ObjectStreamField[0];
?
 /**
 * Initializes a newly created {@code String} object so that it represents
 * an empty character sequence. Note that use of this constructor is
 * unnecessary since Strings are immutable.
 */
 public String() {
 this.value = "".value;
 this.coder = "".coder;
 }
?
 //......
?
 public char charAt(int index) {
 if (isLatin1()) {
 return StringLatin1.charAt(value, index);
 } else {
 return StringUTF16.charAt(value, index);
 }
 }
?
 public boolean equals(Object anObject) {
 if (this == anObject) {
 return true;
 }
 if (anObject instanceof String) {
 String aString = (String)anObject;
 if (coder() == aString.coder()) {
 return isLatin1() ? StringLatin1.equals(value, aString.value)
 : StringUTF16.equals(value, aString.value);
 }
 }
 return false;
 }
?
 public int compareTo(String anotherString) {
 byte v1[] = value;
 byte v2[] = anotherString.value;
 if (coder() == anotherString.coder()) {
 return isLatin1() ? StringLatin1.compareTo(v1, v2)
 : StringUTF16.compareTo(v1, v2);
 }
 return isLatin1() ? StringLatin1.compareToUTF16(v1, v2)
 : StringUTF16.compareToLatin1(v1, v2);
 }
?
 public int hashCode() {
 int h = hash;
 if (h == 0 && value.length > 0) {
 hash = h = isLatin1() ? StringLatin1.hashCode(value)
 : StringUTF16.hashCode(value);
 }
 return h;
 }
?
 public int indexOf(int ch, int fromIndex) {
 return isLatin1() ? StringLatin1.indexOf(value, ch, fromIndex)
 : StringUTF16.indexOf(value, ch, fromIndex);
 }
?
 public String substring(int beginIndex) {
 if (beginIndex < 0) {
 throw new StringIndexOutOfBoundsException(beginIndex);
 }
 int subLen = length() - beginIndex;
 if (subLen < 0) {
 throw new StringIndexOutOfBoundsException(subLen);
 }
 if (beginIndex == 0) {
 return this;
 }
 return isLatin1() ? StringLatin1.newString(value, beginIndex, subLen)
 : StringUTF16.newString(value, beginIndex, subLen);
 }
?
 //......
?
 byte coder() {
 return COMPACT_STRINGS ? coder : UTF16;
 }
?
 byte[] value() {
 return value;
 }
?
 private boolean isLatin1() {
 return COMPACT_STRINGS && coder == LATIN1;
 }
?
 @Native static final byte LATIN1 = 0;
 @Native static final byte UTF16 = 1;
?
 //......
}
  • COMPACT_STRINGS默认为true,即该特性默认是开启的
  • coder方法判断COMPACT_STRINGS为true的话,则返回coder值,否则返回UTF16;isLatin1方法判断COMPACT_STRINGS为true且coder为LATIN1则返回true
  • 诸如charAt、equals、hashCode、indexOf、substring等等一系列方法都依赖isLatin1方法来区分对待是StringLatin1还是StringUTF16

StringConcatFactory

实例

public class Java9StringDemo {
?
 public static void main(String[] args){
 String stringLiteral = "tom";
 String stringObject = stringLiteral + "cat";
 }
}
  • 这段代码stringObject由变量stringLiteral及cat拼接而来

javap

javac src/main/java/com/example/javac/Java9StringDemo.java
javap -v src/main/java/com/example/javac/Java9StringDemo.class
?
 Last modified 2019年4月7日; size 770 bytes
 MD5 checksum fecfca9c829402c358c4d5cb948004ff
 Compiled from "Java9StringDemo.java"
public class com.example.javac.Java9StringDemo
 minor version: 0
 major version: 56
 flags: (0x0021) ACC_PUBLIC, ACC_SUPER
 this_class: #4 // com/example/javac/Java9StringDemo
 super_class: #5 // java/lang/Object
 interfaces: 0, fields: 0, methods: 2, attributes: 3
Constant pool:
 #1 = Methodref #5.#14 // java/lang/Object."<init>":()V
 #2 = String #15 // tom
 #3 = InvokeDynamic #0:#19 // #0:makeConcatWithConstants:(Ljava/lang/String;)Ljava/lang/String;
 #4 = Class #20 // com/example/javac/Java9StringDemo
 #5 = Class #21 // java/lang/Object
 #6 = Utf8 <init>
 #7 = Utf8 ()V
 #8 = Utf8 Code
 #9 = Utf8 LineNumberTable
 #10 = Utf8 main
 #11 = Utf8 ([Ljava/lang/String;)V
 #12 = Utf8 SourceFile
 #13 = Utf8 Java9StringDemo.java
 #14 = NameAndType #6:#7 // "<init>":()V
 #15 = Utf8 tom
 #16 = Utf8 BootstrapMethods
 #17 = MethodHandle 6:#22 // REF_invokeStatic java/lang/invoke/StringConcatFactory.makeConcatWithConstants:(Ljava/lang/invoke/MethodHandles$Lookup;Ljava/lang/String;Ljava/lang/invoke/MethodType;Ljava/lang/String;[Ljava/lang/Object;)Ljava/lang/invoke/CallSite;
 #18 = String #23 // \u0001cat
 #19 = NameAndType #24:#25 // makeConcatWithConstants:(Ljava/lang/String;)Ljava/lang/String;
 #20 = Utf8 com/example/javac/Java9StringDemo
 #21 = Utf8 java/lang/Object
 #22 = Methodref #26.#27 // java/lang/invoke/StringConcatFactory.makeConcatWithConstants:(Ljava/lang/invoke/MethodHandles$Lookup;Ljava/lang/String;Ljava/lang/invoke/MethodType;Ljava/lang/String;[Ljava/lang/Object;)Ljava/lang/invoke/CallSite;
 #23 = Utf8 \u0001cat
 #24 = Utf8 makeConcatWithConstants
 #25 = Utf8 (Ljava/lang/String;)Ljava/lang/String;
 #26 = Class #28 // java/lang/invoke/StringConcatFactory
 #27 = NameAndType #24:#32 // makeConcatWithConstants:(Ljava/lang/invoke/MethodHandles$Lookup;Ljava/lang/String;Ljava/lang/invoke/MethodType;Ljava/lang/String;[Ljava/lang/Object;)Ljava/lang/invoke/CallSite;
 #28 = Utf8 java/lang/invoke/StringConcatFactory
 #29 = Class #34 // java/lang/invoke/MethodHandles$Lookup
 #30 = Utf8 Lookup
 #31 = Utf8 InnerClasses
 #32 = Utf8 (Ljava/lang/invoke/MethodHandles$Lookup;Ljava/lang/String;Ljava/lang/invoke/MethodType;Ljava/lang/String;[Ljava/lang/Object;)Ljava/lang/invoke/CallSite;
 #33 = Class #35 // java/lang/invoke/MethodHandles
 #34 = Utf8 java/lang/invoke/MethodHandles$Lookup
 #35 = Utf8 java/lang/invoke/MethodHandles
{
 public com.example.javac.Java9StringDemo();
 descriptor: ()V
 flags: (0x0001) ACC_PUBLIC
 Code:
 stack=1, locals=1, args_size=1
 0: aload_0
 1: invokespecial #1 // Method java/lang/Object."<init>":()V
 4: return
 LineNumberTable:
 line 8: 0
?
 public static void main(java.lang.String[]);
 descriptor: ([Ljava/lang/String;)V
 flags: (0x0009) ACC_PUBLIC, ACC_STATIC
 Code:
 stack=1, locals=3, args_size=1
 0: ldc #2 // String tom
 2: astore_1
 3: aload_1
 4: invokedynamic #3, 0 // InvokeDynamic #0:makeConcatWithConstants:(Ljava/lang/String;)Ljava/lang/String;
 9: astore_2
 10: return
 LineNumberTable:
 line 11: 0
 line 12: 3
 line 13: 10
}
SourceFile: "Java9StringDemo.java"
InnerClasses:
 public static final #30= #29 of #33; // Lookup=class java/lang/invoke/MethodHandles$Lookup of class java/lang/invoke/MethodHandles
BootstrapMethods:
 0: #17 REF_invokeStatic java/lang/invoke/StringConcatFactory.makeConcatWithConstants:(Ljava/lang/invoke/MethodHandles$Lookup;Ljava/lang/String;Ljava/lang/invoke/MethodType;Ljava/lang/String;[Ljava/lang/Object;)Ljava/lang/invoke/CallSite;
 Method arguments:
 #18 \u0001cat
  • javap之后可以看到通过Java 9利用InvokeDynamic调用了StringConcatFactory.makeConcatWithConstants方法进行字符串拼接优化;而Java 8则是通过转换为StringBuilder来进行优化

StringConcatFactory.makeConcatWithConstants

java.base/java/lang/invoke/StringConcatFactory.java

public final class StringConcatFactory {
 //......
?
 /**
 * Concatenation strategy to use. See {@link Strategy} for possible options.
 * This option is controllable with -Djava.lang.invoke.stringConcat JDK option.
 */
 private static Strategy STRATEGY;
?
 /**
 * Default strategy to use for concatenation.
 */
 private static final Strategy DEFAULT_STRATEGY = Strategy.MH_INLINE_SIZED_EXACT;
?
 private enum Strategy {
 /**
 * Bytecode generator, calling into {@link java.lang.StringBuilder}.
 */
 BC_SB,
?
 /**
 * Bytecode generator, calling into {@link java.lang.StringBuilder};
 * but trying to estimate the required storage.
 */
 BC_SB_SIZED,
?
 /**
 * Bytecode generator, calling into {@link java.lang.StringBuilder};
 * but computing the required storage exactly.
 */
 BC_SB_SIZED_EXACT,
?
 /**
 * MethodHandle-based generator, that in the end calls into {@link java.lang.StringBuilder}.
 * This strategy also tries to estimate the required storage.
 */
 MH_SB_SIZED,
?
 /**
 * MethodHandle-based generator, that in the end calls into {@link java.lang.StringBuilder}.
 * This strategy also estimate the required storage exactly.
 */
 MH_SB_SIZED_EXACT,
?
 /**
 * MethodHandle-based generator, that constructs its own byte[] array from
 * the arguments. It computes the required storage exactly.
 */
 MH_INLINE_SIZED_EXACT
 }
?
 static {
 // In case we need to double-back onto the StringConcatFactory during this
 // static initialization, make sure we have the reasonable defaults to complete
 // the static initialization properly. After that, actual users would use
 // the proper values we have read from the properties.
 STRATEGY = DEFAULT_STRATEGY;
 // CACHE_ENABLE = false; // implied
 // CACHE = null; // implied
 // DEBUG = false; // implied
 // DUMPER = null; // implied
?
 Properties props = GetPropertyAction.privilegedGetProperties();
 final String strategy =
 props.getProperty("java.lang.invoke.stringConcat");
 CACHE_ENABLE = Boolean.parseBoolean(
 props.getProperty("java.lang.invoke.stringConcat.cache"));
 DEBUG = Boolean.parseBoolean(
 props.getProperty("java.lang.invoke.stringConcat.debug"));
 final String dumpPath =
 props.getProperty("java.lang.invoke.stringConcat.dumpClasses");
?
 STRATEGY = (strategy == null) ? DEFAULT_STRATEGY : Strategy.valueOf(strategy);
 CACHE = CACHE_ENABLE ? new ConcurrentHashMap<>() : null;
 DUMPER = (dumpPath == null) ? null : ProxyClassesDumper.getInstance(dumpPath);
 }
?
 public static CallSite makeConcatWithConstants(MethodHandles.Lookup lookup,
 String name,
 MethodType concatType,
 String recipe,
 Object... constants) throws StringConcatException {
 if (DEBUG) {
 System.out.println("StringConcatFactory " + STRATEGY + " is here for " + concatType + ", {" + recipe + "}, " + Arrays.toString(constants));
 }
?
 return doStringConcat(lookup, name, concatType, false, recipe, constants);
 }
?
 private static CallSite doStringConcat(MethodHandles.Lookup lookup,
 String name,
 MethodType concatType,
 boolean generateRecipe,
 String recipe,
 Object... constants) throws StringConcatException {
 Objects.requireNonNull(lookup, "Lookup is null");
 Objects.requireNonNull(name, "Name is null");
 Objects.requireNonNull(concatType, "Concat type is null");
 Objects.requireNonNull(constants, "Constants are null");
?
 for (Object o : constants) {
 Objects.requireNonNull(o, "Cannot accept null constants");
 }
?
 if ((lookup.lookupModes() & MethodHandles.Lookup.PRIVATE) == 0) {
 throw new StringConcatException("Invalid caller: " +
 lookup.lookupClass().getName());
 }
?
 int cCount = 0;
 int oCount = 0;
 if (generateRecipe) {
 // Mock the recipe to reuse the concat generator code
 char[] value = new char[concatType.parameterCount()];
 Arrays.fill(value, TAG_ARG);
 recipe = new String(value);
 oCount = concatType.parameterCount();
 } else {
 Objects.requireNonNull(recipe, "Recipe is null");
?
 for (int i = 0; i < recipe.length(); i++) {
 char c = recipe.charAt(i);
 if (c == TAG_CONST) cCount++;
 if (c == TAG_ARG) oCount++;
 }
 }
?
 if (oCount != concatType.parameterCount()) {
 throw new StringConcatException(
 "Mismatched number of concat arguments: recipe wants " +
 oCount +
 " arguments, but signature provides " +
 concatType.parameterCount());
 }
?
 if (cCount != constants.length) {
 throw new StringConcatException(
 "Mismatched number of concat constants: recipe wants " +
 cCount +
 " constants, but only " +
 constants.length +
 " are passed");
 }
?
 if (!concatType.returnType().isAssignableFrom(String.class)) {
 throw new StringConcatException(
 "The return type should be compatible with String, but it is " +
 concatType.returnType());
 }
?
 if (concatType.parameterSlotCount() > MAX_INDY_CONCAT_ARG_SLOTS) {
 throw new StringConcatException("Too many concat argument slots: " +
 concatType.parameterSlotCount() +
 ", can only accept " +
 MAX_INDY_CONCAT_ARG_SLOTS);
 }
?
 String className = getClassName(lookup.lookupClass());
 MethodType mt = adaptType(concatType);
 Recipe rec = new Recipe(recipe, constants);
?
 MethodHandle mh;
 if (CACHE_ENABLE) {
 Key key = new Key(className, mt, rec);
 mh = CACHE.get(key);
 if (mh == null) {
 mh = generate(lookup, className, mt, rec);
 CACHE.put(key, mh);
 }
 } else {
 mh = generate(lookup, className, mt, rec);
 }
 return new ConstantCallSite(mh.asType(concatType));
 }
?
 private static MethodHandle generate(Lookup lookup, String className, MethodType mt, Recipe recipe) throws StringConcatException {
 try {
 switch (STRATEGY) {
 case BC_SB:
 return BytecodeStringBuilderStrategy.generate(lookup, className, mt, recipe, Mode.DEFAULT);
 case BC_SB_SIZED:
 return BytecodeStringBuilderStrategy.generate(lookup, className, mt, recipe, Mode.SIZED);
 case BC_SB_SIZED_EXACT:
 return BytecodeStringBuilderStrategy.generate(lookup, className, mt, recipe, Mode.SIZED_EXACT);
 case MH_SB_SIZED:
 return MethodHandleStringBuilderStrategy.generate(mt, recipe, Mode.SIZED);
 case MH_SB_SIZED_EXACT:
 return MethodHandleStringBuilderStrategy.generate(mt, recipe, Mode.SIZED_EXACT);
 case MH_INLINE_SIZED_EXACT:
 return MethodHandleInlineCopyStrategy.generate(mt, recipe);
 default:
 throw new StringConcatException("Concatenation strategy " + STRATEGY + " is not implemented");
 }
 } catch (Error | StringConcatException e) {
 // Pass through any error or existing StringConcatException
 throw e;
 } catch (Throwable t) {
 throw new StringConcatException("Generator failed", t);
 }
 }
?
 //......
}
  • makeConcatWithConstants方法内部调用了doStringConcat,而doStringConcat方法则调用了generate方法来生成MethodHandle;generate根据不同的STRATEGY来生成MethodHandle,这些STRATEGY有BC_SB、BC_SB_SIZED、BC_SB_SIZED_EXACT、MH_SB_SIZED、MH_SB_SIZED_EXACT、MH_INLINE_SIZED_EXACT,默认是MH_INLINE_SIZED_EXACT(可以通过-Djava.lang.invoke.stringConcat来改变默认的策略)

小结

  • Java 9引入了Compact Strings来取代Java 6的Compressed Strings,它的实现更过彻底,完全使用byte[]来替代char[],同时新引入了一个字段coder来标识是LATIN1还是UTF16
  • isLatin1方法判断COMPACT_STRINGS为true且coder为LATIN1则返回true;诸如charAt、equals、hashCode、indexOf、substring等等一系列方法都依赖isLatin1方法来区分对待是StringLatin1还是StringUTF16
  • Java 9利用InvokeDynamic调用了StringConcatFactory.makeConcatWithConstants方法进行字符串拼接优化,相比于Java 8通过转换为StringBuilder来进行优化,Java 9提供了多种STRATEGY可供选择,这些STRATEGY有BC_SB(等价于Java 8的优化方式)、BC_SB_SIZED、BC_SB_SIZED_EXACT、MH_SB_SIZED、MH_SB_SIZED_EXACT、MH_INLINE_SIZED_EXACT,默认是MH_INLINE_SIZED_EXACT(可以通过-Djava.lang.invoke.stringConcat来改变默认的策略)

doc

  • String Compaction
  • JEP 254: Compact Strings
  • Java 9: Compact Strings
  • Compact Strings In Java 9
  • Java 9 Compact Strings Example
  • Evolution of Strings in Java to Compact Strings and Indify String Concatenation

Tags:

本文暂时没有评论,来添加一个吧(●'◡'●)

欢迎 发表评论:

最近发表
标签列表