diff --git a/okhttp/src/main/java/com/squareup/okhttp/HttpUrl.java b/okhttp/src/main/java/com/squareup/okhttp/HttpUrl.java index dd446009a..f0e34e535 100644 --- a/okhttp/src/main/java/com/squareup/okhttp/HttpUrl.java +++ b/okhttp/src/main/java/com/squareup/okhttp/HttpUrl.java @@ -129,8 +129,55 @@ import okio.Buffer; * The fragment is optional: it can be null, empty, or non-empty. Unlike host, port, path, and query * the fragment is not sent to the webserver: it's private to the client. * - *
Percent encoding is used in every URL component except for the hostname. But the set of + * characters that need to be encoded is different for each component. For example, the path + * component must escape all of its {@code ?} characters, otherwise it could be interpreted as the + * start of the URL's query. But within the query and fragment components, the {@code ?} character + * doesn't delimit anything and doesn't need to be escaped.
{@code
+ *
+ * HttpUrl url = HttpUrl.parse("http://who-let-the-dogs.out").newBuilder()
+ * .addPathSegment("_Who?_")
+ * .query("_Who?_")
+ * .fragment("_Who?_")
+ * .build();
+ * System.out.println(url);
+ * }
+ *
+ * This prints: {@code
+ *
+ * http://who-let-the-dogs.out/_Who%3F_?_Who?_#_Who?_
+ * }
+ *
+ * When parsing URLs that lack percent encoding where it is required, this class will percent encode
+ * the offending characters.
+ *
+ * In order to avoid confusion and discourage phishing attacks, + * IDNA Mapping transforms names to avoid + * confusing characters. This includes basic case folding: transforming shouting {@code SQUARE.COM} + * into cool and casual {@code square.com}. It also handles more exotic characters. For example, the + * Unicode trademark sign (™) could be confused for the letters "TM" in {@code http://ho™mail.com}. + * To mitigate this, the single character (™) maps to the string (tm). There is similar policy for + * all of the 1.1 million Unicode code points. Note that some code points such as "\ud83c\udf69" are + * not mapped and cannot be used in a hostname. + * + *
Punycode converts a Unicode string to an ASCII + * string to make international domain names work everywhere. For example, "σ" encodes as + * "xn--4xa". The encoded string is not human readable, but can be used with classes like {@link + * InetAddress} to establish connections. * *