Formatting & TokenizingS2C Home « Formatting & Tokenizing
In our final lesson of the API Contents section we look at formatting and tokenizing our data. We begin the lesson by looking at formatting our output and Java offers us different options for doing this. In this
lesson we will look at formatting data using the java.util.Formatter
class as well as using the static format()
method of the java.util.String
class. We finish of our look at
formatting output by looking at the printf()
method contained in the java.io.PrintStream
and java.io.PrintWriter
classes.
We finish off our tour of the Java API by looking at tokenizing our data. For this we will first look at the split()
method of the String
class which uses a regular expression
delimiter to tokenize our data. After this we look at the java.io.Scanner
class; objects of this class allow us to break input into tokens using a delimiter pattern which defaults to whitespace or can be
set using a regular expression.
Formatting OverviewTop
All the methods we will look at here which produce formatted output require a format string and an argument list. The formatted output is a String
object which is derived from the formatting string that
may contain fixed text as well as one or more embedded format specifiers, that are then applied to the argument list which can be set to null
.
Format specifiers which have the argument list set to null
have the following syntax:
// Format specifier syntax with null argument list
%[flags][width]conversion
- The optional flags is a set of characters that modify the output format where the set of valid flags depends on the conversion.
- The optional width is a non-negative decimal integer indicating the minimum number of characters to be written to the output.
- The required conversion is a character indicating content to be inserted in the output.
Format specifiers used to represent date and time types have the following syntax:
// Format specifier syntax with argument list for date and time types
%[argument_index$][flags][width]conversion
- The optional argument_index is a decimal integer indicating the position of the argument in the argument list. The first argument is referenced by "1$", the second by "2$" and so on.
- The optional flags and width are defined as above.
- With dates the required conversion is a two character sequence where the first character is 't' or 'T' and the second character indicates the format to be used.
Format specifiers for general, character, and numeric types have the following syntax:
// Format specifier syntax with argument list for general, character, and numeric types
%[argument_index$][flags][width][.precision]conversion
- The optional argument_index, flags and width are defined as above.
- The optional precision is a non-negative decimal integer generally used to restrict the number of characters but specific behavior depends on the conversion.
- The required conversion is a character indicating how the argument should be formatted, where the set of valid conversions for a given argument depend on the argument's data type.
The table below lists the conversions used in this lesson with their descriptions. You can find the complete list of flags and conversions in the API documentation for the java.util.Formatter
class.
Conversion Symbols | Description |
---|---|
a | Formats boolean true or false |
c | Formats as a Unicode character |
d | Formats as a decimal integer |
f | Formats the argument as a floating point decimal. |
o | Formats as an octal integer |
s | Formats the argument as a string. |
x | Formats as a hexidecimal integer |
A | Locale-specific full name of day of the week, "Monday", "Tuesday".... |
B | Locale-specific full month name, "January", "February".... |
Y | Year in format YYYY with leading zeros for years less than 1000 |
The java.util.Formatter
ClassTop
The java.util.Formatter
class allows us to format output through a wide variety of constructors. The API documentation is extremely detailed and we are just showing an example so you get the idea:
/*
java.util.Formatter Example
*/
import java.util.Date; // Import the Date class from java.util package
import java.util.Formatter; // Import the Formatter class from java.util package
import java.util.Locale; // Import the Locale class from java.util package
class TestFormatter {
public static void main(String[] args) {
// Some types for formatting
Date a = new Date();
double b = 123456789.345678;
// Create appendable StringBuilder object to output to
StringBuilder sb = new StringBuilder();
// Send all output to Appendable object sb using UK locale
Formatter f = new Formatter(sb, Locale.UK);
// Output to sb and display on console.
f.format("Formatted output: %1$tA-%1$tB-%1$tY | %1$tY-%1$tB-%1$tA | %2$,.3f", a, b);
// Rearrange output using indices.
f.format("...Rearranged output: %2$,.3f | %1$tA-%1$tB-%1$tY | %1$tY-%1$tB-%1$tA", a, b);
System.out.println(sb);
// Create appendable StringBuilder object to output to
StringBuilder sb2 = new StringBuilder();
// Send all output to Appendable object sb using GERMANY locale
Formatter f2 = new Formatter(sb2, Locale.GERMANY);
// Output to sb2 and display on console.
f2.format("Formatted output: %1$tA-%1$tB-%1$tY | %1$tY-%1$tB-%1$tA | %2$,.3f", a, b);
// Rearrange output using indices.
f2.format("...Rearranged output: %2$,.3f | %1$tA-%1$tB-%1$tY | %1$tY-%1$tB-%1$tA", a, b);
System.out.println(sb2);
}
}
Save, compile and run the TestFormatter
test class in directory c:\_APIContents2 in the usual way.
The above screenshot shows the output of compiling and running the TestFormatter
class. First off we create a Date
object and double
to be formatted for output and a
StringBuilder
object to output our formatted data to. We then pass StringBuilder
object and the UK locale as arguments to our Formatter
constructor. We then format some output
using the format()
method. Lets go through the format specifiers used:
Format Specifier | Description |
---|---|
%1$tA | For the first argument index. 1$ Use prefix so we know this is a date/time conversion. t Give us the locale-specific full name of day of the week conversion. A |
%1$tB | For the first argument index. 1$ Use prefix so we know this is a date/time conversion. t Give us the locale-specific full month name conversion. B |
%1$tY | For the first argument index. 1$ Use prefix so we know this is a date/time conversion. t Give us the locale-specific year conversion. Y |
%2$,.3f | For the second argument index. 2$ Use flag so the result will include locale-specific grouping separators. , Use the decimal separator. . Use 3 decimal places of precision. 3 Give us a floating-point conversion. f |
Using the format specifers described above we display the dates in different orders, rearrange the output using the argument indices and also output the display for the GERMANY locale.
The String.format()
MethodTop
The String.format()
static method allows us to format an output string and is overloaded to accept a format string and argument list or a locale, format string and argument list. In our example
we will use the second overloaded method which accepts a locale, format string and argument list:
/*
String.format() Example
*/
import java.util.Locale; // Import the Locale class from java.util package
class TestStringFormat {
public static void main(String[] args) {
// Some types for formatting
int a = 123456789;
boolean b = true;
char c = 65;
// Create a formatted String object using UK locale
String s = String.format(Locale.UK, "UK Dec: %1$,d %1$s Bool: %2b Char: %3$c", a, b, c);
System.out.println(s);
// Create a formatted String object using GERMANY locale
String s2 = String.format(Locale.GERMANY, "GER Dec: %1$,d %1$s Bool: %2b Char: %3$c", a, b, c);
System.out.println(s2);
}
}
Save, compile and run the TestStringFormat
test class in directory c:\_APIContents2 in the usual way.
The above screenshot shows the output of compiling and running the TestStringFormat
class. First off we create some primitives with values, then output these to a String
object using the
format()
method with a UK locale, before outputting the results. Lets go through the format specifiers used:
Format Specifier | Description |
---|---|
%1$,d | For the first argument index. 1$ Use flag so the result will include locale-specific grouping separators. , Use a decimal integer conversion. d |
%1s | For the first argument index. 1$ Use a string conversion. s |
%2b | For the second argument index. 2$ Use a boolean conversion. b |
%3c | For the third argument index. 3$ Use a character conversion. c |
Using the format specifers described above we display the formatted primitives and also output the display for the GERMANY locale.
The printf()
MethodTop
The printf()
method allows us to format output to a java.io.PrintStream
or java.io.PrintWriter
stream. These classes also contains a method called format()
which
produces the same results, so whatever you read here for the printf()
method, can also be applied to the format()
method. For our example we will use the printf()
method from
the PrintStream
class. System.out
is of type PrintStream
and so will be used for convenience:
/*
printf() Example
*/
import java.io.PrintStream; // Import the Printstream class from java.io package
class TestStringf {
public static void main(String[] args) {
// Some types for formatting
int a = 1234;
// Send formatted output to Printstream
System.out.printf("Dec: %1$,d Octal: %1$o Hex: %1$x", a);
}
}
Save, compile and run the TestStringf
test class in directory c:\_APIContents2 in the usual way.
The above screenshot shows the output of compiling and running the TestStringf
class. First off we create an integer primitive with value, then output this to a console using the
format()
method. There is also a method signature in which you can also pass a locale to the method. Lets go through the format specifiers used:
Format Specifier | Description |
---|---|
%1$,d | For the first argument index. 1$ Use flag so the result will include locale-specific grouping separators. , Use a decimal integer conversion. d |
%1o | For the first argument index. 1$ Use an octal conversion. o |
%1x | For the first argument index. 1$ Use a hexidecimal conversion. x |
Tokenizing Our DataTop
In this part of the lesson we look at splitting our data into separate tokens. For this we will first look at the split()
method of the String
class which uses a
regular expression delimiter to tokenize our data. After this we look at the java.io.Scanner
class; objects of this class allow us to break input into tokens using a
delimiter pattern which defaults to whitespace or can be set using a regular expression.
The split()
MethodTop
The split()
method will split a string around matches of the given regular expression, returning the results in a String
array. The split()
method is
overloaded and will accept a regex string and a limit
argument of type int
denoting the number of times the pattern is to be applied. The second form just requires a
regex string and in this form it is the same as invoking the split()
method with the limit
set to zero. An explanation of how values passed to the limit
parameter
affect the number of times the pattern is to be applied follows:
limit < 0
Pattern will be applied as many times as possible, output array can have any length.limit = 0
Pattern will be applied as many times as possible, output array can have any length and trailing empty strings are discarded.limit > 0
Pattern will be applied at mostlimit - 1
times, output array length maximum<= limit
and output array last entry will contain all input beyond last matched delimiter.
For our example we use the split()
method to delimit our str1 String
object using regular expressions for whitespace and the word "and".
/*
The split() method
*/
public class TestSplit {
public static void main(String[] args) {
String str1 = "1 and 2 and 3 and 4";
// Whitespace delimiter
String[] sOut = str1.split("\\s", 0);
for (int i=0; i<sOut.length; i++) { System.out.print("idx" + i + ": " + sOut[i] + " | "); }
System.out.println(" ");
sOut = str1.split("\\s", -1);
for (int i=0; i<sOut.length; i++) { System.out.print("idx" + i + ": " + sOut[i] + " | "); }
System.out.println(" ");
sOut = str1.split("\\s", 3);
for (int i=0; i<sOut.length; i++) { System.out.print("idx" + i + ": " + sOut[i] + " | "); }
System.out.println(" ");
// "and" delimiter
sOut = str1.split("and", 0);
for (int i=0; i<sOut.length; i++) { System.out.print("idx" + i + ": " + sOut[i] + " | "); }
System.out.println(" ");
sOut = str1.split("and", -3);
for (int i=0; i<sOut.length; i++) { System.out.print("idx" + i + ": " + sOut[i] + " | "); }
System.out.println(" ");
sOut = str1.split("and", 3);
for (int i=0; i<sOut.length; i++) { System.out.print("idx" + i + ": " + sOut[i] + " | "); }
}
}
The screenshot above shows the results of running the TestSplit
class. We output using different limits to show how this parameter affect the output String
array.
The java.util.Scanner
ClassTop
The java.util.Scanner
class is a simple text scanner which allows us to parse primitive data types and strings using regular expressions. Objects of this class allow us to break input into
tokens using a delimiter pattern. The resulting tokens can then be converted into values of different types using one of the next
type methods available in the java.util.Scanner
class. In our example we show how to use the Scanner
class with the default delimiter of whitespace and also with a delimiter created using a regular expression.
/*
Scanner Examples
*/
import java.util.Scanner; // Import the Scanner class from java.util package
class TestScanner {
public static void main(String[] args) {
String input1 = "1 2.0 3.1 4";
String input2 = "1 and 2.0 and 3.1 and 4";
// Using default delimiter (whitespace)
Scanner s1 = new Scanner(input1);
System.out.println(s1.nextInt());
System.out.println(s1.nextFloat());
System.out.println(s1.nextFloat());
System.out.println(s1.nextInt());
s1.close();
// Using ' and ' as delimiter
Scanner s2 = new Scanner(input2).useDelimiter("\\s*and\\s*");
System.out.println(s2.nextInt());
System.out.println(s2.nextFloat());
System.out.println(s2.nextFloat());
System.out.println(s2.nextInt());
s2.close();
}
}
Save, compile and run the TestScanner
test class in directory c:\_APIContents2 in the usual way.
The above screenshot shows the output of compiling and running the TestScanner
class. The examples use a default and custom delimiter to extract tokens to our Scanner
object.. We then use
the nextInt()
and nextFloat()
methods to extract the required tokens and print these off to the console.
There are other ways to use the java.util.Scanner
class, such as using the match()
method, which returns the match result of the last scanning operation performed
by this Scanner
object. I will leave it as an exercise for you to investigate this method of the Scanner
class.
Lesson 9 Complete
In our final look at the Java API we examined formatting and tokenizing our data.
What's Next?
We start a new section on Concurrency with an introduction to threads.