We often encounter the situation that requires text file data processing. Here we’ll look at how to execute conditioned filtering in text files with Java through an example: read employee information from text file employee.txt and select female employees who were born on and after January 1, 1981.

         The text file employee.txt is in a format as follows:

EID   NAME       SURNAME  GENDER  STATE        BIRTHDAY        HIREDATE       DEPT        SALARY

1       Rebecca   Moore                   F       California    1974-11-20       2005-03-11        R&D           7000

2       Ashley       Wilson                   F       New York    1980-07-19       2008-03-16        Finance      11000

3       Rachel      Johnson                 F       New Mexico1970-12-17      2010-12-01        Sales           9000

4       Emily         Smith                     F       Texas          1985-03-07       2006-08-15         HR              7000

5       Ashley       Smith                     F       Texas          1975-05-13       2004-07-30         R&D           16000

6       Matthew   Johnson                 M       California    1984-07-07       2005-07-07        Sales          11000

7       Alexis        Smith                     F       Illinois           1972-08-16       2002-08-16        Sales           9000

8       Megan      Wilson                    F       California     1979-04-19      1984-04-19        Marketing    11000

9       Victoria     Davis                     F       Texas            1983-12-07      2009-12-07        HR                3000

10     Ryan         Johnson                M        Pennsylvania1976-03-12     2006-03-12        R&D             13000

11     Jacob        Moore                  M        Texas            1974-12-16      2004-12-16        Sales           12000

12     Jessica     Davis                    F        New York      1980-09-11      2008-09-11         Sales           7000

13     Daniel       Davis                    M        Florida          1982-05-14      2010-05-14         Finance       10000

22.jpg

Java’s way of code writing is that it reads data from the file by rows, save them in the List objects, traverse List objects, and savethe eligible records in the resultingList objects. Lastly, print out the number of eligible employees. Detailed code is as follows:

       public static void myFilter() throws Exception{

              File file = new File("D:\\employee.txt");

              FileInputStream fis = null;

              fis = new FileInputStream(file);

              InputStreamReader input = new InputStreamReader(fis);

              BufferedReader br = new BufferedReader(input);

              String line = null;

              String info[] = null;

              List sourceList= new ArrayList();

              List resultList= new ArrayList();

              if ((line = br.readLine())== null) return;//skip the first line, exit if the file is null

              while((line = br.readLine())!= null){ //import to the memory from the file

                     info = line.split("\t");

                     Map<String,String> emp=new HashMap<String,String>();

                     emp.put("EID",info[0]);

                     emp.put("NAME",info[1]);

                     emp.put("SURNAME",info[2]);

                     emp.put("GENDER",info[3]);

                     emp.put("STATE",info[4]);

                     emp.put("BIRTHDAY",info[5]);

                     sourceList.add(emp);

              }

              for (int i = 0, len = sourceList.size(); i < len; i++) {//process data by rows

                     Map<String,String> emp =(Map) sourceList.get(i); 

                     SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd");

                     if ( emp.get("GENDER").equals("F") && !sdf.parse(emp.get("BIRTHDAY")).before(sdf.parse("1981-01-01")) )

{ //save the eligible records in List objects using the conditional statement

                            resultList.add(emp);

                     }

              }

              System.out.println("count="+resultList.size());//print out the number of eligible employees

       }


The filtering condition of this function is fixed. If the condition is changed, the conditional statement in the program should be modified accordingly. Multiple pieces of code are needed if there are multiple conditions, and the program lacks the ability to handle the provisional, dynamic conditions. Now we’ll rewrite the code and make it universal in some degree by slightly changing the loop of traversing sourceList:

       for (int i = 0, len = sourceList.size(); i < len; i++) {

                     Map<String,String> emp =(Map) sourceList.get(i); 

                     SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd");

                     boolean isRight = true;

                     if (gender!=null && !emp.get("GENDER").equals(gender)){//process the condition of gender

                            isRight = false;

                     }

                     if (start!=null && sdf.parse(emp.get("BIRTHDAY")).before(start) ){//process the starting conditionof BIRTHDAY

                            isRight = false;

                     }

                     if (end!=null && sdf.parse(emp.get("BIRTHDAY")).after(end) ){//process the end condition of BIRTHDAY

                            isRight = false;;

                     }

                     if (isRight) resultList.add(emp);//save the eligible records in the resulting list

              }

In the rewritten code, gender, start and end are input parameters of the functionmyFilter. The program can manage situations that GENDER field equals the input value gender, BIRTHDAY field is greater than or equal to the input value start as well as less than or equal to the input value end. If any of the input values is null, the condition will be ignored. Conditions are joined by AND.


If we want to make myFiltera more universal function, for example, join conditions with OR or allow computation between fields, the code will become more complicated, requiringprogram for analyzing and evaluating dynamic expressions. This type of program can be as flexible and universal as database SQL, but it is really difficult to develop.


In view of this, we can turn to esProc to assist with this task. esProc is a programming language designed for processing structured (semi-structured) data. It is quite easy for it to perform the above universal query task and can integrate with Java seamlessly so that Java can access and process text file data as flexibly as SQL does. For example, to query female employees who were born on and after January 1, 1981, esProc can import from external an input parameter “where” as the dynamic condition, see the following chart:

   java filter1.jpg

The value of “where”is:BIRTHDAY>=date(1981,1,1) && GENDER=="F". esProc needs only three lines of code as follows:

java filter2.jpg

 

A1:Define a file object and import data to it. The first row is the headline with tab as the field separator by default. esProc’s IDE can visually display the imported data, as shown on the right of the above chart.

A2:Filter according to the condition. Here macro is used to analyze the expression dynamically. “where” is the input parameter. esProc will first compute the expression enclosed by ${…}, then replace ${…} with the computed result acting as macro string value and interpret and execute the result. In this example, the code we finally execute is =A1.select(BIRTHDAY>=date(1981,1,1) && GENDER=="F").

A3:Return the eligible result set to the external program.


When the filtering condition changes, we just need to change the parameter “where”without rewriting the code. For example, the condition is modified into querying female employees who were born on and after January 1, 1981,or records of employees whose NAME+SURNAMEequals“RebeccaMoore”. The code forwhere’s parameter value can be like this: BIRTHDAY>=date(1981,1,1) && GENDER=="F" || NAME+SURNAME=="RebeccaMoore". After execution, the result set in A2 is shown in the following chart:

java filter3.jpg

Finally, call this piece of esProc code with Java to get the filtering result by using jdbc provided by esProc. The code called by Java for saving the above esProc code as test.dfx file is as follows:

       // create esProcjdbcconnection

       Class.forName("com.esproc.jdbc.InternalDriver");

       con= DriverManager.getConnection("jdbc:esproc:local://");

       //call esProc program (the stored procedure) in which test is the file name of dfx

       st =(com.esproc.jdbc.InternalCStatement)con.prepareCall("call test(?)");

       //set parameters

       st.setObject(1," BIRTHDAY>=date(1981,1,1) && GENDER==\"F\" ||NAME+SURNAME==\"RebeccaMoore\"");//the parameter is the dynamic filtering condition

       // execute esProc stored procedure

       st.execute();

       //get the result set: a set of eligible employees

       ResultSet set = st.getResultSet();


When writing script of relatively simple code, we may write the esProc code directly into Java code that calls the esProc JDBC. This can save us from having to writethe esProc script file (test.dfx):

st=(com. esproc.jdbc.InternalCStatement)con.createStatement();

ResultSet set=st.executeQuery("=file(\"D:\\\\esProc\\\\employee.txt\").import@t().select(BIRTHDAY>=date(1981,1,1)&&GENDER==\"F\" || NAME+SURNAME==\"RebeccaMoore\")");


This piece of Java code directly calls a line of code from esProc script: get data from the text file, filter them according to the specified condition and return the result set toset, the ResultSet object.