Showing posts with label sort. Show all posts
Showing posts with label sort. Show all posts

Tuesday, May 5, 2009

Hibernate annotations sort using Comparator

Collections/Sets in Hibernate can be sorted in 2 ways:

  • One way is by Using “OrderBy” annotation (@OrderBy).
  • Second way is by using “Sort” annotation (@Sort).

The first way does the sort on the query level, meaning the SQL code that is generated for the object has the ORDER BY clause. The second way does the sort on Java.

Doing the sort on the SQL level may be more efficient, but doing the sort on Java gives us a Set (or List) that is always sorted, even if new items are added to it. In some cases we want to keep the list sorted as we add new items to it.

Using the Sort annotation is very simple, two things are needed to be done:

  • Add the @Sort annotation to the Set we would like to sort.
  • Implement a Comparator that will be used for the sort.

Here is example of 2 objects (tables):

  • The first object is called Owner. It represent a person. It’s columns are:
    • owner_id
    • first_name
    • last_name
  • The second is called Disk. It represents a music disk. It’s columns are:
    • disk_id
    • owner_id
    • title
    • artist

The relation between Owner and Disk is one-to-many. The example shows how to use the @Sort annotations to keep the disks Set sorted by the disk title.

This is the Disk class. It contains the DiskComparator:

package com.bashan.blog;
import javax.persistence.*;
import java.util.Comparator;
@Entity
@Table(name="disk")
public class Disk {
@Id
@Column(name="disk_id")
private Integer diskId;
@Column(name="title")
private String title;
@Column(name="artist")
private String artist;
@ManyToOne
@JoinColumn(name="owner_id")
private Owner owner
public Integer getDiskId() {
return diskId;
}
public void setDiskId(Integer diskId) {
this.diskId = diskId;
}
public String getTitle() {
return title;
}
public void setTitle(String title) {
this.title = title;
}
public String getArtist() {
return artist;
}
public void setArtist(String artist) {
this.artist = artist;
}
public Disk getDisk() {
return disk;
}
public void setDisk(Disk disk) {
this.disk = disk;
}
public static class DiskComparator implements Comparator<Disk>
{
public int compare(Disk d1, Disk d2)
{
String t1 = d1.getTitle();
String t2 = d2.getTitle();
if (t1 == null && t2 == null)
{
return 0;
}
else if (t1 == null)
{
return -1;
}
else if (t2 == null)
{
return 1;
}
else
{
return t1.toLowerCase().compareTo(t2.toLowerCase());
}
} 
}
}
This is the Owner class. It contains the use of @Sort annotation:
package com.bashan.blog;
import org.hibernate.annotations.Sort;
import org.hibernate.annotations.SortType;
import javax.persistence.*;
import java.util.SortedSet;
@Entity
@Table(name="owner")
public class Owner {
@Id
@Column(name="owner_id")
private Integer ownerId;
@Column(name="first_name")
private String firstName;
@Column(name="last_name")
private String lastName;
@OneToMany(cascade=CascadeType.ALL, fetch=FetchType.LAZY)
@JoinColumn(name="owner_id")
@Sort(type = SortType.COMPARATOR, comparator = Disk.DiskComparator.class)
private SortedSet<Disk> disks;
public Integer getOwnerId() {
return ownerId;
}
public void setOwnerId(Integer ownerId) {
this.ownerId = ownerId;
}
public String getFirstName() {
return firstName;
}
public void setFirstName(String firstName) {
this.firstName = firstName;
}
public String getLastName() {
return lastName;
}
public void setLastName(String lastName) {
this.lastName = lastName;
}
public SortedSet<Disk> getDisks() {
return disks;
}
public void setDisks(SortedSet<Disk> disks) {
this.disks = disks;
}
}

You can also download the 2 classes from Here.

Monday, March 2, 2009

Sorting large text files

Sorting a very big text file is a problem in terms of memory resources. Ideally, if all data could load to memory, it was an easy task: simply sort data in memory and write it to a new file. But, there are times we need to sort a really big file and the physical memory is not enough.

I once had to write such a large file sorter, and had a hard time finding something ready on the web. Therefore, I wrote a sorter of my own. The logic behind sorting a very large file is not hard to understand once you get the point. In general, the steps needed are:

Phase 1: Split big file to several small sorted files

  1. Read n lines from file to memory.
  2. Sort lines in memory.
  3. Write sorted lines to a temp file.
  4. Repeat step 1 while reading next n lines to memory and writing sorted lines to a new temporary file.

Phase 2: Merge smaller files to a new sorted big file

  1. Find the smallest line among all files.
  2. Write line to a new file (this file is the new merged file).
  3. Move to the next row in file (the file that had the smallest line).
  4. Repeat step 1 until all lines in all files were read.

Here is the code for such a file sorter:

import java.io.*;
import java.util.*;
public class FileSort
{
  protected String filenameToSort;
  protected String filenameSorted;
  protected Comparator comparator;
  protected int maxCapacity;
  public FileSort(String filenameToSort, String filenameSorted, Comparator comparator, int maxCapacity)
  {
    this.filenameToSort = filenameToSort;
    this.filenameSorted = filenameSorted;
    this.comparator = comparator;
    this.maxCapacity = maxCapacity;
  }
  public void sort() throws IOException
  {
    BufferedReader bufferedReader = new BufferedReader(new FileReader(filenameToSort));
    String line = null;
    int fileIndex = 0;
    BufferedWriter bufferedWriter;
    do
    {
      List<String> lines = new ArrayList<String>(maxCapacity);
      for (int i = 0; i < maxCapacity; i++)
      {
        line = bufferedReader.readLine();
        if (line == null)
        {
          break;
        }
        else
        {
          lines.add(line);
        }
      }
      Collections.sort(lines, comparator);
      bufferedWriter = new BufferedWriter(new FileWriter(filenameToSort + ".tmp" + fileIndex++));
      for (String _line : lines)
      {
        bufferedWriter.write(_line);
        bufferedWriter.newLine();
      }
      bufferedWriter.flush();
      bufferedWriter.close();
    }
    while (line != null);
    bufferedReader.close();
    mergeFiles(fileIndex);
  }
  public void mergeFiles(int numFiles) throws IOException
  {
    BufferedWriter bufferedWriter = new BufferedWriter(new FileWriter(filenameSorted));
    List<MergeFile> mergeFiles = new ArrayList<MergeFile>();
    for (int i = 0; i < numFiles; i++)
    {
      mergeFiles.add(new MergeFile(filenameToSort + ".tmp" + i));
    }
    do
    {
      // Find smallest line
      Iterator<MergeFile> iterator = mergeFiles.iterator();
      MergeFile minMergeFile = null;
      while (iterator.hasNext())
      {
        MergeFile mergeFile = iterator.next();
        if (mergeFile.line != null) // File has more lines
        {
          if (minMergeFile == null || comparator.compare(mergeFile.line, minMergeFile.line) < 0)
          {
            minMergeFile = mergeFile;
          }
        }
        else // No more lines in file. No need to iterate it
        {
          mergeFile.removeFile();
          iterator.remove();
        }
      }
      // Write smallest line to file
      if (minMergeFile != null)
      {    
        bufferedWriter.write(minMergeFile.line);
        bufferedWriter.newLine();
        minMergeFile.readLine();
      }
    }
    while (mergeFiles.size() > 0); // As long as there are files to read
    bufferedWriter.flush();
    bufferedWriter.close();
  }
  private final class MergeFile
  {
    BufferedReader bufferedReader;
    String filename;
    String line;
    boolean isReadNextRow = true;
    public MergeFile(String filename) throws IOException
    {
      this.filename = filename;
      bufferedReader = new BufferedReader(new FileReader(filename));
      readLine();
    }
    public void readLine() throws IOException
    {
      isReadNextRow = (line = bufferedReader.readLine()) != null;
    }
    public void removeFile() throws IOException
    {
      bufferedReader.close();
      new File(filename).delete();
    }
  }
}

Note for several things on this file sorter:
  1. It doesn’t limit the number of simultaneous open files. It shouldn't be much of a problem, since the maximum open files limit is pretty high and also can be configured (at least on Linux machines). If such limit exists you can always merge a group of files at a time (the size of the group is the maximum allowed open files). Then merge the new grouped files to a single file.
  2. This file sorter does not define the logic in which the lines are sorted.It should be supplied using a Comparator.
  3. The maximum number of rows to read to memory is also defined passed as a parameter to the sorter.

Suppose you would like to sort several big files to a single huge sorted file. The FileSort class can be easily extended to support sorting multiple big files. It can be done by first merging all files to a single file. Then sorting it using the FileSorter:
public class MultipleFileSort extends FileSort {
    private static final int BUFFER_SIZE = 1024 * 4;
    protected String[] filesToSort;
    public MultipleFileSort(String[] filesToSort, String filenameSorted, Comparator comparator, int maxReadLines) {
        super(filenameSorted + ".tmp", filenameSorted, comparator, maxReadLines);
        this.filesToSort = filesToSort;
    }
    public static void mergeFiles(String[] files, String filenameMerged) throws IOException {
        // Open all files for reading
        InputStream[] inputs = new InputStream[files.length];
        OutputStream os = null;
        try {
            for (int i = 0; i < files.length; i++) {
                inputs[i] = new FileInputStream(files[i]);
            }
            // Open file for writing
            os = new FileOutputStream(filenameMerged);
            byte[] buffer = new byte[BUFFER_SIZE];
            int len;
            for (InputStream is : inputs) {
                while ((len = is.read(buffer)) > 0) {
                    os.write(buffer, 0, len);
                }
            }
        }
        finally {
            for (InputStream is : inputs) {
                FileUtils.close(is);
            }
            FileUtils.close(os);
        }
    }
    public void sort() throws IOException {
        FileUtils.mergeFiles(filesToSort, filenameToSort);
        super.sort();
        new File(filenameToSort).delete();
    }
}

And finally example of using the MultipleFileSort class:

    public static void main(String[] args) {
        try {
            MultipleFileSort multipleFileSort = new MultipleFileSort(
                    new String[]{"in_file1.txt", "in_file2.txt", "in_file3.txt"},
                    "out_sorted.txt", new Comparator<String>() {
                        public int compare(String o1, String o2) {
                            return (o1).compareTo(o2);
                        }
                    }, 1000);
            multipleFileSort.sort();
        }
        catch (Exception e) {
            e.printStackTrace();
        }
    }