Friday, May 6, 2011

What is the fastest way to read a large number of small files into memory ?

I need to read ~50 files on every server start and place each text file's representation into memory. Each text file will have its own string (which is the best type to use for the string holder?).

What is the fastest way to read the files into memory, and what is the best data structure/type to hold the text in so that I can manipulate it in memory (search and replace mainly)?

Thanks

From stackoverflow
  • It depends a lot on the internal structure of your text files and what you intend to do with them.

    Are the files key-value dictionaries (i.e. "properties" files)? XML? JSON? You have standard structures for those.

    If they have a formal structure you may also use JavaCC to build an object representation of the files.

    Otherwise, if they are just blobs of data, well, read the files and put them in a String.

    Edit: about search&replace- juste use String's replaceAll function.

  • If you want to do search / replace in a text file then you could use StringBuilder to store the content of your files.

    I have no advice for reading the filee...

  • The most efficient way is:

    • Determine the length of the file (File.length())
    • Create a char buffer with the same size (or slightly larger)
    • Determine the encoding of the file
    • Use new InputStreamReader (new FileInputStream(file), encoding) to read
    • Read the while file into the buffer with a single call to read(). Note that read() might return early (not having read the whole file). In that case, call it again with an offset to read the next batch.
    • Create the string: new String(buffer)

    If you need to search&replace once at startup, use String.replaceAll().

    If you need to do it repeatedly, you may consider using StringBuilder. It has no replaceAll() but you can use it to manipulate the character array in place (-> no allocation of memory).

    That said:

    1. Make your code as short and simple as possible.
    2. Measure the performance
    3. It it's too slow, fix it.

    There is no reason to waste a lot of time into making this code run fast if it takes just 0.1s to execute.

    If you still have a performance problem, consider to put all the text files into a JAR, add it into the classpath and use Class.getResourceAsStream() to read the files. Loading things from the Java classpath is highly optimized.

    : if i load all the text files with Class.getResourceAsStream() how can i iterate throw the files inside the jar ?
    TofuBeer : java.util.ZipFile will let you work with files on a JAR (a JAR is just a zip file).
    Aaron Digulla : Either use ZipFile or iterate over a list of filenames (instead of trying to iterate over the resource).
    TofuBeer : cannot make a single call to read - have to loop it because read may not read in the entire file in one go
    Hosam Aly : @TofuBeer is right; you should loop checking how many bytes have actually been read.
    Aaron Digulla : @TofuBeer: fixed. thanks!
    kohlerm : using String.replaceAll() is definitely not a good idea. It will not replace Strings inp lace, but allocate new Strings.
    Aaron Digulla : kohlerm: Since there is no way to modify a String in place in Java, it doesn't really matter how you do it. As I said: If the String is really large, use a StringBuilder instead.
  • A memory mapped file will be fastest... something like this:

        final File             file;
        final FileChannel      channel;
        final MappedByteBuffer buffer;
    
        file    = new File(fileName);
        fin     = new FileInputStream(file);
        channel = fin.getChannel();
        buffer  = channel.map(MapMode.READ_ONLY, 0, file.length());
    

    and then proceed to read from the byte buffer.

    This will be significantly faster than FileInputStream or FileReader.

    EDIT:

    After a bit of investigation with this it turns out that, depending on your OS, you might be better off using a new BufferedInputStream(new FileInputStream(file)) instead. However reading the whole thing all at once into a char[] the size of the file sounds like the worst way.

    So BufferedInputStream should give roughly consistent performance on all platforms, while the memory mapped file may be slow or fast depending on the underlying OS. As with everything that is performance critical you should test your code and see what works best.

    EDIT:

    Ok here are some tests (the first one is done twice to get the files into the disk cache).

    I ran it on the rt.jar class files, extracted to the hard drive, this is under Windows 7 beta x64. That is 16784 files with a total of 94,706,637 bytes.

    First the results...

    (remember the first is repeated to get the disk cache setup)

    • ArrayTest

      • time = 83016
      • bytes = 118641472
    • ArrayTest

      • time = 46570
      • bytes = 118641472
    • DataInputByteAtATime

      • time = 74735
      • bytes = 118641472
    • DataInputReadFully

      • time = 8953
      • bytes = 118641472
    • MemoryMapped

      • time = 2320
      • bytes = 118641472

    Here is the code...

    import java.io.BufferedInputStream;
    import java.io.DataInputStream;
    import java.io.File;
    import java.io.FileInputStream;
    import java.io.IOException;
    import java.io.InputStream;
    import java.nio.MappedByteBuffer;
    import java.nio.channels.FileChannel;
    import java.nio.channels.FileChannel.MapMode;
    import java.util.HashSet;
    import java.util.Set;
    
    public class Main
    {
        public static void main(final String[] argv)
        {
            ArrayTest.main(argv);
            ArrayTest.main(argv);
            DataInputByteAtATime.main(argv);
            DataInputReadFully.main(argv);
            MemoryMapped.main(argv);
        }
    }
    
    abstract class Test
    {
        public final void run(final File root)
        {
            final Set<File> files;
            final long      size;
            final long      start;
            final long      end;
            final long      total;
    
            files = new HashSet<File>();
            getFiles(root, files);
    
            start = System.currentTimeMillis();
    
            size = readFiles(files);
    
            end = System.currentTimeMillis();
            total = end - start;
    
            System.out.println(getClass().getName());
            System.out.println("time  = " + total);
            System.out.println("bytes = " + size);
        }
    
        private void getFiles(final File      dir,
                              final Set<File> files)
        {
            final File[] childeren;
    
            childeren = dir.listFiles();
    
            for(final File child : childeren)
            {
                if(child.isFile())
                {
                    files.add(child);
                }
                else
                {
                    getFiles(child, files);
                }
            }
        }
    
        private long readFiles(final Set<File> files)
        {
            long size;
    
            size = 0;
    
            for(final File file : files)
            {
                size += readFile(file);
            }
    
            return (size);
        }
    
        protected abstract long readFile(File file);
    }
    
    class ArrayTest
        extends Test
    {
        public static void main(final String[] argv)
        {
            final Test test;
    
            test = new ArrayTest();
            test.run(new File(argv[0]));
        }
    
        protected long readFile(final File file)
        {
            InputStream stream;
    
            stream = null;
    
            try
            {
                final byte[] data;
                int          soFar;
                int          sum;
    
                stream = new BufferedInputStream(new FileInputStream(file));
                data   = new byte[(int)file.length()];
                soFar  = 0;
    
                do
                {
                    soFar += stream.read(data, soFar, data.length - soFar);
                }
                while(soFar != data.length);
    
                sum = 0;
    
                for(final byte b : data)
                {
                    sum += b;
                }
    
                return (sum);
            }
            catch(final IOException ex)
            {
                ex.printStackTrace();
            }
            finally
            {
                if(stream != null)
                {
                    try
                    {
                        stream.close();
                    }
                    catch(final IOException ex)
                    {
                        ex.printStackTrace();
                    }
                }
            }
    
            return (0);
        }
    }
    
    class DataInputByteAtATime
        extends Test
    {
        public static void main(final String[] argv)
        {
            final Test test;
    
            test = new DataInputByteAtATime();
            test.run(new File(argv[0]));
        }
    
        protected long readFile(final File file)
        {
            DataInputStream stream;
    
            stream = null;
    
            try
            {
                final int fileSize;
                int       sum;
    
                stream   = new DataInputStream(new BufferedInputStream(new FileInputStream(file)));
                fileSize = (int)file.length();
                sum      = 0;
    
                for(int i = 0; i < fileSize; i++)
                {
                    sum += stream.readByte();
                }
    
                return (sum);
            }
            catch(final IOException ex)
            {
                ex.printStackTrace();
            }
            finally
            {
                if(stream != null)
                {
                    try
                    {
                        stream.close();
                    }
                    catch(final IOException ex)
                    {
                        ex.printStackTrace();
                    }
                }
            }
    
            return (0);
        }
    }
    
    class DataInputReadFully
        extends Test
    {
        public static void main(final String[] argv)
        {
            final Test test;
    
            test = new DataInputReadFully();
            test.run(new File(argv[0]));
        }
    
        protected long readFile(final File file)
        {
            DataInputStream stream;
    
            stream = null;
    
            try
            {
                final byte[] data;
                int          sum;
    
                stream = new DataInputStream(new BufferedInputStream(new FileInputStream(file)));
                data   = new byte[(int)file.length()];
                stream.readFully(data);
    
                sum = 0;
    
                for(final byte b : data)
                {
                    sum += b;
                }
    
                return (sum);
            }
            catch(final IOException ex)
            {
                ex.printStackTrace();
            }
            finally
            {
                if(stream != null)
                {
                    try
                    {
                        stream.close();
                    }
                    catch(final IOException ex)
                    {
                        ex.printStackTrace();
                    }
                }
            }
    
            return (0);
        }
    }
    
    class DataInputReadInChunks
        extends Test
    {
        public static void main(final String[] argv)
        {
            final Test test;
    
            test = new DataInputReadInChunks();
            test.run(new File(argv[0]));
        }
    
        protected long readFile(final File file)
        {
            DataInputStream stream;
    
            stream = null;
    
            try
            {
                final byte[] data;
                int          size;
                final int    fileSize;
                int          sum;
    
                stream   = new DataInputStream(new BufferedInputStream(new FileInputStream(file)));
                fileSize = (int)file.length();
                data     = new byte[512];
                size     = 0;
                sum      = 0;
    
                do
                {
                    size += stream.read(data);
    
                    sum = 0;
    
                    for(int i = 0; i < size; i++)
                    {
                        sum += data[i];
                    }
                }
                while(size != fileSize);
    
                return (sum);
            }
            catch(final IOException ex)
            {
                ex.printStackTrace();
            }
            finally
            {
                if(stream != null)
                {
                    try
                    {
                        stream.close();
                    }
                    catch(final IOException ex)
                    {
                        ex.printStackTrace();
                    }
                }
            }
    
            return (0);
        }
    }
    class MemoryMapped
        extends Test
    {
        public static void main(final String[] argv)
        {
            final Test test;
    
            test = new MemoryMapped();
            test.run(new File(argv[0]));
        }
    
        protected long readFile(final File file)
        {
            FileInputStream stream;
    
            stream = null;
    
            try
            {
                final FileChannel      channel;
                final MappedByteBuffer buffer;
                final int              fileSize;
                int                    sum;
    
                stream   = new FileInputStream(file);
                channel  = stream.getChannel();
                buffer   = channel.map(MapMode.READ_ONLY, 0, file.length());
                fileSize = (int)file.length();
                sum      = 0;
    
                for(int i = 0; i < fileSize; i++)
                {
                    sum += buffer.get();
                }
    
                return (sum);
            }
            catch(final IOException ex)
            {
                ex.printStackTrace();
            }
            finally
            {
                if(stream != null)
                {
                    try
                    {
                        stream.close();
                    }
                    catch(final IOException ex)
                    {
                        ex.printStackTrace();
                    }
                }
            }
    
            return (0);
        }
    }
    
    Hosam Aly : Interesting! Do you have a benchmark or comparison between both approaches?
    TofuBeer : I had some code (long gone) that parsed every class file in rt.jar (6000+). Using FileInputStream (wrapped with a BufferedIputStream) it took 30 seconds, with a memorry mapped file it too 4. Other than the way the bytes were read there was no difference in the code.
    TofuBeer : I did extract all of the files from the JAR to the file system before doing it.
    eaolson : This will read bytes and not necessarily character data, though, correct?
    TofuBeer : You can make use of CharBuffer via the ByteBuffer.asCharBuffer. Also the speed will be very OS dependant - nio integrates tightly to the underlying OS (updating my answer)
    Aaron Digulla : @Tofu: Care to explain why using a big char buffer sounds like the worst way? If allocates memory only twice (once for char array and once to copy it into a String). Can't get more cheap than that.
    Aaron Digulla : @Tofu: Also, I use a single command to read the whole file, so only one IO request. Your approach uses a lot of objects, MMU table changes, etc. I figure no matter the OS, that should be slower than a single file.read() call.
    TofuBeer : Reading the whole file into a byte[] was the slowest in my tests (by a large amount). Also you need to do repetitive reads to get the whole array (read returns an int of how many were read, it may be less than the length of the array).
    TofuBeer : updated with code for people to test with
    Hosam Aly : Great answer @TofuBeer! +1 from me.
    kohlerm : as you said. Memory mapping might take long on some OS's. So for small files it's probaly not a good idea
  • I'll not touch the points already explained by previous posters. But I use to do something more if load speed is really critical.

    This operation is essentially limited by I/O throughput. You may eventually load using 2 or more threads. This would certainly cut the load time in half (aprox.). So, for example, if you use 2 threads, assign 25 files to each thread.

    Your mileage may vary but in some cases this may give an incredible boost.

  • Any conventional approach is going to be limited in speed. I'm not sure you'll see much of a difference from one approach to the next.

    I would concentrate on business tricks that could make the entire operation faster.

    For instance, if you read all the files and stored them in a single file with the timestamps from each of your original file, then you could check to see if any of the files have changed without actually opening them. (a simple cache, in other words).

    If your problem was getting a GUI up quickly, you might find a way to open the files in a background thread after your first screen was displayed.

    The OS can be pretty good with files, if this is part of a batch process (no user I/O), you could start with a batch file that appends all the files into one big one before launching java, using something like this:

    echo "file1" > file.all
    type "file1" >> file.all
    echo "file2" >> file.all
    type "file2" >> file.all
    

    Then just open file.all (I'm not sure how much faster this will be, but it's probably the fastest approach for the conditions I just stated)

    I guess I'm just saying that more often than not, a solution to a speed issue often requires expanding your viewpoint a little and completely rethinking the solution using new parameters. Modifications of an existing algorithm usually only give minor speed enhancements at the cost of readability.

  • You should be able to read all the files in under a second using standard tools like Commons IO FileUtils.readFileToString(File)

    You can use writeStringToFile(File, String) to save the modified file as well.

    http://commons.apache.org/io/api-release/index.html?org/apache/commons/io/FileUtils.html

    BTW: 50 is not a large number of files. A typical PC can have 100K files or more.

  • After searching across google for for existing tests on IO speed in Java, I must say TofuBear's test case completely opened my eyes. You have to run his test on your own platform to see what is fastest for you.

    After running his test, and adding a few of my own (Credit to TofuBear for posting his original code), it appears you may get even more speed by using your own custom buffer vs. using the BufferedInputStream.

    To my dismay the NIO ByteBuffer did not perform well.

    NOTE: The static byte[] buffer shaved off a few ms, but the static ByteBuffers actualy increased the time to process! Is there anything wrong with the code??

    I added a few tests:

    1. ArrayTest_CustomBuffering (Read data directly into my own buffer)

    2. ArrayTest_CustomBuffering_StaticBuffer (Read Data into a static buffer that is created only once in the beginning)

    3. FileChannelArrayByteBuffer (use NIO ByteBuffer and wrapping your own byte[] array)

    4. FileChannelAllocateByteBuffer (use NIO ByteBuffer with .allocate)

    5. FileChannelAllocateByteBuffer_StaticBuffer (same as 4 but with a static buffer)

    6. FileChannelAllocateDirectByteBuffer (use NIO ByteBuffer with .allocateDirect)

    7. FileChannelAllocateDirectByteBuffer_StaticBuffer (same as 6 but with a static buffer)

    Here are my results:, using Windows Vista and jdk1.6.0_13 on the extracted rt.jar: ArrayTest
    time = 2075
    bytes = 2120336424
    ArrayTest
    time = 2044
    bytes = 2120336424
    ArrayTest_CustomBuffering
    time = 1903
    bytes = 2120336424
    ArrayTest_CustomBuffering_StaticBuffer
    time = 1872
    bytes = 2120336424
    DataInputByteAtATime
    time = 2668
    bytes = 2120336424
    DataInputReadFully
    time = 2028
    bytes = 2120336424
    MemoryMapped
    time = 2901
    bytes = 2120336424
    FileChannelArrayByteBuffer
    time = 2371
    bytes = 2120336424
    FileChannelAllocateByteBuffer
    time = 2356
    bytes = 2120336424
    FileChannelAllocateByteBuffer_StaticBuffer
    time = 2668
    bytes = 2120336424
    FileChannelAllocateDirectByteBuffer
    time = 2512
    bytes = 2120336424
    FileChannelAllocateDirectByteBuffer_StaticBuffer
    time = 2590
    bytes = 2120336424

    My hacked version of TofuBear's code:

    import java.io.BufferedInputStream;
    import java.io.DataInputStream;
    import java.io.File;
    import java.io.FileInputStream;
    import java.io.IOException;
    import java.io.InputStream;
    import java.nio.MappedByteBuffer;
    import java.nio.ByteBuffer;
    import java.nio.channels.FileChannel;
    import java.nio.channels.FileChannel.MapMode;
    import java.util.HashSet;
    import java.util.Set;
    public class Main { 
        public static void main(final String[] argv)     { 
            ArrayTest.mainx(argv);
            ArrayTest.mainx(argv);
         ArrayTest_CustomBuffering.mainx(argv);
         ArrayTest_CustomBuffering_StaticBuffer.mainx(argv);
            DataInputByteAtATime.mainx(argv);
            DataInputReadFully.mainx(argv);
            MemoryMapped.mainx(argv);
         FileChannelArrayByteBuffer.mainx(argv);
         FileChannelAllocateByteBuffer.mainx(argv);
         FileChannelAllocateByteBuffer_StaticBuffer.mainx(argv);
         FileChannelAllocateDirectByteBuffer.mainx(argv);
         FileChannelAllocateDirectByteBuffer_StaticBuffer.mainx(argv);
         } 
     } 
    abstract class Test { 
        static final int BUFF_SIZE = 20971520;
        static final byte[] StaticData = new byte[BUFF_SIZE];
        static final ByteBuffer StaticBuffer =ByteBuffer.allocate(BUFF_SIZE);
        static final ByteBuffer StaticDirectBuffer = ByteBuffer.allocateDirect(BUFF_SIZE);
        public final void run(final File root)     { 
            final Set<File> files;
            final long      size;
            final long      start;
            final long      end;
            final long      total;
            files = new HashSet<File>();
            getFiles(root, files);
            start = System.currentTimeMillis();
            size = readFiles(files);
            end = System.currentTimeMillis();
            total = end - start;
            System.out.println(getClass().getName());
            System.out.println("time  = " + total);
            System.out.println("bytes = " + size);
         } 
        private void getFiles(final File dir,final Set<File> files)     { 
            final File[] childeren;
            childeren = dir.listFiles();
            for(final File child : childeren)         { 
                if(child.isFile())             { 
                    files.add(child);
                 } 
                else             { 
                    getFiles(child, files);
                 } 
             } 
         } 
        private long readFiles(final Set<File> files)     { 
            long size;
            size = 0;
            for(final File file : files)         { 
                size += readFile(file);
             } 
            return (size);
         } 
        protected abstract long readFile(File file);
     } 
    class ArrayTest    extends Test { 
        public static void mainx(final String[] argv)     { 
            final Test test;
            test = new ArrayTest();
            test.run(new File(argv[0]));
         } 
        protected long readFile(final File file)     { 
            InputStream stream;
            stream = null;
            try         { 
                final byte[] data;
                int          soFar;
                int          sum;
                stream = new BufferedInputStream(new FileInputStream(file));
                data   = new byte[(int)file.length()];
                soFar  = 0;
                do             { 
                    soFar += stream.read(data, soFar, data.length - soFar);
                 } 
                while(soFar != data.length);
                sum = 0;
                for(final byte b : data)             { 
                    sum += b;
                 } 
                return (sum);
             } 
            catch(final IOException ex)         { 
                ex.printStackTrace();
             } 
            finally         { 
                if(stream != null)             { 
                    try                 { 
                        stream.close();
                     } 
                    catch(final IOException ex)                 { 
                        ex.printStackTrace();
                     } 
                 } 
             } 
            return (0);
         } 
     } 
    
     class ArrayTest_CustomBuffering    extends Test { 
        public static void mainx(final String[] argv)     { 
            final Test test;
            test = new ArrayTest_CustomBuffering();
            test.run(new File(argv[0]));
         } 
        protected long readFile(final File file)     { 
            InputStream stream;
            stream = null;
            try         { 
                final byte[] data;
                int          soFar;
                int          sum;
                stream = new FileInputStream(file);
                data   = new byte[(int)file.length()];
                soFar  = 0;
                do             { 
                    soFar += stream.read(data, soFar, data.length - soFar);
                 } 
                while(soFar != data.length);
                sum = 0;
                for(final byte b : data)             { 
                    sum += b;
                 } 
                return (sum);
             } 
            catch(final IOException ex)         { 
                ex.printStackTrace();
             } 
            finally         { 
                if(stream != null)             { 
                    try                 { 
                        stream.close();
                     } 
                    catch(final IOException ex)                 { 
                        ex.printStackTrace();
                     } 
                 } 
             } 
            return (0);
         } 
     }
    
     class ArrayTest_CustomBuffering_StaticBuffer    extends Test { 
    
    
    
        public static void mainx(final String[] argv)     { 
            final Test test;
            test = new ArrayTest_CustomBuffering_StaticBuffer();
            test.run(new File(argv[0]));
         } 
        protected long readFile(final File file)     { 
            InputStream stream;
            stream = null;
            try         { 
                int          soFar;
                int          sum;
          final int  fileSize;
                stream = new FileInputStream(file);
          fileSize = (int)file.length();
                soFar  = 0;
                do             { 
                    soFar += stream.read(StaticData, soFar, fileSize - soFar);
                 } 
                while(soFar != fileSize);
                sum = 0;
                for(int i=0;i<fileSize;i++)             { 
                    sum += StaticData[i];
                 } 
                return (sum);
             } 
            catch(final IOException ex)         { 
                ex.printStackTrace();
             } 
            finally         { 
                if(stream != null)             { 
                    try                 { 
                        stream.close();
                     } 
                    catch(final IOException ex)                 { 
                        ex.printStackTrace();
                     } 
                 } 
             } 
            return (0);
         } 
     }
    
    class DataInputByteAtATime    extends Test { 
        public static void mainx(final String[] argv)     { 
            final Test test;
            test = new DataInputByteAtATime();
            test.run(new File(argv[0]));
         } 
        protected long readFile(final File file)     { 
            DataInputStream stream;
            stream = null;
            try         { 
                final int fileSize;
                int       sum;
                stream   = new DataInputStream(new BufferedInputStream(new FileInputStream(file)));
                fileSize = (int)file.length();
                sum      = 0;
                for(int i = 0; i < fileSize; i++)             { 
                    sum += stream.readByte();
                 } 
                return (sum);
             } 
            catch(final IOException ex)         { 
                ex.printStackTrace();
             } 
            finally         { 
                if(stream != null)             { 
                    try                 { 
                        stream.close();
                     } 
                    catch(final IOException ex)                 { 
                        ex.printStackTrace();
                     } 
                 } 
             } 
            return (0);
         } 
     } 
    class DataInputReadFully    extends Test { 
        public static void mainx(final String[] argv)     { 
            final Test test;
            test = new DataInputReadFully();
            test.run(new File(argv[0]));
         } 
        protected long readFile(final File file)     { 
            DataInputStream stream;
            stream = null;
            try         { 
                final byte[] data;
                int          sum;
                stream = new DataInputStream(new BufferedInputStream(new FileInputStream(file)));
                data   = new byte[(int)file.length()];
                stream.readFully(data);
                sum = 0;
                for(final byte b : data)             { 
                    sum += b;
                 } 
                return (sum);
             } 
            catch(final IOException ex)         { 
                ex.printStackTrace();
             } 
            finally         { 
                if(stream != null)             { 
                    try                 { 
                        stream.close();
                     } 
                    catch(final IOException ex)                 { 
                        ex.printStackTrace();
                     } 
                 } 
             } 
            return (0);
         } 
     } 
    class DataInputReadInChunks    extends Test { 
        public static void mainx(final String[] argv)     { 
            final Test test;
            test = new DataInputReadInChunks();
            test.run(new File(argv[0]));
         } 
        protected long readFile(final File file)     { 
            DataInputStream stream;
            stream = null;
            try         { 
                final byte[] data;
                int          size;
                final int    fileSize;
                int          sum;
                stream   = new DataInputStream(new BufferedInputStream(new FileInputStream(file)));
                fileSize = (int)file.length();
                data     = new byte[512];
                size     = 0;
                sum      = 0;
                do             { 
                    size += stream.read(data);
                    sum = 0;
                    for(int i = 0;
     i < size;
     i++)                 { 
                        sum += data[i];
                     } 
                 } 
                while(size != fileSize);
                return (sum);
             } 
            catch(final IOException ex)         { 
                ex.printStackTrace();
             } 
            finally         { 
                if(stream != null)             { 
                    try                 { 
                        stream.close();
                     } 
                    catch(final IOException ex)                 { 
                        ex.printStackTrace();
                     } 
                 } 
             } 
            return (0);
         } 
     } 
    class MemoryMapped    extends Test { 
        public static void mainx(final String[] argv)     { 
            final Test test;
            test = new MemoryMapped();
            test.run(new File(argv[0]));
         } 
        protected long readFile(final File file)     { 
            FileInputStream stream;
            stream = null;
            try         { 
                final FileChannel      channel;
                final MappedByteBuffer buffer;
                final int              fileSize;
                int                    sum;
                stream   = new FileInputStream(file);
                channel  = stream.getChannel();
                buffer   = channel.map(MapMode.READ_ONLY, 0, file.length());
                fileSize = (int)file.length();
                sum      = 0;
    
                for(int i = 0; i < fileSize; i++)             { 
                    sum += buffer.get();
                 } 
                return (sum);
             } 
            catch(final IOException ex)         { 
                ex.printStackTrace();
             } 
            finally         { 
                if(stream != null)             { 
                    try                 { 
                        stream.close();
                     } 
                    catch(final IOException ex)                 { 
                        ex.printStackTrace();
                     } 
                 } 
             } 
            return (0);
         } 
     } 
    
     class FileChannelArrayByteBuffer    extends Test { 
        public static void mainx(final String[] argv)     { 
            final Test test;
            test = new FileChannelArrayByteBuffer();
            test.run(new File(argv[0]));
         } 
        protected long readFile(final File file)     { 
            FileInputStream stream;
            stream = null;
            try         { 
          final byte[] data;
                final FileChannel      channel;
          final ByteBuffer     buffer;
          int        nRead=0;
          final int              fileSize;
                int                    sum;
          stream = new  FileInputStream(file);
                data   = new byte[(int)file.length()];
          buffer = ByteBuffer.wrap(data);
    
                channel  = stream.getChannel();
          fileSize = (int)file.length();
          nRead += channel.read(buffer);
    
          buffer.rewind();
                sum      = 0;
                for(int i = 0; i < fileSize; i++)             { 
                    sum += buffer.get();
                 } 
                return (sum);
             } 
            catch(final IOException ex)         { 
                ex.printStackTrace();
             } 
            finally         { 
                if(stream != null)             { 
                    try                 { 
                        stream.close();
                     } 
                    catch(final IOException ex)                 { 
                        ex.printStackTrace();
                     } 
                 } 
             } 
            return (0);
         } 
     } 
    
     class FileChannelAllocateByteBuffer    extends Test { 
        public static void mainx(final String[] argv)     { 
            final Test test;
            test = new FileChannelAllocateByteBuffer();
            test.run(new File(argv[0]));
         } 
        protected long readFile(final File file)     { 
            FileInputStream stream;
            stream = null;
            try         { 
          final byte[] data;
                final FileChannel      channel;
          final ByteBuffer     buffer;
          int        nRead=0;
          final int              fileSize;
                int                    sum;
          stream = new  FileInputStream(file);
                //data   = new byte[(int)file.length()];
          buffer = ByteBuffer.allocate((int)file.length());
    
                channel  = stream.getChannel();
          fileSize = (int)file.length();
          nRead += channel.read(buffer);
    
          buffer.rewind();
                sum      = 0;
                for(int i = 0; i < fileSize; i++)             { 
                    sum += buffer.get();
                 } 
                return (sum);
             } 
            catch(final IOException ex)         { 
                ex.printStackTrace();
             } 
            finally         { 
                if(stream != null)             { 
                    try                 { 
                        stream.close();
                     } 
                    catch(final IOException ex)                 { 
                        ex.printStackTrace();
                     } 
                 } 
             } 
            return (0);
         } 
     } 
    
     class FileChannelAllocateDirectByteBuffer    extends Test { 
        public static void mainx(final String[] argv)     { 
            final Test test;
            test = new FileChannelAllocateDirectByteBuffer();
            test.run(new File(argv[0]));
         } 
        protected long readFile(final File file)     { 
            FileInputStream stream;
            stream = null;
            try         { 
          final byte[] data;
                final FileChannel      channel;
          final ByteBuffer     buffer;
          int        nRead=0;
          final int              fileSize;
                int                    sum;
          stream = new  FileInputStream(file);
                //data   = new byte[(int)file.length()];
          buffer = ByteBuffer.allocateDirect((int)file.length());
    
                channel  = stream.getChannel();
          fileSize = (int)file.length();
          nRead += channel.read(buffer);
    
          buffer.rewind();
                sum      = 0;
                for(int i = 0; i < fileSize; i++)             { 
                    sum += buffer.get();
                 } 
                return (sum);
             } 
            catch(final IOException ex)         { 
                ex.printStackTrace();
             } 
            finally         { 
                if(stream != null)             { 
                    try                 { 
                        stream.close();
                     } 
                    catch(final IOException ex)                 { 
                        ex.printStackTrace();
                     } 
                 } 
             } 
            return (0);
         } 
     }
    
     class FileChannelAllocateByteBuffer_StaticBuffer    extends Test { 
        public static void mainx(final String[] argv)     { 
            final Test test;
            test = new FileChannelAllocateByteBuffer_StaticBuffer();
            test.run(new File(argv[0]));
         } 
        protected long readFile(final File file)     { 
            FileInputStream stream;
            stream = null;
            try         { 
          final byte[] data;
                final FileChannel      channel;
          int        nRead=0;
          final int              fileSize;
                int                    sum;
          stream = new  FileInputStream(file);
                //data   = new byte[(int)file.length()];
                StaticBuffer.clear();
          StaticBuffer.limit((int)file.length());
                channel  = stream.getChannel();
          fileSize = (int)file.length();
          nRead += channel.read(StaticBuffer);
    
          StaticBuffer.rewind();
                sum      = 0;
                for(int i = 0; i < fileSize; i++)             { 
                    sum += StaticBuffer.get();
                 } 
                return (sum);
             } 
            catch(final IOException ex)         { 
                ex.printStackTrace();
             } 
            finally         { 
                if(stream != null)             { 
                    try                 { 
                        stream.close();
                     } 
                    catch(final IOException ex)                 { 
                        ex.printStackTrace();
                     } 
                 } 
             } 
            return (0);
         } 
     }
    
     class FileChannelAllocateDirectByteBuffer_StaticBuffer    extends Test { 
        public static void mainx(final String[] argv)     { 
            final Test test;
            test = new FileChannelAllocateDirectByteBuffer_StaticBuffer();
            test.run(new File(argv[0]));
         } 
        protected long readFile(final File file)     { 
            FileInputStream stream;
            stream = null;
            try         { 
          final byte[] data;
                final FileChannel      channel;
          int        nRead=0;
          final int              fileSize;
                int                    sum;
          stream = new  FileInputStream(file);
                //data   = new byte[(int)file.length()];
                StaticDirectBuffer.clear();
          StaticDirectBuffer.limit((int)file.length());
                channel  = stream.getChannel();
          fileSize = (int)file.length();
          nRead += channel.read(StaticDirectBuffer);
    
          StaticDirectBuffer.rewind();
                sum      = 0;
                for(int i = 0; i < fileSize; i++)             { 
                    sum += StaticDirectBuffer.get();
                 } 
                return (sum);
             } 
            catch(final IOException ex)         { 
                ex.printStackTrace();
             } 
            finally         { 
                if(stream != null)             { 
                    try                 { 
                        stream.close();
                     } 
                    catch(final IOException ex)                 { 
                        ex.printStackTrace();
                     } 
                 } 
             } 
            return (0);
         } 
     }
    
    Jason S : the times are suspicious to me. I'm not sure I'd trust comparisons of times that are "only" a few seconds in length; the JVMs need time to start up. I'd use larger files or more iterations. As tofubeer pointed out, you also need to include a loop of at least one additional iteration prior to starting the "real" timing, to prime the disk cache and also to let the JVM's JIT do its fancy work.
    jkaufmann : I did keep the TofuBeer's "priming" copy as the first iteration in the test. The files were already cached from a previous run. I am re-running the test on an XP box and coming up with similar results. I am really perplexed by the speed decrease when reusing a static buffer vs. creating a new buffer for each file. Perhaps this is an optimization in the underlying JVM? Not sure.. I'd rather believe its an issue with the code written. I think I may troll through the underlying source in the JAVA API to see whats going on with the NIO channels vs the FileInputStreams. Thanks for the inout

Python sockets buffering

Let's say I want to read a line from a socket, using the standard socket module:

def read_line(s):
    ret = ''

    while True:
        c = s.recv(1)

        if c == '\n' or c == '':
            break
        else:
            ret += c

    return ret

What exactly happens in s.recv(1)? Will it issue a system call each time? I guess I should add some buffering, anyway:

For best match with hardware and network realities, the value of bufsize should be a relatively small power of 2, for example, 4096.

http://docs.python.org/library/socket.html#socket.socket.recv

But it doesn't seem easy to write efficient and thread-safe buffering. What if I use file.readline()?

# does this work well, is it efficiently buffered?
s.makefile().readline()
From stackoverflow
  • The recv() call is handled directly by calling the C library function.

    It will block waiting for the socket to have data. In reality it will just let the recv() system call block.

    file.readline() is an efficient buffered implementation. It is not threadsafe, because it presumes it's the only one reading the file. (For example by buffering upcoming input.)

    If you are using the file object, every time read() is called with a positive argument, the underlying code will recv() only the amount of data requested, unless it's already buffered.

    It would be buffered if:

    • you had called readline(), which reads a full buffer

    • the end of the line was before the end of the buffer

    Thus leaving data in the buffer. Otherwise the buffer is generally not overfilled.

    The goal of the question is not clear. if you need to see if data is available before reading, you can select() or set the socket to nonblocking mode with s.setblocking(False). Then, reads will return empty, rather than blocking, if there is no waiting data.

    Are you reading one file or socket with multiple threads? I would put a single worker on reading the socket and feeding received items into a queue for handling by other threads.

    Suggest consulting Python Socket Module source and C Source that makes the system calls.

    Bastien Léonard : I don't really know why I asked about thread-safety, I don't need it in my current project. In fact I want to rewrite a Java program in Python. In Java it's easy to get buffered reading, and I was wondering if Python's socket module provides the same buffering (in fact, I wonder why someone wouldn't want buffering and directly call system calls instead).
  • If you are concerned with performance and control the socket completely (you are not passing it into a library for example) then try implementing your own buffering in Python -- Python string.find and string.split and such can be amazingly fast.

    def linesplit(socket):
        # untested
        buffer = socket.read(4096) # thx!
        done = False
        while not done:
            if "\n" in buffer:
                (line, buffer) = buffer.split("\n", 1)
                yield line+"\n"
            else:
                more = socket.read(4048)
                if not more:
                    done = True
                else:
                    buffer = buffer+more
        if buffer:
            yield buffer
    

    If you expect the payload to consist of lines that are not too huge, that should run pretty fast, and avoid jumping through too many layers of function calls unnecessarily. I'd be interesting in knowing how this compares to file.readline() or using socket.recv(1).

    Christian Witts : 4048 is a rather odd buffer size. You should stick to powers of 2.
  • I am trying to achieve a similar thing in Python for S60 (symbian), running a .recv on a socket object without it freezing my UI. I have tried the setblocking(False) method to prevent this function from blocking the main app, however it has not worked for me. (tested on both the socket instance and the connection instance from socket.accept() )

    Have also attempted using select() to create a non-blocking function, however have also had no success with this method.

    My application does not require separate threads, however I just want to be able to access the UI while this function is running.

    Any help would be appreciated.. Sorry for crashing your question, but hopefully I can contribute some tests and results?

    Bastien Léonard : I think you would get more help if you posted a new question. About your problem, I guess threads are the way to go if had no success with select(). Or maybe you could find a library which manages that.

Ia there a way to get or calculate true north in cocoa touch?

Hello

I am would like to determine a direction moving x degrees clockwise starting on true north. Is there a way for me get or calculate true north based on a set of lat & long coordinates?

I am interested in implementing this cocoa touch. I am sure this is used in many of the applications already out there. Any comments, pointers, advice will be highly appreciated.

Thanks

ldj

From stackoverflow
  • Do you mean magnetic north?

    If so google for magnetic variation and you will have it.

    ldj : I mean true north.
  • if you get one set of coordinates, then another set say ten seconds later, you could work out which direction you had travelled in. (Not sure if this is what you wanted)

    You would have to work out the change in longitude and latitude, then use a bit of trigonometry, and simple do arctan(long/lat) (arc tan is the inverse function of tan) You'll have to avoid dividing by 0 when change in lat is 0. However, when the change in long or lat is 0 you know you have travelled directly north, east, south or west.

    Also, arctan in most APIs outputs in radians so you must times it by 180 and divide by pi to get degrees.

    Edit: Is true north not located at any latitude of 0? Although my first impression this was grid north, I think due to the curve of the earth and the fact lines of longitude are drawn parallel creates variation referred to as grid north.

  • Zekian is on the right path, though... It looks like you need your current long/lat, and the current magnetic poles long/lat, a bit of spherical geometry to figure out the magnetic derivation ("declination" is a word that pops up a lot), and then apply that derivation to your current magnetic north reading. And because the earth's magnetic field keeps moving around, you need to compensate for this too. Or atleast offer an update for the magnetic north's long/lat.

    These are links I found along the way, but my head started spinning like in the Exorcist before I could come up with anything clever

    http://www.nauticalissues.com/en/math1.html

    http://mathforum.org/library/drmath/view/60398.html

    http://mathforum.org/library/drmath/view/55417.html

    http://www.ngdc.noaa.gov/geomag/magfield.shtml

    http://en.wikipedia.org/wiki/Magnetic_declination

How do I use mod_perl2 and Apache Bucket Brigades?

I'm writing an application to do proxying and rewriting of webpages on the fly and am pretty settled on using mod_perl2 - there is an existing implementation using mod_perl (v1) that I'm working from. In mod_perl2, there's this idea of APR::Brigades and APR::Buckets which, from my vague understanding, are an efficient way to do the sort of filtering & rewriting that I want. I can't, however, find anything but the Perldoc pages for these modules, so I'm really quite unsure how to utilize them.

Can anyone explain mod_perl2 Bucket Brigades to me, point me to a tutorial, or even show me some open-source app that uses mod_perl2 that I could learn from?

From stackoverflow

Automating QA on Flex Application

I have a Flex Application that needs to be tested and our QA department is really adament on using some form of automated-testing tools like HP's QuickTest Pro (QTP). However, QTP requires that you write some custom code if you wish to automate some home-made components... Unfortunately, we have some 3rd-party components which we do not have the source code so we can't really the custom code without having the 3rd-party component's source code.

Is there any existing framework and/or tools that would allow me to automate testing without having to write custom code that could be used by a non-programmer (i.e. A QA guy which has no idea what a pointer is)

I've taken a quick-look at Flex-Monkey (A free open-source software) which seems to be a promising project, but it's still in it's infancy and I need something soon (i.e. Yesterday)

Any ideas?

From stackoverflow
  • Haven't tried any of them personally. Just some googling.

  • Unfortunately, Flex/Flash automation just isn't very strong right now. QTP has a monopoly on the only "official" solution. Adobe needs to do more here :(

    However, one automation tool is pretty interesting and completely sidesteps the traditional API mode of automation. Check out Eggplant, which uses graphical bitmaps to determine how/where/when to click on visual elements. This means your "scripts" are now text + bitmaps, but it also means it can test almost anything.

    For full disclosure, I'm one of the Selenium Remote Control founders and have done a lot of work with Selenium and Flash automation in the past.

  • I know this post is a bit late in the game (almost a month), but if you haven't done so, check out FlexMonkey. I'm currently investigating Flex automation at work, and this is the most promising Flex test suite I've come across.

    Note: Selenium Flex is only compatible with Selenium running on FireFox 2.x. It's not compatible with the latest beta which runs on FF 3. Because of this, I found it to be an inadequate solution.

  • Hi i am using selenium flex API integrated selenium RC.After launching the Flex application selenium fails to identify the fields inside the module box on the login page.

    Eror trace:

    com.thoughtworks.selenium.SeleniumException: ERROR: Error: The element 'logonId' was not found in the application

    Can you please provide a solution on this?

  • (coming in really late in the game)

    Another option now is Borland's SilkTest. They've recently added support for Flex and it seems to work pretty well.

    My company tried a couple other options, including RIATest; but ended up using SilkTest because our QA dept. was already trained on it.

  • I've just released a new version of the SeleniumFlex API, bringing it up to 0.2.5.

    This fixes release a lot of major issue and probably makes the API the best free alternative for Flex test automation. You can get it on sourceforge here: Selenium-Flex API

    EDIT Update:

    The project has moved to Google Code: http://code.google.com/p/sfapi/

  • AFAIK the following tools currently support Flex GUI automation (alphabetical order):

    1. QuickTest Professional
    2. Ranorex
    3. Rational Functional Tester
    4. RIATest
    5. Selenium
    6. SilkTest
    7. TestComplete

    ANY of the above will require you to "instrument" custom components to support full automation. However depending on how "custom" your components are you may find that you may not need to do anything, they will just be represented by their base classes. All built-in Flex components are already "instrumented".

    Disclosure: I am a RIATest team member and am naturally biased to believe this is the best tool if you need Flex-only automation. It is the only tool that is focused on providing the best possible automation for Flex.

  • Ranorex might be interesting for you. The RanorexLoader enables automation of your flash/flex application without including the automation lib directly into your application.

    And you can also capture/replay your UI tests, edit your actions and generate C#, VB.NET and IronPython code.

    Here you find an example using the loader: http://www.ranorex.com/support/user-guide-20/flash-flex-testing.html#c2305

  • Hi, I am trying to automate things on Flex apps using UI Automation framework. Most of the things are possible with it, like enumerating the fields on the page, clicking on buttons, but I am not able to interact with text fields. Is there any special sdk or C# classes that I need to use to achieve this? Is there any standard sdk that's built for development in C# or C/C++ ?

Web Charting, serverside or client side?

I'm trying to establish whether we are better off rendering charts serverside and pushing them to the browser as images or using a javascript charting library to draw the charts clientside.

I'm looking at the JFreeChart and Dojo Charting libraries, my target is an existing JSP site that runs in JBoss.

Does anyone have any experience with this who might be able to point out the merits/flaws of each.

At the moment I'm seeing Client side rendering as being a hell of a lot more efficient unless the dataset is Huge.

From stackoverflow
  • I would like to recommend a library I wrote: charts4j which is a server-side charting solution. Here are some features:

    • Supports nearly all features of the Google Chart API
    • Hides the ugly details of creating the URL parameters that are necessary for communicating with the Google Chart API
    • Can be incorporated into any Internet enabled Swing or web application environment (JSP/Servlet, GWT, Spring MVC, etc.)
    • 100% pure core Java solution. No need for special graphics libraries, etc.
    • Super-scalable & Lightweight. Only one 160Kb jar and an Internet connection required
    • Well documented
    • Best of all, it is FREE!

    Here are some testimonials. Also check out the FAQ.

    I have an example of incorporating this technology into a Spring MVC (using JSPs) app on my blog.

    Omar Kooheji : Looks interesting, the reason we've strayed away form the google api is that it (seemingly) requires a connection to the google and the internet, whish is bad becuase our applications often have to run in intranets with no outide connection.
    Julien Chastang : That's right. No Internet connection, no Google Charts. It solves some charting problems very well, but not all.
  • JFreeChart is very well established and has been about for many years. I've used it on previous projects and it's worked very well. It can be used from a rich client application or from a web application. It has example applications for both scenarios. If you're distributing your application it's also GPL licensed.

    The advantages for doing it server-side are that you can render the resulting chart as an image and not worry about cross browser compatibility. I've incorporate JFreeChart by rendering from a Servlet and from Struts, works very well.

    I can't speak for Dojo charting, as it's reasonably new.

  • The first deciding factor should be whether or not you need the charts to be accessible with JavaScript disabled. If you do or think you might, it rules out JavaScript completely.

  • I see a lot of valid points on either side, but one thing that I like about doing the charting client side, is the ability to do some interaction with the chart. Using the Dojo charting library, you have a variety of methods for chart interaction such as dojox.charting.action2d.Highlight and dojox.charting.action2d.Tooltip. You can also have you charts update dynamically without the need to refresh, and I can see some situations where that can be useful.

    Of course, this is all up to you, but I like charts I can interact with a whole lot better than images rendered from the server, and I think a lot of people agree with me on that one.

  • I would recommend determining your performance/provisioning needs and making the decision from there. If you are expecting a large number of clients, each requiring a large number of charts which may need to update periodically, offloading the processing onto the clients will likely be the better solution. As jesper mentioned, you would also be able to do more interaction directly with the charts on the client, rather than requiring callbacks to the server for more complex functionality.

    If the general use-model for your charts is simple (e.g. static charts being generated on the fly by the server, w/out needs for updating), and the number of clients is low, you might be fine using hardware to better improve performance. Server-side would probably be sufficient in this case.

    Scalability and performance can be hard to implement later down the road. If you have the potential to mitigate this from the beginning, you should do so, since current use models so often change as future users decide they need faster/better functionality.

  • If you go with javascript you may also want a server-side solution, in case javascript is turned off.

    If you use unobtrusive javascript then if javascript is turned on it can change the links for the server-side solution to use a javascript charting tool.

    I found it best to use php on the server-side as gdlib is nice, and my tomcat server would throw errors at times due to not being able to connect to an x11 server even if it was running in headless mode.

    There are many solutions possible, or you can do it yourself. Graphing isn't hard, it depends on what you need. For example, if you can go with using svg then that would allow animated charts, but needs a plugin for many browsers.

  • I'm also trying to figure out which to use. This page seems to have a good analysis of ads and disads:

    http://www.linux.com/archive/feature/124993

sqlite in vb.net, can't find table even though it exists

I'm using the following code (System.Data.SQLite within VB.net):

        Dim SQLconnect As New SQLite.SQLiteConnection()
        Dim SQLcommand As SQLiteCommand
        SQLconnect.ConnectionString = "Data Source=vault.db;"
        SQLconnect.Open()
        SQLcommand = SQLconnect.CreateCommand
        SQLcommand.CommandText = "INSERT INTO entries VALUES ('" & appinput.Text & "','" & userinput.Text & "','" & passinput.Text & "')"
        SQLcommand.ExecuteNonQuery()
        SQLcommand.Dispose()
        SQLconnect.Close()
        Me.Hide()

I get an error back that says it can't find the table "entries" I know the table exists because I can write to it via the command line through sqlite and through Autoit and can see it and edit it in the SQLite browser when I open the database. I don't understand why VB can't see it (I know it sees and opens the database file just fine). Any ideas?

From stackoverflow
  • Argh! There are 3 big issues in that code. Please update it like this to fix two of them:

    Using cn As New SQLite.SQLiteConnection("Data Source=vault.db;"), _
          cmd As New SQLiteCommand("INSERT INTO entries VALUES (@AppInput, @UserInput, @PassInput)", cn)
    
        cmd.Parameters.AddWithValue("@AppInput", appinput.Text);
        cmd.Parameters.AddWithValue("@UserInput", userinput.Text);
        cmd.Parameters.AddWithValue("@PassInput", passinput.Text);
    
        cn.Open()
        cmd.ExecuteNonQuery()
    End Using
    

    This will prevent sql injection by parameterizing your query instead of substituting values directly and prevent db locking issues by making sure your connection is disposed properly, even if an exception is thrown.

    The third problem is that you should NEVER store plain-text passwords in your database (or anywhere else for that matter). Go read up on how to hash values in .Net and hash and salt your password before storing or comparing it.

    Once you've done that re-test your code to see if you still get the same errors reported as before. We need to make sure this didn't solve the problem or introduce something new. Then we can start addressing the missing table issue, perhaps by checking your connection string.

    MaQleod : thanks, I'll try it, and I know about the plain text passwords, I will be adding encryption, I just wanted to make sure the program writes properly first, encryption comes second.
  • Most likely your problem is with relative paths (directories).

    sqlite will create a database file if it does not exist so you will never get a "db file not found message". The first indication of an incorrect path is "table missing".

    My personal experience is that although it goes against my programmers instinct is to alway use an absolute (fully qualified) path/file name for an sqlite database.

    If you put in the full file location like "/var/myapp/vault.db" you should be OK. If this is likly to move around store pick up the file name from a properties/config file -- 'config file not found' is much easier to deal with than "table not found".

    MaQleod : that solved the problem, but brings up another issue; without the use of a relative path, the program is then limited to a strict install path.
    Joel Coehoorn : No its not. You just have to be able to pick up the installed path at runtime and use it to compose the full connection string. .Net makes that pretty easy.