Friday, May 6, 2011

C++ adding a carriage return at beggining of string when reading file

I have two questions:

1) Why is my code adding a carriage return at the beggining of the selected_line string?
2) Do you think the algorithm I'm using to return a random line from the file is good enough and won't cause any problems?

A sample file is:

line
number one
#
line number two

My code:

int main()
{
    srand(time(0));
    ifstream read("myfile.dat");
    string line;
    string selected_line;
    int nlines = 0;
    while(getline(read, line, '#')) {
        if((rand() % ++nlines) == 0)
            selected_line = line;
    }
    // this is adding a \n at the beggining of the string
    cout << selected_line << endl; 
}

Thanks in advance for your help.

EDIT: OK, what some of you suggested makes a lot of sense. The string is probably being read as "\nmystring". So I guess my question now is, how would i remove the first \n from the string?

From stackoverflow
  • Because you don't specify \n as a delimeter.

  • Your "random" selection is completely wrong. In fact, it will always select the first line: rand() % 1 is always 0.

    There is no way to uniformly select a random line without knowing the number of lines present.

    In addition, why are you using # as a delimiter? Getline, by default, gets a line (ending with \n).

  • The newlines can appear from the second line that you print. This is because, the getline function halts on seeing the # character and resumes the next time it is called from where it left of i.e. a character past the # which as per your input file is a newline. Read the C FAQ 13.16 on effectively using rand().

    One suggestion is to read the entire file in one go, store the lines in a vector and then output them as required.

    Pukku : Yep - when you have the lines in a vector, it will be easy to pick one at random.
  • Because # is your delimeter, the \n that exists right after that delimeter will be the beginning of your next line, thus making the \n be in front of your line.

  • 1) You're not adding a \n to selected_line. Instead, by specifying '#' you are simply not removing the extra \n characters in your file. Note that your file actually looks something like this:

    line\n number one\n #\n line number two\n <\pre>

    So line number two is actually "\nline number two\n".

    2) No. If you want to randomly select a line then you need to determine the number of lines in your file first.

    Naaff : To remove whitespace from an ifstream (before you call getline), you can do something like this: while(isspace(read.peek())) read.ignore();
  • What you probably want is something like this:

    std::vector<std::string> allParagraphs;
    std::string currentParagraph;
    
    while (std::getline(read, line)) {        
        if (line == "#") { // modify this condition, if needed
            // paragraph ended, store to vector
            allParagraphs.push_back(currentParagraph);
            currentParagraph = "";
        else {
            // paragraph continues...
            if (!currentParagraph.empty()) {
                currentParagraph += "\n";
            }
            currentParagraph += line;
        }          
    }
    
    // store the last paragraph, as well
    // (in case it was not terminated by #)
    if (!currentParagraph.empty()) {
        allParagraphs.push_back(currentParagraph);
    }
    
    // this is not extremely random, but will get you started
    size_t selectedIndex = rand() % allParagraphs.size();
    
    std::string selectedParagraph = allParagraphs[selectedIndex];
    

    For better randomness, you could opt for this instead:

    size_t selectedIndex 
        = rand() / (double) (RAND_MAX + 1) * allParagraphs.size();
    

    This is because the least significant bits returned by rand() tend to behave not so randomly at all.

    nmuntz : Excellent Solution! Thank you very very much! I have learned a lot from this solution that you have posted. Thanks again!
    Pukku : You are welcome. I hope it wasn't homework :)
  • You could use the substr method of the std::string class to remove the \n after you decide which line to use:

    if ( line.substr(0,1) == "\n" ) { line = line.substr(1); }
    

    As others have said, if you want to select the lines with uniform randomness, you'll need to read all the lines first and then select a line number. You could also use if (rand() % (++nlines+1)) which will select line 1 with 1/2 probability, line 2 with 1/2*1/3 probability, etc.

0 comments:

Post a Comment