This is the first time I've came across this. Just printed a list and each element seems to have a u in front of it i.e.
[u'hello', u'hi', u'hey']
What does it mean and why would a list have this in front of each element?
As I don't know how common this is, if you'd like to see how I came across it, I'll happily edit the post.
From stackoverflow
-
The
ujust means that the following string is a unicode string (as opposed to a plain ascii string). It has nothing to do with the list that happens to contain the (unicode) strings. -
I believe the u' prefix creates a unicode string instead of regular ascii
-
it's an indication of unicode string. similar to
r''for raw string.>>> type(u'abc') <type 'unicode'> >>> r'ab\c' 'ab\\c'day_trader : Ah, I thought r'' meant something to do with a regular expression?Samir Talwar : It's generally used for regular expressions so we can write things like `r'/[ \t]+/'` instead of `'/[ \\t]+/'` (note the double backslash - you don't have to escape things in raw strings unless you're escaping the closing quote).SilentGhost : it's often used in regex to avoid all the escaping backslashesday_trader : I see. If I iterate through a unicode listing and check if some string is 'in' the list, will that recognise the string? I'm currently checking each element to see if it matches a certain string and it keeps escaping everytime. Is this because it's Unicode?Mike Graham : r and u are a bit different. u indicates the type of the string, whereas r (or ru, if you want to use raw unicode literals) makes a normal str (or unicode, if u and r are both used) but that is parsed differently at compile time. `>>> repr(r'foo') "'foo'" >>> repr(u'foo') "u'foo'"` Notice how the r goes away (that's just a matter of what backslashes do) and the u does not (because it makes an object of different type.)SilentGhost : if your string is a unicode string that uses only ascii characters (as in your example) `in` operation would cast the strings implicitly and you'll get `True`: 'abc' in [u'abc'] results in `True`. If your unicode string uses characters outside of ascii charset, you naturally would get `False` in such test.
0 comments:
Post a Comment