Python Regex: Matching Multiple Lines
Regular expressions in python are by default limited to matching one line at a time. But at times one needs to extract data out of a pattern that is spread over multiple lines.
Example:
Raw String:
<li>commercial software available for piracy on the same day it is released to the public.</li> <p class="attrib">by <a href="/users/11053/contributions/" rel="nofollow">
The text to be extracted is
commercial software available for piracy on the same day it is released to the public
and the minimum amount of text to be matched (because of other similar raw strings that we want to ignore)
<li>commercial software available for piracy on the same day it is released to the public. <p class="attrib">by <a href=
A regular expression that works here is:
re.compile(r"<li>(.*?)\.(?:\n|\r|\r\n?)<p class=\"attrib\">by(?:\n|\r|\r\n?) <a href=\".*", re.MULTILINE)
Notice these:
- (?:\n|\r|\r\n?) matches a new line character, depending on the platform it could be either \r\n or \n or \r (?: is just to make sure that this group of a ORs is not treated as a matching group
- re.MULTILINE flag has to be set for this to work
Rather simple stuff, but only after you know it 🙂
Google Wave Hacked??
This is what suddenly Google wave looked like on my Firefox today:
My first thought was that wave got hacked!
But after a little googling it turns out that its a message from wave itself..and a known issue
Now, I would like to have some informality in the way an application talks to the user but to have it sound like hacker speak is a stretching it too far :O
Getting NumPy (1.4.0) and SciPy (0.7.1) to work
A note on getting NumPy and SciPy to work together:
The latest versions of NumPy (1.4.0)Â and SciPy (0.7.1) don’t seem to like each other. A typical error
ValueError: numpy.dtype does not appear to be the correct type object
While compiling for your own system is always an option a simple solution is to work with NumPy – 1.3.0 (available at sourceforge) which seems to work fine with SciPy (0.7.1).
If you have a better solution, please post in the comments 🙂
Building MySQLdb for Python on Windows
MySQLdb is perhaps the most widely used interface between python and MySQL. Sadly for the python developers on Windows, there are no official pre-built binaries for MySQLdb for window. You are expected to build for your own system using the scripts provided.
As almost always is the case, this path is fraught with problems. I have documented below the problems I ran into while building MySQLdb myself.
XPCOM: Converting wstring to nsAString
While standard C++ uses the class std::wstring for wide character strings(wchar_t : 16 bytes) for Unicode representation an XPCOM equivalent for this is the nsAString type. As a result, when interfacing pure C++ code handling Unicode with XPCOM requires conversion between these two types. Here is how to do it:
include nsStringAPI.h
#include "nsStringAPI.h"
and use the function NS_StringSetData., e.g., to return a member variable called path from class MyClass do this
NS_IMETHODIMP MyClass::GetPath(nsAString & aPath) { NS_StringSetData(aPath, mPath.c_str()); return NS_OK; }
Writing A Wrapper Around printf : C++
Often one needs to write a wrapper around the printf function (e.g., in a custom logging module)
The prototype usually required is of the form
int Log(LogLevel aLogLevel, const char* aFormat, ...);
so that we can accept multiple arguments the same way as printf does and pass them on to printf once we have done our processing (e.g., filtering on basis of the log level).
Here is how to achieve it (using va_list)
#include <cstdarg> static int Log(LogLevel aLogLevel, const char* aFormat, ...) { va_list argptr; va_start(argptr, aFormat); int rv = OK; FILE* f = fopen("c:/xxx.txt", "a+"); if(!f) { Â rv = ERROR_FAILURE; } else { if(rv = vfprintf(f, aFormat, argptr) < 0){ rv = ERROR_FAILURE; } fclose(f); } va_end(argptr); return rv; }
Notice a few things:
- The “…” at the end of Log function parameter list: this allows passing in of a variable number of arguments
- The vprintf that replaces printf : When we accept variable number of arguments in a function, the only way (I know!) to access them is to use the macros defined in stdarg.h to convert them into a va_list (which is essentially typedef of char*). The catch however is that the printf function does not accept a list of variable arguments, it accepts variable number of arguments (reread it, if its confusing :)). Here comes to our rescue, vprintf which is similar to printf except for the fact that it accepts va_list as a parameter.
Changing Directory in Windows Batch File
Just noticed a small thing yesterday.
set curr_dir=%cd% chdir /D J:\ REM...do your stuff... chdir /D %curr_dir%
notice two important things here:
- The script changes the current working directory to whatever it requires but then switches it back again so that if the user runs the script from a command window he is does not find himself in a strange directory after the script is finished
- There is a /D switch that needs to be added to chdir so that it changes the drive if required. Without this chdir won’t work ( it had me baffled for quite some time 🙂 )
Forcing Digest Authentication in Web Applications
A simple way to password-protect a page in Apache web server is the use of htaccess. However, some times you don’t have enough privileges to place or modify a/the htaccess file, e.g., in Google App Engine. Here is a very simple way to achieve this on the application level (using GAE as example)
Recent Comments