Brian's Geek Blog: 2011

Friday, December 16, 2011

My Latest Odd Project

So since the USGS is releasing a bunch of their old scanned paper maps at their Historical Topographic Map Collection, I thought it would be interesting to take the GeoPDF's there and georeference them to GeoTIFF's to compare to modern maps. I've always had a thing about history, so thought this would be a fun side project to get my cartography back on.

Since QGIS doesn't yet import GeoPDF's, I'm first loading them into the GIMP (at around 300 dpi from the PDF), clipping the collar out, and saving them to a LZW-compressed TIFF. Then I'm importing the TIFFs into QGIS and using the Georeferencing plugin to mark the grid points on the map. I've read that the old maps were based on the Clarke 1866 ellipsoid, so in QGIS I'm setting the source projection to NAD27, which is based on Clarke 1866. Yes, I know that technically this isn't fully correct in the cartographic sense, but then again georeferencing old paper maps like this also won't exactly put out a highly accurate GIS product either :P I'm outputting them to WGS84 from the georeferencing plugin. Times on my older Dell E1705 are around 5 minutes doing a polynomial interpolation with Lanczos interpolation. Then again, I remember back in the mid to late 1990's when DRG production at the USGS on the old Data Generals would take a whole lot longer, so I wasn't going to complain :)

The output isn't so bad really. Here's a sample of the output draped on top of Yahoo Street Maps (Google Maps and reprojection on the fly doesn't seem to work so well in QGIS right now).

Setting the Fredericksburg 1889 map to 50 percent transparency in QGIS and zooming in to old town, you can see that the map fits to a modern map surprisingly well.

I'll probably play around with this some more and maybe upload the georeferenced maps to archive.org or something along those lines.

Thursday, August 18, 2011

Building OpenCV 2.3.1 on Ubuntu 11.04

Getting OpenCV 2.3.1 to compile on Ubuntu can be interesting. The first issue is tracking down all of the dependencies you need to get the different parts of it to work properly. There is more information on dependencies needed at the OpenCV wiki here and here. I found this page on a blog as well which will help to get a lot of the dependencies down.

For me specifically, I had a couple of problems on 11.04. Make sure you have the following packages installed to enable gstreamer and unicap support:

libgstreamer-plugins-base0.10-dev
libgstreamer0.10-dev
libunicap2
libunicap2-dev
libucil2-dev

The second major problem is that OpenCV 2.3.1 doesn't fully track the latest releases of ffmpeg. This patch helps, but bear in mind that it's for OpenCV 2.3.0 and there have been changes between versions. You'll have to install the patch by hand to take care of some of the differences.

Once this is done, you should be ready to build. On my system I ran cmake as:

cmake -D CMAKE_BUILD_TYPE=RELEASE -D CMAKE_INSTALL_PREFIX=/usr/local -D BUILD_PYTHON_SUPPORT=ON -D WITH_TBB=ON -D WITH_XINE=ON -D WITH_UNICAP=ON -D BUILD_EXAMPLES=ON -D BUILD_NEW_PYTHON_SUPPORT=ON -D INSTALL_PYTHON_EXAMPLES=ON -D BUILD_DOCUMENTATION=ON -D BUILD_SHARED_LIBS=ON ..

If you have all of the extra media repositories on Ubuntu enabled, I'd highly recommend NOT disabling shared libraries when building OpenCV. You'll avoid some linking errors due to concurrent versions of some of the multimedia libraries that might be installed.

After that, when you compile make sure you add in something like -I/usr/local/include and -L/usr/local/lib to your makefile to make sure you're pulling in the version you just compiled instead of the default and you should be good to go.

Parallel Programming: Understanding the impact of Critical Sections

Found this link earlier. More and more people need to learn how to properly do parallel programming, but we seem to be getting less and less graduates that understand how to do it.

Fun With Verizon

So we had an odd outage here at the Maddox house. Internet quit working. Phone quit working. FIOS boxes couldn't access the Internet either (tested On Demand, Widgets, etc). So I called Verizon tech support.

Me: Yeah, everything except for the television video stream has quit.
Tech: Have you changed anything on your router?
Me: Not for about a year now.
Tech: Let me run some tests and check. Oh I found the problem, your wireless on your router was turned off.
Me: Yeah, it's been off for about a year now. I'm using a Wireless N router.
Tech: I turned your router's Wireless back on. Did that fix things?
Me (thinking): WTF? The wireless has been off for a year and why would that affect the FIOS boxes and phones anyway.
Me: Nope. Maybe you should reset the equipment here?
Tech: Well, let's see if turning the wireless on fixed things.
Me (beating my forehead so hard I made it flat): Um, ok,

Quick VNCViewer Tip

Yes, yes I know this is insecure so shush :)

You might know that vncserver on UNIX boxes pretty much requires you to set a password for logins. If you're on a secure system or just don't really care, it's pretty easy to write a shell script to log into such server without prompting you for a password:

echo "yourpassword here" | vncviewer "your options here" -autopass host:whichever screen you used

Saturday, July 16, 2011

The Case of the Broken OpenGL Extensions

Like a good boy I did my apt-get update / apt-get upgrade the other day and thought everything was ok. It pulled down some x.org updates and a new nvidia-current (yes, running Ubuntu with some development ppa's). Upon rebooting I noticed compositing in kwin had stopped. I thought maybe the sequence of updates had messed something up because there was one xorg update that came in after nvidia-current.

So I ran glxinfo and log and behold I had no GL extensions whatsoever. Hrm, this is a problem I thought. I also ran nvidia-settings to see what it said and the OpenGL tab also said nothing was there. Did a quick web check and no one else was reporting problems with the version of the nvidia driver I had.

I then did a quick ldd `which glxinfo` to see what was up, thinking that one of the xorg updates might have sneaked in some mesa configs. Sure enough, glxinfo was pulling libGL.so from mesa instead of the nvidia binary. I looked in /etc/ld.so.conf.d and found a i386-linux-gnu_GL.conf file that indeed placed mesa in ldconfig's path higher than the nvidia one. A quick rm i386-linux-gnu_GL.conf and restarting of X fixed things.

The moral of the story is, if you're like me and have the xorg edgers ppa installed and have suddenly lost OpenGL acceleration, check your /etc/ld.so.conf.d dir.

Thursday, July 14, 2011

The most useful GCC options and extensions

Found this post on my RSS feed and thought I'd share. It's a listing of options for gcc that many people might not be familiar with.

Tuesday, July 5, 2011

Visual Basic / Microsoft Compilers

I came across this article on the Daily WTF and it reminded me of the many issues I have with Microsoft compilers. For example, back in the VB 6 days, I remember how variables in a structure would disappear if the field they referenced in a database was NULL. This made things oh so much fun.

I also remember how Visual C++ decided to screw with new so it wouldn't throw std::bad_alloc on allocation failure. I had a bunch of code where I had to put in things like foo = new(nothrow) all throughout my code to help it be cross platform. Or having two separate and incompatible iostreams implementations where neither was really feature complete but I needed to find a way to use both to get things to work.

Ahh the good old days :)

Sunday, July 3, 2011

Fun with Compilers 2: Optimizations

For my second post about compilers, I thought I'd show what happens with various optimization levels in GCC. Note I'm still focusing on gcc because I'm too lazy to fire up VirtualBox and Windows right now.

As a reminder from the first, here's my wonderfully useless sample program. All it does is go through a loop and does not use the output.

int main(int argc, char* argv[])
{
int i = 0;
int loop;

for (loop = 0; loop < 50; loop++)
++i;
return 0;
}

So last time they were purposely compiled without optimization as I just wanted a quick and dirty assembler output. Below we'll examine the assembly output at different optimization levels of just the main function, as that's the one we're more interested in (there are actually more sections in an ELF executable, but that's a story for another day).

The option flag -O1 turns on the first level of optimizations in gcc/g++. Instead of copying them here, I'll refer you to the GNU manpage for optimizations for gcc/g++.

gcc -O1 main.c -o main_c1

main():
55 push ebp
89 e5 mov ebp,esp
b8 32 00 00 00 mov eax,0x32
83 e8 01 sub eax,0x1
75 fb jne 804839c <main+0x8>
b8 00 00 00 00 mov eax,0x0
5d pop ebp
c3 ret
90 nop
90 nop
90 nop
90 nop
90 nop
90 nop
90 nop
90 nop

Compared to the previous post (no optimization), we see here that the assembly is better optimized. This version does not allocate space for the local variables i and loop,set them to zero, and increment loop and i at each pass. Instead, it loads 50 into the EAX register, subtracts 1, and checks if the zero flag is set. If not, it loops back to the subtraction and continues until it reaches zero. It then nukes it's local stack and returns while being padded out (again, I'll go over that another day). As we turn more optimization on, the compiler does more analysis and realizes that it doesn't need to increment the variables since we're not doing anything with them. For brevity, note that the assembly output from g++ is identical to the main function from gcc.

gcc -O2 main.c -o main_c1

At -O2, the compiler does even more analysis and has more "smarts" turned on. Let's look at the assembler output below.

main():
55 push ebp
31 c0 xor eax,eax
89 e5 mov ebp,esp
5d pop ebp
c3 ret
90 nop
90 nop
90 nop
90 nop
90 nop
90 nop
90 nop
90 nop
90 nop

Here the compiler realized that nothing is done with the variables, and nothing was done with the output. So, the compiler doesn't even do the loop any more. It still pushes the stack frame pointer. It moves 0 into EAX (the xor eax,eax generates shorter op codes and is a touch faster than actually pushing zero there). It still sets up the stack frame (main is a function after all) and then brings back the previous frame and returns. Again, with optimization the g++ assembly matches the gcc output for main(). gcc/g++ -O3 generates the same output as -O2.

You might ask yourself "Why do the compilers even set up the stack frame for the main function?" The answer is, they have to. main() HAS to exist in a C or C++ program, even if it really doesn't do anything.

So have fun, and poke around your programs to see what all gets generated from the compiler. You'll see that there's a lot more in your program than you realized. Plus, the Art of Assembly Language is now available in a Linux version if you want a decent reference to learn assembler.

Monday, June 27, 2011

Random Number Generators

This is an article after my own heart. Due to my modeling and cryptography work, I've spent a lot of time studying random number generators. The biggest problem with RNG's is there are two camps: the academic ones that freak out if the initial source of entropy isn't 100% entirely random and well thought out and those that are more practical about the issue and focus more on the algorithm being correct and performing well. You can go browse the Linux Kernel Mailing list to see this in action.

Sunday, June 26, 2011

Parallel Memory Subtley

Found this interesting article in Dr. Dobbs about how memory hardware can impact parallel algorithms.

Sunday, June 19, 2011

First Post: Fun with Compilers

For the first post on my new blog, I thought I'd be random and show what happens when you take the same small piece of code and compile it with both gcc and g++. You might be surprised that, while identical for the most part, there are some small differences in the assembly code output when targeting for C and C++.

The test program was the same for both compilers, except I saved it as both main.c and main.cpp. Note that I could have kept the same file but I was lazy and didn't feel like forcing the compiler to output pure C code (gcc will compile for C++ when it sees a .cpp extension).

int main(int argc, char* argv[])
{
int i = 0;
int loop;

for (loop = 0; loop < 50; loop++)
++i;

return 0;
}

A quick gcc main.c -o main_c and g++ main.cpp -o main_cpp took care of the output. After that, I used objdump -DCslx -M intel to fully disassemble the programs.

Ignoring the differences (for now) in the ELF output, we examine the main functions in assembler.

Output using gcc:

8048394 <main>:
main():
8048394: 55 push ebp
8048395: 89 e5 mov ebp,esp
8048397: 83 ec 10 sub esp,0x10
804839a: c7 45 fc 00 00 00 00 mov DWORD PTR [ebp-0x4],0x0
80483a1: c7 45 f8 00 00 00 00 mov DWORD PTR [ebp-0x8],0x0
80483a8: eb 08 jmp 80483b2 <main+0x1e>
80483aa: 83 45 fc 01 add DWORD PTR [ebp-0x4],0x1
80483ae: 83 45 f8 01 add DWORD PTR [ebp-0x8],0x1
80483b2: 83 7d f8 31 cmp DWORD PTR [ebp-0x8],0x31
80483b6: 7e f2 jle 80483aa <main+0x16>
80483b8: b8 00 00 00 00 mov eax,0x0
80483bd: c9 leave
80483be: c3 ret
80483bf: 90 nop

Output using g++:
080483f4 <main>:

main():
80483f4: 55 push ebp
80483f5: 89 e5 mov ebp,esp
80483f7: 83 ec 10 sub esp,0x10
80483fa: c7 45 fc 00 00 00 00 mov DWORD PTR [ebp-0x4],0x0
8048401: c7 45 f8 00 00 00 00 mov DWORD PTR [ebp-0x8],0x0
8048408: eb 08 jmp 8048412 <main+0x1e>
804840a: 83 45 fc 01 add DWORD PTR [ebp-0x4],0x1
804840e: 83 45 f8 01 add DWORD PTR [ebp-0x8],0x1
8048412: 83 7d f8 31 cmp DWORD PTR [ebp-0x8],0x31
8048416: 0f 9e c0 setle al
8048419: 84 c0 test al,al
804841b: 75 ed jne 804840a <main+0x16>
804841d: b8 00 00 00 00 mov eax,0x0
8048422: c9 leave
8048423: c3 ret
8048424: 90 nop
8048425: 90 nop
8048426: 90 nop
8048427: 90 nop
8048428: 90 nop
8048429: 90 nop
804842a: 90 nop
804842b: 90 nop
804842c: 90 nop
804842d: 90 nop
804842e: 90 nop
804842f: 90 nop

The first part of main() from both C and C++ targeted code is the same. We'll go through it here to discuss what is happening for those who are more familiar with higher-level languages than Intel x86 assembly. These lines are pretty much standard for any subroutine in assembly language across different compilers. They set up a the new stack frame for the subroutine and save pointers from the old routine so it can be recovered upon exiting.

push ebp Push the base pointer to the stack
mov ebp,esp Base and top of stack pointer equal
sub esp,0x10 Allocate space for local variables in our new stack frame.

Now we get to the meat of the program. If you'll recall, I had two local variables in the program declared as:

int i = 0;
int loop;

In the assembler output, both the variables (referenced through their aliases on the stack) are initialized to zero.

mov DWORD PTR [ebp-0x4],0x0 - explicity set to zero
mov DWORD PTR [ebp-0x8],0x0 - gcc sets loop to zero implicity

We then jump to the comparison line where we test that loop (again referenced off the stack) is compared to the value 49 (0x31 in hex). This is where we see the differences in the output from targeting pure C vs. C++. I'll copy them below from both the C and C++ target since I've scrolled quite a bit from up top ;)

Loop in C target:

jmp 80483b2 <main+0x1e>
add DWORD PTR [ebp-0x4],0x1
add DWORD PTR [ebp-0x8],0x1
cmp DWORD PTR [ebp-0x8],0x31
jle 80483aa <main+0x16>

Loop in C++ target:

jmp 8048412 <main+0x1e>
add DWORD PTR [ebp-0x4],0x1
add DWORD PTR [ebp-0x8],0x1
cmp DWORD PTR [ebp-0x8],0x31
setle al
test al,al
jne 804840a <main+0x16>

Now for the most part they're identical except for how we do our test to break out of the loop. In the C version, gcc uses the jle (jump if less than or equal) instruction after the comparison (cmp) to go up and add one to i and loop. In the C++ version, we first use setle (Set if less than or equal) to set the al register to 1 if the comparison matched. We then test if al is true and then we use jne to move back up in the function to do the additions. Once loop is equal to 49, we then put 0 into the EAX register and then exit the routine.

Until I go into WHY there's this slight difference in output, for now let's just say that gcc and g++ use some similar and some different code paths when generating machine code. Have fun and Happy Father's Day everyone.