Showing posts with label Ubuntu. Show all posts
Showing posts with label Ubuntu. Show all posts

Tuesday, June 10, 2008

Consolidation

I'm sitting here this evening with a MacBook that has slowed to a crawl. It further delays my inevitable completion of a school project (fine with me), but I really just want to finish what I'm doing (not the school project) and go to bed.

It's not the MacBook. It's the parasite I installed on it.

For the past month, I've been juggling the Vista desktop I use at home for development, the Vista notebook I use for school, the Windows XP tablet that I have for my day job, and this wonderful MacBook that doesn't have many applications I actually use to produce things. It's cool to blog with it, chat, and play with the camera, but it's really just eye-candy. I can't do schoolwork with it (they require Office 2K7 documents), I can't find an FTP program for it, and I really can't figure out how to edit raw text - a very important feature I need to edit HTML and do programming.

To combat my two-computer dining room table, I installed Windows (the aforementioned parasite) on the MacBook using VMWare's VMFusion. It's a wonderful piece of software and it is very similar to Parallels, only cheaper ($40 vs. $80). I installed the trial of VMFusion, and an old copy of Windows XP. It has effectively slowed my MacBook to where it takes a full eight seconds to open a new tab in Firefox. It's working really hard right now on installing SP3, and I'm sure a slew of updates are in store after that is finished. It has Office 2007 and I shouldn't need much more to do everything that I need to do on this beautiful 13-inch MacBook.

The cool thing is that if I ever get really sick of the reduced speed, I can close the Virtual Machine and Windows goes away like a little troll in the closet. I feel powerful.

Okay, I know it's slow because I am only running 1GB RAM on this computer with two operating systems running. A fix (4GB) is on the way. After the updates and the memory upgrade I should have no problem. I might even install Ubuntu on another VM.

Must go now; I have to write a post to tell everyone that the Web Spider project is not dead - I'm just busy.

Friday, May 23, 2008

How to Write a Search Engine

It seems a bit strange using the world's best search engine to find out how to build your own. Google is my first resource in this project, though Google itself provides nothing but the idea. There is a paper at Stanford by Larry and Sergey, and that basically is the starting point. That is Google's only contribution so far aside from the many searches I will perform.

There are three main parts to the search engine: the crawler, which tirelessly captures data from the web, the database to hold everything, and the actual search engine - the queries that put the data together in a meaningful format for you.

I could write a search engine that actually crawls the web looking for my search criteria, but that is very VERY inefficient. Google (and many others) have solved this inefficiency by effectively downloading the Web (that's right - as much of it as they can) to their computers so it can search it much faster and have it available in one place. They've done a whole lot more to increase efficiency and effectiveness of searches, but downloading the web was the first thing they did. It turns out they needed a lot of computers.

I'm going to start with two. I have three desktops that no one wants to buy, and I am really tired of looking at them. I will probably need more if I get this index working soon, but there will be software considerations to make too. You can't fit the web on one computer, no matter how big. I will learn a lot.

I have always had an interest in distributed systems and cluster computing, so this will be fun. I have a lot to learn about distributed databases and algorithm analysis. But all that is later - I haven't even really finished thinking out the preliminaries yet. So one development/crawling machine, and one database machine. After I figure out how to crawl the web, I will begin work on performing searches. If this project holds my interest long enough, I might publish statistics at 49times.com, so keep looking. I will be posting here if I come up with anything worth publishing. I'm going to try to journal my progress and decisions without publishing code, but I realize that I very well could lose interest in this. If I get started, I will likely enjoy it and keep going, but no one can say. If you have some confidence that I will continue, you can subscribe to this blog and get the updates. Beware, though, that you'll get everything else I write too.