Image processing en-masse…

From time to time, I recieve a CD-ROM filled with images from one client or another with the idea that I can just slap them up on the web.  Sometimes it is just that simple, and I love the clients who make it so.  But sometimes a little more work is needed.  One client I work with frequently has an extensive background in print marketing, and routinely sends me disks full of TIFF images.  It’s times like this when knowing the tool for the job can save you hours of work.

Today’s tool of choice is ImageMagick, the Swiss-Army-Knife of image processing tools.  If you have never used this tool, I recommend you download it and get familiar with it.  It makes short work of alot of the heavy lift tasks associated with ensuring that images are web-ready.  It is a simple one-liner for me to convert the directory of TIFF files to web ready jpegs:

mogrify -format jpg -path _forUpload *.tif

It takes all the *.tif files and converts them to jpg format, and outputs them into the given sub-directory (_forUpload), retaining the base filenames.  If I need to resize them to a maximum of 1280 pixels in width while retaining the aspect ratio, it’s just about the same:

mogrify -format jpg -path _forUpload -resize 1280 *.tif

Many more examples of how to use the various command line utilites to do anything from simple format changes to filtering and serious editing can be found on the website.

It’s a tool I’ve used in the past but, for some reason, since forgotten about.  Starting work this morning with the task of converting a couple of hundred files for upload to a client’s site sent me running to look for tools,  (I could have written a script to do it in Photoshop I guess), and seeing ImageMagick pop up in the google search was just one of those “Duh! I’m a moron!” moments.  I was happy to see the Windows version for x64 architecture as well, the install was about as simple as it gets, and within minutes I was happily crunching images.

I reccomend that all serious web workers keep this one in their toolbox.

Now if I could just find a tool to extract images and text from a folder full of PDF documents…any suggestions out there?

Why I will Never Buy HP again!

I’m sorry HP, but I will never again buy a product from you.  Ever!

Now I am sure that someone, somewhere will ask the obvious question… why?

Well I can understand a business needs to make a profit, but I am sick to death of your ridiculously short time to obsolescence for all of your hardware products.

I bought a laptop about 2 years go from you.  I needed one on short notice, and yours seemed a good ballence of price and performance.  Not to mention I have a fetish for blue LED indicators and yours had plenty.  I knew I would be upgrading to Vista and yours had a ‘Vista Ready’ sticker on it (remember when those were all the rage?) and a 64 bit Athlon processor in it.  So I thought ‘Outstanding, this will be perfect for work!’.  I bought it, and a couple of weeks later bought a copy of Vista Ultimate to install.  I whipped out that 64 bit version DVD and installed it.  No problems, it installed, and booted up.  No sound, but hey, I need to grab the drivers.  I check your support site, and no-dice.  Okay, its only been a couple of months, I can suffer with 32 bit for a month or two until the drivers are out.  That was two years ago, and even today, over 2 years after Vista hit the open market, there are still no 64 bit drivers for the audio components on that laptop.  I’m sorry, but that kinda tells me that it wasn’t Vista ready folks, sorry to burst your bubble.  The cooling fan began to make a horrendous noise recently, so I finally gave up on it, and purchased a laptop from one of your competitors, that came with a 64 bit processor AND a 64 bit OS that has all the drivers it needs.

About the same time I bought the laptop, I purchased one of your OfficeJet business printers.  I am a tele-commuting programmer and the $25 home printer was just not holding up well to some of my printing needs.  This printer served me well, it was attached for quite a while to a computer with XP on it, simply because thats what the desktop had on it.  I recently moved it to a wireless print server, and started to install drivers, and guess what I found?  Again, no drivers for Vista.  Why, after over 2 years, do you still not have a “full featured” printer driver for Vista available for my OfficeJet printer?  For that matter, Vista was out to companies like you to be able to develop and test drivers even before it was out to the general public.  So we’re talking at least 6 months longer.  It’s not like I bought the $29 printer from Walmart, I went out an purchased a business class printer, one that can stand up to the punishment I put it through yet you refuse to support it?  You’re still selling me ink cartridges at about $60 a pop for a full set.  So it’s not like you’ve told the world that particular model is dead.  Why can’t I get a driver set that works flawlessly with Vista, rather than having to deal with the cut down basic printing services that I get natively?  Its not like you haven’t had time to get the drivers put together.

Furthermore, I have a cheap ($40) photo printer I bought a while back as well, made by one of your competators, and guess what?  It has drivers, and 64 bit ones at that for Vista.  Seems like they want my business alot more than you do, they provide a lower cost, higher quality product, and they damn well stand behind it.

Now I’m sure that I am going to hear the “But you’re an early adopter…” refrain from the chorus.  Sorry folks, but after over 2 years….its not longer early adoption.  64 bit is here, and it’s here to stay.  Any company that refuses to provide support to the x64 crowd is going to go under, or at least it will never get my business again.

Sorry HP, despite your advertisements, there is nothing personal about your support of your customers.  But for me, your lack of support is personal, and I refuse to put up with it any more.  Good-bye.

Getting up to speed with SVN on Windows

I’ve been saying it for a couple of years now. “I’ve got to get all my projects under source control!” We all know it’s the right thing to do. The smart thing. But for some reason it never got done, or got partially done only to fall apart later because of the half measures taken. Well last night I finally decided to sit down and kick myself into doing it. Surprisingly, it was much easier than I feared.

I’ve had a WAMP stack on my laptop for quite some time, for all of my personal projects, and for work projects if I need to be offline from the office. My personal favorite is XAMPP, because of the ability to drop it on a USB stick if I need to and go.

I’ve also had SVN installed for quite some time. I use the package with the Apache 2.2 bindings located at Tigris.org (they also have an Apache 2.0 version available).

It was as simple as going into the file c:\xampp\apache\conf\httpd.conf and adding a couple of lines to get things up.

LoadModule dav_module modules/mod_dav.so
LoadModule dav_svn_module "C:\Program Files\svn\bin\mod_dav_svn.so"

<location /repos>
	DAV svn
	SVNParentPath C:\svn_repository
</location>

Basically this loads the WEB_DAV module and the SVN extensions to that. Then it defines the location /repos to be the root of my subversion repositories. Using the SVNParentPath directive tells SVN to look for a directory in C:\svn_repository that matches the repository name you are trying to access.

Example:
I build a repository Foo at C:\svn_repository\foo. I access it by visiting http://localhost/repos/foo

Once this is up and running I installed http://tortoisesvn.tigris.org/. This makes it simple to create repositories and manage working directories by integrating the SVN functions with the windows explorer context menu.

Finally, I use Eclipse to do my development, so I installed Subclipse to add SVN integration to Eclipse.

Now it came to the most tedious portion. I had SVN installed, the server configured, and the management tools in place. Now its time to import some projects.

Going to the C:\svn_repository directory I create a new folder {project-name} for the project I am going to import. I right click on the folder and select the TortoiseSVN sub-menu and click on Create Repository Here. I select Native File System and click on OK. Congrats, your repository is waiting. You can confirm this by visiting http://localhost/repos/{project-name} . You should see revision 0 and an empty directory.

I use the recommended repository structure from the book Version Control with Subversion. I create a temporary directory and within it I create a directory named for the repository. Within that directory I create the tags, branches and trunk directories and copy the project files into the trunk directory. I back out to the temporary directory, and right-click on the project directory. From the context menu I select the TortoiseSVN sub menu and click on Import. You will be asked what repository you would like to import to. Enter http://localhost/repos/{project-name} . Where project-name is the folder name you created the repository above. Enter an initial comment to describe what you are doing, and maybe what the project is. Click on OK and sit back and watch as your project is imported.

Congratulations, you are now under revision control!

PREG_REPLACE_EVAL

This afternoon, while trying to come up with a solution for a problem on a client’s site, I had one of those “AHA!” moments. I’ve been working with PCRE (Perl Compatible Regular Expressions) just about as long as I have been using PHP, but I never really looked into the docs until today. I discovered the ‘e’ expression modifier for use with preg_replace.

The problem is fairly simple. I cannot trust the users to input names and headlines with a consistent capitalization at times. So normally I trust the simple text handling methods in PHP and do something like:

echo uc_words(strtolower($name));

Now for most cases this will work like a charm, until you come across a hypenated last name, like:

Kathy Jones-Smith

You end up with:

Kathy Jones-smith

Not real pretty, and when Mrs. Jones-Smith sees her name on the site like that, she can get a little irate (Do you blame her?). So I started looking into ways to resolve this. I stumbled across the example on the preg_replace documentation page that shows the use of the ‘e’ modifier to capitalize all HTML tags on a page. The light popped on and I replaced my code above with:

echo preg_replace("/(^|\s|-)([a-z])/e","'\\1'.strtoupper('\\2')",$name);

To explain it, the expression looks for any lower case letter immediately following one of: the beginning of the string, a white-space or a hyphen. It captures both the preceding character and the lowercase letter into two numbered capture groups. It passes them into the replacement string and then eval()’s the string as PHP code, allowing the strtoupper() function to do its work.

Regular expressions for the win yet again.

Bad Information

I get really frustrated when I see bad information given out.  On any topic, it doesn’t really matter, if I know its false, I hate to hear it get perpetuated.  Today has been no different, except it has really torqued me because the subject is information security.

I read in another blog that the best way to prevent POST requests to your site from originating elsewhere is to review the value stored in $_SERVER['HTTP_REFERER'], and make sure that it matches the domain of the site itself before processing the POST request.  Yes folks, its that simple.  Or is it?

The last time I checked the PHP Language Documentation I recall it saying that this value was not to be trusted. In fact it still says just this, here is the relevant text from the PHP web-site:

The address of the page (if any) which referred the user agent to the current page. This is set by the user agent. Not all user agents will set this, and some provide the ability to modify HTTP_REFERER as a feature. In short, it cannot really be trusted.

Hmm… so it seems that this is a bad idea all around.  All I would need to do as an attacker would be to crank out a script using CURL, and set the CURLOPT_REFERER option to be my target’s web-site, and BANG, I’m golden, happily cranking away at POST requests to his contact book form and filling it with viagra spam.

Well, if this is a bad idea, what is a good way to prevent this?

Well, my first thought is to take advantage of PHP sessions to do this.

In the source form file, initiate a session, and store a secret value to the session data store.  In the processing script, check this value for validity, clear it out (to prevent simply hijacking the session to send repeated posts), and then process the input only if valid.  Further more, I would add abuse checking, to prevent repeated attempts at submitting the form.  A simple counter variable, again stored in the session data store, or an array of time-stamps, with a threshold check to prevent more than X submissions in Y seconds for a given session, before the IP address is added to a block list and denied access to the processing form logic at all.

Now this won’t prevent a determined attacker, nor will it help you if someone just decides to use their pet bot-net to flood the site with POST requests simply to create a Denial of Service situation.  But it should put a crimp in the operations of your basic comment spammer who simply wants to sell the world on the benefits of cheap, questionable-quality, pharmaceuticals.

Download the code discussed in this post!

Is there such a thing as too much?

Recent (okay not so recent) developments in the area of web development have led a lot of designers to hop on the latest and greatest and slam WAY too much 2.0 into their sites.  Sadly, I have to admit that I am guilty as well.  Some points to remember as you are AJAX-ing and flashing your site to death:

1)  It’s only a good thing if it’s useful to the user and his or her end experience.  If you are just adding flashy elements because you can, stop.  Go back to some solid HTML or dynamically generated content with you favorite coding language.

2)  Search engines are NOT, repeat NOT, going to be indexing your super cool AJAX and, unless you’ve coded it properly, flash elements.  Rather than try to explain the specifics around this, know that Google is your friend.  Spend some time researching.

That being said, there are times when adding some flash elements or some nice AJAX to a page are certainly called for but be careful that you pay attention to the two points above.  I personally love a good AJAX login, voting system, or modal box/light box element shoved in when called for.

Expect the unexpected

I have seen many examples lately of “newbie” help posts where the code given, though technically correct, does not suit a wide range of situations. The most recent example of this I found on DZone’s PHP Zone. I read this post, and couldn’t help but have to comment on the limited view that was embraced by the original poster. Yes, checking for port 443 use will indeed work to determine if the incoming request is SSL encrypted, but only provided your server is using the standard ports. I work with a situation where when we have a client site with an installed SSL certificate, we place our beta server on a non standard port with the same domain name as the live site. The allows us to ensure that there are no issues with the certificate while not having to purchase or bill our clients for an additional certificate. For this situation we use PHP’s built in support for detecting HTTPS communication.

if ($_SERVER['HTTPS'] == '' || $_SERVER['HTTPS'] == 'off') {
    // redirect here to the proper hostname, port number and page
    header("Location: https://{$_SERVER['HTTP_HOST']}:{$secure_port}
              {$_SERVER['REQUEST_URI']}");
    exit();
}

This code will support you in re-directing your non HTTPS communication to HTTPS when using non-standard ports, you will need to supply the $secure_port variable to ensure that redirection goes to the proper target.

Zend Framework

I’ve been working with the Zend Framework finally for the last week or so. Doing some tutorials to get the basics, and now starting to build a website based on it. I’m pretty impressed with what I have seen so far.  I plan on writing a series of posts around my experiences.

Welcome friends and foes alike

Recently, I stumbled across Oliver Steele’s site and found his link to The Programmer’s Food Pyramid. Looking it over, I recognized the importance of most of the items there. Reading code, and reading about code of course. Writing code, how obvious. Revising code, okay, I had always lumped that one into the reading code and writing code blocks, but I could see how it could be considered a separate activity. Then, up there at the top, the one that made me think for a minute.

Writing about code.

I had never thought about that one before. But in seeing it there, it makes complete sense. It’s something I should have realized much earlier. It’s something I have always done when reading code. When I find a particularly dense piece of code, something that is far from being intuitive, how did I work it out, and understand it? I would go through it, line-by-line, instruction-by-instruction, and write out what it was doing, in plain English. I was writing about code, while reading it. But it was never a consistent thing, a tool only reserved for special occasions.

This blog is my attempt to change all that. If writing about code once in a great while has helped me in the past, what about writing about it far more often? Once or twice a week? Find some piece of code and analyze it. Tear it down, put it back together, and explain how I think it could be improved in the process. I have to think, it can’t hurt.

Though the blog title contains PHP, it only one of the languages I plan on writing about here. I’m the type who is always trying something new, so I plan on using this to write about everything I try. So expect that you might see some Java, C, C++, C#, Groovy, Ruby, and who knows what else.

Maybe my musings here will eventually help me become a better programmer, but even better, it might help someone else become a better programmer as well. Feel free to comment, criticize and debate. It can’t hurt, and it might just help.

Musings From the World of PHP