Dokan SSHFS

New job, unique challenges.

I just found a way to work on a remote codebase easier today. On Linux you can mount a remote filesystem using SSHFS.  Turns out that you can do roughly the same thing on Windows too with Dokan SSHFS.  You can map a remote filesystem to a drive letter and it will open an SSH tunnel and use SFTP behind the scenes to treat it like a mapped network drive.

Download the Dokan library, unzip, and run the installer.

Then do the same with Dokan SSHFS.

Run the DokanSSHFS executable and specify the remote host and username and password/key and pick your Windows drive letter.  Voila!

PHPT

PHPT is a test format designed for testing internal PHP functions.  But you can use it to test your own functions as well.

Essentially, the PHPT format consists of a few blocks of text marked by the following delimeters:

  • --TEST--
  • --FILE--
  • --EXPECT--

Here’s an example:

#IncrementTest.php
--TEST--
increment() function - basic test for increment()

--FILE--
<?php
   $integer = increment(1);
   $string = increment("one");

   var_dump($integer);
   var_dump($string);
?>
--EXPECT--
int(2)
string(3) "two"

PHPT scripts are executed by “run-tests.php”

You can execute it with Pear:

pear run-tests IncrementTest.phpt

Running 1 tests
FAIL increment() function - a basic test to see if it works[IncrementTest.phpt]
wrote log to "C:\dev\eclipse\workspaces\aaron\MyTestClass\t\run-tests.log"
TOTAL TIME: 00:03
0 PASSED TESTS
0 SKIPPED TESTS
1 FAILED TESTS:
IncrementTest.phpt

You can see why it failed in the .diff file (which is a diff of the .exp and .out files.)  All the generated files are:

IncrementTest.diff -- this is a diff of the expected and actual output
IncrementTest.exp -- this is the expected output
IncrementTest.log -- this is the same test output sent to stdout (see above)
IncrementTest.out -- this is the actual output
IncrementTest.php -- this is the --FILE-- section of your test 
IncrementTest.phpt -- this is your original test
run-tests.log -- this is the summary of your test results (# of tests passed/failed/skipped)

PHPT test stubs can be autogenerated http://qa.php.net/autogenerate.php

php generate-phpt.php -f increment


Here’s a good introductory tutorial for PHPT: http://qa.php.net/write-test.php

Here’s the detailed PHPT spec http://qa.php.net/phpt_details.php


	

POST – Plain Old SQL Tables

There was a movement in Java over the past several years to get away from complex inheritance and the implementation of byzantine interfaces specifically, the complexity of EJBs – where you had to implement at least 6 methods that you, personally, likely never cared about.

The idea was to separate the details of persistence, transactions, and other logic irrelevant to the business domain.  The movement called for POJOs, or Plain Old Java Objects.  Whether that succeeded or not is a matter for other debate.  But in a language that generally calls for you to implement clone, serializable, toString, guids, etc. I’ll let you decide.

Many developers have moved away from Java (or in the newer generation, never taken it up) in favor of more agile, dynamic languages such as Ruby and Python, which are still object oriented, but dynamically typed, interpretted, and have a more concise syntax.

Frameworks like Rails have led the way to making web development much easier and more enjoyable.  But unfortunately, the number of frameworks (most inspired, if not directly copied by Rails) has multiplied exceedingly.  Sadly, many of these frameworks require specific formatting of database tables.  Often, they dismiss the importance of SQL, and tend to consider data storage a mere “implementation detail”.   Some even make it difficult or impossible for other applications to access the data.

That completely misses the point of a database.  If you are confident that your application will be the only one using the data, by all means store it however you like.  Store it as serialized objects, in a key value store, in flat files, or cache as much as you can.  But the reason you probably chose to use a database as your “implementation detail”  is because you had some inkling that you’d like to share this data at some point.

Java started it probably.  ORMs like Hibernate wanted to write your SQL queries for you.  The type of people that wanted this either didn’t like typing “SELECT * FROM FOO” or just didnt’ know how.  Unfortunately, this type of person probably didn’t know anything else about databases, and so tended to want to keep everything in one big ugly table.  The true benefit of ORM wasn’t really in the SQL generation anyway.  Mapping tables to objects can be tedious but isn’t complex.  What really got them to catch on was the built in connection pooling, and then later, in recordset caching.  All of a sudden, you could keep your recordset in  memory as long as it was valid.  Or at least as long as you had spare memory.  And even when you needed a change, your database connection was already open.  Performance soared.  And even the hypothetical database agnostic application became possible.

But at the cost of the database.  In order to keep track of how “fresh” data was, you had to store it’s state somewhere.  And what better place than right there in the database itself!  And thus once clean (hypothetically) tables got polluted with all sorts of metadata that wasn’t what you wanted to store at all, it was merely an “implementation detail” of your ORM and caching mechanism.

Agile, dynamic, lightweight systems followed this lead.  Since they were truly lightweight frameworks in dynamic languages, they needed to store meta data even more than a complex statically typed language like Java.  And because they could only count on the least common denominator of compatibility between database, more metadata was needed to keep track of more abstractions that could be taken for granted in one database.

Now a table called Employees has more than your name, salary, and manager (and maybe a sequential id for primary key), it has things like “created_by” and “date_last_modified” and maybe application specific information like “is_active” or “version”

What I’m calling for are POSTs, or Plain Old SQL Tables.  Let’s get rid of as much application implementation cruft from our doman tables as possible.

I don’t think we can get rid of ids, but I do think we can get rid of access control, caching, and versioning info out of the main tables.  I realize that sometimes denormalization can be a good thing for performance, but it’s not time to optimize for performance.

I spent some time yesterday thinking about how I could move historical data out of a table, so that I can do a SELECT * FROM EMPLOYEES and get only the relevant information.

Maybe the answer is SELECT * isn’t the answer, but I think we can do better, to make our tables more readable, and find a way to improve performance later.

I came up with a generic CHANGES table that keeps track fo the last_modified for all tables that want it, and associated join tables such as EMPLOYEES_CHANGES that map to it.  There could then also be an EMPLOYEES_REVISIONS table that could store previous revisions if needed, which essentially mirrors EMPLOYEES, but has a REVISION_ID that can be used as a foreign key by CHANGES.

+------------+
| Employees  |
+------------+
| id         |
| name       |
| salary     |
| manager_id |
+------------+

+-------------------+
| employees_changes |
+-------------------+
| employee_id       |
| change_id         |
+-------------------+

+-------------+
| changes     |
+-------------+
| id          |
| timestamp   |
| editor_id   |
| revision_id |
+-------------+

+---------------------+
| employees_revisions |
+---------------------+
| id                  |
| employee_id         |
| name                |
| salary              |
| manager_id          |
+---------------------+

Of course, all change information could be stored in the REVISIONS table, but this is another pollution, and separating interesting change information from the bulk of the data is also a performance improvement.  The downside is the somewhat complex and costly join to get commonly used change information like “last_modified” is more complex.  Perhaps there is a way to improve database optimization for such joins, or the caching used by ORMs can be used to keep track of it.

select * from changes where timestamp =
  (select max(timestamp) from changes where id in
      (select change_id from employees_changes where employee_id = 1));

That’s not the best query, but it illustrates the process.

I’m not saying this is a good implementation, just that the concept of keeping your database tables clean is probably at least as worthwhile as keeping your objects clean.

Daemons at the Seattle PHP Meetup

Last night I attended the Seattle PHP Meetup at Office Nomads on Capitol Hill for the first time.  It took only a bit of work finding the place, and a bit more to find parking, but I got there a little late, just in time to be the last to introduce myself.

There was quite a turnout, maybe 15-20 people, and after introductions there was a round of announcements with people looking for work or with positions that need filled.

Then there was a presentation.  The group split at this point and one part went for a more introductory PHP discussion and  the rest of the group stayed for the presentation, which this week was on ‘daemons’ or long running processes.

A daemon usually runs on a server and many daemons are in fact servers.  Apache, for instance, typically runs as a daemon.  It runs in an infinite loop, typically listening for events to do some amount of processing (such as serving web requests).  What makes it a daemon is that is isn’t typically started or stopped as a user process.  It may startup at boot time or by cron, and runs until killed (hopefully on purpose).  It listens for signals from the system for this.  On Unix (and Linux of course) these are the SIGHUP,  SIGINT, SIGTERM, etc.  which (along with SIGKILL) are varying ways to tell the process to end.

SIGHUP notifies the process that the controller has “HUNG UP” and was traditionally used to tell a terminal that a modem connection had been closed.  It is not explicitly a “kill” command but is often used to let a process know it’s service is no longer needed.

SIGINT tells the process to “INTERRUPT”, or pause.  CTRL+C sends a SIGINT, for instance.

SIGTERM tells a process to terminate.  It can be “trapped” or handled by the process so it can do whatever cleanup is necessary before closing.

SIGKILL is also known as “SIGNAL 9 FROM OUTER SPACE” or “DIE DIE DIE!”  It’s what happens when you do kill -9, and while very necessary, operating systems are getting less respectful of users wanting (and having the authority) to kill their processes.  But that’s another discussion.

These signals are posix mechanisms, and so may not be supported on all systems equally (because it’s a standard, of course.)  Anyways, back to the presentation.

I didn’t get the presenter’s name, but I think he works at Big Fish Games where they apparently use PHP processes as daemons sometimes.   He presented a utility class called util_daemon designed to be included in a daemon to help with signal handling.  It also had PID file handling built in. It is available at  http://isnoop.net/pub/daemon.phps

(As a side note he mentioned something I didn’t know about which is that apparently the apache PHP module has a source formatter that will display syntax highlighed PHP files with the .phps extension.  That’s a handy tidbit.)

Essentially, if you’re writing a daemon, you’d include util_daemon.php and use it’s methods to start, stop, and handle signals.  For instance:

<?php
include 'util_daemon.php';

$processName = "mydaemon";
$timeoutSeconds = 30;

$util = new util_daemon($processName, $timeoutSeconds);
$running = $util->start();

while ($running) { #infinite loop
  #doSomething
  $running = $util->heartbeat();
}

#cleanup after myself
$util->stop();
?>

start() will return true if it is able to register the PID.  It will fail, for instance, if another process has a lock on the PID file.  You can have more than one of the same daemon running by specifying the number of optional $maxPeers argument to the constructor.   start() also registers listeners for the system signals using the pcntl_signal() built in PHP function and specifies that the  signal_handler() function be called when a signal is received.  Finally, it sends the first heartbeat()check and returns the result.

heartbeat()returns true or false based on whether it can read the PID file.  If it cannot obtain a lock on the file, it returns false, and there is something wrong.

When a signal is received  (the normal case) it sets the killFlag which tells our process that it is time to end.  Obviously, if you want to more complex handling, you would override signal_handler() and handle each signal separately.

stop()empties the PID file and unregisters the process.  Our daemon can now exit cleanly.

One question I have is whether it deletes the PID file, or what happens to it.

There was discussion about what daemons could be used for, and he mentioned it was typically used for batch processing.  He mentioned having it run via cron, and I was confused, because having something run via cron is a substitute for a daemon.  I think I eventually understood that there is a cron job that monitors whether the deamon is running and respawns it if necessary.

One example I gave of using a daemon would be for sending emails.  You could have your application log (to a database for instance) that user X placed an order and your daemon could periodically check the table (or get direct messages from the webserver) and send mails out of process.  This could improve response time by moving the email handling out of the request sequence, thereby not delaying the response.

It was brought up that this could also be used to throttle sending emails (to ensure you don’t get flagged for spam or exceed your allotted usage, for instance).  Another advantage would be that if your request process dies, or sending the email fails, you could try again later.  The generalization was that moving thing out of process that can be handled asychronously is a good thing.

I was actually surprised that PHP was used for daemons, and would have guessed that it’s garbage collection and memory management would not be up to snuff.  Of course there are a number of issues you need to worry about when working with daemons.

1. You may not have access to environment and request variables that are available when running as a web process

2. You need to be sure to cleanly handle things like open files, sockets, and database connections that you don’t always have to worry about with short running processes that die on completion.

3. You need to be aware of resource usage.  Not just sending emails, but you could really peg your database or file IO if you’re doing something perpetually instead of just on request.

SugarCRM Relationships 101

I’m going through the Relationships 101 tutorial for SugarCRM.

First Comment:

Why keep database metadata in code?  It’s better in SQL, but if you’re going to get away from the source, why not something better, like YAML?  Better still to discover and cache, but something like this:

table: cases_opportunities
    - fields:
        - name: id
          type: varchar
          len: 36
          ...

is easier to understand and more concise than SugarCRM’s metadata file:

$dictionary['cases_opportunities'] = array (
    'table' => 'cases_opportunities',
    'fields' => array (
        array('name' => 'id', 'type' => 'varchar', 'len' => '36'),
        ...

You could then use Spyc to slurp it up.  But of course, PHP’s shared nothing state and non-compilation would make that tricky.  You’d end up checking timestamps on files and that’s no way to run a real server.

Even more, the file is a standard format, and everything can be deduced from the two table names (and public key names, which should always be ‘id’ by convention.)

So a helper function that takes the two table names (‘cases’ and ‘opportunities’) could be used to generate all the needed metadata.  I realize that there might be a need to overwrite this, but hashes of hashes isn’t the solution.  PHP4 has enough OOP capability and reflection to be able to do this.

Freelance testing and marketing yourself


I posted on Craigslist this morning:

I’m a freelance tester. I specialize in test automation with open source tools:

Selenium, Watir, JUnit, TestNG, PHP, Ruby, Webrat, RSpec, Cukes, Hudson, Cruise Control, Ant, Maven

I believe in getting the job done. I can do all of the following:

* set up a web test environment (Linux, Apache, MySQL, Tomcat, JBoss, Passenger)
* install tools (Jira, Bugzilla, Confluence, xWiki, etc.)
* create test plans
* manage remote testers
* develop an automation framework
* implement continuous integration

Call me at 425-242-4304 or visit my website: http://one-shore.com/
-Aaron Evans

I haven’t had much luck with Craigslist in the past. Most listings under computer services offered are for either web design or PC services.

I’m curious what other freelancers do to market themselves, since this seems to be the biggest challenge to me.