PHP and Solaris: getcwd() Behavior November 10, 2007 1 Comment
The Solaris OS has implemented, for quite some time, a permission restriction with respect to when getcwd() will return a full/real path under certain conditions. This has at least been since SunOS 5.8 and continuing on into 5.10. Many functions within the PHP codebase relied upon a universally working getcwd() [C] call to expand paths and to find out where a script is being executed. In particular, Solaris does not assume that getcwd() is a privilege that should be granted to users in directories that don’t have ‘r’ (read) permission, even if it has ‘x’ (execute) permissions. For example, in Solaris, getcwd() will fail if a directory anywhere underneath your current path has ‘–x’.
pwd vs. getcwd()
The ‘pwd’ command under most shells cannot be used to reliably test for this behavior since it caches and keeps track of ‘cd’ operations. ‘pwd’ will let you see the name of your current location in Solaris, but it does not do this via a getcwd(). Instead, it is simply spitting back a path that it recorded and cached from when you have issued a ‘cd /directory’. Therefore, it is necessary to explicitly call getcwd() to test this behavior out. The following code can be used to test the operation of getcwd() in a directory:
#include <stdio.h> main() { char str[200]; if (getcwd(str, sizeof(str)) == 0) printf("getcwd failed!!!\n"); else printf("CWD = %s\n",str); }
Under Linux, getcwd() behaves normally with the restrictive permissions:
# uname -a Linux x.y.z 2.6.9-42.ELsmp #1 SMP Wed Jul 12 23:27:17 EDT 2006 i686 athlon i386 GNU/Linux # find ./localfs -ls 32001 8 d--x--x--x 3 nobody nobody 4096 Oct 1 13:05 ./localfs 32002 8 d--x--x--x 2 nobody nobody 4096 Oct 1 13:12 ./localfs/test123 32004 12 -rwsr-xr-x 1 nobody nobody 4878 Oct 1 13:06 ./localfs/test123/cwdtest # su nobody # cd /localfs/test123 # ./cwdtest CWD = /localfs/test123 # pwd /localfs/test123 #
Under Solaris, getcwd() does not work with the –x restrictive permissions:
# uname -a SunOS opteron 5.10 Generic_118855-14 i86pc i386 i86pc # find ./localfs -ls 449384 1 d--x--x--x 3 nobody nobody 512 Oct 1 13:13 ./localfs 449386 1 d--x--x--x 2 nobody nobody 512 Oct 1 13:13 ./localfs/test123 449388 7 -rwsr-xr-x 1 nobody nobody 6552 Oct 1 12:57 ./localfs/test123/cwdtest # su nobody # cd /localfs/test123 # ./cwdtest getcwd failed!!! # pwd /localfs/test123 #
Bug or “security” feature?
There are some who feel that this is a bug, not a feature. I tend to agree with it being termed a bug, especially since I cannot find a single other OS that implements this “feature”. Either way, it broke PHP under situations where site owners were attempting to implement tight security on websites with many different users who have their own directory for PHP content.
Changes to PHP 5.2.5 that deal with this
Prior to PHP 5.2.5, PHP would would fail for relative includes in these situations: [include("../dir/")]. Since PHP 5.2.5, we added proper handling for this Solaris quirk to the code base so that relative paths in include()/require() still work under these conditions. In addition, a patch was added to safe_mode that used to cause similar behavior when the getcwd() call failed.
[1245][root@opteron:/test/testa/test]$ ls -al
total 56150
d--x--x--x 2 root root 512 Nov 11 12:45 .
d--x--x--x 4 root root 512 Nov 11 12:42 ..
-rwxr-xr-x 1 root root 16719040 Nov 11 12:44 php-5.2.0
-rwxr-xr-x 1 root root 12007004 Nov 11 12:45 php-5.2.5
-rw-r--r-- 1 root root 35 Nov 11 12:43 test.php
[1245][root@opteron:/test/testa/test]$ cat ./test.php
<?php
include('../b/file.php');
?>
[1245][root@opteron:/test/testa/test]$ cat ../b/file.php
<?php
echo "I am a file in directory b.\n";
?>
[1245][root@opteron:/test/testa/test]$ su rob
[1246][rob@opteron:/test/testa/test]$ uname -a
SunOS opteron 5.10 Generic_118855-14 i86pc i386 i86pc
[1246][rob@opteron:/test/testa/test]$ ./php-5.2.0 ./test.php
Warning: include(../b/file.php): failed to open stream: No such file or directory in /test.php on line 2
Warning: include(): Failed opening '../b/file.php' for inclusion (include_path='.:/usr/local/lib/php') in /test.php on line 2
[1246][rob@opteron:/test/testa/test]$ ./php-5.2.5 ./test.php
I am a file in directory b.
[1246][rob@opteron:/test/testa/test]$NFS Mounts can hide the problem
The issue can initially be perplexing to pin down. I originally ran into this problem on a machine with an NFS mounted directory like: /a/b/NFS/webstuff/, where ‘NFS’ was an external FS. Looking at the directory path, all permissions looked ok (r-x), so I at first dismissed it as not being the culprit of the broken include() functionality. Upon looking into the problem, it was found that the local directory called ‘NFS’, before the mount happens had the restrictive permissions that triggered the quirky Solaris behavior (–x). This was not immediately evident since after the mount happens, the /NFS path is (to the eye) replaced with the inode of the remotely mounted directory, which was (r-x). None-the-less, the OS must walk this directory tree, including the /NFS directory inode on the local FS to get to the remotely mounted content.
Directory: /a/b/NFS/webstuff
inode(a) --> inode(b) --> inode(NFS) --> inode(NFS) --> inode(webstuff)
[local] [local] [local] [remote NFS] [remote NFS]
^
^
hidden after mountUpgrade your PHP!
So, long story short, I strongly encourage anyone using Solaris to upgrade to PHP 5.2.5 as you are likely to run into this situation if you have multiple users, NFS mounted web dirs, safe_mode=on, or another reason to implement strict directory permission in your installation.
Wapache – Win GUI Framework October 29, 2007 No Comments
Here is a project that deserves much more attention: http://wapache.sourceforge.net.
It is a modified Apache installation for Windows that links the power of apache (including PHP/Perl/etc. extensions) with the Windows GUI to create GUI windows applications simply and easily by way of custom apache Directives that link an Internet Explorer instance to the Windows GUI and to URLs in your code. It does not require a local TCP port and is very stable. Long story short, you can create Windows GUIs in your web based programming language of choice, including [File Edit Help] style menus, system tray support, right-click menus and lots of other features. Very cool. It is very well thought out and extremely powerful. I think this will eventually out-pace GTK libs and other attempts at GUI-ing PHP, Python and Perl.
Combine this with an installshield package or the like and you can develop and distribute full windows apps in just about any web language of your choice.
Here is a screenshot from the standard install:
SetGID and SetUID Shell Scripts October 22, 2007 No Comments
Under most UNIX implementations, shell scripts cannot be made to setuid. This is both a security precaution and hindrance. For example, let’s say that you have the need to allow your users to RSYNC content from one server (a development server), that they have an account on, to a production server that they don’t have direct access to. So here is a script called rolldata.sh that accomplishes just that:
#!/usr/bin/sh /opt/local/bin/rsync -avP -e "/bin/ssh -i /sshdir/rolluser-id_rsa" \ "/localdir/" "rolluser@securehost.domain.com:/remotedir/"
Now, this works, but it leaves a few glaring security problems:
1) The user needs to be able to READ the /sshdir/rolluser-id_rsa file in order for the ssh command to work without a password. If they can read this file, then they could just SSH right to the box and do lots of other stuff if they know how, which is not very hard.
2) Since ssh will only allow identity files to be -r——– in permission, every user who needs to be able to execute the above script must have their own rolluser-id_rsa file somewhere (and their own personal script) that can read by them and nobody else.
So, the obvious solution I thought would be to setgid the shell script, make it owned by root and make it only executable like:
---s--s--- 1 root cit 229 Oct 22 19:32 rolldata.sh -r-------- 1 root cit 883 Oct 22 18:59 rolluser-id_rsa
Therefore, only root would be able read the contents of the rolldata.sh script and the rolluser-id_rsa file, but any user in group cit could run it. The only problem is that most UNIX kernels (including Solaris) don’t allow you to setuid/gid a shell script!
However, this restriction obviously does not apply to compiled code, so this is what I came up with to get around this restriction. Create a C file called rolldata.c that looks like:
#include <unistd.h>; main () { setuid(0); int ret; ret = execl ("/mydir/rolldata.sh", "/mydir/rolldata.sh", "-1", (char *)0); }
This program simply runs a pre-defined script as the user root, by way of the setuid(0); call. After compiling this (gcc ./rolldata.c), call this file rolldata and set the permissions with setgid like:
---s--s--- 1 root cit 6656 Oct 22 20:05 rolldata ---x------ 1 root cit 229 Oct 22 19:32 rolldata.sh -r-------- 1 root cit 883 Oct 22 18:59 rolluser-id_rsa
Now, any user in group cit can now execute the rolldata.sh script by way of running rolldata, but cannot read the script or the ssh identity key!
[user@computer mydir]$ ./rolldata building file list ... 109 files to consider wrote 2998 bytes read 20 bytes 6036.00 bytes/sec total size is 2532169264 speedup is 839022.29 [user@computer mydir]$ cat ./rolluser-id_rsa cat: cannot open ./rolluser-id_rsa
Multipart Mime Headers in PHP September 16, 2007 2 Comments
Here is a snippet of code that shows how multi-part mime headers can be assembled without the help of a mailer class.
<? $mime_boundary=md5(time()); $headers .= 'From: My Email <me@company.com>' . "\n"; $headers .= 'MIME-Version: 1.0'. "\n"; $headers .= "Content-Type: multipart/mixed; boundary=\"".$mime_boundary."\"". "\n"; $msg .= "--".$mime_boundary. "\n"; $msg .= "Content-Type: text/plain; charset=iso-8859-1". "\n"; $msg .= "Content-Transfer-Encoding: 7bit". "\n\n"; $msg .= $textmessage . "\n\n"; $msg .= "--".$mime_boundary. "\n"; $msg .= "Content-Type: application/pdf; name=\"".$filename."\"". "\n"; $msg .= "Content-Transfer-Encoding: base64". "\n"; $msg .= "Content-Disposition: attachment; filename=\"".$filename."\"". "\n\n"; $msg .= chunk_split(base64_encode($doc)) . "\n\n"; $msg .= "--".$mime_boundary."--". "\n\n"; ?>
Parallelizing RSYNC Processes August 18, 2007 4 Comments
Rsync: receiving file list…
If you’ve ever used RSYNC, you are probably a dedicated fan. It’s a fast, stable, mature, easy-to-use black-box tool that you can use to exactly backup/mirror directories between machines over the network. The attraction of RSYNC is that it only transfers what it has to in order to synchronize file sets. It does lots more, but using RSYNC as a backup/mirroring tool seems to be among the most popular applications.
However, synchronizing huge file systems across networks can cause some issues. It will work every time, but the speed in which it completes it’s initial “receiving file list…” operation can sometimes take an entire day to complete if you have a file system with 3.2 million small files on either side like one that I work with.
The RSYNC daemon/client negotiates a list of files to “deal with” when you first start the operation. This initial negotiation runs at a constant pace, in my case around 10,000 files/minute. Not at all bad, but when considering 3,200,000 files, this equates to 5.3 hours. Then if you use the –delete function, this is doubled since for some reason it does the same operation on your local receiving side after the remote file list is gathered.
Parallelizing RSYNC Processes
The good news is that RSYNC is very good at running parallel instances. With a little planning, you can reduce the amount of time it takes for RSYNC to gather it’s file list. The trick it to have multiple process each get a section of the directory tree that you are trying to backup. These multiple processes all run at the same time, with little impact on each other. For example, if you have a filesystem with 4,000,000 files, it would take around 6.6 hours to complete. However, if you go through the contents of your directory tree and pick out some criteria to split that into 2 x 2,000,000 file operations, and ran a seperate RSYNC process for each, the amount of time it takes until the RSYNC process actually start transferring files would be reduced in half to around 3.3 hours.
This does increase the amount of CPU and I/O that both your sending and receiving side use, but I’ve been able to run ~25 parallel instances without remotely degrading the rest of the system or slowing down the other RSYNC instances.
The key is to use the –include and –exclude command line switches to create selection criteria.
Example
drwxr-xr-x 2 root root 179 Jul 19 16:22 directory_a drwxr-xr-x 2 root root 179 Aug 12 00:08 directory_b
If directory_a has 2,000,000 files underneath it. and directory_b also has 2,000,000 files, use the following idea to split them up. The –exclude option says in essence to “exclude everything that is not explicitly included”.
#!/bin/bash rsync -av --include="/directory_a*" --exclude="/*" --progress remote::/ /localdir/ > /tmp/myoutputa.log & rsync -av --include="/directory_b*" --exclude="/*" --progress remote::/ /localdir/ > /tmp/myoutputb.log &
The following will take about twice the amount of time gathering files than the above:
#!/bin/bash rsync -av --progress remote::/ /localdir/ > /tmp/myoutput.log &
This is sure to save you a lot of time, especially if you have services quiesced and down while you are doing your rsync. You just need to be careful not leave anything out when you are explicitly including files, as this is rather easy to do.
At some point, I’d like to see the RSYNC daemon have an option to automatically split up the workload into manageable chunks at the expense of your CPU and I/O. However, I don’t know exactly how this would be accomplished since RSYNC would somehow need to know what the directories looked like before it started it’s process.
One option I could think of would be to add a feature to RSYNC to gather, save and use a “statistics file”. So, it could generate this statistics file on a remote directory tree, which would save general information like how many files are in each part of the tree. Then it could read this file to decide how to split up the file gathering tasks in order to optimizing time. This file could then be reused during subsequent operations. Of course this would only work if data was not constantly moving around between those directories and they stayed relatively proportional with respect to each other over time. This could also be dealt with entirely outside of the RSYNC process, perhaps in a script that would gather statistics and fork RSYNC processes based on what it finds.
Well, until then, the above method works great for me.
Oracle Tree Relationships August 17, 2007 1 Comment
Working with parent-child relationships between rows in tables can be a headache to deal with, especially if you need to display those relationships graphically. However, Oracle provides some great constructs that make this much easier to deal with. Here are some of the challenges that I ran into and the solutions I came up with:
1) Maintaining and managing integrity of the tree table. Solution: Oracle CONNECT BY, LEVEL and START WITH clauses 2) Enforcing business rules based on the relationships. Solution: Oracle functions and procedures 3) Displaying the parent-child relationships graphically. Solution: Recursive functions in ColdFusion and PHP
The stuff below is largely suited to the application that I worked on, the way it was solved may be useful to you if you have any table with a tree relationship and a need to show users that tree.
The Data
Here is the table structure I worked with. The ID field is the parent ID of the node and the field PARENT_ID is the reference to the parent.
Name Null? Type ----------------------------------------- -------- ---------------------------- ID NOT NULL NUMBER(10) OBJECT_ID NOT NULL NUMBER(10) ACCESS NOT NULL VARCHAR2(6) DELEGATE NOT NULL NUMBER(1) PARENT_ID NUMBER(10) ENABLED NOT NULL NUMBER(1) CREATED NOT NULL DATE CREATED_BY NOT NULL VARCHAR2(6) MODIFIED DATE MODIFIED_BY VARCHAR2(6) NOTE VARCHAR2(400)
The data in the table looks like the following. The ACCESS field is a system user and the delegate field is a row-specific attribute that determines in this case whether the user can delegate authority to other users for OBJECT_ID 1:
ACCESS ID OBJECT_ID PARENT_ID DELEGATE ------ ---------- ---------- ---------- ---------- zz1116 174 1 173 0 zz1117 175 1 173 1 zz1112 172 1 164 0 zz1113 173 1 164 1 ab5602 8 1 1 ak1520 87 1 8 1 zz1111 164 1 8 1 ab7096 165 1 8 1 aa7648 166 1 8 1 af0449 167 1 8 1
Maintaining Tree Integrity: Oracle CONNECT BY, START WITH and LEVEL
Oracle functions and procedures are an excellent way to accomplish this. By using a function, you can ensure that all data that is inserted into the tree table is done properly and with the business rules that you want to enforce. You can grant access to this function, and not to the table itself if you are working with several people who need access to the function. The key to dealing with tree relationships in oracle are the CONNECT BY and START WITH commands. Here is a select that uses these in my table:
SELECT ID, OBJECT_ID, ACCESS, DELEGATE, PARENT_ID, LEVEL, SYS_CONNECT_BY_PATH(access||':'||delegate, '/') AS PATH FROM YOUR_TABLE WHERE OBJECT_ID=1 CONNECT BY PARENT_ID = PRIOR ID START WITH LOWER(TRIM(ACCESS))='ab5602'
Notice that the PATH column shows the tree relationship between the different rows and the LEVEL column has the tree level that the user is at in the tree!
ID OBJECT_ID ACCESS DELEGATE PARENT_ID LEVEL PATH
----- ---------- ------ ---------- ---------- ---------- ----------------------------------------
8 1 ab5602 1 1 /ab5602:1
87 1 ak1520 1 8 2 /ab5602:1/ak1520:1
164 1 zz1111 1 8 2 /ab5602:1/zz1111:1
172 1 zz1112 0 164 3 /ab5602:1/zz1111:1/zz1112:0
173 1 zz1113 1 164 3 /ab5602:1/zz1111:1/zz1113:1
174 1 zz1116 0 173 4 /ab5602:1/zz1111:1/zz1113:1/zz1116:0
175 1 zz1117 1 173 4 /ab5602:1/zz1111:1/zz1113:1/zz1117:1
165 1 ab7096 1 8 2 /ab5602:1/ab7096:1
166 1 aa7648 1 8 2 /ab5602:1/aa7648:1
167 1 af0449 1 8 2 /ab5602:1/af0449:1Making Graphical Trees
Although the above information will really help while developing the procedures/functions that will be necessary to determine who is above and below a specific user in the tree when enforcing business rules, you will still need to write some recursive functions for your web page if you want to display this information graphically. Here is what the recursive functions I wrote provides. The hand function denotes that the user has DELEGATE=1:

Here is the script to generate the visual in tree in ColdFusion. The way I implemented it was to create a function that begins table structures and iterates throughout the tree by calling WHERE PARENT_ID=[this child]. The HTML tables do not get their endtags until the script reaches the end of the tree.
<cffunction name="GenerateTree" access="private" returntype="string" output="yes">
<cfargument name="object" type="numeric" required="true">
<cfargument name="startwith" type="string" required="true">
<cfargument name="parent_id" type="numeric" required="false">
<cfargument name="thislevel" type="string" required="false">
<cfargument name="thisiteration" type="string" required="false">
<cfif NOT IsDefined("thisiteration")>
<cfset thisiteration=1>
<cfset thislevel=1>
</cfif>
<cfquery name="DATreeCol#thisiteration##thislevel#" datasource="#application.m_datasource#">
select id,
object_id,
access,
delegate,
parent_id,
enabled,
created,
created_by,
modified,
modified_by,
note,
level,
SYS_CONNECT_BY_PATH(access||':'||delegate, '/') as Path
from your_table
WHERE
object_id = #object# AND
level = #thislevel# AND
<cfif thisiteration GT 1>
parent_id = #parent_id# AND
</cfif>
enabled=1
connect by parent_id = prior id
start with lower(trim(access))='#startwith#'
</cfquery>
<cfloop query="DATreeCol#thisiteration##thislevel#">
<table border=0>
<tr>
<td nowrap width=5 valign=center style='font-size: .75em;
width:10px;height:100%;border-top:1px solid ##AAAAAA;
border-bottom:1px solid ##AAAAAA;
border-left:1px solid ##AAAAAA;'>
</td><td>
<span class='fixeduid' style='cursor:pointer;
font-size:.75em;'
onclick="javascript:userinfo('#access#');">#access#
</span>
</td>
<td>
<cfif #delegate# EQ 1>
<img src='icons/hand.jpg'>
<cfelse>
<img src='icons/x.jpg'>
</cfif>
</td>
<td>
<cfset thislevel=#level# + 1>
<cfset thisiteration=thisiteration + 1>
<CFINVOKE object="#object#" thislevel="#thislevel#"
startwith="#startwith#"
thisiteration="#thisiteration#"
parent_id=#id# RETURNVARIABLE="foo"
METHOD="GenerateTree">
</CFINVOKE>
</td>
</tr>
</table>
</cfloop>
<cfreturn "test">
</cffunction>
<CFINVOKE object=#URL.object# startwith="#session.access#" METHOD="GenerateTree">
</CFINVOKE>Here is a similar script that produces the same output written in PHP:
<? function generateList_tree($obj) { global $oracle_server, $oracle_user, $oracle_password, $oracle_tns; $conn = ocilogon($oracle_user,$oracle_password,$oracle_tns); $query ="SELECT A.ID, A.OBJECT_ID, A.ACCESS, A.DELEGATE, A.PARENT_ID, B.OBJECT_NAME FROM MSP_SEC_DELEGATE A LEFT JOIN MSP_SEC_OBJECTS B ON A.OBJECT_ID = B.ID WHERE A.PARENT_ID is NULL AND A.OBJECT_ID=" . $obj . " AND A.ENABLED=1 ORDER BY A.ACCESS"; $statement = ociparse ($conn, $query); ociexecute ($statement); oci_fetch_all($statement,$res, 0, -1 ,OCI_ASSOC); for ($i=0;$i<sizeof($res[ID]);$i++) { $list[$i][ID] = $res[ID][$i]; $list[$i][ACCESS] = $res[ACCESS][$i]; $list[$i][DELEGATE] = $res[DELEGATE][$i]; $list[$i][OBJECT_NAME] = $res[OBJECT_NAME][$i]; } return $list; } function generateChildList_tree($root) { global $oracle_server, $oracle_user, $oracle_password, $oracle_tns; $conn = ocilogon($oracle_user,$oracle_password,$oracle_tns); $query ="SELECT A.ID, A.OBJECT_ID, A.ACCESS, A.DELEGATE, A.PARENT_ID FROM MSP_SEC_DELEGATE A LEFT JOIN MSP_SEC_OBJECTS B ON A.OBJECT_ID = B.ID WHERE A.PARENT_ID=" . $root . " AND A.ENABLED=1 ORDER BY A.ACCESS"; $statement = ociparse ($conn, $query); ociexecute ($statement); oci_fetch_all($statement,$res, 0, -1 ,OCI_ASSOC); for ($i=0;$i<sizeof($res[ID]);$i++) { $list[$i][ID] = $res[ID][$i]; $list[$i][ACCESS] = $res[ACCESS][$i]; $list[$i][DELEGATE] = $res[DELEGATE][$i]; } if (sizeof($list) == 0) { return false; } else { echo "<table border=0 cellpadding=0 cellspacing=0>"; for ($j=0;$j<sizeof($list);$j++) { echo "<tr>"; echo "<td valign=center style='font-family:courier;'>"; echo "<a href=''>" . $list[$j][ACCESS]. "</a> "; echo "</td>"; echo "<td valign=center>"; if ($list[$j][DELEGATE] == 1) { echo "<img src='icons/hand.png'>"; echo " </td>"; echo "<td style='width:10px;height:100%;border-top:1px solid #AAAAAA;border-bottom:1px solid #AAAAAA;border-left:1px solid #AAAAAA;' valign=center>"; echo " </td>"; } else { echo "<img src='icons/x.png'>"; echo " </td>"; echo "<td style='width:10px;height:100%;' valign=center>"; echo " </td>"; } echo "<td valign=center>"; generateChildList_tree($list[$j][ID]); echo "</td>"; echo "</tr>"; echo "<tr><td></td></tr>"; } echo "</table>"; return $list[$j][ID]; } } $list = generateList_tree($obj); echo "<h4>Tree For: " . $list[0][OBJECT_NAME] . "</h4>"; echo "<table border=0 cellpadding=0 cellspacing=0>"; for ($i=0;$i<sizeof($list);$i++) { echo "<tr>"; echo "<td valign=center style='font-family:courier;'>"; echo "<a href=''>" . $list[$i][ACCESS]. "</a> "; echo "</td>"; echo "<td valign=center>"; if ($list[$i][DELEGATE] == 1) { echo "<img src='icons/hand.png'>"; echo " </td>"; echo "<td style='width:10px;height:100%;border-top:1px solid #AAAAAA; border-bottom:1px solid #AAAAAA;border-left:1px solid #AAAAAA;' valign=center> "; echo " </td>"; } else { echo "<img src='icons/x.png'>"; echo " </td>"; echo "<td style='width:10px;height:100%;' valign=center>"; echo " </td>"; } echo "<td valign=center>"; $thischild = generateChildList_tree($list[$i][ID]); echo "</td>"; echo "</tr>"; echo "<tr><td></td></tr>"; } echo "</table>"; ?>
PHP: Find Ophaned Files August 16, 2007 No Comments
For a project at work, I needed a good way to find orphaned files (files that are not linked-to by anything) in a web directory. I looked for a good [free] application and could not find anything worthwhile except for a few windows applications. This PHP script searches recursively through an entire directory of files and does this comparison of the file contents and the filenames to find orphaned files.
All you should need to change is the $dir var to use your own Directory:
$dir = "/Your/directory/here";
Add any filename patterns that you want to include in the $extensions array.
Add any filename patterns that you want to exclude in the $excludes array.
Excludes are excluded out of the matching files that fall into the patterns in the $extensions array. This is to exclude files that are named something like: file.cfm.svn and such.
$extensions = array('.cfm','.html','.htm','.css','.php','.gif','.jpg','.png','.jpeg','.dwt'); $excludes = array('.svn');
And here is the code, you’ll probably need PHP5.x because of the file_get_contents() function, enjoy!
<? $findex = array(); $findex[path] = array(); $findex[file] = array(); $extensions = array('.cfm','.html','.htm','.css','.php','.gif','.jpg','.png','.jpeg','.dwt'); $excludes = array('.svn'); function rec_scandir($dir) { $files = array(); global $findex; global $extensions; global $excludes; if ( $handle = opendir($dir) ) { while ( ($file = readdir($handle)) !== false ) { if ( $file != ".." && $file != "." ) { if ( is_dir($dir . "/" . $file) ) { $files[$file] = rec_scandir($dir . "/" . $file); } else { for ($i=0;$i<sizeof($extensions);$i++) { if (strpos(strtolower($file),strtolower($extensions[$i])) > 0) { $found = true; } } for ($i=0;$i<sizeof($excludes);$i++) { if (strpos(strtolower($file),strtolower($excludes[$i])) > 0) { $found = false; } } if ($found) { $files[] = $file; $dirlink = $dir . "/" . $file; array_push($findex[path],$dirlink); array_push($findex[file],$file); } $found = false; } } } closedir($handle); return $findex; } } $dir = "/Your/directory/here"; echo "\n"; echo " Searching ". $dir ." for matching files\n"; $files = rec_scandir($dir); echo " Found " . sizeof($files[file]) . " matching extensions\n"; echo " Scanning for orphaned files....\n"; $findex[found] = array(); for ($i=0;$i<sizeof($findex[path]);$i++) { echo $i . " "; $contents = file_get_contents($findex[path][$i]); for ($j=0;$j<sizeof($findex[file]);$j++) { if (strpos($contents,$findex[file][$j]) > 0) { $findex[found][$j] = 1; } } } echo "\n"; $counter=1; for ($i=0;$i<sizeof($findex[path]);$i++) { if ($findex[found][$i] != 1) { echo " " . $counter . ") " . substr($findex[path][$i],0,1000) . " is orphaned\n"; $counter++; } } ?>