PHP: Find Ophaned Files August 16, 2007

For a project at work, I needed a good way to find orphaned files (files that are not linked-to by anything) in a web directory. I looked for a good [free] application and could not find anything worthwhile except for a few windows applications. This PHP script searches recursively through an entire directory of files and does this comparison of the file contents and the filenames to find orphaned files.

All you should need to change is the $dir var to use your own Directory:

$dir = "/Your/directory/here";

Add any filename patterns that you want to include in the $extensions array.
Add any filename patterns that you want to exclude in the $excludes array.

Excludes are excluded out of the matching files that fall into the patterns in the $extensions array. This is to exclude files that are named something like: file.cfm.svn and such.

$extensions = array('.cfm','.html','.htm','.css','.php','.gif','.jpg','.png','.jpeg','.dwt');
$excludes = array('.svn');

And here is the code, you’ll probably need PHP5.x because of the file_get_contents() function, enjoy!

<?
$findex = array();
$findex[path] = array();
$findex[file] = array();
 
$extensions = array('.cfm','.html','.htm','.css','.php','.gif','.jpg','.png','.jpeg','.dwt');
$excludes = array('.svn');
 
function rec_scandir($dir)
        {
        $files = array();
        global $findex;
        global $extensions;
        global $excludes;
 
        if ( $handle = opendir($dir) ) 
        {
        while ( ($file = readdir($handle)) !== false ) 
            {
            if ( $file != ".." && $file != "." ) 
            	{
                if ( is_dir($dir . "/" . $file) ) 
                        {
                        $files[$file] = rec_scandir($dir . "/" . $file);
                        }
                else 
                        {
                        for ($i=0;$i<sizeof($extensions);$i++)
                        	{
                            if (strpos(strtolower($file),strtolower($extensions[$i])) > 0)
                            	{
                                $found = true;
                                }
                            }
                        for ($i=0;$i<sizeof($excludes);$i++)
                        	{ 
                            if (strpos(strtolower($file),strtolower($excludes[$i])) > 0)
                            	{
                                $found = false;
                                }
                            }
                        if ($found)
                            {
                            $files[] = $file;
                        	$dirlink = $dir . "/" . $file;
                            array_push($findex[path],$dirlink);
                        	array_push($findex[file],$file);
                            }
                        $found = false;
                        }
               		}
            	}
            closedir($handle);
            return $findex;
        	}
    	}
 
 
$dir = "/Your/directory/here";
 
echo "\n";
echo " Searching ". $dir ." for matching files\n";
 
$files = rec_scandir($dir);
 
echo " Found " . sizeof($files[file]) . " matching extensions\n";
 
echo " Scanning for orphaned files....\n";
 
$findex[found] = array();
 
for ($i=0;$i<sizeof($findex[path]);$i++)
        {
        echo $i . " ";
        $contents = file_get_contents($findex[path][$i]);
        for ($j=0;$j<sizeof($findex[file]);$j++)
                {
                if (strpos($contents,$findex[file][$j]) > 0)
                        {
                        $findex[found][$j] = 1;
                        }
                }
        }
 
echo "\n";
 
$counter=1;
for ($i=0;$i<sizeof($findex[path]);$i++)
        {
        if ($findex[found][$i] != 1)
                {
                echo  " " . $counter . ") " .  substr($findex[path][$i],0,1000) . " is orphaned\n";
                $counter++;
                }
        }
 
?>
Leave a Reply