Apache:cachemire

De Lowgeek wiki

Web.png

Cachemire

Cachemire ce décompose en 2 éléments.

  • Le script de génération, un script bash utilisant wget
  • Une configuration apache faisant appel au cache.

Le but de ce cache étant de soulager les appels au serveur d'applications. (jboss/tomcat/php)

Le script de" génération peu par exemple prendre pour cible la home du site, et télécharger les éléments html listés dans cette home. Tout est configurable bien sur.

Title2.png

Le générateur de cache

Voici le script principale. C'est le moteur de cachemire. Pour chaque site, il est nécessaire de créer un fichier de configuration. Il est possible également de définir un script de pré-traitements et post-traitement.

<syntaxhighlight lang="bash">

  1. !/bin/bash
  2. Ecrit par erreur404 and guest ;)

CONFIG_FILE=$1 SCRIPT_DIR=~/scripts/cachemire DATA_DIR=~/scripts/cachemire/data LOG_DIR_ROOT=$DATA_DIR/log CONFIG_DIR=$SCRIPT_DIR/config ENV=recette MONITOR_CACHEMIRE=$(date "+%s") GEN_TIME=$(date "+%F A %T") if [ -n "$2" ];then

   DEBUG=1

else

   DEBUG=0

fi

function die() {

   echo "$(date +'%F %T') ERREUR ${@}" 1>&2
   exit 1

}

test -f $CONFIG_DIR/$CONFIG_FILE || die "Config file [$CONFIG_FILE] not found"

source $CONFIG_DIR/$CONFIG_FILE source ~/bin/global.func

  1. Check needed params

test -n "${SITE}" || die "Site not set"

  1. set default values

KEY=${KEY-default} MIRROR_LEVEL=${MIRROR_LEVEL-3} GLOBAL_TIMEOUT=${GLOBAL_TIMEOUT-30} EXCLUDE_FILES=${EXCLUDE_FILES-"*.css,*.js,*.jpg,*.jpeg,*.gif,*.png,*.pdf,*.mpg,*.mpeg,*.avi,*.flv,*.swf"} FOLLOW_TAGS=${FOLLOW_TAGS-"a,meta"} LOG_DIR=$LOG_DIR_ROOT/$TD/$(date +%y-%m)/$(date +%d) LOG_FILE=$LOG_DIR/$KEY-$(date +%Hh%M.%S).log

  1. LOG_FILE=$LOG_DIR/$KEY-last.log

LOG_FILE_RSYNC=$LOG_DIR/$KEY-$(date +%Hh%M.%S)-rsync.log MIN_FILE_SIZE=${MIN_FILE_SIZE-100} EXCLUDE_FOLDER=${EXCLUDE_FOLDER-"no-cache,nocache"} if [[ "${MIRROR_LEVEL}" != "0" ]]

    then RECURSIVE="--recursive"

fi

PID_FILE=${SCRIPT_DIR}/pid/${SITE}-${KEY}.pid check-pid ${PID_FILE}

  1. Execution de pre commandes, si le script existe

test -f ${SCRIPT_DIR}/scripts/pre_${SITE}-${KEY}.sh && source ${SCRIPT_DIR}/scripts/pre_${SITE}-${KEY}.sh

function log-echo {

   DATE=$(date +%Hh%M.%S)
   echo "[$DATE] $@"
   echo "[$DATE] $@" >> $LOG_FILE

}

function test_mon {

   RESULT_TEST=
   for PAGE in "${MONITOR_TS_PAGES[@]}"
       do
           ls "${f}" | grep ${PAGE} > /dev/null
           RESULT_PAGE=${?}
           if [[ "${RESULT_PAGE}" == "0" ]]
               then
                   for MONITOR in "${MONITOR_LIST[@]}"
                   do
                       fgrep -a -i ${MONITOR} "${f}" > /dev/null
                       RESULT_TEST=${?}
                       if [[ "${RESULT_TEST}" != "0" ]] ; then RESULT_TEST=1; RESULT_MON="NOK: Pas de moniteur dans le fichier ${f}"; continue ;fi
                   if [[ "${RESULT_TEST}" == "0" ]] ; then RESULT_MON="OK: ${MONITOR} finded in ${f}"; fi
                   done
           fi
       done

}

CACHE_DIR=/var/www/${SITE}/generated/web/cachemire/$KEY WORK_DIR_MIRROR=$DATA_DIR/${SITE}/$KEY/mirror WORK_DIR_PROC=$DATA_DIR/${SITE}/$KEY/proc

  1. test final values

test -d $CACHE_DIR || die "Folder [$CACHE_DIR] do not exist" test -w $CACHE_DIR || die "Folder [$CACHE_DIR] is not writeable"

  1. mkdir

test -d $LOG_DIR || mkdir -p $LOG_DIR test -d $WORK_DIR_MIRROR && rm -rf $WORK_DIR_MIRROR ; mkdir -p $WORK_DIR_MIRROR test -d $WORK_DIR_PROC && rm -rf $WORK_DIR_PROC ; mkdir $WORK_DIR_PROC

cd $WORK_DIR_MIRROR

log-echo "start (log: $LOG_FILE)"

log-echo "wget ..."

wget -o $LOG_FILE -nc --progress=dot --timeout=$GLOBAL_TIMEOUT --html-extension ${WGET_PARAM} ${URL} --no-host-directories --directory-prefix=$WORK_DIR_MIRROR --user-agent=cachemire ${RECURSIVE} --level=$MIRROR_LEVEL --reject "$EXCLUDE_FILES" --exclude-directories="${EXCLUDE_FOLDER}" --follow-tags=$FOLLOW_TAGS -e robots=off

  1. Check saved \[0/0\]

RET=$(grep -Rn "saved \[0/0\]" $LOG_FILE) if [ $? -eq 0 ];then

   echo "$RET"|mailx -s "[$HOSTNAME] cachemire on $CONFIG_FILE: 0 size files found" admin@mondomaine

fi

log-echo "search and delete 0 sized ..." find -type f -size 0 -print -delete|tee -a $LOG_FILE

log-echo "proc ..."

IFS=$(echo -en "\n\b")

log-echo "Convert url-encoded to real" for FILE in $(ls | grep "%2F") do

 CONVERTER=${FILE//%2F//}
 CONVERTED=$(echo "${CONVERTER}"|sed 's/?/__PI__/g')
 DIR_NAME=${CONVERTED%/*}
 FILE_NAME=${CONVERTED##*/}
 test $DEBUG && log-echo "mkdir -p $DIR_NAME"
 test -d $DIR_NAME || mkdir -p $DIR_NAME
 test $DEBUG && log-echo "mv ${FILE} ${DIR_NAME}/${FILE_NAME}"
 mv ${FILE} ${DIR_NAME}/${FILE_NAME}

done

log-echo "mkdir $WORK_DIR_PROC path" for d in $(find -type d) do

   test $DEBUG && log-echo "test -d $WORK_DIR_PROC/$d || mkdir -p $WORK_DIR_PROC/$d"
   test -d $WORK_DIR_PROC/$d || mkdir -p $WORK_DIR_PROC/$d

done

log-echo "Check and cp file to proc" for f in $(find -type f) do

   FILE_SIZE=$(stat -c "%s" "$f")
   if [[ "x${MONITOR_PAGES}" != "x" ]]
       then test_mon
       if [[ "${RESULT_TEST}" == "1" ]]
           then continue
           echo ${RESULT_MON}          
       fi
   fi
   #check file size
   if [ $FILE_SIZE -lt $MIN_FILE_SIZE ]
   then
       log-echo "file [$f] has size $FILE_SIZE, pass"
       continue
   fi
   #Set final name
   new_name=$(echo "${f}"|sed 's/?/__PI__/g'| sed 's|./||')
   #Copy file to proc
   test $DEBUG && log-echo "rsync -rt \"$f\" \"${WORK_DIR_PROC}/${new_name}\""
   rsync -rt "${f}" "${WORK_DIR_PROC}/${new_name}"|tee -a $LOG_FILE
   #Ajout de sed pour mettre un moniteur sur l’ancienneté des pages
   test -f "${WORK_DIR_PROC}/${new_name}" && echo "" >> "${WORK_DIR_PROC}/${new_name}"
   test -f "${WORK_DIR_PROC}/${new_name}" && echo "" >> "${WORK_DIR_PROC}/${new_name}"

done

  1. Exécution de post commandes, si le script existe

test -f ${SCRIPT_DIR}/scripts/post_${SITE}-${KEY}.sh && source ${SCRIPT_DIR}/scripts/post_${SITE}-${KEY}.sh

log-echo "rsync ..." rsync -rc --delay-updates --log-file=$LOG_FILE_RSYNC $WORK_DIR_PROC/ $CACHE_DIR/

IFS=

cat $LOG_FILE_RSYNC >> $LOG_FILE rm -f $LOG_FILE_RSYNC

log-echo "end" gzip -f $LOG_FILE rm -f ${PID_FILE} </syntaxhighlight>


Title2.png

Le fichier de config

Voici un exemple de fichier de config associé au scripts pour un site:

<syntaxhighlight lang="bash"> pwd ~/scripts/cachemire erreur404@lowgeek:~/scripts/cachemire$ ls -l

-rwxr-xr-x 1 erreur404 erreur404 6367 2012-08-29 13:29 cachemire.sh drwxr-xr-x 2 erreur404 erreur404 4096 2012-10-04 16:42 config drwxr-xr-x 2 erreur404 erreur404 4096 2012-10-10 16:01 pid drwxr-xr-x 2 erreur404 erreur404 4096 2012-10-10 16:00 scripts

cat ./config/lowgeek URL=http://wiki.lowgeek.net/index.php SITE=lowgeek MIRROR_LEVEL=3 MONITOR_LIST=("MediaWiki") MONITOR_TS_PAGES=("index.php") EXCLUDE_FILES="*.css,*.js,*.jpg,*.jpeg,*.gif,*.png,*.pdf,*.mpg,*.mpeg,*.avi,*.flv,*.swf,*;index.php?title=Sp*" </syntaxhighlight>


Title2.png

Configuration apache

Voici un exemple d'une partie du fichier de config du vhost du site:

<syntaxhighlight lang="apache">

  1. CACHEMIRE

AddDefaultCharset UTF-8

RewriteEngine On

  1. RewriteLogLevel 4
  2. RewriteLog "/var/log/apache2/lowgeek_rewrite.log"

RewriteCond %{HTTP_USER_AGENT} !^cachemire$ RewriteCond /var/www/lowgeek/cachemire-disable !-f RewriteCond %{REQUEST_URI} !^(/images.*|/js.*)$ RewriteCond %{QUERY_STRING} ^$ RewriteCond /var/www/lowgeek/generated/web/cachemire/default%{REQUEST_URI} -f RewriteRule ^ /var/www/lowgeek/generated/web/cachemire/default%{REQUEST_URI} [E=no-jk:true,T=text/html,L]

RewriteCond %{HTTP_USER_AGENT} !^cachemire$ RewriteCond /var/www/lowgeek/cachemire-disable !-f RewriteCond %{REQUEST_URI} !^(/images.*|/js.*)$ RewriteCond %{QUERY_STRING} ^(.*)$ RewriteCond /var/www/lowgeek/generated/web/cachemire/default%{REQUEST_URI}__PI__%1 -f RewriteRule ^ /var/www/lowgeek/generated/web/cachemire/default%{REQUEST_URI}__PI__%1 [E=no-jk:true,T=text/html,L]

RewriteCond %{HTTP_USER_AGENT} !^cachemire$ RewriteCond /var/www/lowgeek/cachemire-disable !-f RewriteCond %{REQUEST_URI} !^(/images.*|/js.*)$ RewriteCond %{QUERY_STRING} ^$ RewriteCond /var/www/lowgeek/generated/web/cachemire/default%{REQUEST_URI}.html -f RewriteRule ^ /var/www/lowgeek/generated/web/cachemire/default%{REQUEST_URI}.html [E=no-jk:true,T=text/html,L]

RewriteCond %{HTTP_USER_AGENT} !^cachemire$ RewriteCond /var/www/lowgeek/cachemire-disable !-f RewriteCond %{REQUEST_URI} !^(/images.*|/js.*)$ RewriteCond %{QUERY_STRING} ^(.*)$ RewriteCond /var/www/lowgeek/generated/web/cachemire/default%{REQUEST_URI}.html__PI__%1 -f RewriteRule ^ /var/www/lowgeek/generated/web/cachemire/default%{REQUEST_URI}.html__PI__%1 [E=no-jk:true,T=text/html,L] </syntaxhighlight>