BNCweb Installation Guide Prerequisites: xsltproc (from the Gnome LibXSLT package) MySQL 5.0 or higher (http://www.mysql.com) Perl 5.8 (version 5.6 may work, too, but might require installation of additional modules) Perl modules: - DBI - DBD::mysql - HTML::Entities - Parse::RecDescent Step-by-step guide of the installation: 1. Install the IMS Corpus Workbench and the CWB/Perl libraries - the source code for both is provied as part of the BNCweb distribution. Please note that the version avaiable via sourcefourge is not compatible with BNCweb. The two versions will be synchronized in the near future. Please refer to the documentation available with the source code. (You can ignore the error messages produced during the compilation process. Also, "make test" for the Perl modules will only work if you also install the sample corpora available at ftp://ftp.ims.uni-stuttgart.de/pub/outgoing/cwb-beta/index.html; this is not a necessary step.) Make sure that you check whether the path to the corpus registry is set correctly (DEFAULT_REGISTRY). 2. If it isn't already on your server, install MySQL (4.1 or higher; version 5 recommended). BNCweb is not compatible with MySQL version 4.0. 3. Copy the original texts of BNC-XML from the DVD. You do not need to install anything else (index, documentation, etc.) unless you also wish to access the BNC with other corpus tools, such as Xaira. 4. Run EncodeBNC.perl and then MakeFreqTables.perl (in the directory BNCweb-encoder-0.6.2) to create the CQP index and some required frequency tables for the BNC (which are written to the directory "tables"). In the simplest case, you can run the scripts with the following commands: perl EncodeBNC.perl -v /Path/to/new/data-directory/ /path/to/BNC_XML/Texts/ perl MakeFreqTables.perl -v Here's what this step could look like if you wish to make some more specific choices - this is how it ran on our computers, and it is highly likely that you would have to make changes for your set-up. perl EncodeBNC.perl -n BNC-XML -r /Corpora/registry/ -t tables/bnc -f -M 500 -v /Corpora/BNC-XML/ /Volumes/Data/Corpora/BNC_XML/Texts/ perl MakeFreqTables.perl -n BNC-XML -r /Corpora/registry/ -t tables/bnc -f -M 500 -v Please consult "perldoc EncodeBNC.perl" and "perldoc MakeFreqTables.perl" for further information about the available options. Please note that it will take several hours for the script EncodeBNC.perl to complete. 5. Make the necessary changes in the library file bncConfigXML.pm. The script used in the next step of the installation relies on this information. The directory indicated in $bwTempPath must be an existing directory whose permissions are set to be world readable and writable by the Web server. VERY IMPORTANT: The variable $bwMySQLTempPath must refer to the same directory as $bwTempPath as this is where CQP and MySQL exchange data with each other. If MySQL and CQP run on the same server, the two variables must therefore contain identical values. However, if MySQL runs on a different server and the directory is mounted externally, the same directory may of course have different paths depending from which server it is accessed. 6. Run make_MySQL_tables.pl - this will configure MySQL and import data created by Step 4 to a number of tables. You will be asked to enter the username and password of a MySQL user with administrator privileges ("CREATE DATABASE..." and "GRANT ALL/FILE..."). Alternatively, if you do not have admin access to your MySQL server, please ask your system administrator to set up the following users, access permissions and databases - the variables given below must be replaced by the values of your setup in bncLibCQPXML.pm: DROP DATABASE if exists $bwMYSQLtable; DROP DATABASE if exists $bwMYSQLusertable; DROP DATABASE if exists $bwMYSQLcategorizetable; DROP DATABASE if exists $bwMYSQLfrequency; CREATE DATABASE $bwMYSQLtable CHARACTER SET latin1 COLLATE latin1_general_ci; CREATE DATABASE $bwMYSQLusertable CHARACTER SET latin1 COLLATE latin1_general_ci; CREATE DATABASE $bwMYSQLcategorizetable CHARACTER SET latin1 COLLATE latin1_general_ci; CREATE DATABASE $bwMYSQLfrequency CHARACTER SET latin1 COLLATE latin1_general_ci; GRANT ALL ON $bwMYSQLtable.* TO '$bwMysqlUser'@'$bwServer' IDENTIFIED BY '$bwMysqlPwd'; GRANT ALL ON $bwMYSQLusertable.* TO '$bwMysqlUser'@'$bwServer' IDENTIFIED BY '$bwMysqlPwd'; GRANT ALL ON $bwMYSQLcategorizetable.* TO '$bwMysqlUser'@'$bwServer' IDENTIFIED BY '$bwMysqlPwd'; GRANT ALL ON $bwMYSQLfrequency.* TO '$bwMysqlUser'@'$bwServer' IDENTIFIED BY '$bwMysqlPwd'; GRANT FILE ON *.* TO '$bwMysqlUser'@'$bwServer' IDENTIFIED BY '$bwMysqlPwd'; Once this has been done, run the script make_MySQL_tables.pl. 7. Configure your web-server (e.g. by making changes to httpd.conf) and restart it - here's what it could look like: Alias /bncwebXML/ /Library/WebServer/bncwebXML/ Alias /bncwebXML /Library/WebServer/bncwebXML/ Options Indexes FollowSymLinks ExecCGI AuthType Basic AuthName bncweb AuthUserFile /etc/bncpass require valid-user SetEnv PERL5LIB /Library/WebServer/bncwebXML/lib_files SetEnv CORPUS_REGISTRY /Library/WebServer/bncwebXML/Corpora/registry ScriptAlias /cgi-binbncXML/ /Library/WebServer/bncwebXML/cgi-bin/ In oder to be able to set environment variables, you will have to un-comment the following lines in your apache-configuration - this is not the case by default: LoadModule env_module libexec/httpd/mod_env.so AddModule mod_env.c The second line may not be present if you have a recent version of Apache (Version 2) and can then be ignored. If you do not wish to handle the perl libraries in the web server configuration, you will need to make sure that the cgi-scripts have access to these libraries in an alternative way. For example, you could change the first line of each script to something like: #!/usr/bin/perl -I/Library/Webserver/bncweb/lib_files/ The web directory for BNCweb must contain all items of the BNCweb distribution apart from the directories CQP_source, BNCweb-encoder-0.6.2 and the file readme.txt. ***************************************************************************** It is important that your installation of BNCweb requires authentification. Apart from licensing issues in case your server is accessible via the Internet, this is also *very* highly recommended even if you are planning to run BNCweb as a stand-alone solution. BNCweb expects a user to have a username and some of its functionality may not work properly if you do not set up proper authentification. ***************************************************************************** Please note that BNCweb is not compatible with usernames that contain special characters (e.g. "@"). If a user attempts to log in with an incompatible username, an error message is displayed instead of the main BNCweb page. 8. Unless you require them for a different purpose, you can now delete the original BNC-XML files. 9. You can now access BNCweb at http://your.server.name/cgi-alias/BNCweb.pl Troubleshooting/comments: The BNCweb scripts assume that perl is located in /usr/bin/perl - if this is not the case, you will have to change the first line of each script (starting in #!) to reflect your local set-up (e.g. #!/usr/local/bin/perl or #!/sw/bin/perl) Installing BNCweb on other Web servers than Apache: - read documentation to find out about configuration files, options and syntax - if you cannot set the required aliases, distribute the BNCweb files as follows - Simple_query_language.pdf, genres.html, wz_tooltip.js, FileMaker_template.zip go in subdir bncweb/ of your HTML document tree (typically a directory named Documents/, http/ or html/ found in your Web server's data directory -- see local documentation) - all *.pl files in cgi-bin go into subdir bncweb/ of your CGI script tree (typically a directory named cgi-bin/) - move lib_files/ to subdirectory lib/ of cgi-bin/bncweb/ - settings in bncConfigXML.pm: $bwCGIalias = 'cgi-bin/bncweb'; and $bwHTMLalias = 'bncweb'; - it will be crucial to set the PERL5LIB environment variable to the directory .../lib_files/ (or .../cgi-bin/lib/ if you followed the steps above) - full absolute paths are mandatory, starting with the full path of the data directory of your Web server - check Web server documentation (e.g. on LightTPD you have to load mod_setenv); googling also helps and often digs up complete recipes - on LightTPD, PERL5LIB can't be set because of an internal bug; in this case, use PERLLIB instead - if you absolutely fail to set the required environment variable and have installed Perl modules locally in lib/ (as described above), you can also change the first line of _every_ .pl script in cgi-bin/ to ``#!/usr/bin/perl -Ilib'' (-I is "capital i") - depending on your CWB installation, you may also have to set the CORPUS_REGISTRY environment variable - for a well-done default installation, this shouldn't be necessary; check this before you spend a lot of time getting it to work - a general remark: even if a Web server release is marked as "stable", it doesn't really have to be and may crash unexpectedly; in that case, try upgrading to the latest version (even if "unstable") ... and vice versa, of course (based on experience with LightTPD on Debian Linux) - it is also crucial that you find out how to force users to authenticate themselves before accessing BNCweb scripts - BNCweb needs the authentication information for its user management - _all_ Web servers support authentication, which can be activated for individual directories; just read the documentation If you see an error message saying that your corpus is "not defined", it is likely that the path to the corpus registry is not correctly set. In this case, make the necesary change in config.mk and recompile the corpus workbench. If this does not help, check whether the corpus names are correct (e.g. are the characters all upper case?). If you see an error message saying that MySQL cannot exectute an SQL command containing the string "into outfile", check that the CQP-temp directory has the correct write permissions. Please report errors and problems to bncweb@mac.com. Last updated: 15/11/2007