Integrating Condor and Queue Bounds Estimation from Time Series (QBETS) Into the UCLA Grid Portal

Report ID: 
2007-06
Authors: 
Kerby Johnson
Date: 
2007-06-01 05:00:00

Abstract

Grid Computing facilitates the sharing of distributed, heterogeneous resources owned by different organizations, each with their own goals, policies, and security requirements. One of the challenges facing Grid Computing is finding an optimal way for users to interact with Grid resources without needing to know the underlying details of the Grid. Grid Portals have been proposed as a means to address this challenge by providing transparent access to the Grid through a Web Portal interface.

The UCLA Grid Portal is being developed to create Grids for the UC Campuses. The Portal provides cluster scheduler information, data management and job management to the user. It supports clusters running PBS, SGE and LSF as cluster schedulers. To increase the resources available in the UCLA Grid Portal and to provide better support for some types of jobs, this thesis adds support for Condor, a prominent cluster scheduler, into the UCLA Grid Portal. Condor was added by using a Condor job manager (provided by Globus) and by adding scripts for collecting cluster status information.

The UCLA Grid Portal provides limited feedback to its users; in particular, it is difficult for users to know where they should submit jobs for fastest execution. This thesis enhances the UCLA Grid Portal by adding QBETS (Queue Bounds Estimation from Time Series) as a standalone portlet to the Portal. QBETS provides an upper-bound prediction of the queue delay a job will experience at a cluster or the probability that a job will start at a cluster by a deadline. This thesis integrates QBETS into the UCLA Grid Portal by adding monitors that communicate job accounting information to the QBETS backend database and a standalone portlet in the Portal that uses a web service query to retrieve predictions for the clusters available to a user.

Document

PDF icon 2007-06.pdf