(The specified address was excluded from the index. The crawl rules may have to be modified to include this address. (The item was deleted because it was either not found or the crawler was denied access to it.)
We were using the Robots.txt file to block search engines from indexing our public site. When I first tried to index the site from the SSP it would only crawl a small part of the site. After messing with permissions and other items I was informed that we had added a Robots.txt file to the site. Once we removed the file from the C:\inetpub\wwwroot\wss\VirtualDirectories\YourSiteName folder and rebooted the server, I was able to index the entire site from the SSP. The following day is when the trouble began. I tried doing a Full Crawl and the crawl status would change from Crawling to Idle within a few seconds and would display the error above.
Here is how I fixed the problem. In the content database create this Stored Procedure (you can delete it later).
What does it do? The stored procedure will allow you to search all the tables in a database.
Create PROC [dbo].[SearchAllTables]
(
@SearchStr nvarchar(100)
)
AS
BEGIN
CREATE TABLE #Results (ColumnName nvarchar(370), ColumnValue nvarchar(3630))
SET NOCOUNT ON
DECLARE @TableName nvarchar(256), @ColumnName nvarchar(128), @SearchStr2 nvarchar(110)
SET @TableName = ”
SET @SearchStr2 = QUOTENAME(‘%’ + @SearchStr + ‘%’,””)
WHILE @TableName IS NOT NULL
BEGIN
SET @ColumnName = ”
SET @TableName =
(
SELECT MIN(QUOTENAME(TABLE_SCHEMA) + ‘.’ + QUOTENAME(TABLE_NAME))
FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_TYPE = ‘BASE TABLE’
AND QUOTENAME(TABLE_SCHEMA) + ‘.’ + QUOTENAME(TABLE_NAME) > @TableName
AND OBJECTPROPERTY(
OBJECT_ID(
QUOTENAME(TABLE_SCHEMA) + ‘.’ + QUOTENAME(TABLE_NAME)
), ‘IsMSShipped’
) = 0
)
WHILE (@TableName IS NOT NULL) AND (@ColumnName IS NOT NULL)
BEGIN
SET @ColumnName =
(
SELECT MIN(QUOTENAME(COLUMN_NAME))
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_SCHEMA = PARSENAME(@TableName, 2)
AND TABLE_NAME = PARSENAME(@TableName, 1)
AND DATA_TYPE IN (‘char’, ‘varchar’, ‘nchar’, ‘nvarchar’)
AND QUOTENAME(COLUMN_NAME) > @ColumnName
)
IF @ColumnName IS NOT NULL
BEGIN
INSERT INTO #Results
EXEC
(
‘SELECT ”’ + @TableName + ‘.’ + @ColumnName + ”’, LEFT(‘ + @ColumnName + ‘, 3630)
FROM ‘ + @TableName + ‘ (NOLOCK) ‘ +
‘ WHERE ‘ + @ColumnName + ‘ LIKE ‘ + @SearchStr2
)
END
END
END
SELECT ColumnName, ColumnValue FROM #Results
END
Once you have created the stored proc execute this command:
Exec SearchAllTables ‘Robots.txt’
You will see a single record returned.
Then run this command:
Delete from AllDocs
Where LeafName = ‘Robots.txt’
After this I rebooted the server and went to lunch. Why lunch? Well, for some reason SharePoint takes a little time to propagate the change (timer job I’m sure).
If you are having an issue where only part of your site is being indexed try adding crawl rules.
example:
include
then select the default account or use one that has read access to the site.