Skip to content

Commit b039718

Browse files
peffgitster
authored andcommitted
drop support for "experimental" loose objects
In git v1.4.3, we introduced a new loose object format that encoded some object information outside of the zlib stream. Ultimately the format was dropped in v1.5.3, but we kept the reading side around to help people migrate objects. Each time we open a loose object, we use a heuristic to check whether it is in the normal loose format, or the experimental one. This heuristic is robust in the face of valid data, but it tends to treat corrupted or garbage data as an experimental object. With the regular format, we would notice quickly that zlib's crc does not check out and complain. With the experimental object, we are likely to extract a nonsensical object size and try to allocate a huge buffer, resulting in xmalloc calling "die". This latter behavior is much worse, for two reasons. One, git reports an allocation error when the real error is corruption. And two, the program dies unconditionally, so you cannot even run fsck (which would otherwise ignore the broken object and keep going). We could try to improve the heuristic to err on the side of normal objects in the face of corruption, but there is really little point. The experimental format is long-dead, and was never enabled by default to begin with. We can instead simply remove it. The only affected repository would be one that explicitly set core.legacyheaders in 2007, and then never repacked in the intervening 6 years. Signed-off-by: Jeff King <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
1 parent becb433 commit b039718

19 files changed

+0
-143
lines changed

sha1_file.c

Lines changed: 0 additions & 74 deletions
Original file line numberDiff line numberDiff line change
@@ -1372,51 +1372,6 @@ void *map_sha1_file(const unsigned char *sha1, unsigned long *size)
13721372
return map;
13731373
}
13741374

1375-
/*
1376-
* There used to be a second loose object header format which
1377-
* was meant to mimic the in-pack format, allowing for direct
1378-
* copy of the object data. This format turned up not to be
1379-
* really worth it and we no longer write loose objects in that
1380-
* format.
1381-
*/
1382-
static int experimental_loose_object(unsigned char *map)
1383-
{
1384-
unsigned int word;
1385-
1386-
/*
1387-
* We must determine if the buffer contains the standard
1388-
* zlib-deflated stream or the experimental format based
1389-
* on the in-pack object format. Compare the header byte
1390-
* for each format:
1391-
*
1392-
* RFC1950 zlib w/ deflate : 0www1000 : 0 <= www <= 7
1393-
* Experimental pack-based : Stttssss : ttt = 1,2,3,4
1394-
*
1395-
* If bit 7 is clear and bits 0-3 equal 8, the buffer MUST be
1396-
* in standard loose-object format, UNLESS it is a Git-pack
1397-
* format object *exactly* 8 bytes in size when inflated.
1398-
*
1399-
* However, RFC1950 also specifies that the 1st 16-bit word
1400-
* must be divisible by 31 - this checksum tells us our buffer
1401-
* is in the standard format, giving a false positive only if
1402-
* the 1st word of the Git-pack format object happens to be
1403-
* divisible by 31, ie:
1404-
* ((byte0 * 256) + byte1) % 31 = 0
1405-
* => 0ttt10000www1000 % 31 = 0
1406-
*
1407-
* As it happens, this case can only arise for www=3 & ttt=1
1408-
* - ie, a Commit object, which would have to be 8 bytes in
1409-
* size. As no Commit can be that small, we find that the
1410-
* combination of these two criteria (bitmask & checksum)
1411-
* can always correctly determine the buffer format.
1412-
*/
1413-
word = (map[0] << 8) + map[1];
1414-
if ((map[0] & 0x8F) == 0x08 && !(word % 31))
1415-
return 0;
1416-
else
1417-
return 1;
1418-
}
1419-
14201375
unsigned long unpack_object_header_buffer(const unsigned char *buf,
14211376
unsigned long len, enum object_type *type, unsigned long *sizep)
14221377
{
@@ -1444,42 +1399,13 @@ unsigned long unpack_object_header_buffer(const unsigned char *buf,
14441399

14451400
int unpack_sha1_header(git_zstream *stream, unsigned char *map, unsigned long mapsize, void *buffer, unsigned long bufsiz)
14461401
{
1447-
unsigned long size, used;
1448-
static const char valid_loose_object_type[8] = {
1449-
0, /* OBJ_EXT */
1450-
1, 1, 1, 1, /* "commit", "tree", "blob", "tag" */
1451-
0, /* "delta" and others are invalid in a loose object */
1452-
};
1453-
enum object_type type;
1454-
14551402
/* Get the data stream */
14561403
memset(stream, 0, sizeof(*stream));
14571404
stream->next_in = map;
14581405
stream->avail_in = mapsize;
14591406
stream->next_out = buffer;
14601407
stream->avail_out = bufsiz;
14611408

1462-
if (experimental_loose_object(map)) {
1463-
/*
1464-
* The old experimental format we no longer produce;
1465-
* we can still read it.
1466-
*/
1467-
used = unpack_object_header_buffer(map, mapsize, &type, &size);
1468-
if (!used || !valid_loose_object_type[type])
1469-
return -1;
1470-
map += used;
1471-
mapsize -= used;
1472-
1473-
/* Set up the stream for the rest.. */
1474-
stream->next_in = map;
1475-
stream->avail_in = mapsize;
1476-
git_inflate_init(stream);
1477-
1478-
/* And generate the fake traditional header */
1479-
stream->total_out = 1 + snprintf(buffer, bufsiz, "%s %lu",
1480-
typename(type), size);
1481-
return 0;
1482-
}
14831409
git_inflate_init(stream);
14841410
return git_inflate(stream, 0);
14851411
}

t/t1013-loose-object-format.sh

Lines changed: 0 additions & 66 deletions
This file was deleted.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.

t/t1013/objects/76/e7fa9941f4d5f97f64fea65a2cba436bc79cbb

Lines changed: 0 additions & 2 deletions
This file was deleted.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.

t/t1013/objects/f8/16d5255855ac160652ee5253b06cd8ee14165a

Lines changed: 0 additions & 1 deletion
This file was deleted.

0 commit comments

Comments
 (0)