20 out of 48 OSD refuse to start after our ceph cluster was hit by a power outage

By zshongyi

One of our ceph cluster was hit by a power outage. After that we managed to recover the monitor and some of the OSDs. But 20 out of 48 OSD refuse to start. They have error logs like the following.

2017-07-06 11:41:14.165597 7f9316b3a800 0 set uid:gid to 64045:64045 (ceph:ceph)

2017-07-06 11:41:14.165619 7f9316b3a800 0 ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374), process ceph-osd, pid 25387

2017-07-06 11:41:14.166409 7f9316b3a800 0 pidfile_write: ignore empty –pid-file

2017-07-06 11:41:14.174987 7f9316b3a800 0 filestore(/var/lib/ceph/osd/ceph-1) backend xfs (magic 0x58465342)

2017-07-06 11:41:14.175294 7f9316b3a800 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-1) detect_features: FIEMAP ioctl is disabled via ‘filestore fiemap’ config option

2017-07-06 11:41:14.175299 7f9316b3a800 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-1) detect_features: SEEK_DATA/SEEK_HOLE is disabled via ‘filestore seek data hole’ config option

2017-07-06 11:41:14.175311 7f9316b3a800 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-1) detect_features: splice is supported

2017-07-06 11:41:14.176056 7f9316b3a800 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-1) detect_features: syncfs(2) syscall fully supported (by glibc and kernel)

2017-07-06 11:41:14.176081 7f9316b3a800 0 xfsfilestorebackend(/var/lib/ceph/osd/ceph-1) detect_feature: extsize is disabled by conf

2017-07-06 11:41:14.176550 7f9316b3a800 1 leveldb: Recovering log #44233

2017-07-06 11:41:14.177987 7f9316b3a800 1 leveldb: Delete type=0 #44233

2017-07-06 11:41:14.178010 7f9316b3a800 1 leveldb: Delete type=3 #44232

2017-07-06 11:41:14.178404 7f9316b3a800 0 filestore(/var/lib/ceph/osd/ceph-1) mount: enabling WRITEAHEAD journal mode: checkpoint is not enabled

2017-07-06 11:41:14.179148 7f9316b3a800 1 journal _open /var/lib/ceph/osd/ceph-1/journal fd 18: 21474836480 bytes, block size 4096 bytes, directio = 1, aio = 1

2017-07-06 11:41:14.179781 7f9316b3a800 -1 journal Unable to read past sequence 97495989 but header indicates the journal has committed up through 97496025, journal is corrupt

2017-07-06 11:41:14.182137 7f9316b3a800 -1 os/filestore/FileJournal.cc: In function ‘bool FileJournal::read_entry(ceph::bufferlist&, uint64_t&, bool*)’ thread 7f9316b3a800 time 2017-07-06 11:41:14.179788

os/filestore/FileJournal.cc: 2031: FAILED assert(0)

ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)

1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0x7f9317538dab]

2: (FileJournal::read_entry(ceph::buffer::list&, unsigned long&, bool*)+0x8f1) [0x7f93172f0a21]

3: (JournalingObjectStore::journal_replay(unsigned long)+0x1ce) [0x7f93172453ae]

4: (FileStore::mount()+0x3bf6) [0x7f931721d686]

5: (OSD::init()+0x27d) [0x7f9316ef78bd]

6: (main()+0x29d1) [0x7f9316e60c81]

7: (__libc_start_main()+0xf5) [0x7f9313a5aec5]

8: (()+0x353957) [0x7f9316ea9957]

NOTE: a copy of the executable, or objdump -rdS is needed to interpret this.

— begin dump of recent events —

-59> 2017-07-06 11:41:14.163302 7f9316b3a800 5 asok(0x7f9321532280) register_command perfcounters_dump hook 0x7f9321512050

-58> 2017-07-06 11:41:14.163316 7f9316b3a800 5 asok(0x7f9321532280) register_command 1 hook 0x7f9321512050

-57> 2017-07-06 11:41:14.163319 7f9316b3a800 5 asok(0x7f9321532280) register_command perf dump hook 0x7f9321512050

-56> 2017-07-06 11:41:14.163321 7f9316b3a800 5 asok(0x7f9321532280) register_command perfcounters_schema hook 0x7f9321512050

-55> 2017-07-06 11:41:14.163322 7f9316b3a800 5 asok(0x7f9321532280) register_command 2 hook 0x7f9321512050

-54> 2017-07-06 11:41:14.163324 7f9316b3a800 5 asok(0x7f9321532280) register_command perf schema hook 0x7f9321512050

-53> 2017-07-06 11:41:14.163325 7f9316b3a800 5 asok(0x7f9321532280) register_command perf reset hook 0x7f9321512050

-52> 2017-07-06 11:41:14.163327 7f9316b3a800 5 asok(0x7f9321532280) register_command config show hook 0x7f9321512050

-51> 2017-07-06 11:41:14.163329 7f9316b3a800 5 asok(0x7f9321532280) register_command config set hook 0x7f9321512050

-50> 2017-07-06 11:41:14.163331 7f9316b3a800 5 asok(0x7f9321532280) register_command config get hook 0x7f9321512050

-49> 2017-07-06 11:41:14.163335 7f9316b3a800 5 asok(0x7f9321532280) register_command config diff hook 0x7f9321512050

-48> 2017-07-06 11:41:14.163340 7f9316b3a800 5 asok(0x7f9321532280) register_command log flush hook 0x7f9321512050

-47> 2017-07-06 11:41:14.163342 7f9316b3a800 5 asok(0x7f9321532280) register_command log dump hook 0x7f9321512050

-46> 2017-07-06 11:41:14.163344 7f9316b3a800 5 asok(0x7f9321532280) register_command log reopen hook 0x7f9321512050

-45> 2017-07-06 11:41:14.165597 7f9316b3a800 0 set uid:gid to 64045:64045 (ceph:ceph)

-44> 2017-07-06 11:41:14.165619 7f9316b3a800 0 ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374), process ceph-osd, pid 25387

-43> 2017-07-06 11:41:14.165633 7f9316b3a800 5 object store type is filestore

-42> 2017-07-06 11:41:14.166360 7f9316b3a800 1 — 10.20.10.11:0/0 learned my addr 10.20.10.11:0/0

-41> 2017-07-06 11:41:14.166367 7f9316b3a800 1 accepter.accepter.bind my_inst.addr is 10.20.10.11:6805/25387 need_addr=0

-40> 2017-07-06 11:41:14.166381 7f9316b3a800 1 — 10.30.10.11:0/0 learned my addr 10.30.10.11:0/0

-39> 2017-07-06 11:41:14.166383 7f9316b3a800 1 accepter.accepter.bind my_inst.addr is 10.30.10.11:6806/25387 need_addr=0

-38> 2017-07-06 11:41:14.166393 7f9316b3a800 1 — 10.30.10.11:0/0 learned my addr 10.30.10.11:0/0

-37> 2017-07-06 11:41:14.166395 7f9316b3a800 1 accepter.accepter.bind my_inst.addr is 10.30.10.11:6807/25387 need_addr=0

-36> 2017-07-06 11:41:14.166404 7f9316b3a800 1 — 10.20.10.11:0/0 learned my addr 10.20.10.11:0/0

-35> 2017-07-06 11:41:14.166408 7f9316b3a800 1 accepter.accepter.bind my_inst.addr is 10.20.10.11:6806/25387 need_addr=0

-34> 2017-07-06 11:41:14.166409 7f9316b3a800 0 pidfile_write: ignore empty –pid-file

-33> 2017-07-06 11:41:14.167620 7f9316b3a800 5 asok(0x7f9321532280) init /var/run/ceph/ceph-osd.1.asok

-32> 2017-07-06 11:41:14.167627 7f9316b3a800 5 asok(0x7f9321532280) bind_and_listen /var/run/ceph/ceph-osd.1.asok

-31> 2017-07-06 11:41:14.167686 7f9316b3a800 5 asok(0x7f9321532280) register_command 0 hook 0x7f932150e0a0

-30> 2017-07-06 11:41:14.167691 7f9316b3a800 5 asok(0x7f9321532280) register_command version hook 0x7f932150e0a0

-29> 2017-07-06 11:41:14.167693 7f9316b3a800 5 asok(0x7f9321532280) register_command git_version hook 0x7f932150e0a0

-28> 2017-07-06 11:41:14.167696 7f9316b3a800 5 asok(0x7f9321532280) register_command help hook 0x7f9321512230

-27> 2017-07-06 11:41:14.167700 7f9316b3a800 5 asok(0x7f9321532280) register_command get_command_descriptions hook 0x7f9321512220

-26> 2017-07-06 11:41:14.167733 7f93107d6700 5 asok(0x7f9321532280) entry start

-25> 2017-07-06 11:41:14.167747 7f9316b3a800 10 monclient(hunting): build_initial_monmap

-24> 2017-07-06 11:41:14.174599 7f9316b3a800 5 adding auth protocol: cephx

-23> 2017-07-06 11:41:14.174604 7f9316b3a800 5 adding auth protocol: cephx

-22> 2017-07-06 11:41:14.174721 7f9316b3a800 5 asok(0x7f9321532280) register_command objecter_requests hook 0x7f9321512270

-21> 2017-07-06 11:41:14.174755 7f9316b3a800 1 — 10.20.10.11:6805/25387 messenger.start

-20> 2017-07-06 11:41:14.174782 7f9316b3a800 1 — :/0 messenger.start

-19> 2017-07-06 11:41:14.174805 7f9316b3a800 1 — 10.20.10.11:6806/25387 messenger.start

-18> 2017-07-06 11:41:14.174824 7f9316b3a800 1 — 10.30.10.11:6807/25387 messenger.start

-17> 2017-07-06 11:41:14.174849 7f9316b3a800 1 — 10.30.10.11:6806/25387 messenger.start

-16> 2017-07-06 11:41:14.174871 7f9316b3a800 1 — :/0 messenger.start

-15> 2017-07-06 11:41:14.174940 7f9316b3a800 2 osd.1 0 mounting /var/lib/ceph/osd/ceph-1 /var/lib/ceph/osd/ceph-1/journal

-14> 2017-07-06 11:41:14.174987 7f9316b3a800 0 filestore(/var/lib/ceph/osd/ceph-1) backend xfs (magic 0x58465342)

-13> 2017-07-06 11:41:14.175294 7f9316b3a800 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-1) detect_features: FIEMAP ioctl is disabled via ‘filestore fiemap’ config option

-12> 2017-07-06 11:41:14.175299 7f9316b3a800 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-1) detect_features: SEEK_DATA/SEEK_HOLE is disabled via ‘filestore seek data hole’ config option

-11> 2017-07-06 11:41:14.175311 7f9316b3a800 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-1) detect_features: splice is supported

-10> 2017-07-06 11:41:14.176056 7f9316b3a800 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-1) detect_features: syncfs(2) syscall fully supported (by glibc and kernel)

-9> 2017-07-06 11:41:14.176081 7f9316b3a800 0 xfsfilestorebackend(/var/lib/ceph/osd/ceph-1) detect_feature: extsize is disabled by conf

-8> 2017-07-06 11:41:14.176550 7f9316b3a800 1 leveldb: Recovering log #44233

-7> 2017-07-06 11:41:14.177987 7f9316b3a800 1 leveldb: Delete type=0 #44233

-6> 2017-07-06 11:41:14.178010 7f9316b3a800 1 leveldb: Delete type=3 #44232

-5> 2017-07-06 11:41:14.178404 7f9316b3a800 0 filestore(/var/lib/ceph/osd/ceph-1) mount: enabling WRITEAHEAD journal mode: checkpoint is not enabled

-4> 2017-07-06 11:41:14.179098 7f9316b3a800 2 journal open /var/lib/ceph/osd/ceph-1/journal fsid 204fd686-6387-40da-b4e1-3f51a26b9e90 fs_op_seq 97495988

-3> 2017-07-06 11:41:14.179148 7f9316b3a800 1 journal _open /var/lib/ceph/osd/ceph-1/journal fd 18: 21474836480 bytes, block size 4096 bytes, directio = 1, aio = 1

-2> 2017-07-06 11:41:14.179727 7f9316b3a800 2 journal read_entry 5461938176 : seq 97496026 18271 bytes

-1> 2017-07-06 11:41:14.179781 7f9316b3a800 -1 journal Unable to read past sequence 97495989 but header indicates the journal has committed up through 97496025, journal is corrupt

0> 2017-07-06 11:41:14.182137 7f9316b3a800 -1 os/filestore/FileJournal.cc: In function ‘bool FileJournal::read_entry(ceph::bufferlist&, uint64_t&, bool*)’ thread 7f9316b3a800 time 2017-07-06 11:41:14.179788

os/filestore/FileJournal.cc: 2031: FAILED assert(0)

ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)

1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0x7f9317538dab]

2: (FileJournal::read_entry(ceph::buffer::list&, unsigned long&, bool*)+0x8f1) [0x7f93172f0a21]

3: (JournalingObjectStore::journal_replay(unsigned long)+0x1ce) [0x7f93172453ae]

4: (FileStore::mount()+0x3bf6) [0x7f931721d686]

5: (OSD::init()+0x27d) [0x7f9316ef78bd]

6: (main()+0x29d1) [0x7f9316e60c81]

7: (__libc_start_main()+0xf5) [0x7f9313a5aec5]

8: (()+0x353957) [0x7f9316ea9957]

NOTE: a copy of the executable, or objdump -rdS is needed to interpret this.

— logging levels —

0/ 5 none

0/ 1 lockdep

0/ 1 context

1/ 1 crush

1/ 5 mds

1/ 5 mds_balancer

1/ 5 mds_locker

1/ 5 mds_log

1/ 5 mds_log_expire

1/ 5 mds_migrator

0/ 1 buffer

0/ 1 timer

0/ 1 filer

0/ 1 striper

0/ 1 objecter

0/ 5 rados

0/ 5 rbd

0/ 5 rbd_mirror

0/ 5 rbd_replay

0/ 5 journaler

0/ 5 objectcacher

0/ 5 client

0/ 5 osd

0/ 5 optracker

0/ 5 objclass

1/ 3 filestore

1/ 3 journal

0/ 5 ms

1/ 5 mon

0/10 monc

1/ 5 paxos

0/ 5 tp

1/ 5 auth

1/ 5 crypto

1/ 1 finisher

1/ 5 heartbeatmap

1/ 5 perfcounter

1/ 5 rgw

1/10 civetweb

1/ 5 javaclient

1/ 5 asok

1/ 1 throttle

0/ 0 refs

1/ 5 xio

1/ 5 compressor

1/ 5 newstore

1/ 5 bluestore

1/ 5 bluefs

1/ 3 bdev

1/ 5 kstore

4/ 5 rocksdb

4/ 5 leveldb

1/ 5 kinetic

1/ 5 fuse

-2/-2 (syslog threshold)

-1/-1 (stderr threshold)

max_recent 10000

max_new 1000

log_file /var/log/ceph/ceph-osd.1.log

— end dump of recent events —

2017-07-06 11:41:14.185028 7f9316b3a800 -1 *** Caught signal (Aborted) **

in thread 7f9316b3a800 thread_name:ceph-osd

ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)

1: (()+0x8ebb02) [0x7f9317441b02]

2: (()+0x10340) [0x7f9315a0f340]

3: (gsignal()+0x39) [0x7f9313a6fcc9]

4: (abort()+0x148) [0x7f9313a730d8]

5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x265) [0x7f9317538f85]

6: (FileJournal::read_entry(ceph::buffer::list&, unsigned long&, bool*)+0x8f1) [0x7f93172f0a21]

7: (JournalingObjectStore::journal_replay(unsigned long)+0x1ce) [0x7f93172453ae]

8: (FileStore::mount()+0x3bf6) [0x7f931721d686]

9: (OSD::init()+0x27d) [0x7f9316ef78bd]

10: (main()+0x29d1) [0x7f9316e60c81]

11: (__libc_start_main()+0xf5) [0x7f9313a5aec5]

12: (()+0x353957) [0x7f9316ea9957]

NOTE: a copy of the executable, or objdump -rdS is needed to interpret this.

— begin dump of recent events —

 0> 2017-07-06 11:41:14.185028 7f9316b3a800 -1 *** Caught signal (Aborted) **

in thread 7f9316b3a800 thread_name:ceph-osd

ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)

1: (()+0x8ebb02) [0x7f9317441b02]

2: (()+0x10340) [0x7f9315a0f340]

3: (gsignal()+0x39) [0x7f9313a6fcc9]

4: (abort()+0x148) [0x7f9313a730d8]

5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x265) [0x7f9317538f85]

6: (FileJournal::read_entry(ceph::buffer::list&, unsigned long&, bool*)+0x8f1) [0x7f93172f0a21]

7: (JournalingObjectStore::journal_replay(unsigned long)+0x1ce) [0x7f93172453ae]

8: (FileStore::mount()+0x3bf6) [0x7f931721d686]

9: (OSD::init()+0x27d) [0x7f9316ef78bd]

10: (main()+0x29d1) [0x7f9316e60c81]

11: (__libc_start_main()+0xf5) [0x7f9313a5aec5]

12: (()+0x353957) [0x7f9316ea9957]

NOTE: a copy of the executable, or objdump -rdS is needed to interpret this.

— logging levels —

0/ 5 none

0/ 1 lockdep

0/ 1 context

1/ 1 crush

1/ 5 mds

1/ 5 mds_balancer

1/ 5 mds_locker

1/ 5 mds_log

1/ 5 mds_log_expire

1/ 5 mds_migrator

0/ 1 buffer

0/ 1 timer

0/ 1 filer

0/ 1 striper

0/ 1 objecter

0/ 5 rados

0/ 5 rbd

0/ 5 rbd_mirror

0/ 5 rbd_replay

0/ 5 journaler

0/ 5 objectcacher

0/ 5 client

0/ 5 osd

0/ 5 optracker

0/ 5 objclass

1/ 3 filestore

1/ 3 journal

0/ 5 ms

1/ 5 mon

0/10 monc

1/ 5 paxos

0/ 5 tp

1/ 5 auth

1/ 5 crypto

1/ 1 finisher

1/ 5 heartbeatmap

1/ 5 perfcounter

1/ 5 rgw

1/10 civetweb

1/ 5 javaclient

1/ 5 asok

1/ 1 throttle

0/ 0 refs

1/ 5 xio

1/ 5 compressor

1/ 5 newstore

1/ 5 bluestore

1/ 5 bluefs

1/ 3 bdev

1/ 5 kstore

4/ 5 rocksdb

4/ 5 leveldb

1/ 5 kinetic

1/ 5 fuse

-2/-2 (syslog threshold)

-1/-1 (stderr threshold)

max_recent 10000

max_new 1000

log_file /var/log/ceph/ceph-osd.1.log

— end dump of recent events —

Source: Stack Overflow

    

Share it with your friends!

    Fatal error: Uncaught Exception: 12: REST API is deprecated for versions v2.1 and higher (12) thrown in /home/content/19/9652219/html/wp-content/plugins/seo-facebook-comments/facebook/base_facebook.php on line 1273